Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Is there any reason to use Unsafe Code in Unity in 2023?

Discussion in 'Scripting' started by DTECTOR, Jun 8, 2023.

  1. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    With things like IL2CPP, jobs and burst as well as improvements to C# structures and allocations over time... Are there any potential use cases for using unmanaged/unsafe C# in a unity project in 2023? And are there any potential pitfalls or conflicts that can arise due to incompatibility issues? For example, does IL2CPP have any issues with unsafe C#? Does anyone have any experience with writing unsafe C# in Unity?
     
  2. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    I use
    unsafe
    stackalloc
    fixed
    and pointers every day.

    Can't think without them. Let's me code like I'm in C.

    They work the same in all versions of Unity.

    Don't require any of the Packages.

    Build times aren't affected. Try using IL2CPP to build, you'll know what I mean. Even adding just Burst and Mathematics blows out build time massively.

    There's no building up of special "Native" arrays and contexts and structs and then issuance of jobs, and being uncertain if they're done, like with Jobs. You just do it. And it just works.

    It's AMAZINGLY FAST!!! For anything on its own thread, it's far faster than Jobs and Burst.

    You can use OnAudioFilterRead to have that thread without any setup!
     
    DebugLogError likes this.
  3. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    Does it impact deployment?
    I've seen some forums suggest that there may be some issues with platform dependency/compatibility issues but no real examples given.

    Quick summary from chatgpt for example, which is basically what I've seen being regurgitated everywhere on forums without any real world examples:

    "In the context of Unity, using unmanaged code through PInvoke is possible, but it might complicate the deployment process as you need to ensure that the DLLs are available on the target system and that they're compatible with the system's architecture (x86, x64, ARM, etc.). COM Interop, on the other hand, is generally less suitable for Unity projects, especially those intended to run on platforms other than Windows."
     
  4. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    I've never tried deploying a build to Android, and if I'm lucky that will continue.

    Everything else it works fine. Really well, in fact. I don't do much tricky stuff. Just basic use of arrays and pointers and math operations that are SIMD friendly, and they tend to just fly through.

    One thing to note, if you wanna go nuts, the single threaded prowess of Apple's iOS chips is amazing, and their caches are bigger, too.
     
  5. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Perhaps something I should add... It's fast to do, once you understand how to use "C" within C#, you can just do it, anytime you want, anywhere you want, and hit "Play" and it just works. And because you don't have any of the big frameworks installed to do this, the iteration time if you make mistakes and cause crashes is super fast, too. I tend to use a spare, barebones project to do the initial creation and testing of the "Low Level C within C#" stuff, and once I'm happy with it bring it into the main project. This way iteration time is crazy fast. No packages other than Rider.
     
    Kurt-Dekker likes this.
  6. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,134
    Last edited: Jun 8, 2023
    Yuchen_Chang and DTECTOR like this.
  7. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    I don't have a lot of experience with C, and unsafe code examples seem to be very sparse and are very basic to the point of useless. Should I look into C for better examples to get up to speed with unsafe C#?

    Also, do you have any examples off the top of your head where you would consider using unsafe C# to solve a problem as opposed to managed CLR C#? Or even just an example of things you might do with it simply because you prefer it to managed code?

    I would greatly appreciate it.
     
  8. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,762
    I seem to recall there were issues trying to build for WebGL but this might no longer be the case.

    Since I am lazy I am going to ask you first, Unifikation, if you are able to build and run your unsafe games in WebGL?

    Thinking on it, I imagine that now it DOES work, simply because IL2CPP... maybe I will go test it myself. After all, I got ALL my native C games compiling under Unity for WebGL for my KurtMaster2D game. Hm.

    I've just never written C# unsafe code before... my XP in that area is 0.
     
    Unifikation and DTECTOR like this.
  9. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    10,533
    unsafe code is used in Unity packages, you should be good to go. The Unity.Physics package uses it a fair bit to manipulate collider structures to force a kind of inheritance. stack stuff is also used there to generate the BVH's. Super fast!
     
    _geo__, SisusCo, Yuchen_Chang and 6 others like this.
  10. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,762
    Fear is the mind killer.

    I love stuff like this.

    "Reach over into that struct, set that byte to 1, call the same method, then restore the byte to what it was before."

    Done.

    EDIT: I did this with my graphics library back in C days: the thing that wrote to the screen buf could simply be told "hey your screen is this little shape over here, go blit on it," and it would happy do so. Then restore the graphics back to the main presentation surface. Good times.
     
  11. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    Would you by chance, have any idea of a good resource covering unsafe code? Or a good source for becoming comfortable with working in unsafe code? Or any wisdom for someone who is just considering it for the first time? The only thing I seem to see everywhere is that those with a background in C seem to understand it.
     
    Unifikation and Kurt-Dekker like this.
  12. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    10,533
    Unfortunately I don't have any resources to hand. I'm sure there must be lots around but from my perspective, I already had several decades of using pointers (of various kinds) so the only real things to understand were the syntax and managed quirks.

    I would say though, try to use "ref" all over the place before you go unsafe. :) Not in the same sense as multithreading but you can sure get yourself into trouble quickly nevertheless.
     
    DTECTOR and Unifikation like this.
  13. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,762
    This sounds like one of those things you put on a teeshirt.

    "Never go full unsafe!!"

    I'm curious to test this, just to see how fast C# blitting can work...
     
  14. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Anything you need to do a lot of in a single frame, and want to be sure it’s done on time. Collisions, checks, commandeering particles of a system and get them to do something else, predict what’s going to happen in a number of frames from now, analyse audio, create audio, some kind of primitive path finding and choice making, and the king of all uses, running your own little occlusion culling to deactivate geometry/renderers you don’t really need.

    I use it mostly for ADSR envelopes on sound, sound creation and my own sound modding/altering/post processing.

    There’s a feature/callback known as OnAudioFilterRead, that’s called every time the audio buffer needs another chunk of data. If you run in High Performance Audio mode (and I think everyone should, as this greatly reduces audio latency) you get a call every 256/44100 times per second - every 5.8ms - to my mind this is the single coolest feature in all of Unity.

    You can’t use MonoBehaviours or any other Unity APIs within OnAudioFilterRead. But you can use print(“Some Details here”), oddly enough, to send yourself notifications on what’s happening, which is super handy for debugging.

    But the real power of OnAudioFilterRead is that you don’t need to use it for audio, you can make anything with it, on any Unity Game Object, and edit any and all normal arrays, because you can fix them in place whilst you work on them with pointers, and then access them later in Update, with the work all done. It’s like its own little Jobs system because OnAudioFilterRead is running on its own thread at its own cycles.
     
    CodeRonnie and DTECTOR like this.
  15. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,762
    WAAAAAT! That's wild! I am so tired of the slight lag when I use OnAudioFilterRead!!!

    Do you just unsafe {} the entire code block basically?

    Yeah, Debug.Log() is one of those few bits of the Unity API you can call from other threads... I don't think that is documented probably because they want to reserve the right to remove it.

    But you won't SEE the output in the console window until the main thread runs to paint it. :) If you go south in a handbasket, well, that's all she wrote.
     
    Unifikation likes this.
  16. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    You don't have to unsafe{} the whole lot, but that's often what I'm doing in there.

    But yeah, it's the most fun usage of Unity there is, I think... probably because it's not Unity and barely C# ;)

    And because it's out of sync with the other updates, and faster than them, you can set flags back and forth for your Updates and FixedUpdates to know when stuff is ready and waiting for them in arrays they might be wanting, and when there's work waiting to be done by your OnAudioFilterRead's "jobs" system.

    I should point out that it's usually 512 samples (the two sides of stereo), so is actually 256/44100. But you don't have to put anything in it, at all.
     
  17. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    Oh my lord that's crazy. You are clearly on another level. As are all of you on this post :D (excluding me)
     
    Last edited: Jun 8, 2023
  18. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Sorry, couple more things. Windows defaults to 48,000Hz and future MacOS will, too. 44,100Hz is everything Apple, and Sony PS5 can be set to it.

    No, not me. But Kurt and Melvyn most certainly are!!!
     
  19. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    To learn a little about C# and pointers, type those two words (C# & pointers) into youtube. A bunch of videos came up, but I don't know which ones are good.

    Pointers take a little while for modern programmers to get their head around. For us old timers, it's hard to think without them. They're incredibly addictive because they're real, not abstract.
     
    Kurt-Dekker and DTECTOR like this.
  20. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    It's not a silver bullet though. The last time I've tried fixed layout struct (an overlapping union) for my random generator, it turned out that injecting bits into floating point is slower than straight up division. Who would've guessed?

    I had some pretty fun voxel stuff done in the unsafe context, but It's harder and harder for me to find uses where it really pays off.
     
    Ryiah and DTECTOR like this.
  21. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,380
    I use unsafe code here and there. Super useful for unwrapping structs that have its underlying members private (see: System.Guid).
     
    DTECTOR, Ryiah and Unifikation like this.
  22. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    I've seen where .NET uses unsafe code in some of their sources for working with arrays and things as though you were using pointers in C++, in order to achieve better performance, or in BitConverter for turning a float or double into an int or long that represents the bits, not a similar number.

    However, I've never seen a true need to use it myself personally. Usually .NET had provided something that will do the unsafe bit for you if there really is a valid need. Using Spans is really useful for slicing up arrays. I haven't ever had a need yet.

    However, that does not mean that there is never a reason to use an unsafe technique, only that I personally haven't had to.
     
    Last edited: Jun 9, 2023
    DTECTOR likes this.
  23. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    This seems to be what the current climate looks like with newer versions of C# & dotnet. A lot of the places where unsafe seems to actually provide a performance boost is on lower level operations where heap allocations and passing whole copies of value types. Although as @MelvMay mentioned down below. You can pass value types by ref now so you can even circumvent that issue a lot without introducing the plethora of problems that come with using raw pointers and getting in the way of garbage collector. I am still learning so if anyone thinks what I am saying here is not correct in some way please do feel free to correct me. I am trying a lot of smaller stuff and it seems like in newer .net versions the performance difference is negligible in most situations if not actually slower due the overhead created from unsafe operating in a managed environment. It would appear it will only really create a performance benefit in a large series of operations, would anyone with some serious experience here say that would be a correct conclusion?

    I was also running into things like SIMD which @Unifikation mentioned but it looks like SIMD is not really applicable to scripting gameplay in Unity and is mostly useful in lower level operations with very rudimentary value types like int, float & double. Again if someone has experience with SIMD in a higher level example that would allow a more performant way to pack in useful operations in gameplay scripts I would love to hear about it.
     
    CodeRonnie likes this.
  24. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Gameplay scripts for game object logic is too short an exercise, in most cases, to need moving into unsafe, Jobs or even ECS and/or Burst, and definitely unlikely to need or benefit from trying to be SIMD friendly.

    -----

    More generally, to your other points, the biggest advantage of heading into the unsafe realm is stackalloc. This gives you immediate stack arrays, and they don't need be small, you can go to quite amazing sizes, with which you can do truly stunningly performant calculations that are both SIMD friendly and cache optimal, almost by default, too.

    This does take a little bit of re thinking about what you're doing. Someone needs to be able to say "here's something that can benefit from doing this, this way" before it makes sense. The examples I gave previously are some of them. But an even easier way of thinking about this might be: if you think it's something you'd prefer to do with compute shaders, but it's not possible or too time consuming to try do it that way (in terms of engineering effort) then those are the things that stackalloc and SIMD considerations can do huge things for you.

    Similarly, if there's something you're pretty sure will work well with Jobs, but is only needed to be done in a single frame, there's a good chance that stackalloc is an easier way to set it up and perform it, and far easier to code, require much less coding endeavour and far faster compilation/build/Play mode entry as you don't need Jobs/Burst/ECS/Mathematics to play this way.

    One of the best examples of something that benefits from this is plotting all the points of a trajectory. With Melvyn's fantastic development of multiple physics scenes, it's possible to do this in a new scene, but it can be a little slow if it's something you want to perform a lot, with lots of accuracy. If you know the math you do truly astonishingly high resolution trajectory calculations in a single frame by putting everything you need in a small variety of stackalloc arrays and operating on/between them to create the results.

    This can then be all thrown away without any garbage issues, as it's all stack, not heap, and the desired results "printed" back to a fixed managed array via Memcpy, which is so fast it's stunning.
     
    CodeRonnie and DTECTOR like this.
  25. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    I use stackalloc for creating temporary arrays without needing an unsafe context by assigning the result of the stackalloc to a Span<T>. Span handles all of the actual unsafe operations, and protects it from leaving the stack because Span is a readonly ref struct.

    I'm not sure if my use case relates exactly to all of the scenarios you've described, and I'm certainly not trying to negate any of your points. I only thought it useful to point out that creating a temporary array with stackalloc does not always require an unsafe code context.

    Here is an example where I copy the contents of a string into an array of characters that is temporarily allocated on the stack.

    Code (CSharp):
    1. public static class Foo
    2. {
    3.     public static void Bar()
    4.     {
    5.         string example = "Hello world!";
    6.         Span<char> span = stackalloc char[example.Length];
    7.         example.AsSpan().CopyTo(span);
    8.     }
    9. }
     
    Last edited: Jun 9, 2023
    _geo__, DTECTOR and Unifikation like this.
  26. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Span is not available in all Unity versions, and one of the strengths of stackalloc under unsafe is that it's available all the way back into almost any old version of Unity. Even the one that Kurt uses ;)
     
    Kurt-Dekker, DTECTOR and CodeRonnie like this.
  27. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    Haha. That's fair. It does indeed require at least Unity 2021.
     
  28. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    Wow, lot's of great information here. @Unifikation @CodeRonnie @orionsyndrome so would you say it would be considered a default mode of thinking to assume unsafe and using stackalloc should be considered for small window of time and for very temporary objects? You would not use it for something with a long lifecycle? Or something that may persist as long as the game is alive? If what I understand about gc is correct, g2 objects and any links between g0,g1 and g2 also cause significant slowdown of gc traversal which causes overhead to spike in CPU? But I also am aware that unsafe and fixed objects get in the way of gc and can cause performance decrease as well? I imagine because of this, stack and unsafe is not a viable option for long living objects to eek out some extra ms per frame for other stuff on the cpu?

    And then of course my biggest struggle is the black box that is il2cpp, I've tried to find resources to understand better how this would differ from mono, as relating to the gc behavior and the fact that it's compiling aot to C++? So would that mean that pointers and all the unsafe stuff in C# would be potentially optimized in this way on the build process? Because then I wonder what the difference is in using managed code versus unsafe if IL2CPP optimizes away the difference on build. (Maybe the only way to figure this out is to drop 100k for source?:()
     
    Last edited: Jun 9, 2023
  29. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Perhaps the easiest way is to have a private chat with three or four people that use stackalloc in different ways, and a public one with Melvyn that we can all learn from. I suspect Melvyn might be a master of low level coding.
     
    DTECTOR likes this.
  30. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    That's a great idea. Where do you suggest you would ideally do that. I don't think there is private chat available on unity.com is there? Conversations look like they go into public forums.
     
  31. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    First of all, for anyone thinking about optimizing to this degree, you must use BenchmarkDotNet: https://github.com/dotnet/BenchmarkDotNet Until you run actual measurements, you just can't really say with 100% certainty which way of programming something will run faster based on what you've heard from others on the internet. You absolutely must measure it for yourself if you really want to feel like you *know* what's going on.

    The precision of the Stopwatch class is, at best, 100ns. So, if you're trying to figure out which line of code runs the fastest, and you're measuring down into one and two digit numbers of nanoseconds, just trying to do it yourself with Stopwatch is folly. BenchmarkDotNet will take your test blocks of code and run them millions, or billions of times. It will run them in loops that are large enough to ensure the overall measurement will be greater than 100ns, and take the average of all the iterations. It will do that over, and over, and over, and it will calculate how much of that is probably just overhead from the system running. Trying to measure the actual performance of something without it is like either just not actually knowing the correct answer, or trying to re-invent a very, very good wheel that already exists.

    BenchmarkDotNet can also tell you whether something is allocating garbage memory or not. The only limitation is that you have to structure various test like very small unit tests that do like one single thing. But, if you know the performance of those individual operations, then you can know that the performance of the overall architecture will be better when you put it all back together.

    I don't want to try to be prescriptive, but it's certainly not my default mode. I use it in exactly one place in my class library, and it was a decision I made based on the balancing of many concerns, that ultimately came down to my preference for how I was going to design the solution. To be more specific, there is a place where I have data representing one coherent object conceptually, but broken into many separate arrays potentially under the hood. The method that I need to pass that data into will only accept an argument for one contiguous span of data. It's not my method (or I should say the interface is out of my control) so I can't pass the data any other way. I need to copy the data from all of the separate arrays into one block of memory before passing it to this one method. I could easily have done it with an array that doesn't live on the stack, but that I have cached somewhere to re-use every time I need to pass an argument into this particular method. However, that raises new concerns. The array has to then be the maximum length of any data that could possibly need to be copied. I didn't feel like I wanted to automatically, and permanently, allocate that big of an array for users of my class library just because they used a particular method. Also, if a multi-threaded environment is a concern, as it was in this example, now you have to think about allocating a separate array for each thread, or locking threads so that only one of them can copy into this big array at a time. See what I mean about balancing many concerns? If something is allocated on the stack it can help with thread safety because each thread uses it's own stack. So, if multiple threads call my method, each thread will just call stackalloc and temporarily create it's own buffer to copy and pass the data without needing to lock or conflict with one another. The down side is that if someone invokes this method, and they *do* need to copy the maximum amount of data, that stackalloc call will probably be slower than an alternative solution. That's why, ultimately it was a design decision for me to make. If one of my users encounters that issue, the method in question is a convenience method, and there is a workaround I could prescribe for them to handle the scenario manually. So, I went with stackalloc for the common case, assuming they probably won't stackalloc a mega giant array in the expected use case.

    You absolutely could use it for something with a long lifecycle. If Unity stackallocs something before starting the main loop that the whole player and Update() live inside of, then that object will be on the stack the whole entire time. The question is, what is the scenario where you are doing this, and why? Why would just allocating one big array and hanging onto your reference to it not be better?

    You definitely want to avoid the Large Object Heap (LOH) of the garbage collector. The threshold for the LOH is 85000 bytes. If an object is larger than that, probably a giant array of some kind (remember a bunch of separate objects totalling over 85k are not themselves one 85k object), then it will end up in gen 2 and cleaning it up can be very costly yes. That's why in my class library I have a limit on those big data sets that always keeps everything under 85kB. Here are some bookmarks that can explain much better than I can.

    WARNING: This first blog post doesn't seem to have a valid security certificate, and there are some truly strange ads displayed on the page. You can skip straight to the next link if you don't want to deal with that.
    Regarding Unity memory optimization, you may encounter these problems
    https://blog.actorsfit.com/a?ID=01000-706b19bf-2ea7-4385-9430-1a80c54bc0f2
    A Microsoft expert's guide to profiling garbage collection, as referenced in the above blog post
    https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md
    Perfview, the tool to use for profiling garbage collection
    https://github.com/microsoft/perfview
    How to Debug GC Issues Using PerfView
    https://www.philosophicalgeek.com/2012/07/16/how-to-debug-gc-issues-using-perfview/
    PerfView User's Guide
    http://htmlpreview.github.io/?https...main/src/PerfView/SupportFiles/UsersGuide.htm

    I only used PerfView once for a few tests, but I needed to know with my solution that took longer for the individual operations to run but allocated no garbage, which I knew from BenchmarkDotNet, if the entire test application would still complete faster, simulating faster frames per second due to the effects of said garbage collection. That can't be tested in BenchmarkDotNet because it's a much larger effect that plays out over a much bigger time frame.

    I *think* what you're referring to has something to do with how unsafe contexts need to lock access to something while that's happening. Something, something boundary. It's ringing a bell, but I'm at the limit of my knowledge there. I'm pretty sure that stackalloc with a Span<T> will not cause any such interference with the GC.

    Edit: This is what I was trying to think of: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/fixed, but as I said I'm not the expert on unsafe. I've only experimented with Span based stackalloc.

    Oh, it certainly can provide a big performance increase, it just depends on what you're doing and why. Like I said, benchmark it for yourself, and know why you are doing it that way.

    Thanks to these forums, I recently found this: https://docs.unity3d.com/Manual/ScriptingRestrictions.html, (actually I realize now I already had it bookmarked and forgot about it), which is potentially relevant for users of my code, so I actually need to add a note to my own documentation. I think usually it is things that rely on reflection which could be stripped out because the compiler cannot infer what types need to be included beyond that reflection boundary. It's similar to the case in the link where a bunch of generic interfaces, only used as interfaces, never as their concrete types, could make it difficult for the compiler to know what to strip during code stripping.

    Since the things that are unsafe in C# are allowable in C++ there may be some interesting things to be done during IL2CPP compilation, but I'm not a compiler engineer and I would have no clue.
     
    Last edited: Jun 10, 2023
    DTECTOR and Unifikation like this.
  32. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    I thought it would be more helpful for me to show my work. So, I went to my benchmarks project and opened up the stackalloc test I had written for char arrays. I tried to duplicate the test for a TestObject class and plain old objects, but the compiler wouldn't allow it. Maybe it's something you can do in an unsafe context. So, I also made an int array version. Here is the test:
    Code (CSharp):
    1. using System;
    2. using BenchmarkDotNet.Attributes;
    3.  
    4. namespace Benchmarks
    5. {
    6.     [MemoryDiagnoser]
    7.     public class Stackalloc
    8.     {
    9.         [ParamsSource(nameof(Sizes))]
    10.         public int Size;
    11.  
    12.         public int[] Sizes { get; } = new int[]
    13.         {
    14.             4,
    15.             8,
    16.             16,
    17.             32,
    18.             64,
    19.             128,
    20.             256,
    21.             512,
    22.             1024,
    23.             2048,
    24.             4096,
    25.             8192,
    26.             16384,
    27.             32768, // Large Object Heap Territory
    28.         };
    29.  
    30.         [benchmark]
    31.         public void stackallocchararray()
    32.         {
    33.             Span<char> span = stackalloc char[size];
    34.         }
    35.  
    36.         [benchmark]
    37.         public void newchararray()
    38.         {
    39.             char[] array = new char[size];
    40.         }
    41.  
    42.         [Benchmark]
    43.         public void StackallocIntArray()
    44.         {
    45.             Span<int> span = stackalloc int[Size];
    46.         }
    47.  
    48.         [Benchmark]
    49.         public void NewIntArray()
    50.         {
    51.             int[] array = new int[Size];
    52.         }
    53.     }
    54. }
    55.  

    Summary of char[] results:
    BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.2965)
    Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
    .NET SDK=7.0.202
    [Host] : .NET Core 3.1.32 (CoreCLR 4.700.22.55902, CoreFX 4.700.22.56512), X64 RyuJIT AVX2
    DefaultJob : .NET Core 3.1.32 (CoreCLR 4.700.22.55902, CoreFX 4.700.22.56512), X64 RyuJIT AVX2
    Code (CSharp):
    1. string summary = @"
    2. |              Method |  Size |         Mean |      Error |      StdDev |    Gen0 | Allocated |
    3. |-------------------- |------ |-------------:|-----------:|------------:|--------:|----------:|
    4. | StackallocCharArray |     4 |     1.660 ns |  0.0754 ns |   0.1541 ns |       - |         - |
    5. |        NewCharArray |     4 |     4.070 ns |  0.1291 ns |   0.2487 ns |  0.0102 |      32 B |
    6. | StackallocCharArray |     8 |     1.647 ns |  0.0773 ns |   0.1507 ns |       - |         - |
    7. |        NewCharArray |     8 |     4.939 ns |  0.2465 ns |   0.6829 ns |  0.0127 |      40 B |
    8. | StackallocCharArray |    16 |     2.263 ns |  0.0870 ns |   0.1999 ns |       - |         - |
    9. |        NewCharArray |    16 |     5.770 ns |  0.1611 ns |   0.3502 ns |  0.0179 |      56 B |
    10. | StackallocCharArray |    32 |     3.300 ns |  0.1028 ns |   0.1906 ns |       - |         - |
    11. |        NewCharArray |    32 |     7.007 ns |  0.1837 ns |   0.2750 ns |  0.0280 |      88 B |
    12. | StackallocCharArray |    64 |     5.440 ns |  0.1513 ns |   0.2071 ns |       - |         - |
    13. |        NewCharArray |    64 |    10.786 ns |  0.1773 ns |   0.3779 ns |  0.0485 |     152 B |
    14. | StackallocCharArray |   128 |    13.665 ns |  0.3149 ns |   0.3369 ns |       - |         - |
    15. |        NewCharArray |   128 |    17.202 ns |  0.3860 ns |   0.8058 ns |  0.0893 |     280 B |
    16. | StackallocCharArray |   256 |    25.675 ns |  0.4406 ns |   0.3679 ns |       - |         - |
    17. |        NewCharArray |   256 |    32.875 ns |  0.8304 ns |   2.2590 ns |  0.1708 |     536 B |
    18. | StackallocCharArray |   512 |    52.286 ns |  1.0803 ns |   2.2550 ns |       - |         - |
    19. |        NewCharArray |   512 |    58.044 ns |  1.4465 ns |   3.9840 ns |  0.3340 |    1048 B |
    20. | StackallocCharArray |  1024 |   100.170 ns |  2.0188 ns |   4.5977 ns |       - |         - |
    21. |        NewCharArray |  1024 |   114.577 ns |  2.4496 ns |   6.4959 ns |  0.6604 |    2072 B |
    22. | StackallocCharArray |  2048 |   208.603 ns |  4.1785 ns |   9.6843 ns |       - |         - |
    23. |        NewCharArray |  2048 |   226.176 ns |  4.5482 ns |  10.8093 ns |  1.3123 |    4120 B |
    24. | StackallocCharArray |  4096 |   394.288 ns |  7.8228 ns |  16.6711 ns |       - |         - |
    25. |        NewCharArray |  4096 |   461.951 ns | 10.1325 ns |  27.7375 ns |  2.6174 |    8216 B |
    26. | StackallocCharArray |  8192 |   823.174 ns | 16.1552 ns |  24.1804 ns |       - |         - |
    27. |        NewCharArray |  8192 |   801.459 ns | 15.6762 ns |  27.0407 ns |  5.2080 |   16408 B |
    28. | StackallocCharArray | 16384 | 1,767.479 ns | 34.1742 ns |  30.2946 ns |       - |         - |
    29. |        NewCharArray | 16384 | 1,590.997 ns | 31.3940 ns |  46.0170 ns | 10.4160 |   32792 B |
    30. | StackallocCharArray | 32768 | 3,712.781 ns | 71.6048 ns | 104.9574 ns |       - |         - |
    31. |        NewCharArray | 32768 | 3,123.477 ns | 60.2412 ns |  84.4497 ns | 20.8321 |   65560 B |
    32. ";

    Summary of int[] results:
    BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.2965)
    Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
    .NET SDK=7.0.202
    [Host] : .NET Core 3.1.32 (CoreCLR 4.700.22.55902, CoreFX 4.700.22.56512), X64 RyuJIT AVX2
    DefaultJob : .NET Core 3.1.32 (CoreCLR 4.700.22.55902, CoreFX 4.700.22.56512), X64 RyuJIT AVX2
    Code (CSharp):
    1. string summary = @"
    2. |             Method |  Size |         Mean |      Error |      StdDev |       Median |    Gen0 | Allocated |
    3. |------------------- |------ |-------------:|-----------:|------------:|-------------:|--------:|----------:|
    4. | StackallocIntArray |     4 |     1.910 ns |  0.0792 ns |   0.1563 ns |     1.954 ns |       - |         - |
    5. |        NewIntArray |     4 |     4.564 ns |  0.1354 ns |   0.2441 ns |     4.582 ns |  0.0127 |      40 B |
    6. | StackallocIntArray |     8 |     2.396 ns |  0.0637 ns |   0.0498 ns |     2.403 ns |       - |         - |
    7. |        NewIntArray |     8 |     5.247 ns |  0.1216 ns |   0.1016 ns |     5.197 ns |  0.0179 |      56 B |
    8. | StackallocIntArray |    16 |     3.401 ns |  0.1077 ns |   0.0900 ns |     3.411 ns |       - |         - |
    9. |        NewIntArray |    16 |     7.299 ns |  0.1846 ns |   0.1726 ns |     7.288 ns |  0.0280 |      88 B |
    10. | StackallocIntArray |    32 |     5.405 ns |  0.1513 ns |   0.2727 ns |     5.467 ns |       - |         - |
    11. |        NewIntArray |    32 |    10.953 ns |  0.2601 ns |   0.5977 ns |    11.013 ns |  0.0485 |     152 B |
    12. | StackallocIntArray |    64 |    11.939 ns |  0.2811 ns |   0.5990 ns |    11.948 ns |       - |         - |
    13. |        NewIntArray |    64 |    16.794 ns |  0.3773 ns |   0.7179 ns |    16.776 ns |  0.0893 |     280 B |
    14. | StackallocIntArray |   128 |    24.508 ns |  0.5305 ns |   1.1073 ns |    24.775 ns |       - |         - |
    15. |        NewIntArray |   128 |    31.171 ns |  0.6606 ns |   1.2079 ns |    31.227 ns |  0.1708 |     536 B |
    16. | StackallocIntArray |   256 |    51.378 ns |  0.6825 ns |   0.5328 ns |    51.443 ns |       - |         - |
    17. |        NewIntArray |   256 |    58.860 ns |  1.1991 ns |   1.6413 ns |    59.088 ns |  0.3340 |    1048 B |
    18. | StackallocIntArray |   512 |   106.007 ns |  2.1457 ns |   3.6436 ns |   106.722 ns |       - |         - |
    19. |        NewIntArray |   512 |   119.103 ns |  2.6758 ns |   7.5909 ns |   118.834 ns |  0.6604 |    2072 B |
    20. | StackallocIntArray |  1024 |   205.882 ns |  4.0845 ns |   7.5708 ns |   208.194 ns |       - |         - |
    21. |        NewIntArray |  1024 |   230.277 ns |  4.5729 ns |   7.5134 ns |   232.114 ns |  1.3123 |    4120 B |
    22. | StackallocIntArray |  2048 |   401.804 ns |  8.0523 ns |  16.6293 ns |   406.737 ns |       - |         - |
    23. |        NewIntArray |  2048 |   471.393 ns |  9.4168 ns |  24.6420 ns |   482.024 ns |  2.6174 |    8216 B |
    24. | StackallocIntArray |  4096 |   778.366 ns | 15.2403 ns |  22.8109 ns |   779.046 ns |       - |         - |
    25. |        NewIntArray |  4096 |   840.995 ns | 16.5353 ns |  27.6267 ns |   838.943 ns |  5.2080 |   16408 B |
    26. | StackallocIntArray |  8192 | 1,807.228 ns | 35.2487 ns |  49.4137 ns | 1,813.243 ns |       - |         - |
    27. |        NewIntArray |  8192 | 1,574.962 ns | 37.2680 ns | 108.7124 ns | 1,572.383 ns | 10.4160 |   32792 B |
    28. | StackallocIntArray | 16384 | 3,684.930 ns | 72.0967 ns | 107.9110 ns | 3,701.458 ns |       - |         - |
    29. |        NewIntArray | 16384 | 3,108.149 ns | 61.8280 ns | 176.3987 ns | 3,115.799 ns | 20.8321 |   65560 B |
    30. ";

    So, you can see that stackalloc is pretty much always worth it, up until the very end where new instantiation performs slightly better. However, the difference is only a few hundred nanoseconds, which is negligible, and new always allocates new memory for later garbage collection, while stackalloc never does. Larger arrays will start to cross the threshold for the Large Object Heap of the garbage collector, so it's questionable if you should ever even call new on them anyway. Sounds like stackalloc is pretty good! It's no wonder I found a reason to use it.

    So, you see? I wasn't even remembering correctly off the top of my head. It pays to measure. To answer your previous question again, it seems like you could default to stackalloc in situations where it makes sense. The performance will almost always be just as good or better, with no GC memory allocated.

    However
    , that doesn't mean that you just stackalloc arrays whenever you need them because it may be obviously better to create a new array and just cache the reference for a longer period of time than a single stack frame. Doing stackalloc every frame would certainly be much worse. If you don't want the cached array to be garbage collected later on, it can be pooled if you have a good solution for doing so. Also, the stack is only 1 to 4 MB for 32-bit and 64-bit CPUs. That is another clear limitation on stackalloc.

    As far as the other benefits of unsafe code more broadly, there may be other examples where it is useful that someone else can provide. If you have Unity 2021 or greater stackalloc is not unsafe.
     
    Last edited: Jun 11, 2023
    DTECTOR and Unifikation like this.
  33. DTECTOR

    DTECTOR

    Joined:
    Aug 23, 2020
    Posts:
    128
    Wow, that's pretty cool. I am just learning all of this, I have pretty much used C# exclusively at a very limited level of understanding. My concept of performance for a long time was staying away from allocations in C# by using for over foreach and writing my own sorting and filtering as opposed to using linq and query syntax. But I've recently come to a point where some of the things I am doing, or rather, what I have to represent in code, I am worried I may need to begin optimizing because it's a lot of data, a lot of moving parts and getting to a point where I have a lot more garbage than I did before I had finished implementing everything. So I am just trying to make sense of all this stuff as it's rather new to me. I've never thought much about the stack. Right now, I am beginning to believe that stackalloc is probably pretty useless in gameplay related scripting I would imagine. Especially with how small the stack is. And if I am understanding all of this correctly, the reference to the data for reference types is stored on the stack which points to the data stored on the heap. So you are already filling this up with reference types and "new"... So even then you are already working with something that probably looks like a lot less than 4MB on the main thread of your unity app which is the only thread you can call and manage Unity objects due to api not being thread safe. And with that size I would imagine the stack is mostly designed or it is assumed that one would mostly use it for primitives and not complex types. I imagine that's another reason structs are intended to remain small aside from the fact that they traditionally pass by copy. But they also allocate on stack so I don't know how I would optimize much other than very small stack allocations like ints and floats by passing as ref and reusing them. I imagine using other threads to allocate more stack resources would be a bad idea. Maybe Dots is the only solution? I am using 2020 so DOTS 1.0 is not available to me. But I guess I am well out of bounds of the original question on this post. Thanks guys for all your insightful input. @CodeRonnie I really appreciate all the resources you provided. I am checking them all out. Real helpful gems.

    If you guys have any thoughts on my rambling here. I'd appreciate your insight.
     
  34. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    287
    Most of what is in memory is going to be allocated on the heap. That's what's occupying all of the RAM. But, as long as something is referenced by something, they're all connected in this big tree of data. Once something becomes fully snipped off of that tree, it will eventually get garbage collected. C# maintains a separate tree that it can compare, and free up the memory for the bits that have become fully de-referenced.

    The stack is for all of the local variables in the methods that are running, and that sort of thing. Basically, if you need a local array, you could stackalloc it, and it will get cleaned up easily when the method ends. However, you would want to do that infrequently, sort of event based. You wouldn't want it happening in the update loop because you may as well just hang onto the array at that point.

    It's a lifelong practice to expand our knowledge of programming. I've been coding for years, and I've only just learned much of this recently. I fully intend to keep learning as long as I'm writing code. So, keep it up. :)
     
    Last edited: Jun 11, 2023
    DTECTOR likes this.
  35. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    10,533
    The feeling of "power" that comes from such direct manipulation when I was a 15 year old Nerd simply cannot be described. Was it really *ahem* nearly 40 years ago. :eek: Time flies when you're poking memory.
     
    Unifikation, Bunny83 and CodeRonnie like this.
  36. Verne33

    Verne33

    Joined:
    Apr 12, 2020
    Posts:
    30
    Is "high performance audio mode" an actual feature, or were you referring to the unsafe block? I have never heard of this, nor can I find it on google ;)
     
  37. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    Apologies, I paraphrased, and got it wrong... it's called "Best Latency" and is an option in Audio Settings. See the letter "C" in this image, that's where you set it:

    https://docs.unity3d.com/Manual/comp-ManagerGroup.html
     
    Verne33, CodeRonnie and Bunny83 like this.
  38. Verne33

    Verne33

    Joined:
    Apr 12, 2020
    Posts:
    30
    After some experimentation, there is an unfortunate caveat that Windows cannot seem to handle any Unity DSP Buffer Size under 1024, which includes 'Best Latency' and 'Good Latency'. It is a known issue that causes your audio to crackle/stutter.

    Apparently it works OK on Android, iOS, and Mac machines
     
  39. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,762
    What's sad is that I bet Windows95 could have handled it. :)
     
    Verne33 likes this.
  40. Verne33

    Verne33

    Joined:
    Apr 12, 2020
    Posts:
    30
    I did read older forum posts stating it wasn't happening prior to some version of Unity 5. No idea whether that's true, or if it's more on Windows side, as you seem to imply

    In other words, I'd be curious if ways to circumvent this issue via middleware, custom plugins, etc are viable.
     
    Unifikation likes this.
  41. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    This is a Unity problem.

    They should have fixed this.

    I'll have a look into it when I find some patience to deal with Windows and Unity at the same time ;)

    DAWs (audio workstations) happily work down to crazy low DSP packets on Windows. Daughter runs at 64 for audio recording to reduce latency, so it's definitely not inherent to Windows, just Unity's relationship with Windows.
     
    Verne33 likes this.
  42. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,004
    Unity changed something at some point (during Unity 5 sounds right) and they have been refusing to touch their audio stuff since.
     
    Unifikation likes this.
  43. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    just done a bit of research. The situation is worse than I first thought. @AcidArrow has summed it up accurately in terms of "action to correct" not being a thing, too.

    This is, sadly, generally consistent with Unity's way of not doing things and leaving traps every other place you dare to step to do something with their engine.

    Sorry, I don't think there's any other solution other than making your own thread(s).
     
    Verne33 likes this.
  44. Verne33

    Verne33

    Joined:
    Apr 12, 2020
    Posts:
    30
    I know we are majorly sidetracked from the original topic, but anyone have insight into Windows + FMOD and Wwise? I know they have their own API with options to change buffer sizes and such.

    Or Unreal?

    I've mostly been able to overcome these limitations with my app, but it was honestly way harder than it should've been. And Unity's lack of attention to audio is concerning going forward.
     
  45. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,004
    I use FMOD Studio. I like it. My latency requirements are not crazy though.

    I found Wwise less straightforward to use and not for much benefit (although in theory it has better tools / automation to assist you with voiced dialogue etc).

    Try them, these days they both have free licenses for indies and affordable ones for bigger productions.
     
    Verne33 likes this.
  46. Unifikation

    Unifikation

    Joined:
    Jan 4, 2023
    Posts:
    1,046
    It's only going to get worse, I fear.

    Unlike AcidArrow, I do have latency requirements, my game is fast. Knowing what you've now told me, probably not possible on Windows, as I lean on OnAudioFilterRead and higher speed (low latency) audio for all sorts of things, and the player's expect/need the audio to be low latency, such is the speed of the game. This works, good enough, on iOS and Mac with Unity, after much work figuring out how to get it to work.

    As you say, it should just work.

    Unreal has a LOT of this stuff covered, quite well, including its own sound synthesis system.

    Unity started and then stopped audio development, several times, and now leans on providing FMOD for free as an add-on. But this won't help with latency, at all. May make it slightly worse.
     
    Verne33 likes this.