Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Discussion Why Doesn't C# Automatically Release Memory for Temporary Strings

Discussion in 'Scripting' started by GuirieSanchez, May 26, 2024.

  1. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Hi everyone,

    I've been wondering why C# stack-allocated strings' memory doesn't get automatically released and is managed by the GC, like other objects. I understand the decision behind using the GC to make the C# developer's life easier when it comes to memory management. However, I was wondering why making an exception only for string objects is a bad idea, or if it has ever been considered and why it was refuted.

    For instance, consider this quick example:
    Code (CSharp):
    1. for (int i = 0; i < 1000; i++)
    2. {
    3.     string a = "aaaa";
    4. }
    This outputs 2.88KB of garbage that will eventually be claimed by the GC. In C++:
    Code (CSharp):
    1. for (int i = 0; i < 1000; i++)
    2. {
    3.     std::string a = "aaaa";
    4. }
    The string object gets automatically deleted and its memory is automatically released when it goes out of scope (in other words, at the end of each iteration of the loop). Naturally, the memory snapshot at the end of the loop is 0KB since all the allocated memory is managed by C++ (so no stress on the developer for this particular case).

    The code in release mode for both projects runs at the same speed, the only difference being the garbage output from the C# side. Since string creation and manipulation are so common in applications of all sorts, I wonder why using an approach similar to C++ for strings in C# wouldn't be feasible (like making an exception for strings, while having all other classes/objects be managed by the GC).

    So, I guess my question is if it was a decision with the purpose of making the behavior consistent and align with the GC and the managed nature of C#, or if it's something difficult to implement, or if it has some performance considerations that only arise in a C# environment that prevent it from being as performant as the C++ counterpart.

    Thanks!
     
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    39,369
    There are rivers of digital ink already spilled out there about the differences between these languages.

    Here's one:

    https://www.geeksforgeeks.org/c-vs-c-sharp/

    Mark and sweep GC is fundamentally different from ref-counting new/delete systems.
     
    DragonCoder and marcoantap like this.
  3. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,594
    No it won't. Both of these in C/C++ and C# will behave in slightly similar manners (not exactly, but slightly similar).

    Both of these examples are setting a local variable to a static string. Both C/C++ and C# have special ways they deal with static strings. In both languages they will effectively have a read-only section of memory where string literals are stored and that's what the string will use.

    In the case of C# since 'System.String' is just a plain old class whose interface is defined in an immutable manner (you can't modify a string, every method that modifies the string actually creates a new string...). So the string is just a reference to the string literal in the string pool where all static string literals are stored as a part of program memory.

    In the case of std::string from C/C++... well your implementation really defines exactly what is going on. But the general shape of std::string is that there is a char* inside of the class as well as some other fields for length and what not. The char* though will point at the static string literal in the read-only portion of memory. Note this really comes from C where you would have just defined your string as a 'char* a[]' directly.

    Each string relies on the manner in which memory is managed in each language.

    You're missing a crucial step in the C/C++ side of things since you're relying on the std::string class which hides away a lot of the logic internally. In the end the C/C++ std::string is actually a char*... it's a pointer to a place in memory.

    Your general string implementation will allocate the memory for the actual string data, and in its deconstructor/dispose method it will deallocate it.

    If you've stack allocated your string scoped to your function like in your example code above. Really at the end of the function call since the compiler notices you don't need it anymore it will automatically call the deconstructor which in turn runs the necessary logic to properly manage the memory of the string.

    If you go and dig up some source code like say the GNU libstdc++ code you'll notice that there is actually an 'Allocator' passed along to the basic_string template for dealing with this. If I understand it correctly this is how your static string literals will work in that the allocator is a dummy... it doesn't do anything during the deconstructor/dispose since it's a static string literal.

    BUT if you had actually generated a string at runtime (and so isn't known to the program memory) then that string will get pruned from memory.

    If you had allocated your string manually instead of stack allocated it like in your example... well you would have had to deallocate it in the standard way you would any old class in C/C++.

    Because in the end C# string and C/C++ std::string rely on the same fundamental principles of how the languages and memory management behave. And therefore C#'s implementation relies on the GC for its memory management. Because that's how all classes in C# are dealt with.

    ...

    And in both cases this actually makes sense. In both C# and C/C++ stack allocations are generally a fixed size. When the function is called a section of the stack is dedicated to that function call. All local variables of that function exist in that little section of memory on the stack. Every byte, int, float, char, bool, exists right there... including a struct of those things. And the compiler knows exactly how big each and every one of them are as well.

    In C, this includes constant sized arrays. If your function declares an array like so:
    char a[16]...

    We know exactly how big this array is. It's 16 * sizeof(char).

    But if you want some resizable string... well... things get a bit different. Either you create a char[] bigger than the biggest string you need and terminate it with \0, or you alloc a dynamically sized array, or in the case of std::string you allow the classes implementation to handle it all for you magically under the hood.

    ...

    But here's the really convenient part of C/C++... structs and and classes are the same thing fundamentally at the end of the day (yes there's differences... but really as you boil it down they're the same general idea as a result of how C++ was built on top of C).

    In C# that's not how it rolls. You have to pick struct or class and their behaviour is distinct and limited.

    A class is always allocated on the heap and referenced. It can be of a dynamic size of memory since the allocation calls at time of saying "new".

    A struct is allocated as to the contextual means of where it's needed. Meaning if you're in a function it's on the stack, or if it's a field of a class it's in the allocated memory of the classes object. This freedom of where it can be in memory comes with a major limitation though... its memory size has to be fixed. A Vector3 struct is ALWAYS 3 floats. A float[] might be any number of floats in size.... including zero.

    Strings also aren't fixed in size, they behave more like an array. So therefore they need to be a class. And by being a class they come with all the limitations of a class in how its memory is managed since at the end of the day we're relying on the fundamental aspects of the language.

    ...

    Now keep in mind. Garbage Collection isn't a bad thing.

    The problem that arises is that here in Unity land we perceive it as a bad thing because we're making video games. We're constructing software that is inherently concerned about the frame rate of our game.

    But C# wasn't designed for video games. It was designed at the turn of the millennium to fulfill a similar task as Java. Things like enterprise software and/or rapid application development. It was designed to do what the likes of VB was doing in the 90s but better. So when the GC runs and we need to stall the game for a moment since GC doesn't like you poking its innards while its doing its thing is not a C# problem... it's a Unity problem.

    The fact that a string is a regular class that is managed by GC is... a feature!

    Unity just happened to pick C# for its scripting environment.
     
    Last edited: May 27, 2024
    Sluggy, ClementSXD, Lekret and 7 others like this.
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    4,194
    @lordofduct That's an impressive summary :)

    I'd like to add that C# recently has received a new expression: stackalloc. It allows to allocate a fix sized block of an unmanaged type on the stack. This "looks" like a normal managed array, but isn't.

    Arrays just like strings are objects and therefore always live on the heap. A "Span" is actually a special kind of struct which essentially wraps a memory pointer with some safeguards. Many methods in the .NET framework which take a string now have a version that takes a
    Span<char>
    as well, specifically to avoid garbage allocations. Though many still do not. Since this is relatively new, Unity only slowly adapts to those new features.

    The example of

    Code (CSharp):
    1. string a = "aaaa";
    as already mentioned by our lord, does not allocate a string object as string literals are "interned" strings. So each iteration you define a "new" variable on the stack which is a reference to a string object and each would just reference that same pooled object.

    So a more realistic example would be
    Code (CSharp):
    1.     for (int i = 0; i < 1000; i++)
    2.     {
    3.         string a = "aaaa" + i;
    4.     }
    This would indeed allocate a new string object on the heap that needs to be garbage collected. Note that there's a difference between defining a local variable and allocating an object. In C# all local variables are allocated when you enter the method and cleaned up when you leave the method. So it's essentially hardcoded in the compiled method. How the reserved stack memory is used depends on where and how your local variables are defined. In your case your "a" variable would just eat up 4 (or 8) bytes and those are reused in each iteration. Of course this stack memory has nothing to do with the string object which you may store in that variable besides that it is counted as a reference to the object. The GC will actually check the current stack if there are references to objects so they are not collected. My loop would actually create 1000 individual string objects (actually more than that since "i" would be converted to a string itself before it gets concatenated).
     
  5. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,687
    We're also dealing with a less advanced, less efficient, and less performant garbage collector. One of the things coming with the new scripting runtime is a more competent garbage collector. Josh has a few blog posts on the topic but I've linked the main one below.

    https://blog.unity.com/engine-platform/porting-unity-to-coreclr
     
    Last edited: May 27, 2024
  6. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,558
    I Cmd+F'd the thread, saw
    interned
    , saw Bunny83, had nothing to add.
     
  7. Spy-Master

    Spy-Master

    Joined:
    Aug 4, 2022
    Posts:
    844
    Ryiah, Bunny83, SisusCo and 1 other person like this.
  8. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Nice post. C and C++ are my favorite programming languages. I grew up dissecting games like Pokémon, some of which were written in C, and learning how they managed memory to squeeze out good performance on old consoles with very limited memory. For instance, they would place 8 booleans into a single byte and use other clever tricks. These ones allow for tight communication with hardware, precise control over memory patterns, and other low-level stuff. I believe you can do similar things in C# too, although there are more restrictions. I didn’t spend nearly as much time delving into the inner workings of C# as I did with C++ when I first learned it, so there are still some things that escape me.

    On the one hand, we have C#, which was created with the intention of ease of use, safety, and rapid development, sometimes at the cost of performance. On the other hand, Unity chose C# as their scripting language because it is safer, easier to work with, and appeals to a broader audience, which I believe was one of the main reasons for their choice. Nonetheless, it’s fast enough for most use cases, although not as fast for highly demanding games where you need to apply some hardcore optimizations.

    Now, coming back to C# strings, I was wondering specifically whether Unity is able to “hack” C# in a way that allows stack-allocated strings to be immediately deallocated from memory when they go out of scope. Or at least provide the option, since string manipulation is quite common. Of course, if this isn’t being done, there must be a reason (perhaps it’s not as performant as we assume, maybe it’s not that easy, or maybe it’s simply not worth it). I’m just curious, maybe some Unity staff might have insights on this.

    PS: Just to clarify, I’m not making any suggestions or requests; my question is purely out of curiosity.
     
  9. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    39,369
    That's an interesting idea... I don't think they really would want to do all that extra complexity.

    In c++ you call new / delete.

    In C# you just stop referring to something and it becomes eligible for GC... but AFAIK it is GC that does the actual delete / free after it determines (via mark and sweep) that something has no more references.

    I guess if there WAS a hack it would be on Unity to determine that a string really isn't used by anyone, then immediately call the GC and say "hey this one's back for you." It would be sorta... well, I dunno what would be the point.

    Remember also that Unity is IL2CPP on many major platforms (iOS, Android and WebGL), so to trans-hack that C++ that was made from C# in order to do this... oooh, ouch... I think my head exploded just now.
     
    GuirieSanchez likes this.
  10. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Yes, the reason I got 2.88KB is because I took a snapshot in debug mode, so the compiler didn’t apply any optimizations. But yeah, the idea is that. When you concatenate strings, for instance, you create new strings and allocate memory that won’t be immediately deallocated when it goes out of scope.

    :D
     
    Last edited: May 27, 2024
  11. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    4,194
    Right, but you could only use it in unsafe code :) So yeah, the keyword did already exist since the beginning. Haven't checked the IL but it usually allows much more than what the high level languages support. With pure IL you could do all sorts of tricks that C# wouldn't allow. Anyways, that's kinda off topic :)
     
  12. karliss_coldwild

    karliss_coldwild

    Joined:
    Oct 1, 2020
    Posts:
    613
    The tricky part is how to keep it compatible with rest of language semantics.

    Sure in a simple test code where you don't do anything with the string, it may seem obvious that it can immediately be freed.
    Code (CSharp):
    1. for (int i=0; i<1000; i++) {
    2.     string str = $"aaa {i}";
    3. }
    But real code is more complicate than that. In real code you will probably want to do something with the string.

    Code (CSharp):
    1. for (int i=0; i<1000; i++) {
    2.     string str = $"aaa {i}";
    3.     DoSomethingWithString(str);
    4. }
    How do you know that DoSomethingWithString doesn't store the str somewhere? And what happens if it does, does the str become invalid, does the lifetime of str becomes magically extended, does str get automatically copied, does the code not compile at all, do you accept memory unsafety make tracking this responsibility of programmer? How does function accepting string knows whether the string it receives need to be later handled by GC.

    Different programming languges solve this differently, but in most case it has major impact on overall typesystem and memory model of the programming language.

    For example rust has a complex borrowing mechanism which it uses to ensure at compile that no pointer to the borrowed string outlives the scope of object they borrowed.

    C++ embraces the UB, and says that if a programmer writes a code where it causes a problem it's their own fault.

    How do you ensure the stack allocated string doesn't get mixed up with GC strings?

    You could introduce a completely new string type. But programming in a language where you have a dozen different string types isn't pleasant, it's annoying when function expected one kind of string but you have different one. You often end up adding a bunch of conversions all over the place, potentially ruining the performance benefits for which you introduced multiple string types in the first place. Both rust and C++ have this with &str,String and char*,std::string.

    You could hide the runtime tracking of the way memory for string was allocated under the hood, but doing it this way you will likely end up something that's no better than reference counting based garbage collection.

    You could make a typesystem where object lifetime and allocation is orthogonal to the types. So that you can have the same string type, but combine it with different kind of storage mechanisms. In some sense you still get multiple string types, but in certain usage patterns the conversion is free. Both C++ and Rust since they have stuff like T,T*,T&,unique_ptr<T>, shared_ptr<T> etc. which can be combined with almost any type T. Although you usually don't use each of them as much in combination with strings since std::string under the hood is essentially unique_ptr<char[]> with a few extras.

    One more consideration is backwards compatibility. There is already large amount of existing C# code passing around strings with expectations you can do everything you currently can. A lot of solutions to get non GC strings would likely introduce additional string type or changes to type system, both of which would have poor compatibility with existing C# code.

    Back to the core problem. A common usecase which introduces a lot of string garbage is when you build longer strings from multiple parts. That's why you have stuff like StringBuilder https://learn.microsoft.com/en-us/dotnet/standard/base-types/stringbuilder . And if you are simply passing around strings without any modifications there shouldn't be any garbage generated since strings in C# are immutable so they can all point to the same object.

    I am somewhat tempted to repeat your test. As it really shouldn't be generating any garbage even in debug mode (unless there were some important details that you didn't mention). Did you test it with mono or ill2cpp ? And how exactly you observed that 2.88 KB.
     
    Lekret, Ryiah, halley and 2 others like this.
  13. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    It all has to do with scopes. In C#, when that
    str
    has no more references is the only time it becomes eligible to be garbage collected. The super simplistic idea would be to free the memory of the string object immediately at this stage, instead of tossing it to the GC. Again, the hypothetical case is to give developers the option to make an exception just for string objects (maybe having a special String class), which might be useful in some cases where heavy string manipulation is needed. But of course, I understand this might not be desirable for Unity if they want to keep everything consistent with the overall design of C#.
     
  14. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Sorry, I didn't mean to say that my code snippet (in particular) generates garbage, especially in a final executable with optimizations in place (it only did in debug mode with no optimization).

    I simply wanted to point out that creating new strings generates garbage in a C# environment. Since strings are immutable, concatenating strings creates new strings and therefore generates garbage (a better example of this is Bunny83's). In C++, for example, the counterpart will immediately release the allocated memory when those newly created strings go out of scope.
     
  15. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    C#:
    Code (CSharp):
    1. static void Main(string[] args)
    2. {
    3.     // Memory snapshot 1
    4.  
    5.     for (int i = 0; i < 1000; i++)
    6.     {
    7.         string a = "aaaa" + i.ToString();
    8.     }
    9.  
    10.     // Memory snapshot 2
    11. }
    C++:
    Code (CSharp):
    1. int main()
    2. {
    3.     // Memory snapshot 1
    4.  
    5.     for (int i = 0; i < 1000; i++)
    6.     {
    7.         std::string a = "asas" + std::to_string(i);
    8.     }
    9.  
    10.     // Memory snapshot 2
    11. }
    The difference in the heap size in C#:
    upload_2024-5-27_16-38-35.png

    (+5.08KB) as expected since in each iteration of the loop a string was discarded and is eligible to be garbage collected eventually.

    The difference in C++:
    upload_2024-5-27_16-40-34.png

    also as expected since in each iteration of the loop a string was discarded and the memory it pointed to is immediately deallocated.

    -------------

    I wasn't concerned about speed, just memory, but I figured I would also share it. Both in release with optimization, C# yields about 110μs on average while C++ yields 19μs. Of course, not a fair comparison (different classes and implementations, different compilers, or being able to use IL2CPP in C#, etc.). Using raw chars would probably yield similar speeds for both.
     
    Last edited: May 27, 2024
  16. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    4,194
    That's simply not an options. In C# objects live exclusively on the managed heap and the heap management is completely in the hands of the GC and runs on a separate thread. Apart from that you know that object references can be passed around and stored somewhere else. So "just freeing the memory" is not an option for any reference type, as you don't know who may have a reference. Figuring out who has a reference is the job of the GC by examining static variables and the callstack.

    You essentially want an exception to store an object in stack memory which is not possible at all in C#. As I said, objects always live on the heap. The goal of having a managed memory is specifically to avoid such potential issues that some part may free memory that others still use.

    When or in which case would you actually need a string just on the stack? What would you do with that string? You could not pass a reference to such a string to any method which could potentially store it somewhere. Since the CLR can not enforce this, it's generally not allowed. You can do all sort of nasty stuff in unmanaged or unsafe code with native memory that is not managed memory. Though you can not store managed objects like strings there.

    I'm not sure I understood what you meant by that but it seems you have the wrong idea what is happening here. Your
    str
    is of course just a name for a bit of memory. This memory in case of a local variable is the stack and as I mentioned before, the memory for all local variables is allocated when you enter the method and freed when you return from that method. That variable is just a reference. So it occupies 4 or 8 bytes as it just holds a reference to an object. Nobody "references" the variable itself. The identifier that within this scope essentially refers to the 4 or 8 bytes of memory on the stack can of course only be used inside the scope it is defined with. This is only about the identifier. As I said, the memory exists before you enter the scope and will last until the method returned. The actual string object lives on the managed heap. When you pass your "str" variable to another method, you actually copy the reference that is stored in your local str variable into the stackframe of the method that you're calling. That means the actual object can outlive the original variable that held the reference for the first time.

    If you allocate a new string object each iteration, all those strings (given you did not store those references somewhere else) would be up for garbage collection right away, even while your loop still runs. The only exception is the one string object that is currently referenced by your str variable.

    So when you do

    Code (CSharp):
    1. for (int i = 0; i < 1000; i++)
    2. {
    3.     string a = i.ToString();
    4.     if (i %100 == 99)
    5.         System.GC.Collect();
    6. }
    This would, with a very high chance, collect 99 string objects (i = 0 to 98) which are all up for collection. So in this loop there would be at max 100 strings on the heap at the same time (just from this code, of course). If you ran GC.Collect every iteration, there would only be 2 strings on the heap and each iteration one would be collected. Variables on the stack do not occupy memory that needs to be collected, only objects do and those can not be "removed" manually because we have a managed heap. You can not decide when this memory is no longer in use. That's done by the highest court: the garbage collector.
     
    Last edited: May 27, 2024
    Ryiah likes this.
  17. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,558
    In addition to the fundamental rules Bunny laid out (the reference is on the stack, the backing data is on the heap), you'd have to ask yourself why string is special among all objects. They're first-class objects because the syntax has special sugar to handle them easily, but at the end of the day, they're just objects. Why would
    string
    get special considerations for this mysterious "freed instantly" behavior, while, say, anything else in the class library, or indeed
    [MagicallyFree] class MyMagicalClass
    would not? You're taking on the work of the GC in two different places: the magically-go-out-of-scope-free and the gradually-work-our-way-through-heap-marking-and-sweeping-free. Gotta make sure it's bullet-proof in all cases and now both freeing functions must be sure they're not interfering with each other.
     
    Last edited: May 27, 2024
    Bunny83 and Ryiah like this.
  18. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    This is exactly what I was saying, perhaps I didn't explain myself correctly.
    string a
    is a local variable whose scope is that of the loop. Once the current iteration of the loop finishes,
    string a
    is gone and up for the GC. But what I was essentially comparing is a C# version where you generate garbage continuously as the loop runs, vs a C++ version that leaves no garbage after the loop ends, thus avoiding performance hits from garbage collection.

    Now, the original question was if it's feasible, or worth even considering I should say, to have a special class (string for example) that uses a C++ approach: for instance, having those objects not living in the managed heap. You said:
    I am curious about the feasibility and difficulty of deviating from the managed heap allocation in C#, particularly for performance-critical sections where manual memory management could be beneficial. If you know the details on how exactly this is not possible and care to share, I'd appreciate it. And to clarify one more time, I'm not requesting anything to Unity, or requesting something specifically about strings; rather, I ask simply out of curiosity because I like to know the inner workings of these things.

    The example wasn't meant to represent any realistic use case but just to showcase garbage generation in C# vs immediate deallocation in C++. I personally don't know how the string class (and some features of C#) works under the hood, so forgive me if I make wrong assumptions. I'm going to deviate a little bit to discuss scopes, so please bear with me for a second. Consider this useless piece of code: **(no real use case)**
    Code (CSharp):
    1. void foo()
    2. {
    3.     // Allocate memory for the Entity object once
    4.     int bufferSize = sizeof(Entity);
    5.     const char* buffer = new char[bufferSize];
    6.  
    7.     for (int i = 0; i < 1000; i++)
    8.     {
    9.         // Construct a new Entity object in the same memory location
    10.         Entity* entity = new((Entity*)buffer) Entity("aaaa");
    11.  
    12.         // Do something
    13.      
    14.         // Explicitly call the destructor to clean up the object without freeing memory
    15.         entity->~Entity();
    16.     }
    17.  
    18.     // Free the allocated memory
    19.     delete[] buffer;
    20. }
    We have a local pointer (called
    buffer
    ) that allocates an array in the heap. The scope of the
    buffer
    pointer is that of the function
    foo
    , which is where the pointer was created.
    buffer
    is just an integer (8 bytes in x64) that represents an address pointing to the start of the memory in the heap where the array of chars starts (in this case, an array of 48 elements/bytes since the size of the
    Entity
    is 48 bytes). If we fail to free the memory (maybe there's an exception, or the class that implements the function gets destroyed, you name it), the memory for the
    buffer
    pointer itself gets deleted since it goes out of scope, but the array of chars will keep living in memory.

    Now, smart pointers can help with some edge cases since they work in a managed way, more similar to a higher-level language:
    Code (CSharp):
    1. void foo_v1()
    2. {
    3.     // Allocate memory for the Entity object once
    4.     int bufferSize = sizeof(Entity);
    5.     std::unique_ptr<char[]> buffer = std::make_unique<char[]>(bufferSize);
    6.  
    7.     for (int i = 0; i < 1000; i++)
    8.     {
    9.         // Construct a new Entity object in the same memory location
    10.         Entity* entity = new(buffer.get()) Entity("aaaa");
    11.  
    12.         // Do something
    13.  
    14.         // Explicitly call the destructor to clean up the object without freeing memory
    15.         entity->~Entity();
    16.     }
    17. }
    The
    buffer
    variable is allocating a
    unique_ptr
    class instance on the stack, which internally calls the 'new' keyword for you and allocates the char array. Since it's a local variable, when the
    foo_v1
    method ends (even if not gracefully), the buffer variable gets deleted as it goes out of scope, and the
    unique_ptr
    class instance would therefore go out of scope, causing its destructor to get triggered, which in turn, calls
    delete
    for you, freeing the memory in the heap that holds the char array.

    What I want to say with this is that deallocation/garbage generation happens regardless, either implicitly or explicitly. The discussion is not about whether we want to keep a reference or pointer that outlives the scope in which it was created, but rather, to see if stack-allocated objects, which are automatically managed and destroyed by the runtime, actually delay memory deallocation or not.

    Here's a more realistic example when discussing strings and garbage generation in C#:
    Code (CSharp):
    1. void UpdateScore(int score)
    2. {
    3.     m_ScoreText.text = score.ToString();
    4. }
    Imagine we update the score quite frequently during gameplay.
    .ToString()
    involves creating a new string object in memory, and it is immediately assigned to
    m_ScoreText
    . The old string that
    m_ScoreText
    may have referenced is dereferenced and thus becomes eligible for garbage collection (in other words, garbage is generated). In a C++ approach, the previous string that is no longer referenced gets automatically deallocated.

    Text and string manipulations in general are very common in many applications and games, that's why I chose 'strings' as an example in this post. For example, score updates, text display, fps count, console text, chats, and so on. It's not about giving strings a special use over other managed objects, but rather, to see if an alternative to garbage allocation could be feasible (or worth it) in Unity as an option for areas that need to be optimized as per project needs.
     
  19. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,687
    Thanks to this thread I'm currently playing around with a NativeArray-based string implementation. I don't see why you couldn't use something like this where you manipulate the individual entries of the array manually to avoid the allocations.

    Code (csharp):
    1. using System;
    2. using Unity.Collections;
    3.  
    4. public struct NativeString : IDisposable
    5. {
    6.     private NativeArray<char> nativeArray;
    7.     public int Length { get; private set; }
    8.  
    9.     public NativeString(string initial, Allocator allocator = Allocator.Persistent)
    10.     {
    11.         Length = initial.Length;
    12.         nativeArray = new NativeArray<char>(Length, allocator);
    13.  
    14.         for (int i = 0; i < Length; i++)
    15.             nativeArray[i] = initial[i];
    16.     }
    17.  
    18.     public char this[int index]
    19.     {
    20.         get
    21.         {
    22.             if (index < 0 || index >= Length)
    23.                 throw new IndexOutOfRangeException($"Index {index} is out of range for NativeString of length {Length}.");
    24.  
    25.             return nativeArray[index];
    26.         }
    27.  
    28.         set
    29.         {
    30.             if (index < 0 || index >= Length)
    31.                 throw new IndexOutOfRangeException($"Index {index} is out of range for NativeString of length {Length}.");
    32.  
    33.             nativeArray[index] = value;
    34.         }
    35.     }
    36.  
    37.     public int CompareTo(NativeString other)
    38.     {
    39.         int minLength = Math.Min(this.Length, other.Length);
    40.  
    41.         for (int i = 0; i < minLength; i++)
    42.         {
    43.             int result = this[i].CompareTo(other[i]);
    44.             if (result != 0)
    45.             {
    46.                 return result;
    47.             }
    48.         }
    49.  
    50.         return this.Length.CompareTo(other.Length);
    51.     }
    52.  
    53.     public override bool Equals(object obj)
    54.     {
    55.         if (obj == null || GetType() != obj.GetType())
    56.             return false;
    57.  
    58.         var other = (NativeString)obj;
    59.    
    60.         if (this.Length != other.Length)
    61.             return false;
    62.  
    63.         for (int i = 0; i < this.Length; i++)
    64.         {
    65.             if (this[i] != other[i])
    66.                 return false;
    67.         }
    68.  
    69.         return true;
    70.     }
    71.  
    72.     public override int GetHashCode()
    73.     {
    74.         int hash = 17;
    75.         for (int i = 0; i < this.Length; i++)
    76.         {
    77.             hash = hash * 31 + this[i].GetHashCode();
    78.         }
    79.         return hash;
    80.     }
    81.  
    82.     public static bool operator ==(NativeString lhs, NativeString rhs)
    83.     {
    84.         if (ReferenceEquals(lhs, rhs))
    85.             return true;
    86.  
    87.         return lhs.Equals(rhs);
    88.     }
    89.  
    90.     public static bool operator !=(NativeString lhs, NativeString rhs)
    91.     {
    92.         return !(lhs == rhs);
    93.     }
    94.  
    95.     public void Dispose()
    96.     {
    97.         if (nativeArray.IsCreated)
    98.             nativeArray.Dispose();
    99.     }
    100. }

    Edit: Skimming through the TMP docs while writing the above code I noticed you can pass it both a
    char[]
    and a
    StringBuilder
    .

    https://docs.unity3d.com/Packages/c...xt.html#TMPro_TMP_Text_SetText_System_Char___
    https://docs.unity3d.com/Packages/c...o_TMP_Text_SetText_System_Text_StringBuilder_
     
    Last edited: May 27, 2024
    SisusCo, GuirieSanchez and Spy-Master like this.
  20. Spy-Master

    Spy-Master

    Joined:
    Aug 4, 2022
    Posts:
    844
    We have UTF-8 stuff in the Collections package, mainly NativeText/UnsafeText as well as fixed-length text structures varying from 29 to 4093 bytes of capacity. Considering current .NET has UTF-8 literals (
    "text"u8
    ), combining these two and avoiding UTF-16 to UTF-8 conversion for literals would be interesting.
     
    SisusCo and Ryiah like this.
  21. meredoth

    meredoth

    Joined:
    Jan 29, 2014
    Posts:
    9
    There are not stack allocated memory strings in C# the reference to the stack class exists in the stack but the class exists in the heap.

    In C++ the string class is allocated in the stack, but the string contents exist in the heap and are referenced by the class. There are sometimes, if the string in C++ is small enough, that its contents can be placed on the stack by the compiler but that is the exception.

    The GC cannot manage the stack memory in C# and in C++ the stack memory can not be manually selectively freed. That is because the stack is (well a stack). It is a LIFO collection, things get deleted in the opposite order they got created.

    The string class in C++ exists in the stack and points to an array of chars that exist on the heap and when the string class goes out of scope frees that memory, that means that the implementer of the string class was responsible for freeing the memory.

    In contrast the string class in C# exists in heap memory and when there is no reference to it, the class becomes eligible for garbage collection (it may get collected or not) and the string contents "may" get deleted from memory too. The reason is that the string data in C# is not some data exclusive to that class, but the class is implemented in such a way that takes up less memory. For example in C#:

    Code (CSharp):
    1. string a = "A string";
    2.  
    3. string b = "A string";
    4.  
    5. Console.WriteLine(Object.ReferenceEquals(a,b)); // true
    Here we have two different strings, but we have memory allocated only for one "A string".

    So, we actually have two questions to answer, why the string is managed by the GC like everything else and why is chosen to be implemented this way.

    For the first, it would be weird for a managed language to have one of its basic types unmanaged. There seems to be a confusion that the garbage collector is slower in freeing memory than manually doing it, the way the string class in C++ does, but in reality freeing memory is a very small part of the memory management.

    The garbage collector does many other things, that someone coding in C++ will have to do manually, and do them better than the GC if he wants his code to be more efficient. It is much harder than just manually deleting the memory. Some examples:

    You will have to keep track of all the references that exist for a string and only delete it when the last one goes out of scope, unless you implement the strings in the same way as C++ where it is less memory efficient for big strings. This is done automatically by the GC and adds to performance, but creating a mechanism to do it manually in C++ will affect performance too.

    You will have to be careful with memory fragmentation. This is very important and is a big problem in C++, it is not as simple as freeing the memory because eventually you will have empty "pockets" of memory. This in C# is solved differently depending on the garbage collector.

    Unity uses the Boehm GC, that keeps a pointer in each empty space and every time there is a need for memory allocation checks the size of that empty area to check if the object fits, if not tries the next one and so on. The gc in C#, every time memory is freed, consolidates the memory, so that there is only one pointer showing at the empty space and that makes the creation faster, but deletion is slower and for that reason that consolidation doesn't get done in one go.

    Without a garbage collector, this is a problem for the programmer. He will have to implement one of those ways (or a different one of his choosing) and do it in a way that will be faster than the gc's implementations. If you just create and delete strings without doing that, your memory will eventually be like a Swiss cheese.

    Another problem, is you have to take care to lock the memory manually when you are doing any defragmentation and update any pointers, another is bounds checking for your memory, another is updating all references every time you move memory and so on.

    I hope this answers the first question, why string is managed like everything else by the gc, it is because manually freeing memory like the string class in c++ does, is more complicated for performance than just the act of freeing it.

    For the second question, why this implementation is chosen, it is because the strings are optimized for memory size. In general, the strings in C# are considered to be relative large and immutable. C# is a general programming language and the performance hit of the gc for the strings for 99% of the cases doesn't matter.

    For the rest 1%, you can 99% of the time, cache your strings and null the reference when it is convenient to do so, like when loading a level, pausing your game, showing a menu etc. and calling the gc then. Caching the references and making them null and calling the gc when it is convenient to do so is much easier than managing all the above memory problems. For the rest 1% of the cases, c# offers low level tools to manually manage memory yourself, or use someone's library that has done a low level string implementation for you. But this is rare and for this reason the managed string is preferred.

    Finally there is the "object" problem. All reference types in C# inherit from the object class. This helps with conversions, polymorphism and the boxing of value types. If strings weren't managed, many of the current things that seem natural in C#, like writing in the console the value of an int, that in fact boxes the int and calls the ToString method, would be much more complicated, slower because there would be a need for a mechanism to traverse between the managed and the unmanaged space and harder for the programmers as any implicit conversion to string would need manual memory handling.

    So the answer of why the strings are not unmanaged as in C++, is because it is an all or nothing deal. Either everything that is stored on the heap is managed so conversions and memory management are easier for the programmer and any performance is gained by caching and having the garbage collected when you are not in a performance critical part of your app, or everything in the heap is unmanaged by the language and the programmer manually deals with memory in a way that is more performant for his use case than the gc by creating mechanisms that deal with all those memory management problems, not just freeing the memory.

    Mixing both of these, will take the worst of each, less performance because of the traveling between managed and unmanaged space and any implicit string creations needed, less safety because of the manual memory management and less productivity because the programmer would have to think in two different ways about the memory a low level thinking for the unmanaged and a high level thinking for the managed part, many times both of those in the same statement ex:
    Code (CSharp):
    1.  
    2. Console.WriteLine($"{person.Name} {person.Age}");
    3.  
    (person a managed instance of a class, Name an unmanaged string, Age an int conversion to string) What gets manually managed here and what not? Is the conversion of Age to string performant? Does the Name reference the unmanaged space and if yes is the only one? The string created by the Age integer, when should be freed? Do I need it later so I should cache it to avoid the cost of crossing managed/unmanaged space? If I free it, do I need to consolidate the unmanaged memory now or later ? etc...

    I'm sure there are other complications too of having a mix of managed and unmanaged instances at the same time that someone with experience in building language compilers will know, but I hope the few I could think of answer you question.
     
    Last edited: May 28, 2024
  22. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Yep, you're right. I should've worded it differently (e.g., when the stack-allocated pointer gets out of scope -> proposal: the string it points to gets immediately deallocated from the heap).

    Thanks for sharing your opinion, I think you hit the nail on the head.

    I don't think mixing both approaches necessarily takes the worst of each though. It's not uncommon to see environments where managed memory management is mixed with manual memory management (for example in Unreal Engine). However, it does involve advanced programmers who know how to handle memory effectively on their own. It's only then when you'd get the best of both worlds rather than the worst.

    But yeah, I have to agree with everything else. For sure, you could make use of the manual string class I was theorizing about by properly managing memory allocations and deallocations, e.g., using custom allocators, pools, placement new operators, etc. (basically following standard rules that avoid fragmentation). However, realistically speaking, what percentage of people would even want to mess around with this? So, yeah, that answers it. No wonder why it hasn't even been considered in the first place.

    In other scripting environments, like those using C++ or other low-level languages, a mixed approach to memory management might make more sense. But in Unity, given that its scripting environment is C#, well, it doesn't.
     
  23. icauroboros

    icauroboros

    Joined:
    Apr 30, 2021
    Posts:
    177
    Manual memory management is not an advanced skill. Stop buying GC gang propaganda.
    99% of the time a heap allocated object lifetime will be bound to parent object (or monobehaviour), and that will be bound to its parent if it has, up to the Main(). You allocate in constructor (or Awake) and free it in dispose or destructor (or OnDestroy). Not the mention class and struct are same in c++, unlike c#, they both like c# struct, so it is freed auto when out of scope. And when you forget freeing, your allocator will give a error message on debug mode.

    GC will be fragmented too, actually you will have more fragmented in gc language because nearly everything heap allocated even it is not needed. GC will moves memory behind the scenes but it is not performance free.

    There is something that excels in performance, robustness and simplicity at the same time in my opinion, which is One-Frame Allocators (Allocator.Temp).

    By nature of a game, many of the allocation made on start of the frame, then freed (or rewrited on preallocated caches, since gc is slow in unity) and does not used later frames. One frame allocators specialized on this, very similar to a stack, flushed every frame so it will be not fragmented. Fast allocation because no need to search for gaps.
    There is no need to manually freeing since it will be freed itself. So no memory leaks.

    I think power of the Temp allocator (or Collections altogether) not taken seriously by majority of unity devs.

    >>Why string not automatically deallocated when out of scope?
    because its a class.

    If you need stack allocated string there is Span<char> for utf16 strings and Span<byte> for utf8.
    For struct string, FixedString from Collections.
    For "pooled" class strings, there is StringBuilder.
    And there is ZString for pooled struct strings.
     
    Last edited: May 30, 2024
    GuirieSanchez likes this.
  24. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    444
    Maybe the C++ compiler is smart enough to see that you are creating a string of the same size and reuses it until the loop finishes. it might also see that you are creating it locally, not using it for anything, and therefore is irrelevant.
     
  25. SisusCo

    SisusCo

    Joined:
    Jan 29, 2019
    Posts:
    1,363
    Some string operations could in theory be automatically optimized by the compiler to get rid of allocations. For example:

    User's code:
    Code (CSharp):
    1. bool IsProjectRelativePath(string path)
    2. {
    3.     var fullPath = Path.GetFullPath(path);
    4.     string fullPathWithForwardSlashes = fullPath.Replace('\\', '/');
    5.     return fullPathWithForwardSlashes.StartsWith(FullProjectPathWithForwardSlashes);
    6. }
    Lowered code:
    Code (CSharp):
    1. bool IsProjectRelativePath(string path)
    2. {
    3.     var fullPath = Path.GetFullPath(path).AsSpan();
    4.     Span<char> fullPathWithForwardSlashes = stackalloc char[fullPath.Length];
    5.     MemoryExtensions.Replace(fullPath, fullPathWithForwardSlashes, '\\', '/');
    6.     return MemoryExtensions.StartsWith(fullPathWithForwardSlashes, FullProjectPathWithForwardSlashes);
    7. }
    Using spans in your code like this can hurt readability quite a bit, so if the compiler could do all this automatically during lowering, it would be nice.

    ReadonlySpan<T>
    /
    Span<T>
    support is being added to more and more .NET APIs, so over time it should become easier to perform various string manipulations without causing so many allocations.
     
    GuirieSanchez, spiney199 and Bunny83 like this.
  26. GuirieSanchez

    GuirieSanchez

    Joined:
    Oct 12, 2021
    Posts:
    462
    Yeah, Unity could benefit from using a custom allocator for the String class that uses a dedicated area on the heap for one-frame operations. All temporary strings (in the scope of a specific function) that are created as a result of string manipulations, such as in
    Code (CSharp):
    1. // inside a method
    2. string fullPathWithForwardSlashes = fullPath.Replace('\\', '/');
    can use this allocator to store the memory in the heap, and flush it immediately after it's used at the end of the method. This has 3 advantages: (1) No overhead due to garbage generation, (2) No fragmentation in the heap by operating on a frame-by-frame basis (akin to stack frames), ensuring contiguous memory allocation and deallocation, and (3) comparable speed for allocation and deallocation between the custom heap and stack due to their similar operation mechanisms.

    For smaller strings, we can get away by using Span<char> and allocate them on the stack with similar and excellent performance.

    It would nice if performant operations like this were done automatically by the compiler or, alternatively, under the hood via wrappers, so that the complexity is abstracted away when working in C#. People tend to argue about the complexity of addressing string-related optimizations, but I think everybody would enjoy these optimizations if they were readily available.

    Personally, I would love to see a more performant default string solution in Unity, without relying on non-standard methods with more limited compatibility or options. StringBuilder, for example, is great for string manipulations, but then you have to perform a ".ToString()" at the end, which I would like to avoid. For instance, I'd like to see a string type that allocates memory in fixed-size blocks, resizing only when necessary (like a List.Resize() or like StringBuilder). Thus, when assigning new values to this string (assuming no resizes), we will override the data at this string address with virtually no cost, rather than dump it to the GC and perform a new allocation for each operation. Then, you can have methods like
    ScoreUpdate
    that can update the text (even using localization or culture info) with no side effects or performance hits. All temporary strings that are created as a result of copy operations or manipulations can be handled using custom heap allocators or stack allocators.
     
    SisusCo likes this.