Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Unity c# garbage collector

Discussion in 'General Discussion' started by felipechavespixida, Oct 1, 2021.

  1. felipechavespixida

    felipechavespixida

    Joined:
    May 31, 2021
    Posts:
    2
    If I'm on Update method (called many times) and I use:

    new Vector3(my_x,my_y, my_z);

    how c#/unity garbage collector deals with this?

    e.g. is it better to declare a Vector3 outside the method as a field update it than being creating new vector every time?
     
  2. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    It doesn't. Vector3 is a struct, which is a value type. It is stored on stack, and there's no garbage generated.

    Reference types generate garbage when you create them. And even in this case GC only needs to work when you RELEASE them. In case of "value types" new is a semantic quirk that does not mean an allocation.
     
  3. felipechavespixida

    felipechavespixida

    Joined:
    May 31, 2021
    Posts:
    2
    Good so no worries about creating new Vector3? it sounds good

    So the same for Color
     
    Tanner555 likes this.
  4. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    There are value types and reference types.
    Reference types are allocated on the heap. And cause allocation.
    Value types are stored on the stack. And do not cause allocations.

    However reference types can contain value types and value types can contain reference types.

    A "new" struct creates no allocation only if it doesn't create any value types in its constructor.
    So, if your struct stores a class or a string in its fields, it will cause allocation for those members when initialized.

    Color, Vector3 and many other types in unity are pure value types with no reference type members. So they cause no allocations.
     
  5. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    Note that if you're really into micro-optimization, you have to know that while 'new Vector3(my_x,my_y, my_z);' will not disturb the garbage collector, it will call the 'Vector3' constructor, so some code will be executed.

    When doing a lot of Vector3 operations, it is often (as in '99% of the time) better to do it 'per axis'.
    For example:
    v2 = v2 + v3
    It is quite commonly used and yet, this is faster:
    v2.x = v2.x + v3.x
    v2.y = v2.y + v3.y
    v2.z = v2.z + v3.z
    It is faster because the previous line will in fact call the 'Vector3' constructor, while these 3 lines will not.

    Of course, it has the disadvantages of being a lot less readable, and a lot more error-prone.
    So, only do it when it really matters.
    And as always with optimization: profile, profile, profile, etc..., profile again and again...
     
  6. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    This is really not a good advice.

    The number 1 priority of any code is its readability, and with modern computational power trying to avoid a function call is premature optimization in almost all cases.

    This kind of thing should ONLY be done if profilers points at it as bottleneck, and otherwise it should never be even considered.
     
  7. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389

    It is an excellent advice, if you bother to read the last 3 lines I wrote:
    Which is about the same as what you said.
     
  8. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    What is the performance difference between constructor and non constructor method ?
     
  9. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,438
    There is also stackalloc that allows to allocate reference types on the stack. It can be useful for small temporary arrays and things like that imo.
     
    Tanner555 likes this.
  10. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    The constructor is a function call(*). Same goes for the Vector3 operators (+, -, *, etc): they're all functions under the hood.

    Function calls are quite cheap, but not free since it must push a stack, jump, pop the stack, and jump back. If you're working on several thousands Vector3 in loops every frame, it can add up. If you have such functions show up high in the profiler, it's worth to test how they perform by changing the math to be per component.

    However, the cost of the function call depends on whether or not it will be inlined when converted into machine code. An inlined function has its body "copied" into the call site. This is done at the whims of the C# compiler (and C++ compiler for IL2CPP), but you can increase the odds of a function being inlined by using
    [MethodImpl(MethodImplOptions.AggressiveInlining)], which causes the inlining to happen in the C# IL if possible.

    (*) In the case of structs, only the constructors which take parameters: the default constructor cannot be overridden by the user and is not a function.
     
  11. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    It is not a good advice, because it is the very last thing you should ever consider and gains are miniscule. Even mentioning this to a newbie nudges them towards premature optimization.

    Also, disadvantages are severe. Do you know DRY? It goes against that. By manually inlining a call, you're copy-pasting it, and if there is a bug, you'll have to fix it in multiple places instead of just one.

    They're both methods, so it should be the same.

    Be aware that some languages can eliminate function call overhead and inline function calls into code. C++ does that. C#, however, doesn't.
     
    frosted likes this.
  12. Arhaam

    Arhaam

    Joined:
    Oct 4, 2021
    Posts:
    1
    Does unity have a garbage collector?

    I think Unity uses the Boehm–Demers–Weiser garbage collector, a stop-the-world garbage collector. Whenever Unity needs to perform garbage collection, it stops running your program code and only resumes normal execution when the garbage collector has finished all its work. If am wrong so please let me know!
     
  13. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    Yes. Some objects are exempt and are not collected, however.

    Yes.
    https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html
     
  14. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Not the C# compiler. but the JIT compiler.

    And you can tell the compiler to inline with the attribute:
    [MethodImpl(MethodImplOptions.AggressiveInlining)]

    I played around with it in the past, and tested the performance, with classes my static methods seem to have been generally inline, there was no difference, with structs, however, the same calls were slower, and only after I had added the attribute was it fast like the class version.
     
  15. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    It's not possible to tell without profiling.

    I used that in an explosion algorithm for a block-game, the explosion algorithm is exploring cells in a grid to compute the damages.
    Just by inlining the Vector3 operations it has reduced the total execution time of the algorithm by about 20%.
    Usually the gain isn't that high (it is less than 2-3% in most cases), but in that case there were so many Vector3 operations that it really helped.

    Of course, I profiled before doing that, and then I profiled again after.
    I did that for other algorithms, and when the gain in under 3-5% I've reverted because the gain in performance wasn't enough to justify the loss in maintainability.


    I think that's there's no golden technical rule when optimizing, you always need to see the real impact of an optimization.
    - you profile to find out what part is too slow
    - then you try an optimization and profile to see if it has an impact
    - if it has a negligible impact you revert and you try another optimization
    - if it decrease the execution time enough, then you keep it, commit, and decide if you need another optimization or not
     
  16. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    20% !!! I'm surprised it gave you so much performance boost.
    I was expecting something in range of 5%-10% max.
    Have you tried doing custom addition method instead of writing add by yourself ?

    Code (csharp):
    1.  
    2. Add(ref Vector3 destination, ref Vector3 addition)
    3.  
     
  17. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    The cost of releasing goes up with every object/managed allocation you have though, even if you only release one so it's not the only way to be thinking about this.
     
  18. HellGate94

    HellGate94

    Joined:
    Sep 21, 2017
    Posts:
    132

    i've raised this issue in Unity.Mathematics here https://github.com/Unity-Technologies/Unity.Mathematics/issues/194 and i seem to have seen that they will do something like this for the default Vector3 structs as well in the new versions (maybe just for il2cpp, can't seem to find the thread again)
     
  19. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    the "in" parameter should only use with readonly structs, otherwise the compiler creates defensive copies, and the performance advantages are lost. But burst and IL2CPP can get around that.
     
    Last edited: Oct 4, 2021
    HellGate94 likes this.
  20. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    Is there somewhere a list of different c# optimization techniques ?
    As the one you mentioned for example.
     
  21. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    And that's why I said it is a bad advice. Another person getting sidetracked into premature optimization.

    A good rule of the thumb to check the code path you're trying to optimize and consider how much your overall performance is going to improve if you reduced execution time of that code fragment to zero.

    You can make a code fragment 50 times faster and that will have zero effect on the final product, because it wasn't a bottleneck in the first place.
     
    unity-freestyle likes this.
  22. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    Yes but if i can make Vector3.Add instead of Vector3 + Vector3 than I might decide to use it.
    If it's not a readability problem I choose to use more performant option from the beginning.
     
  23. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    In my opinion, focusing on readability is more important, as people tend to VASTLY overestimate effects of performance and vastly underestimate computational power of modern PC. That's why I don't ever recommend manually inlining any computations.

    That's all.
     
    stain2319 likes this.
  24. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,124
    I feel that it's an exaggeration to call this "a lot less readable" and "a lot more error-prone".
     
  25. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    What I have is my own MathUtility class where I put my math helper methods.
    Why should I not use there the most performant method ?
     
  26. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    So if the argument is "clean code is better than obscure code" then DOTS has it's work cut out in many ways to convince a whole world's worth of programmers.

    Also... with some parts of the render pipelines, the code is very proper and also very slow. That's not great either.
     
    AcidArrow, Joe-Censored and NotaNaN like this.
  27. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    Yes. It'll probably fail to convince them, by the way.

    --

    Unity as framework initially focused on accessibility to people that didn't dive too deep into gamedev and programming. That resulted in a clean framework that is easy to use. One of the cleanest, actually.

    Dots is something I personally don't feel like touching is it looks like some sort of dialect of Fortran.
    -------
    Well, you won't be able to do that.

    The proposed idea was to inline funciton calls by hand. If you do so, you cannot have MathUtility class, because a method stored in a class is something you cannot inline by hand. You'd have to keep them all in mind and copy-paste.

    You can automate "inline by hand" approach in other languages, for example in C(without ++) and C++. They have macros. C# doesn't have macros in normal sense. Common Lisp also had amazing macro facility and there you could write a function that would generate functions.

    The other problem is that math operations in many cases are SMALL. They're lego bricks. You can't optimize a lego brick much. There are uncommon scenarios (SVD, for example), in those cases, sure, feel free to knock yourself out, although you might consider a C/C++ dll in this scenario.

    The main problem with this approach is that you decide "this is more performant" and then dedicate yourself to the practice. The problem here is that you are not trustworthy for estimating performance due to being human. Tools are trustworthy, profiling results are trustworthy, but you haven't done any profiling yet, but already jumped to conclusions. You're doing work that is most likely not needed, and will produce more work in the long term.

    It is like trying to optimize square root computation on a modern CPU. It can calculate insane number of them, but if you try to get rid of one square root somewhere in a line of sight calculation script, all you're actually doing is wasting time.

    The normal idea, in my opinion, is to make a clean prototype first, and only when you run into performance problems, profile and seek bottlenecks. Chances are you can save more milliseconds via algorithmic optimization anyway.
     
  28. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,008
    Well it is still going to be faster than my current non optimal solutions that are also not going to be inlined but are using Vector3 constructors.
     
  29. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    The point was that trying to optimize vector3 operations is a wrong idea in the first place. It is like trying to write faster integer addition. It makes sense when you're number crunching and trying to do billions of operations per second and identified it as a bottleneck.

    But most likely scenario that didn't happen. There was no bottleneck, people are concerned in advance, and that is a premature optimization.

    It doesn't matter if it is faster, if the need to make it faster didn't arise.
     
  30. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    9,904
    I usually agree with you, on almost every occasion, but on this one I cannot. With this attitude, Burst, for example, would have never happened. They aren't crunching billions of numbers, they are building a general purpose framework, and here we are.

    You're mixing up building fast utility library and specific and specialized optimization. These two are vastly different things. Both should happen, of course and they aren't mutually exclusive.

    Winging it, building whatever comes to mind so "you will optimize it later" is a bad habit, generally. It works for some cases but cripple others.

    Mind you, it doesn't mean one should build a fast library on a gut feeling, of course, but on tested and measured facts. Very good example is what Jackson Dunstan is doing/has done.
     
  31. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    It is not "you will optimize it later", but "you will not optimize it, ever, unless the need arises". In many cases the need will not arise.

    In the past I've got burned plenty of times doing something useless that I thought was important. Hence the reason for the attitude.
     
  32. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    I read a lot of 'inlining' here, but I do not think that's the main reason why Vector3 operations can have their performance increased when they are intensively used.
    I think that's the memory allocation. Even if it's not allocated on the stack, it's still allocated, and it probably reduce the cache efficiency.
    Code (CSharp):
    1. // Adds two vectors.
    2. [MethodImpl(MethodImplOptionsEx.AggressiveInlining)]
    3. public static Vector3 operator+(Vector3 a, Vector3 b) { return new Vector3(a.x + b.x, a.y + b.y, a.z + b.z); }
    The problem is not the inlining here, but the allocation of a new Vector3.
    If you write 'v1 += v2', then you are not modifying v1, but your are in fact creating a whole new Vector3 and put it in 'v1'.

    My guess is (but it's worth nothing, only a profiler could tell) that when you do a lot of Vector3 operations in a row, with nothing in between, using the Vector3 operators will mess the cache very badly and that could explain why in one situation I've seen a 20% improvement.



    You are totally right, when optimizing you only need to modify about 0.01% of the code, and it is not possible to know what part of the code it will be without profiling.


    That said, and because every rule has exceptions, when you're writing a framework it may be different.
    That's because you do not know how your functions will be used, so you do not know if they need to be optimized or not.
    If you take a look at the 'Vector3.SmoothDamp()' method, you'll see that Unity do not use their own Vector3 operators, they have chosen to copy/paste the operators' code in order to optimize.

    But when not writing a framework, I do not see any reason to optimize early.
     
  33. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Allocation on the stack is technical very simple: get the current stack pointer and increase it. That's very cheap. As far as I've read somewhere, CPUs are optimized for stack access.
     
  34. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    Vector3 is a struct and that makes it a value type.

    There's no allocation, as it doesn't contain any reference type members.
     
  35. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    There is an allocation, the value has to go somehwere.
    It's just that it's allocated on the stack, which is much faster.
    And I think that it doesn't need to be unallocated by the garbage collector, but don't take my word on that.

    The problem with a new allocation, even a very efficient one, is that it will be a different memory address, so the value will not be in the cache and will have to be placed into it, meaning that other values will not be in the cache anymore.
    That's the only way to explain the 20% improvement I have seen. It was doing a lot of Vector3 operations with nearly nothing else and I'm pretty sure that the inlining wouldn't have been so efficient, but having less cache miss may explain such an improvement.

    I don't know how .NET does it internally, but here is what I think for that line:
    v1 += v2;
    I think that a temporary variable is created (on the stack), v1+v2 is put in it, and then v1 is assigned with a copy of the content of the temporary variable.
     
  36. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    It doesn't work this way.

    An "allocation" is when you create a block on heap. In C# that creates work for GC. Which is why people avoid that.

    Variables created on stack are free. The stack is also always sitting in the same place and is constantly reused. I'd be very surprised if it DOESN'T end up cached.

    Lastly, a decent language with a decent compiler does not translate your instruction into machine code verbatim. An optimization is performed, and resulting code can be very different from yours. To see what's going on you'd have to open disassembly.

    Compilers performing optimization is one more reason not to make assumption about what is going to be faster.
     
  37. Gladyon

    Gladyon

    Joined:
    Sep 10, 2015
    Posts:
    389
    I doubt that it's free.
    To my knowledge, nothing is free, because there's real hardware behind the scene.
    There are micro-instructions (not assembly, the things executed directly by the micro-processor).
    It is probably not a lot, but it cannot be nothing at all.


    I am doing the opposite, I have witnessed something going faster, and I'm trying to find out why.
    And because I really doubt that just avoiding the calls is the reason for such an improvement I am supposing that it may be related to the cache.

    But you may have a lead here. Maybe that inlining the Vector3 operations triggered some optimization which wasn't done previously for some reasons.
    I have nearly no knowledge about compiler optimization, apart from something I read about the fact that readonly variable may be slower than standard variables in .NET, which is so strange that I haven't looked any further.
     
  38. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    On real hardware, stack allocation is subtracting a number from a CPU register.

    On real hardware, heap allocation is going through heap manager, finding an unused spot, and requesting more memory, if necessary, via an OS call. It is significantly more expensive.

    And compared to that, stack allocation is free. As the stack is already there, you only need to move stack pointer.
     
    unity-freestyle likes this.
  39. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Heap and Stack live on the same Hardware, somewhere in the RAM. The difference is, that the Stack is preallocated at the start of a thread, (generally 8Mb for 64bit programs).
     
  40. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,321
    This one's misleading.

    The difference is that stack fragmentation is not a thing and you cannot deallocate a stack value in the middle of it.
    Heap fragmentation is absolutely a thing. You can allocate a block for A, then a block for B, then free A, leaving a "hole".
     
  41. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    To what extent is misleading? I didn't say anything about fragmentation. The RAM hardware has no concept for heap / stack. There is no fragmentation inside the stack memory block, but the memory block is randomly allocated somewhere in the RAM. However, since the location of the stack does not change during the lifetime of the thread, fragmentation is not an big issue here.
     
  42. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    "Allocating" a local Vector3 variable is no more expensive than "allocating" three consecutive local float variables.
     
  43. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    The fact the location doesn't change is the reason it's so much faster than heap, even if it's using the same underlying hardware. The logic for stack allocation can be as simple as incrementing a number, and since the stack is usually contiguous the odds of things staying in cache are higher.
     
  44. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    That is what I have already described above :D