Search Unity

Garbage Collection, Allocations, and Third Party Assets in the Asset Store

Discussion in 'General Discussion' started by Games-Foundry, Jun 21, 2012.

  1. TowerOfBricks

    TowerOfBricks

    Joined:
    Oct 20, 2007
    Posts:
    963
    @gamesfoundry

    Due to JIT compilation, may I suggest that you place your tests inside a small for loop to minimize effects of the JIT

    Code (csharp):
    1.  
    2. for (int i=0;i<10;i++) {
    3.     //Do Tests Here
    4.     //Log test result
    5. }
    6.  
    The first iteration usually takes around 2 to 10 times longer to run due to JIT compilation and may skew the results.
     
  2. TowerOfBricks

    TowerOfBricks

    Joined:
    Oct 20, 2007
    Posts:
    963
    Also, in your test I think it would be more appropriate if you stored the GolderRetriever as a Dog object instead of casting it every time. That would mimic how a Unity event system would do it (probably store things as MonoBehaviour). I don't think it would cause much of a difference in time... but that's what profiling is for, isn't it? to measure that difference.

    We must also remember that while the override based system may seem faster. Unity will need to call the override on every single MonoBehaviour, even if they don't need to know that the event has occurred. I would say that at least 90% of all calls would not be needed in such as situation. whereas an event system driven by delegates (System.Action for example) will use no CPU time at all if no script has registered to receive the event (well, except for an "is null?" check).

    Unity does seem to use a more sophisticated system than just overrides since when you write a function, for example OnTriggerEnter, it does not use the override keyword. Perhaps they are using reflection in some tricky way to increase performance... or are they using Invoke("OnTriggerEnter")?... I really hope they do not use the latter one.
    Can anyone enlighten me on that issue?

    Sorry for being a bit off topic.
     
    Last edited: Jun 28, 2012
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    7,190
    This is great for micro tweaking and caching but really misses the point of th 80:20 rule.

    80% of the time your code will be fine.

    20% of the time you will need to optimise your code, and basically you should only spend time where it is needed, after you have found slowdowns and used the profiler to locate the bottlenecks.
     
  4. tatoforever

    tatoforever

    Joined:
    Apr 16, 2009
    Posts:
    3,964
    Is the opposite that is true.
    20% of your time, your code will be fine.
    The remaining 80% of your time, your code will need to tweaked, optimized, adjusted.
    :rolleyes:
     
  5. TowerOfBricks

    TowerOfBricks

    Joined:
    Oct 20, 2007
    Posts:
    963
    True (yes, I'm also pessimistic :) ), but maybe not the optimizations we are talking about here (except perhaps pooling). Optimizations almost always need to be done, but when you need measure stuff very carefully with millisecond accuracy and start thinking about if you should use a member variable or a static variable for performance reasons... then you are far away from the usual optimizations you need to make.
     
  6. jasonkaler

    jasonkaler

    Joined:
    Feb 14, 2011
    Posts:
    241
    The difference will probably be less drastic if you add a few parameters to the event, as is often the case.
     
  7. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    No difference. In the original test ( prior to the code posted ) I passed the object in the event. The results were the same.
     
  8. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Thanks to MattCarr for posting up a performance graph for his Advanced Surface Shaders volumetric shader. Hopefully we can encourage other asset store publishers to follow suit and maybe one day it will become expected practice.
     
  9. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    7,190
    But wouldn't this be completely dependent on the performance of the platform the asset is running on! :???:

    Maybe that says more about your code than mine :rolleyes:
     
  10. fholm

    fholm

    Joined:
    Aug 20, 2011
    Posts:
    2,043
    So much miss-information and assumptions in this thread, anyone experienced with either the .NET or Mono runtimes will know these things, but I assume that not many people using Unity come from a .NET background. Anyway, in no specific order:

    • For all intents and purposes value types (structs, primitives) are never put on the heap, yes they will effect the size of a reference type (class) if they exists inside of it as a field. There are exceptions to this, but they're rare and have to do with boxing and interfaces.
    • The reason it's faster to recycle objects then to create new ones is not because of the constructor call, but rather that it lessens the pressure on the GC.
    • Local variables are faster then members on reference types because they are allocated on the stack instead of the heap.
    • The reason a "for ..." is faster then a "foreach ..." is because foreach uses the built-in IEnumerable interface in .NET/Mono which causes a couple of extra method calls. However, using "foreach ..." on an array should result in the compiler inserting a "for ..." loop instead (not 100% sure on if this is true on Mono).
    • Events are slower then overloading because the event system lies on-top of the Delegate/MulticastDelegate sub-system of .NET, and while the delegates themselves end up being a .virtcall instruction there is added ceremony around invoking a delegate which costs performance.
    • Obviously calling "new Vector3()" is more expensive then re-using already initialized memory, but that has nothing to do with the GC and is simple because it costs to invoke the constructor of the struct.
    • Passing reference types into methods instead of large structs in general do not effect performance as whatever tiny gain (which I doubt is even measurable) you would gain from allocating less stack space for the method call is offset by the cost of the GC when it has to clean up the reference object.
    • Even when talking about the MS CLR, while they state that value types can be stored on the heap, it's always in connection to a reference object. When you do "Vector3 v = new Vector3();" that value will always be on th stack.
    • The complaints on some of the Unity methods allocating memory are generally invalid, as most of the time it's because they return an array of objects, and the array needs to be allocated on the heap.
    • Replacing a List<T> with a LinkedList<T> and using it in the same way as you would a list will lead to more allocations.

    I think that's it, g'nite!
     
  11. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Yes, the timers would vary so is only really a sanity check. e.g. a process showing 10ms per frame would warrant further investigation, whereas 0.1ms may not, depending on what the function does. In the context of feeding the GC, the allocations for just a single object will be the more important metric. Some awareness of platforms and an ability to interpolate the test results to be meaningful for your own project will be required, but you'll probably have that if you are concerned about timings and allocation. Silly example: if you plan on using a 0.1ms per object function ( as tested on a desktop ) on a million objects per frame on iOS.

    Perhaps people have some thoughts on the best way for content authors to publish test reports with the greatest meaning and broadest application?
     
  12. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    I don't think it's so "rare" in Unity. What do you think happens to value-type local variables in coroutines?

    It's not the constructor call, it's the allocator call (i.e. locating the empty chunk of memory to use for the object). It's true that the performance benefit in general comes from reducing GC pressure - it eliminates the framerate spikes, which are a much bigger problem in general - but when doing a like-for-like comparison for object creation, the GC's not relevant. It's not involved in allocating objects, only in deallocating them.

    Accessing stack variables is not necessarily going to be faster than the heap (though it usually will be as it'll be in cache). For example, accessing a stack variable from 10 frames up after doing a lot of other unrelated work will probably not be faster than accessing a heap value that you just accessed a moment ago.

    Yes. Mono certainly does this some of the time, but not always. I've not yet been able to figure out when it does and when it doesn't.

    Er, given that structs are value types and thus passed by value, what about the time impact of creating the copy of the struct in the new stack frame? (It doesn't make a whole lot of sense to compare these scenarios as the semantics are different, to be fair).

    They're not invalid. A better solution would be if there were versions of those methods that let us pass in an array to populate. That way we could allocate a maximum-length array once at load time, and then reuse it repeatedly without any further allocations. This is particularly crucial for things that will return large arrays (such as particles - which is exactly why Shuriken's GetParticles method takes an array to populate, as compared to Legacy's particle accessor that allocates a whole new array on every get).
     
  13. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    26,071
    Btw take .net results with a grain of salt because mono is not .net, it's it's own port of .net and probably not as optimised...

    In answer to the question posed to me above, yeah we cached everything and got a 5fps boost, enough to reach 60 for the platform game. I think that makes it worthwhile. It's not just the garbage you worry about but the expense of repeated component calls.

    I'm happy unity caches transforms their end - because sometimes you can't know in advance, short of using dictionaries (which ends up with messy code).

    And since caching stuff is hardly difficult (a nice class, a nice function to stuff it with) why wouldn't you?
     
    Last edited: Jun 30, 2012
  14. tatoforever

    tatoforever

    Joined:
    Apr 16, 2009
    Posts:
    3,964
    In Windows the default framework for mono is MS .net. But still, you can use ms or mono .net in windows. XD
     
    Last edited: Jun 30, 2012
  15. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,200
    No. It doesn't work that way. Unity is all Mono, on all platforms, and makes no use of .NET. As Hippocoder said, Mono is not .NET (it's an open-source clean-room reimplementation of .NET), and while it's similar, it doesn't work the same way.

    --Eric
     
  16. tatoforever

    tatoforever

    Joined:
    Apr 16, 2009
    Posts:
    3,964
    Well i dint mean that, i mean that in windows you can use MonoDevelop* with whatever.net framework you have (mono or ms .net). Sorry for the confusion. :rolleyes:

    $20120630-lnug-69kb.jpg
     
    Last edited: Jun 30, 2012
  17. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,200
    Not that it makes any difference, since Unity still only uses Mono....

    --Eric
     
  18. recon

    recon

    Joined:
    Nov 28, 2009
    Posts:
    119
    I couldn't agree more.
    There are too many cases where the developer have no other choice because there are no function overloads that takes pre-allocated arrays:
    Physics.RaycastAll, Physics.OverlapSphere, Physics.CapsuleCastAll, Physics.SphereCastAll, GameObject.GetComponents
    is just some that I could think of.
     
  19. mindlube

    mindlube

    Joined:
    Oct 3, 2008
    Posts:
    993
  20. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Yup, it's covered in here somewhere.
     
  21. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
    I've compared the resulting IL code and the outputs are almost identical.

    There's no need to locate anything. The runtime knows exactly where the unfragmented piece of free memory begins. Allocations are almost free unless they awake the GC.

    ... is just impossible. You can access a stack variable only inside the function it has been allocated in.
     
    Last edited: Jul 4, 2012
  22. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Just discovered that Int32.ToString() causes heap allocation. Given we regularly update multiple UI labels over 20+ frames by lerping values, this is something I wanted to remove. Here's a little class that uses a lookup table.

    Code (csharp):
    1.  
    2. using UnityEngine;
    3.  
    4. public class StringExtensions
    5. {
    6.     private static string[] lookupInts;
    7.     private static bool mIsInitialized;
    8.    
    9.     public static void Initialize ()
    10.     {
    11.         lookupInts = new string[9999];
    12.        
    13.         for ( int i=0; i<9999; i++ )
    14.         {
    15.             lookupInts[i] = i.ToString ();
    16.         }
    17.        
    18.         mIsInitialized = true;
    19.     }
    20.    
    21.     public static string ToStringLookup ( int val )
    22.     {
    23.         if ( !mIsInitialized ) Initialize ();
    24.        
    25.         if ( val >= 0  val < 9999 )
    26.         {
    27.             return lookupInts[val];
    28.         }
    29.         else
    30.         {
    31.             return val.ToString ();
    32.         }
    33.     }
    34. }
    35.  
    36.  
     
    Last edited: Nov 16, 2012
  23. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
    Code (csharp):
    1. lookupInts = new string[9999];
    2. for (int i = 0; i < 9999; i++)
    3. {
    4.       lookupInts[i] = i.ToString();
    5. }
    You've just increased garbage collection time by about 100 extra microseconds.
     
  24. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    I presume you mean the call to ToString() which indeed causes allocation. We run Initialize() once at the start when we initialize all our Managers, after which we do a single forced gc call ( not shown ) to clean that and other launch garbage up before play begins. Sure, this causes allocation on setup, but the benefit is there is no per-frame allocation once play begins.
     
  25. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
    I mean, the benefit is obvious, no per-frame allocations, but also there is a downside - from now every garbage collection will take a bit longer, because there are 10000 more references the GC have to process. Every time you decide to cache something, you should check whether the benefit of caching outweighs the cost of blowing the heap.
     
  26. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    True, but given the current state of Mono 2.6 gc, even a single call to GC during gameplay has to be avoided like the plague because it's so slow. Were we to get the faster GC in Mono 2.8+, then it would be hoped we could actually allow concurrent GC and there would no longer be a need for such a workaround. Currently the benefit outweighs the potential downside ( in our case anyway ).

    It's great to debate these matters btw. Thanks for joining in the discussion.
     
    Last edited: Nov 17, 2012
  27. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
    By the way, how long does an average garbage collection actually take in your case?
     
  28. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Ignoring the CPU dependency, it's 177ms in the editor. The million pathfinding nodes in the A*Pathfinding grid graph we have might be a big factor in that ( large map requiring fine granularity, plus we can't use navmesh as we have yet to find a solution that will allow us to recalculate areas at runtime when a player places buildings ). To my knowledge, only grid graph allows us to raycast with physics to refresh the graph under a building.
     
    Last edited: Nov 17, 2012
  29. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    Can't you reuse the nodes between pathing queries, so they don't get GCed?
     
  30. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Aron Granberg kindly addressed that in A*Pathfinding Project after we requested it: paths do get recycled. What probably causes gc to be slow for us is the million node references that form the actual grid graph that would be checked each gc cycle. If only there was a way to mark data to inform the gc to skip it during reference checking.

    Actually, since last we spoke, it looks like Aron's added quite a bit, so I'll have a chat with with next time he's on Skype. http://www.arongranberg.com/astar/docs/graph-updates.php. That said, still looks like Navmesh updates are problematic.
     
    Last edited: Nov 17, 2012
  31. techmage

    techmage

    Joined:
    Oct 31, 2009
    Posts:
    2,085
    Ok I did some tests here.

    Is iterating through an array of ushorts faster than iterating through an array of ushorts?

    Code (csharp):
    1.  
    2.         ushort[] testArray = new ushort[65535];
    3.         var sw = System.Diagnostics.Stopwatch.StartNew();
    4.         for( ushort i = 0; i < testArray.Length; i++)
    5.         {
    6.             testArray[i] = 65535;
    7.         }  
    8.         for( ushort i = 0; i < testArray.Length; i++)
    9.         {
    10.             testArray[i] = 0;
    11.         }      
    12.         for( ushort i = 0; i < testArray.Length; i++)
    13.         {
    14.             testArray[i] = 65535;
    15.         }  
    16.         for( ushort i = 0; i < testArray.Length; i++)
    17.         {
    18.             testArray[i] = 0;
    19.         }          
    20.         for( ushort i = 0; i < testArray.Length; i++)
    21.         {
    22.             testArray[i] = 65535;
    23.         }  
    24.         for( ushort i = 0; i < testArray.Length; i++)
    25.         {
    26.             testArray[i] = 0;
    27.         }                  
    28.         sw.Stop();
    29.         print(sw.Elapsed + "ms");  
    30.  
    00:00:00.0026730
    00:00:00.0027073
    00:00:00.0028338

    Code (csharp):
    1.  
    2.         int[] testArray = new int[65535];
    3.         var sw = System.Diagnostics.Stopwatch.StartNew();
    4.         for( int i = 0; i < testArray.Length; i++)
    5.         {
    6.             testArray[i] = 65535;
    7.         }  
    8.         for( int i = 0; i < testArray.Length; i++)
    9.         {
    10.             testArray[i] = 0;
    11.         }      
    12.         for( int i = 0; i < testArray.Length; i++)
    13.         {
    14.             testArray[i] = 65535;
    15.         }  
    16.         for( int i = 0; i < testArray.Length; i++)
    17.         {
    18.             testArray[i] = 0;
    19.         }          
    20.         for( int i = 0; i < testArray.Length; i++)
    21.         {
    22.             testArray[i] = 65535;
    23.         }  
    24.         for( int i = 0; i < testArray.Length; i++)
    25.         {
    26.             testArray[i] = 0;
    27.         }                  
    28.         sw.Stop();
    29.         print(sw.Elapsed + "ms");  
    30.  
    00:00:00.0019804
    00:00:00.0019543
    00:00:00.0019990

    So it appears like iterating through a ushort array is 25% slower than iterating through an int array.

    That seems wrong? Does anyone know why that would be?

    Lets see whats it like declaring the index and limit outside the loop:

    Code (csharp):
    1.  
    2.         ushort[] testArray = new ushort[65535];
    3.         ushort o;
    4.         ushort limit = 65535;
    5.         var sw = System.Diagnostics.Stopwatch.StartNew();
    6.         for( o = 0; o < limit; o++)
    7.         {
    8.             testArray[o] = 65535;
    9.         }  
    10.         for( o = 0; o < limit; o++)
    11.         {
    12.             testArray[o] = 0;
    13.         }      
    14.         for( o = 0; o < limit; o++)
    15.         {
    16.             testArray[o] = 65535;
    17.         }  
    18.         for( o = 0; o < limit; o++)
    19.         {
    20.             testArray[o] = 0;
    21.         }          
    22.         for( o = 0; o < limit; o++)
    23.         {
    24.             testArray[o] = 65535;
    25.         }  
    26.         for( o = 0; o < limit; o++)
    27.         {
    28.             testArray[o] = 0;
    29.         }                  
    30.         sw.Stop();
    31.         print(sw.Elapsed + "ms");  
    32.  
    00:00:00.0027841
    00:00:00.0027710
    00:00:00.0028356

    and interestingly, when compared to the ushort iterator and limiter being declared in the for loop above, this doesn't show much speed improvement, as compared to the int.


    Code (csharp):
    1.  
    2.         int[] testArray = new int[65535];
    3.         int o;
    4.         int limit = 65535;
    5.         var sw = System.Diagnostics.Stopwatch.StartNew();
    6.         for( o = 0; o < limit; o++)
    7.         {
    8.             testArray[o] = 65535;
    9.         }  
    10.         for( o = 0; o < limit; o++)
    11.         {
    12.             testArray[o] = 0;
    13.         }      
    14.         for( o = 0; o < limit; o++)
    15.         {
    16.             testArray[o] = 65535;
    17.         }  
    18.         for( o = 0; o < limit; o++)
    19.         {
    20.             testArray[o] = 0;
    21.         }          
    22.         for( o = 0; o < limit; o++)
    23.         {
    24.             testArray[o] = 65535;
    25.         }  
    26.         for( o = 0; o < limit; o++)
    27.         {
    28.             testArray[o] = 0;
    29.         }                  
    30.         sw.Stop();
    31.         print(sw.Elapsed);     
    32.  
    00:00:00.0018625
    00:00:00.0018743
    00:00:00.0018905

    Iterating through ints is about 25% faster than iterating through ushorts.


    Is setting an int equal to a larger or smaller number make a difference?

    Code (csharp):
    1.  
    2.         int[] testArray = new int[500000];
    3.         int o;
    4.         int limit = testArray.Length;
    5.         var sw = System.Diagnostics.Stopwatch.StartNew();
    6.         for( o = 0; o < limit; o++)
    7.         {
    8.             testArray[o] = 0;
    9.         }  
    10.         sw.Stop();
    11.         print(sw.Elapsed); 
    12.  
    00:00:00.0024679
    00:00:00.0024435
    00:00:00.0023730

    Code (csharp):
    1.  
    2.         int[] testArray = new int[500000];
    3.         int o;
    4.         int limit = testArray.Length;
    5.         var sw = System.Diagnostics.Stopwatch.StartNew();
    6.         for( o = 0; o < limit; o++)
    7.         {
    8.             testArray[o] = 5000000;
    9.         }  
    10.         sw.Stop();
    11.         print(sw.Elapsed);         
    12.  
    00:00:00.0023987
    00:00:00.0025090
    00:00:00.0023672

    No difference


    Is doing a > comparison on an int or a true false check on a bool faster?

    Code (csharp):
    1.  
    2.         int[] testArray = new int[500000];
    3.         int o;
    4.         int limit = testArray.Length;
    5.         var sw = System.Diagnostics.Stopwatch.StartNew();
    6.         for( o = 0; o < limit; o++)
    7.         {
    8.             if(testArray[o] > 300)
    9.                 testArray[o] = 5000000;
    10.         }  
    11.         sw.Stop();
    12.         print(sw.Elapsed);     
    13.  
    00:00:00.0025280
    00:00:00.0025867
    00:00:00.0025122

    Code (csharp):
    1.  
    2.         int[] testArray = new int[500000];
    3.         bool[] testBoolArray = new bool[500000];
    4.         int o;
    5.         int limit = testArray.Length;
    6.         var sw = System.Diagnostics.Stopwatch.StartNew();
    7.         for( o = 0; o < limit; o++)
    8.         {
    9.             if(testBoolArray[o])
    10.                 testArray[o] = 5000000;
    11.         }  
    12.         sw.Stop();
    13.         print(sw.Elapsed);     
    14.  
    00:00:00.0024250
    00:00:00.0024105
    00:00:00.0024439

    Doing bool comparison is marginally faster.

    What about by comparison?

    Code (csharp):
    1.  
    2.         int[] testArray = new int[500000];
    3.         byte[] testByteArray = new byte[500000];
    4.         int o;
    5.         int limit = testArray.Length;
    6.         var sw = System.Diagnostics.Stopwatch.StartNew();
    7.         for( o = 0; o < limit; o++)
    8.         {
    9.             if(testByteArray[o] > 30)
    10.                 testArray[o] = 5000000;
    11.         }  
    12.         sw.Stop();
    13.         print(sw.Elapsed);     
    14.  
    00:00:00.0025063
    00:00:00.0025392
    00:00:00.0025081

    Byte comparison is somewhere between bool and int comparison.
     
    Last edited: Nov 19, 2012
  32. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
  33. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,178
    Generally (and I state very generally as I've only tested this in one use case for a specific situation), I've found that you want to quantize things as either 8-bit or 32-bit. I converted some code from looking up 32-bit values to 16-bit values and actually got a significant slowdown. Byte arrays gave me a large performance increase, though - I'm guessing from greatly reduced cache misses. This is on a 64-bit processor architecture (32 bit program, though).
     
  34. npsf3000

    npsf3000

    Joined:
    Sep 19, 2010
    Posts:
    3,832
    And every single one of your ushort tests are broken. Do not iterate with a ushort, but with an int otherwise you're mixing two separate tests together [and therefor will have garbage output]. Furthermore you're assigning int's to the ushort array - which could be impacting the test.
     
    Last edited: Nov 19, 2012
  35. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
    It should give you a slowdown on local and private arrays (OOPS: just on local arrays and arrays passed as a parameter), because it forces the jit-compiler to insert boundary checks.

    PS
    However, I'm not sure the Mono's jit-compiler is able to remove boundary checks like the Microsoft's one does.
     
    Last edited: Nov 19, 2012
  36. techmage

    techmage

    Joined:
    Oct 31, 2009
    Posts:
    2,085
    This is good to know... Should do some more tests on this.

    I was totally thinking I was optimizing things by using shorts where possible.
     
  37. techmage

    techmage

    Joined:
    Oct 31, 2009
    Posts:
    2,085
    Your right, I'll try that when I can.
     
  38. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    BUG REPORT #502423: Terrain.CullAllTerrains API call causes C# allocation

    Terrain.GetRenderer()
    Terrain.GetTerrainRendererDontCreate()
    Light.GetLights()


    This is the first post regarding allocations from within the Unity API following this topic. This post includes 3 instances within one API call, as highlighted on the attached deep profile image. We have 25 terrains in the scene so the allocations will probably be higher than testing with just a single terrain.

    $Allocation_Bug_Report_1.PNG
     
    Last edited: Nov 27, 2012
  39. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    BUG REPORT #502436: Component.GetComponent et al API calls cause C# allocation

    Component.GetComponent()
    Component.GetComponentInChildren()
    Component.GetComponentsInChildren()
    Component.GetComponents()


    This is obvious given the nature of the calls ( new component or array instantiation ), and in many cases caching can remove per frame use of these functions. However, where that is not possible, superpig and I are suggesting the following overloads/enhancements:

    Component.GetComponent ( myComponent )
    Component.GetComponentInChildren ( myComponent )
    Component.GetComponentsInChildren ( List<T> intoList )
    Component.GetComponents ( List<T> intoList )


    Developers can supply an existing component or array into the call to be populated. Where the results set is larger than the array size, the method returns the total number of results, and the full array. Where the results set is smaller than the array size, the method returns the size of the result set.

    FEATURE REQUEST
    Please vote for this feature request.
     
    Last edited: Nov 27, 2012
  40. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    Fwiw upon further consideration I think using List<T> might be better than arrays - it's backed by arrays internally, ensures you can never access "leftover" slots in the array, and eliminates problems with there being more components than you had room for. Capacity management is less obvious, but its easy to learn how to do that with a couple of example snippets.
     
  41. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    Let me make sure I understand correctly, let's say you pass a List<T>[10] in, and there's 12 results. The API would instantiate and add the extra 2, changing the list capacity to 12. While it's allocation it doesn't matter, because it's our recyclable List<T> that we always re-use, but with a larger capacity.

    What about the case where there are less results than the capacity of the list? While the count will reflect the result size, and the capacity left unchanged, won't the contents previously in the unused slots now marked as null be garbage collected?
     
    Last edited: Nov 27, 2012
  42. recon

    recon

    Joined:
    Nov 28, 2009
    Posts:
    119
    I'm at work so I can't submit a bug report right now, but I can confirm that

    Code (csharp):
    1. Physics.RaycastAll()
    2. Physics.OverlapSphere()
    3. Physics.SphereCastAll()
    4. Physics.CapsuleCastAll()
    all produces garbage, as they return arrays.
     
  43. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    Yeah. It's an allocation, but it's one-time, and we can avoid it by correctly sizing our list to 12 in the first place. If we really wanted to go all out on this, there should be an extra 'expandCapacity' flag that indicates whether the list should be expanded to 12 or whether Unity should just discard 2 of the results.

    Not in this case - remember, the unused slots contain component references. The components they reference will be alive as long as the GameObjects they're attached to are alive, and the references themselves are value types and so won't get GCed (think of them like C++ pointers). This is why calling GetComponent() to get a single component doesn't allocate anything.
     
  44. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,410
  45. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    SGen would definitely be better than the current GC (Boehm?) but it still won't be as good as simply not needing to run the GC at all.
     
  46. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    7,190
    OK What about a memory management system built into Unity?

    Gamasutra article about building one here

    If you have a bespoke build of Mono and the C++ Unity Game Engine code you could probably create a memory management system that is very game friendly!
     
  47. superpig

    superpig

    Quis aedificabit ipsos aedificatores? Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,227
    Similar to the unavoidable terrain wastage (even when you're not using terrains) I'm seeing ~7KB allocated every frame by SendMouseEvents, even though I'm not using any of the OnMouse* events anywhere in my codebase. A flag to turn it off would be most welcome, though if one already exists I'd appreciate someone pointing me at it.
     
  48. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    BUG REPORT #502648: Physics Cast API calls cause C# Allocation

    Physics.RaycastAll()
    Physics.OverlapSphere()
    Physics.SphereCastAll()
    Physics.CapsuleCastAll()


    Returning new arrays each call causes allocation. A solution similar to that proposed in bug #502436 would be perfect where we can pass in our own cached list.

    Physics.RaycastAll ( ..., List<RaycastHit> myExistingList )
    Physics.OverlapSphere ( ..., List<Collider> myExistingList )
    Physics.SphereCastAll ( ..., List<RaycastHit> myExistingList )
    Physics.CapsuleCastAll ( ..., List<RaycastHit> myExistingList )
     
  49. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    I'm going to use this post as a regularly updated list of possible things to investigate. If others confirm they are also seeing it, or I notice it a lot as I analyze our project, I'll raise a bug report.

    API:
    None outstanding, all submitted as bug reports.

    Mono:
    StartCoroutine()

    Workarounds
    Object.get_name() - added cachedName property
    Component.get_tag() - will add cachedTag property or replace any lookups with existing global enum taxonomy
     
    Last edited: Nov 28, 2012
  50. Games-Foundry

    Games-Foundry

    Joined:
    May 19, 2011
    Posts:
    632
    BUG REPORT #502675: SendMouseEvents.DoSendMouseEvents() causes C# Allocation

    The following calls within DoSendMouseEvents() cause allocation:

    Camera.get_targetTexture()
    Camera.get_allCameras()
    RaycastHit.get_rigidbody()
    GUILayer.HitTest()


    A request was included to provide a means to turn off mouse events.

    $Allocation_Bug_Report_2.PNG
     
    Last edited: Nov 28, 2012