Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Is there a difference?

Discussion in 'Scripting' started by Ne0mega, Dec 2, 2023.

  1. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I have been doing extra lines of code, I think, for no reason, but I am not sure

    Example 1:
    What I do
    Code (csharp):
    1.  
    2. int arrayLength = terrainDimension.x * terrainDimension.z;
    3.  
    4. for(int i = 0; i < arrayLength; i++)
    5. {
    6. DoSomething(i);
    7. }
    8.  
    What i see in examples:

    Code (csharp):
    1.  
    2. for(int i = 0; i < terrainArray.Length; i++)
    3. {
    4. DoSomething(i);
    5. }
    6.  
    I think this is from some misunderstanding I had long ago about a problem I had, where i thought if you use terrainArray.Length, the iterator had to look up the length every time, and it was much faster to cache that value before the loop.

    Example2:
    What I do
    Code (csharp):
    1.  
    2. float foo;
    3. for(int i = 0; i < terrainArrayLength; i++)
    4. {
    5. foo = MathStuff(i);
    6. }
    7.  
    What I see

    Code (csharp):
    1.  
    2. for(int i = 0; i < terrainArrayLength; i++)
    3. {
    4. float foo = MathStuff(i);
    5. }
    6.  
    In this case, I do this if I want to reduce garbage collection, as I think a single value being overwritten requires four bytes of GC as required to four times arrayLength. I dont always do it, like during start up since I dont care about the tiny GC spike as the game loads, but during runtime, I do.

    I actually dont know if I am correct, but if I am, why ever declare a variable inside a loop?
     
  2. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    Yes, you can grab terrainArray.Length once and use it, and it might technically be faster. Not much. However, spending your time on micro-optimizations like this is wasteful. It's like picking all the little pebbles that were stuck in the treads of your car tires before you drive to work in the morning. Sure, technically, it makes your car weigh less and is therefore more efficient, but by such a tiny amount it makes no measurable difference.

    Measure the WORST PERFORMING part of your code (with the profiler) and focus ALL your attention on coming up with a better way to get that code to work faster. Always focus on the worst, until it's not the worst anymore. Then focus on the new worst. And so on. If you're not working on the worst part of your code, you're wasting your time.

    Also, local variables don't create garbage memory allocations that need later garbage collections. Don't worry about those at all. When they go out of scope, they're gone, and it does not take the CPU any extra time to dismiss them.
     
    Kurt-Dekker likes this.
  3. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,571
    I tend to write it like this:
    Code (CSharp):
    1. for(int i = 0, iend = terrainArray.Length; i < iend; i++)
    2. {
    3.     DoSomething(i);
    4. }
     
    CodeSmile, SisusCo and Ne0mega like this.
  4. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    When i generate a map, i iterate over a million values at least twenty times, maybe 50 - 100.

    During runtime, every few frames, one of the dozens of strategic AIs scan the entire million map points.

    The habit of caching array length.lvalue first costs me three seconds code time, it is more id like to reduce the extra line of code if i could, which QA jesus showed how above.

    Id guess the savings is somewhat relative to that minor three second extra line of code, but understand, even if it only saves 1/60th of a second each map generation, that i have to generate that map every iteration. That means every hour i save half a second, every day, that could be six to eight seconds.

    Now, add up all the other, so called pointless micro optimizations, and everything counts in large amounts, and i am saving maybe10 - 20 minutes a day in iteration time.

    So why not learn to always use the fastest by habit?
     
    Last edited: Dec 2, 2023
  5. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    It sounds like your worst-performing piece of code might be the need to scan an entire million point map. Even for a strategy game, you can use different approaches to cut that kind of workload significantly.
     
  6. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    Yes, until late stage game, where most AIs see most or all of the map(s), so for now, i keep it at worst case scenario from the beginning to understand the impacts.

    Anyways,

    Every time i ask an optimization question, i get a response here like yours. No matter how significant the optimization.

    So again, i ask to all who constantly repeat such so-called wisdom,

    Why not learn to always use the fastest by habit?
     
  7. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,282
    You really really REALLY don't want to work in a codebase that has had that happen to it.

    It's horrible.

    Optimize to a point, then no more. Don't do stupid stuff. Don't do stuff more than you need to. But after that, stop when the code is clear and concise.

    Remember: over-optimized code is always harder to work with because it either assumes more, checks less, combines and conflates more, or is otherwise unusually organized.

    The CPU is not the only consumer of code. YOU need to "consume" it again every time you contemplate changes.
     
    CodeSmile and halley like this.
  8. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I probably just lost all that iteration optimization saved for the next year, by reading and responding to this thread. :/
     
    Nad_B likes this.
  9. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    There ARE some small habits that will enhance performance. Avoid Find*() when you can. Avoid GetComponent*() when you can. Cache component references when you can't avoid searches.

    There ARE some bigger habits that will enhance performance greatly, but add extra complexity to your code. Avoid APIs that return a whole List<T> when all you need is to find the best T in the list. Use preallocated arrays instead of APIs that allocate new arrays on every call. Don't use SendMessage() when you can subscribe to an event.

    But from your earlier question about whether a local variable like
    foo
    is creating a garbage collection problem (it's definitely not, in either variation of code you showed), it seems like you're not very familiar with how C# code translates into actual machine instructions. Changing algorithms will ALWAYS have a bigger impact than changing little stuff like whether you dereference .Length more often. If you have a thousand units moving on a million-space map, it's likely better to process the map area around the thousand units than it is to reprocess every space on the map.

    And as Kurt pointed out, code that is overly "clever" at wringing every CPU tick will be harder to maintain and modify. Your best use of your time is writing new functionality that works "good enough" and getting that out to players.
     
  10. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I dont do any of that already, except GetComponent calls during generation, and that's because i find editor assignment of objects to be tedious and bug-inducing during changes.

    I get the impression, most of this anti-optimization current here is from people who design games that are heavy on editor work and light on procedural generation, and light (or average) on AI complexity.

    There are only a few people here who get that, and just answer the question, sometimes in precise detail.

    And both your answers, fall flat in this case. There is nothing more complex about either of the four code snippets.

    That's an assumption, as indicated by the word "likely", and i dont have the time to explain everything happening, but it is far more than arm vs core and metal spots on the map.
     
    Last edited: Dec 2, 2023
  11. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    I'm pretty sure the compiler will optimize this:
    Code (CSharp):
    1. for(int i = 0; i < terrainArray.Length; i++)
    2. {
    3.     DoSomething(i);
    4. }
    To this:
    Code (CSharp):
    1. int length = terrainArray.Length;
    2. for(int i = 0; i < length; i++)
    3. {
    4.     DoSomething(i);
    5. }
    So it's not even micro optimization, but redundancy.
     
    CodeSmile likes this.
  12. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I kind of thought it might anyway, as that number will be constant, for the life of the loop.

    It's not a huge deal either way, except everytime i do it, i think, "is there a difference?"... Hence the thread.
     
    Nad_B likes this.
  13. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    521
    I agree with the general idea that you shouldn't worry about micro optimization in normal scenarios over just using standard practices. However, I never practice what I preach there.

    I have a BenchmarkDotNet project full of micro benchmarks testing exactly this kind of stuff. I can share some results when I'm at my computer. However, one important lesson that I've learned is that the results can potentially be affected by every factor of the environment. It matters if I'm targeting .Net 6 or .Net 7. It matters that Unity doesn't exactly match either of those with it's runtime. It matters whether you're compiling for Mono or IL2CPP in Unity. The list goes on and on, so it's hard to ever get a perfectly definitive answer.

    That being said, I did want to share my general thoughts on what benchmarking different loop expressions has told me.

    My view on the matter is that hoisting the array Length doesn't matter in a for loop. However, hoisting the Count property of a List does improve performance. You can cache the Count either outside of the loop, or within the loop declaration, it doesn't matter. However, that performance improvement is negligible in 99% of circumstances, and if it becomes relevant, it is important to assess whether that loop is definitely the best way to be handling something. (I still usually hoist the Count of a List into a local count variable in the for loop declaration just out of habit though, just because.)

    Also, if you're just iterating, and not changing the contents, a foreach loop over an array has the best performance. I believe it's a compiler optimization. So, it's not really running a foreach loop the way you expect for other enumerables. But, you should never use a foreach loop over a List. Sticking to only using for loops over both will give the best overall performance if you can't remember which loop to use on which collection.

    Also, you didn't ask, but there may be a teeny, tiny negligible difference in using a post-increment or pre-increment operator in a for loop. I thought a pre-increment operator might perform better because it usually does in most scenarios. Using ++i is usually faster than i++ in general code because post increment needs to make a temporary variable. (I'm no IL or compiler expert.) But, it seems like all of my tests tend to run i++ for loops like a nanosecond faster. It could be my imagination, or noise in the results, but I know I've seen info suggesting the compiler has special recognition of a signature like the standard for(int i = 0; i < someVar; i++). So, then it's able to provide a compiler optimization that gives the best results. I've seen other examples I can't quite remember off the top of my head deep in .Net code and GitHub issues where it really does matter the way you write something so the compiler can recognize the syntax and provide an optimization. Which brings up another good point. Sometimes attempting to optimize because you think it should run faster that way, can actually make it run slower.
     
    Last edited: Dec 2, 2023
    Ne0mega and Nad_B like this.
  14. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    Yes, the runtime matters a lot, and here with Unity, we're still stuck with Mono. To take your example for a foreach loop on a List, in .NET 8, they implemented some optimizations that made it 3x-4x times faster than .NET 7.0 (and I wonder how many x Unity's Mono...) which brings it on par with Arrays.

    Here's a good article about loops optimizations in .NET, and some of them are totally unexpected.
     
    CodeRonnie, Ne0mega and CodeSmile like this.
  15. zulo3d

    zulo3d

    Joined:
    Feb 18, 2023
    Posts:
    758
    When it comes to optimizing, rather than ask on a forum whether something is beneficial you should just test to see for yourself what's beneficial. I wouldn't even trust the guys that wrote the compiler with their advice. Compilers are complicated.

    When iterating over something a billion times even the smallest micro optimization can make a big difference. So it's definitely worth testing all the different possibilities.
     
    Bunny83 and CodeRonnie like this.
  16. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    Thanks coderonnie.

    Another reason i asked, every time i fear declaring variables outside the loop, or using arrayLength, could add a millionth of a second, because compilers do tricky things.

    In that case, my philosophy would be, dont do it, and use array.Length, even though the effect is negligible, why even add the extra millionth of a second? Learn the fastest, and use the fastest. That's my philosophy. How much it saves is irrelevant to me.

    But, its like hyppocratic oath, "first, do no harm"
     
    Nad_B likes this.
  17. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    Well if this is how you write code, then don't ask anybody. Just write your code, as long as it works and you can understand it.
     
    halley likes this.
  18. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738

    Yes, but, I asked because setting all that up would take massive time, effort, and possibly be full of mistakes i dont know i am making while setting up the tests... So my best option is to defer to people who have done the tests, knowing they know more than me on the subject, for sure.
     
  19. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    521
    Yep. The only test that really matters is profiling a Release build of your app.
     
  20. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,840
    Well, unfortunately you're not correct :) First of ALL local variables inside a method never allocate memory on the heap. They always live on the stack. Even reference type variables. Of course if you create objects which are stored in a reference type variable, that object lives on the heap and would allocate garbage, but the variable itself would not.

    Now primitive types or struct types directly live in the variable itself. So declaring or assigning a value to a primitive type variable would never generate garbage.

    Well, there could be many reasons. Though in most cases it would not make much of a difference. However in some cases it actually does. Funny enough the main reason why you would have to declare the variable inside the loop has to do with closures and objects that generate garbage ^^. Though a much more plausible and straight forward reason is: keep your scope clean. If you have a temporary variable that is only needed inside that loop, it should be declared inside that loop. That way the same name could be reused later in the same method. A good example is the loop variable itself. It's actually declared inside the scope of the loop and only exists in that loop. When the loop has finished the name (and memory) would be free again to be reused.

    What most people don't realise is that the memory for local variables inside a method is reserved on the stack the moment the method is called. That memory is "released" once the method finishes. All the memory for all local variables is allocated at the method entry and that allocation is actually free since it is just a shift of the stack pointer. So the method knows how much stack memory it needs.

    The variable inside the loop only needs the memory once because the same bit of memory is reused the next iteration. No allocations will actually happen. Note that the compiled code doesn't even remember the names of local variables. Local variables are just offsets / indices into the stack. The variable inside a loop always uses the same reserved memory slot.

    There's an exception which I mentioned above and that is when you create a closure inside a loop. A closure can "capture" a variable. Technically the compiler works different in that case because instead of declaring a local variable, the compiler generates an actual closure object on the heap which will contain the variable. The point of a closure is that the variable can be used anonymously even outside the method where it was declared in. Though this is a very special case.
     
    CodeRonnie, Ne0mega and halley like this.
  21. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I need to know which is faster to j"ust write my code", hence the question.
     
  22. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    You don't seem to be doing any deferring at all, and you're saying "aw, these people who reply must not know anything anyway." As Nad says, if you want to code that way, then code that way.
     
  23. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    521
    Yeah, even if everything I said is totally accurate, which nobody should just trust without evidence, it could all be totally obsolete if someone is reading this from years in the future and Unity has finished Core CLR support, and everything just compiles differently at that point.
     
    Nad_B likes this.
  24. zulo3d

    zulo3d

    Joined:
    Feb 18, 2023
    Posts:
    758
    The ultimate way to optimize something is to not do something.

    So just remove the loop!:


    Code (CSharp):
    1.          DoSomething(0);
    2.          DoSomething(1);
    3.          DoSomething(2);
    4.          DoSomething(3);
    5.          DoSomething(4);
    6.          DoSomething(5);
    7.          DoSomething(6);
    8.          DoSomething(7);
    9.          DoSome...
     
    Nad_B likes this.
  25. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    Nope. That's 8x call overhead. Better just cut and paste the contents of that function out in the open eight times.
     
    Nad_B likes this.
  26. zulo3d

    zulo3d

    Joined:
    Feb 18, 2023
    Posts:
    758
    Now we just need more memory!!
     
    Nad_B likes this.
  27. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    Your code is good, there's no problem at all in declaring a variable that holds your Array length and use it in a loop. In fact, I just did it yesterday, because I needed the List length (Count) for some calculations before the loop itself, and I find it cleaner and easier to read to use a variable than several list.Count in the same method:

    Code (CSharp):
    1. var stockPositionsCount = stockPositions.Count;
    2.  
    3. if (stockPositionsCount == 0)
    4.     return;
    5.  
    6. if (stockSize > totalCapacity)
    7.     stockSize = totalCapacity;
    8.  
    9. var stockCount = Math.Floor(stockSize / totalCapacity * stockPositionsCount);
    10.  
    11. // few lines later:
    12.  
    13. for (var i = 0; i < stockPositionsCount; i++)
    14. {
    15.     // loop
    16. }
    As for you second example, it makes no sense. Floats are value types, it means each time you assign a value to a float variable, it creates a new one. So declaring it inside or outside the loop has exactly the same effect. The good news is that value types are allocated on the stack*, which means they're not "garbage" and do not need to be garbage collected.

    * There are some exceptions though (a Struct that contains a reference type, class level variables i.e. fields...)
     
    Last edited: Dec 2, 2023
  28. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    You know what, I'll write my next game entirely in one method, using gotos, ASM style!. This is the real ultimate optimization and my game will run +5% faster!
     
  29. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I am deferring to the articles posted, and to the people who have spent time testing it.

    In fact i just learned about hoisting and cloning, and that using an int 32, a less than, and incrementing only by one is indeed optimal, if the compiler will clone.

    There was a bunch more good stuff in that small article, which answered many questions on the whole subject I forgot I had sometimes.

    I also just learned that if you do math inside a loop that can be done outside of it, the compiler will correct it. So no more worrying about that, although honestly, i will still do it manually anyway, as it has become habit.

    This is called hoisting, which incidentally, is what the first question was about. The answer is usually compiler will hoist for you.

    One of the times it will not is if you use a computation in the loop declaration, (like i = 0; i less than length - 1; i++), or surprisingly, if you use a compound statement in the loop, ie a *= b instead of a = a *b.

    It was great.
     
    Last edited: Dec 3, 2023
    Nad_B likes this.
  30. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    Yes the article is great, but always remember that, in 99.99% of the cases, you don't need those tiny-uber-micro-optimizations, especially in Unity, since the bottlenecks usually come from the GC, number of game objects and their scripts on a scene, and the GPU rendering. You can always try to save few microseconds here and there, but what for when your rendering loop takes 15 milliseconds and you create MBs of garbage per frame that take 10 milliseconds to collect...
     
    Last edited: Dec 2, 2023
    CodeRonnie likes this.
  31. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    I killed GC issues long ago. I cant even remember which best practices i adopted, except the one i asked about with variables declared inside loops, which has always seemed like a cloudy afterthought of my own logic.

    But overall, my main bottleneck right now is my AIs' sight and memory cycles, which is cpu, as they are scanning for a lot... Its pretty fast right now

    As for shader/GPU optimizations, that is also something I tend to be fairly confident about.

    However, perhaps i will wait until next week on lerpNode vs (multiply*mask + multiply*oneMinusMask) for color blending in shadergraph.

    The lerpNode takes less space on the graph, so i just started to use it, but i read lerps were kind of heavy, compared to simple adds and multiplies.

    And again, just like everything else, if one is only slightly faster than the other, i use the faster by default. Regardless of tininess of benefit. I just dont see a reason to ever use the slower option if you know the faster method.
     
  32. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    For AI, the general rule is you don't need to run the logic each single frame. 2-5 times a second is more than enough for most cases, and it's a natural behaviour that aligns with reaction times (a real soldier won't react instantly if they see an enemy but after 200ms-few seconds right?).

    Alternatively, you can create a "Scheduler" that runs all your agents AI logic, measure the total time spent, and if it crosses a threshold (for eg 2ms) it yields the execution to the next frame then continue processing the remaining AIs. Like this you can be sure that your AI processing never exceeds an allocated time budget. This is what I'm using in my current game, works great and I never worry about AI "eating" more than it should, even with a lot of agents. Couple this with an AI "LOD" (invisible/far agents have very simple dumbed down AI) and you can have thousands (tens of thousands if you use Unity's Job system) of AI agents with no problem
     
    Last edited: Dec 2, 2023
    CodeRonnie likes this.
  33. spiney199

    spiney199

    Joined:
    Feb 11, 2021
    Posts:
    7,258
    Future you will be thankful when past you has written code that's clean and maintainable. Very often, overly-optimised code is neither of those, and your development velocity will slow to a crawl every time you have to figure out what kind of crack past you was on when you wrote said code.

    Optimise where needed, otherwise keep your code clean. And measure before you go optimising.
     
    CodeRonnie and Nad_B like this.
  34. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    2,276
    Not sure what you read, but a lerp() is typically just one subtraction, one multiplication, and one addition; some implementations have two multiplications.
     
    Nad_B and Ne0mega like this.
  35. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    Well it depends, as everything is relative. Take this as an example:
    Code (CSharp):
    1. private Vector2 GetVector(VectorType type)
    2. {
    3. if (type == VectorType.A)
    4.     return new Vector2(0, 5);
    5.  
    6. if (type == VectorType.A)
    7.     return new Vector2(2, 5);
    8.  
    9. if (type == VectorType.B)
    10.     return new Vector2(4, 3);
    11.  
    12. if (type == VectorType.C)
    13.     return new Vector2(1, 9);
    14.  
    15. if (type == VectorType.D)
    16.     return new Vector2(6, 0);
    17.  
    18. if (type == VectorType.E)
    19.     return new Vector2(3, 1);
    20.  
    21. if (type == VectorType.F)
    22.     return new Vector2(4, 2);
    23.  
    24. if (type == VectorType.G)
    25.     return new Vector2(1, -5);
    26.  
    27. if (type == VectorType.H)
    28.     return new Vector2(4, 2);
    29.  
    30. if (type == VectorType.I)
    31.     return new Vector2(3, -2);
    32.  
    33. if (type == VectorType.J)
    34.     return new Vector2(0, 2);
    35.  
    36. if (type == VectorType.K)
    37.     return new Vector2(4, 1);
    38. }
    Compared to this:

    Code (CSharp):
    1. private static readonly Dictionary<VectorType, Vector2> _vectorsDictionary = new()
    2. {
    3.     [VectorType.A] = Vector2(0, 5)
    4.     // ... etc, I'd populate this automatically from an SO or via CustomAttribute/Reflection
    5. }
    6.  
    7. private Vector2 GetVector(VectorType type)
    8. {
    9.     return _vectorsDictionary[type];
    10. }
    Is the first method "faster"? absolutely yes. Is it worth it? absolutely no. I won't sacrifice readability, debuggability (is this even a word?) and ease of modification/extension over few micro (nano?) seconds.

    So no, better performance is not always the best option, especially if you sacrifice code quality for a pratically non-existent performance boost.
     
    CodeRonnie likes this.
  36. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738

    That is what I am doing. They dont run every frame. One of the many problems is there are enemies and allies, and potential enemies and allies. I cant have my AIs assume peace means they can let their guard down.

    Also, the AIs have a kind of memory system that lets them remember last seen threats, and still include those in their threat calculations based on position, power, and time last seen.

    And this isnt even getting into roads, buildings, resources, closed borders, danger zones, optimal defensive paths, optimal stealth paths, etc, etc, etc

    My.sight is data driven, as collider driven turned out to be unfeasible, due to the amount of "things" this AI has to track and calculate.

    Ive added some things that make sight handle better, such as static sight zones, where a unit does not run its own sight if it is deep enough inside a static sight zone, and have been contemplating how to have a unit tell another AI it is in its sight, instead of AIs looking, but for now, can not figure a way better than current system.

    And i was doing yet another jobs tutorial, when this question popped up again... ..and this is so much easier than coding... Bleh...
     
    Last edited: Dec 2, 2023
  37. tleylan

    tleylan

    Joined:
    Jun 17, 2020
    Posts:
    606
    Wow what a thread :( It seems to be code-optimization opinion threads that draw everyone out every time. One rule I follow (or tend to follow after I get it working) is to "hide the details" so to begin with (in your example) GetVector would be a method in a class. The first iteration could be (realistically it wouldn't but...) the if type version you posted. I would notice it was silly and would incorporate the dictionary (or some other faster, cleaner solution) but it makes no difference to the consumer of the method calling GetVector.

    You have two VectorType.A in your first example BTW.:rolleyes:

    I believe personally I would use integers and return an element from an array of Vector2 objects.
     
  38. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    594
    It was just an example to illustrate what I was trying to say, written with Unity's forum "Insert Code" function.
     
  39. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    521
    I wanted to demonstrate some fresh benchmark results, and my total ridiculousness, so here we go. One of the main motivations for why I reran all of this in such detail is because this thread got me curious to check my work, and I noticed something. There was a significant difference in my test results when I targeted .NET 6 vs .NET 7. So, I wanted to retest everything, thoroughly enough to present for public scrutiny. I ran everything through BenchmarkDotNet version 0.13.10, which runs each method millions of times, and prevents typical mistakes with performance benchmarking. "Under the hood, it performs a lot of magic that guarantees reliable and precise results thanks to the perfolizer statistical engine."

    Every benchmark presented here performs identical work within the loop, to control for any differences in the results. They all operate on collections of ints, so that each loop is iterating over a contiguous collection of value types, and not pulling in fragmented heap references. The operation compares each value in the collection to see if it is higher than the result, and if so it assigns it to the result. It's important to return a result and ensure that the compiler won't just optimize the entire test away, invalidating the results. So, I assign the value of the index to each element, then randomize the order of those values. I don't actually think the compiler would know to just skip the loop and assign the last value of the collection to the result, but just to make 100% sure that doesn't happen I put the values in "random" order. I use the same seed value for Random, so that every collection contains the exact same values in the exact same order. Each loop iterates through the entire collection. This seems to me like a bare bones version of what most loops might be doing, accesing each value for a conditional, then maybe doing something with it.

    There are a lot of different things you can tweak about a loop expression, and the permutations of all possible differences combined into all of the various resulting expressions gets out of hand fast. Especially since it takes a long time to run each benchmark. So, I start with what I consider the most standard expression of each type of loop. Then, I test each tweak to that expression individually to determine any differences in performance, on a large enough collection to make the differences stand out.

    I ran the basic for loop over an array test on various array lengths to see where the curvature of performance really starts to change between small and large collections. Some of the differences between small collection sizes are in nanoseconds, and it's important to understand the tradeoffs between readability and performance. Everything below 8192 all blends together at the bottom of the curve. If you're only making a difference in nanoseconds or microseconds, it may be entirely unnecessary to sacrifice readability or normalcy. From there, it starts to curve exponentially upwards as I double the length up into the millions. I ended up selecting 32768 because that seemed to be giving me the most consistent results that displayed well on box plots when running various tests multiple times. I ran each test at least twice to compare the box plots, even though each test runs each benchmark millions of times, just to make sure I was getting really consistent results at the end.

    While I'm on the subject of large collections, I think it's also relevant to point out that objects over 85kB in size end up on the Large Object Heap where garbage collection is concerned. So, it's usually a good idea to keep them around permanently and prevent them from being garbage collected, if you really think it's even necessary to use them. However, that's for the typical .NET garbage collector. Unity uses their own incremental garbage collector, so I'm not sure of the exact relevance. In any case, be aware of the memory footprint of your collections. If you have an array of millions of items, that means that one object accounts for megabytes of memory on its own, something to be aware of.

    Oh, and none of these tests allocate any new memory for garbage collection, but I have the memory diagnoser on just to confirm that.

    I tested various ways of writing for loops over arrays, Lists, Spans and ReadOnlySpans. Foreach loops can only really be written one way, so there's not really an ideal version to test for. I also tested various types of while loops just to throw them in for comparison as well. I will probably skip posting some of the box plot graphs for finding the best versions of each type of loop over each collection, just because it's a lot. However, I've included the source code and the results in text form in the linked code. If you want to see a specific box plot I can show you the graph that goes with it if you're really interested.

    After finding the best version of each type of loop for each type of collection, I run each side-by-side to compare them all to each other. I also added versions that iterate spans over lists created with CollectionsMarshal.AsSpan<T>(List<T>).

    Then, there is one more very important thing to test. I change the project from targeting .NET 6 to .NET 7 and test everything all over again. This demonstrates the differences in the results depending on the environment, which is another important lesson to learn from this. None of these results were taken from Unity. Remember that these results are not directly indicative of how things will run in your Unity app, depending on your version of Unity, different hardware platforms, compiled for Mono or IL2CPP, etc. The only test that matters in the end is testing the actual performance of your application.

    All of the code used to get these benchmark results is included here:
    https://gist.github.com/moonymachine/cedbc41b10d5d085c46a705b7ab90d38

    For loops over arrays are pretty much all equal, except that we should not hoist the Array.Length by creating an external variable outside of the loop.
    Code (CSharp):
    1.         [Benchmark]
    2.         public int B()
    3.         {
    4.             int result = 0;
    5.  
    6.             int length = Array.Length;
    7.             for(int i = 0; i < length; i++)
    8.             {
    9.                 if(Array[i] > result)
    10.                     result = Array[i];
    11.             }
    12.  
    13.             return result;
    14.         }
    .NET 6
    Benchmarks.ForLoopArray-boxplot.png
    .NET 7
    Benchmarks.ForLoopArray-boxplot.png
    For loops over Lists are very interesting. This is actually what got me to re-test everything, when I noticed this initially. In .NET 6, you should hoist the Count variable, either with an external variable, or within the loop declaration. In .NET 7, you should not! Completely opposite results.
    Code (CSharp):
    1.         [Benchmark]
    2.         public int B()
    3.         {
    4.             int result = 0;
    5.  
    6.             int count = List.Count;
    7.             for(int i = 0; i < count; i++)
    8.             {
    9.                 if(List[i] > result)
    10.                     result = List[i];
    11.             }
    12.  
    13.             return result;
    14.         }
    15.  
    16.         [Benchmark]
    17.         public int C()
    18.         {
    19.             int result = 0;
    20.  
    21.             for(int i = 0, count = List.Count; i < count; i++)
    22.             {
    23.                 if(List[i] > result)
    24.                     result = List[i];
    25.             }
    26.  
    27.             return result;
    28.         }
    .NET 6
    Benchmarks.ForLoopList-boxplot.png
    .NET 7
    Benchmarks.ForLoopList-boxplot.png
    For loops and while loops over Spans and ReadOnlySpans performed relatively equally no matter how they are expressed. While loops over arrays also all performed relatively equal. So, I will skip displaying their graphs. I use the standard expression format for those in the later comparison test.
     
    Nad_B, Ne0mega and zulo3d like this.
  40. CodeRonnie

    CodeRonnie

    Joined:
    Oct 2, 2015
    Posts:
    521
    While loops over Lists display the same peculiar characteristic that we saw in the for loops. Hoisting the count is good in .NET 6, but bad in .NET 7.
    Code (CSharp):
    1.         [Benchmark]
    2.         public int D()
    3.         {
    4.             int result = 0;
    5.  
    6.             int i = 0;
    7.             int count = List.Count;
    8.             while(i < count)
    9.             {
    10.                 if(List[i] > result)
    11.                     result = List[i];
    12.                 i++;
    13.             }
    14.  
    15.             return result;
    16.         }
    .NET 6
    Benchmarks.WhileList-boxplot.png
    .NET 7
    Benchmarks.WhileList-boxplot.png
    So, having determined the best form of expressing every type of loop over every type of collection, let's see them all side-by-side. I removed the ReadOnlySpan versions because they were essentially the same in performance as the Span versions, so I took them out so the results would be less cluttered.

    .NET 6, List.Count is hoisted
    Benchmarks.LoopTest-boxplot.png
    EL = foreach over a List
    FL = for loop over a List with Count property hoisted
    WL = while loop over a List with Count hoisted

    .NET 7, List for loop is written in standard form
    Benchmarks.LoopTest-boxplot.png
    EL = foreach over a List

    So, what is the conclusion here? Every loop, for, foreach, while, over an array, or a List, or a Span over either of those, performs about the same. Except, in older versions you should hoist the Count of a List, but not the Length of an array if doing it with an external variable, and you should never do a foreach loop over a List. In newer versions you should not hoist the Count of a List or the Length of an array. Also, for some reason a foreach loop over a List in newer code becomes the fastest way to iterate, instead of the slowest.

    So, is there a rule of thumb? Yes, just write all of your loops in the standard form.

    If you have a really hot path you can try avoiding foreach loops over Lists, and hoisting the Count in a for loop (old compiler), or using a foreach over a List (new compiler). But, you'd better be 100% sure you know which one you're getting, or you'll actually get the worst performance instead of the best performance.

    I believe Unity, at least in 2021.3, will probably behave more like the older .NET 6 versions, but when Unity eventually upgrades their .NET support it will probably all work like the newer .NET 7 results.

    There, matter resolved.
     
    Last edited: Dec 8, 2023
  41. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    738
    You're awesome dude.
     
    CodeRonnie likes this.