Search Unity

List vs Array - finding out which is faster when being iterated using foreach/enumerator

Discussion in 'Scripting' started by Maccyfin, Jun 29, 2020.

  1. Maccyfin

    Maccyfin

    Joined:
    Aug 19, 2012
    Posts:
    87
    Hey all.
    So continuing on from my last speed test (for loops vs foreach loops here: https://forum.unity.com/threads/for...in-unity-which-is-faster.915644/#post-6012926), I decided to test out the speed of arrays vs lists. I already knew that arrays were faster than lists, I believe because arrays are sequentially lined up in memory where as lists are scattered about; my knowledge of memory usage in code isn't amazing.

    What I wanted to find out though was if converting a list to an array then looping over the array, was faster than looping over a list. And it was.
    Here's the results:


    SPEED

    As you can see from the chart above. The speed from fastest to slowest goes:

    1) Array
    2) List converted to Array
    3) List


    LESSON LEARNT

    When implementing a List, try converting it to an array first and see if you get better performance.


    THE CODE

    Code (CSharp):
    1.     private const int ELEMENT_COUNT = 1000;
    2.  
    3.  
    4.     public float ArrayForEachTest()
    5.     {
    6.         //create the array
    7.         int[] array = new int[ELEMENT_COUNT];
    8.         for (int i = 0; i < array.Length; i++)
    9.         {
    10.             array[i] = i;
    11.         }
    12.  
    13.         Stopwatch stopwatch = new Stopwatch();
    14.         stopwatch.Start();
    15.  
    16.         foreach (int iNumber in array)
    17.         {
    18.             //just a basic operation
    19.             int iNumberSquard = iNumber * iNumber;
    20.         }
    21.  
    22.         stopwatch.Stop();
    23.         //return stopwatch.ElapsedMilliseconds;
    24.         return (float)stopwatch.Elapsed.TotalMilliseconds;
    25.     }
    26.  
    27.     public float ListForeachTest()
    28.     {
    29.         //create the list
    30.         List<int> list = new List<int>();
    31.         for(int i = 0; i < ELEMENT_COUNT; i++)
    32.         {
    33.             list.Add(i);
    34.         }
    35.  
    36.         Stopwatch stopwatch = new Stopwatch();
    37.         stopwatch.Start();
    38.  
    39.         foreach (int iNumber in list)
    40.         {
    41.             //just a basic operation
    42.             int iNumberSquard = iNumber * iNumber;
    43.         }
    44.  
    45.         stopwatch.Stop();
    46.         return (float)stopwatch.Elapsed.TotalMilliseconds;
    47.     }
    48.  
    49.     public float ListToArrayForeachTest()
    50.     {
    51.         List<int> list = new List<int>();
    52.         for (int i = 0; i < ELEMENT_COUNT; i++)
    53.         {
    54.             list.Add(i);
    55.         }
    56.  
    57.         Stopwatch stopwatch = new Stopwatch();
    58.         stopwatch.Start();
    59.  
    60.         int[] array = list.ToArray();
    61.         foreach (int iNumber in array)
    62.         {
    63.             //just a basic operation
    64.             int iNumberSquard = iNumber * iNumber;
    65.         }
    66.  
    67.         stopwatch.Stop();
    68.         return (float)stopwatch.Elapsed.TotalMilliseconds;
    69.     }

    Let me know if I've missed any glaring issues in the code. Will be good to discuss.

    Happy Developing
    Martin
     
  2. Ryiah and neginfinity like this.
  3. Maccyfin

    Maccyfin

    Joined:
    Aug 19, 2012
    Posts:
    87
  4. Owen-Reynolds

    Owen-Reynolds

    Joined:
    Feb 15, 2012
    Posts:
    1,998
    In-order lists (where there's a definite 1st, 2nd... item) are implemented in 2 ways: array or linked-list. A linked list is that second thing you mention. Choosing the right one is a big deal for speed, like 1000x slower. Looking up things like "array vs. linked list big O" should explain why. Experienced coders immediately google "C# List implementation" to find out. It turns out C# lists are array-backed. In other words, a List is a class holding a private array variable, plus lots of functions to make it easy-to-use.

    As a practical matter, Unity uses arrays in lots of places, and you may as well stick with them there. Inspector arrays can change size easily when you aren't running, so arrays are fine there. But for your own, if you'll ever want to add or remove an item while running, using a List is so much easier than hand-writing your own an array insert.
     
    richardt_aie likes this.
  5. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    I would venture a guess that when converting a list to array your get some overhead from that conversion, but as you iterate over the resulting array that part is as fast as iterating over any array. Did you see any cut off as far as size of the list where it would no longer benefit from being converted to an array first before iterating?
     
  6. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,338
    So the big-o perf of linked list insertion and removal is faster than in an array/array list. It might also perform better if you set up some test cases where you add or remove objects, because in those tests all of those objects will be allocated in order, so they'll probably be close to each other.

    But in real use cases, those objects get spread all over memory, as objects that are added and removed are allocated over time. Big-o will tell you that iterating an array list and a linked list is about as fast, but that's very, very wrong.

    So the only instances where you want to use a linked list that's implemented through cell-to-cell pointers is cases where you add and remove very often and almost never iterate the list. Which is... not a common use case.


    You can improve the situation a lot if you have an implementation where the cells are all in an array, and their next/prev pointers are array indices instead. But that's not how LinkedList<T> is implemented.
     
    angrypenguin likes this.
  7. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    In core 3.1 I get with 1M elements

    list: 26660 ticks
    array: 22600 ticks
    list with toArray: 29649

    So toarray is slowest, not strange since it needs to copy it to a array Though not as slow as I thought it would be. Must be using the underlying array somehow so it doe not need todo a o(n). I wonder why itterating over list is slower. It uses array underneath. Probebly overhead from calling underlaying array on getting next element
     
    angrypenguin likes this.
  8. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,338
    Oh, yeah, right, I forgot to touch on that.

    @Maccyfin Don't copy the list to an array for performance. The difference is minimal, and you'll be hit with a larger GC cost later when that array you created needs to be cleaned up. The difference between the two versions is also probably more due to for/foreach
     
    angrypenguin likes this.
  9. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    by the way 26660 ticks is ~2ms And thats for 1 million elements. So maybe not worry so much ;)
     
  10. Owen-Reynolds

    Owen-Reynolds

    Joined:
    Feb 15, 2012
    Posts:
    1,998
    That's correct. Big-O is about finding things that are 1,000 or more times too slow. Things within about 2 or 3 times the speed of each other count as "about the same".

    The point is, the OP thought C# Lists might be using linked-lists. If that was the case, it would be pointless to compare them to arrays. Depending on the problem, one would probably blow the doors off the other since they were made for different things. Comparing straight arrays to arrays hidden in a list is at least a fair comparison (but I don't see the point -- Lists are so much easier to use, and are clearly only a tad slower).
     
    angrypenguin likes this.
  11. Glader

    Glader

    Joined:
    Aug 19, 2013
    Posts:
    456
    The original post is wrong in its assumption of implementation of the C# generic List type, List in C# is not a linked List. It is a Type that is a resizeable collection around a managed array that can be resized as needed.

    You can see proof of this in the source for the .NET List<T> implementation: https://github.com/dotnet/corefx/bl...oreLib/System/Collections/Generic/List.cs#L26

    Pretty sure .NET Runtime JIT is able to inline properties from interfaces too even though they are technically virtual? So the method call to List indexer will be optimized out. Likely leaving only this code as the difference:
    Code (CSharp):
    1. // Following trick can reduce the range check by one
    2.                 if ((uint)index >= (uint)_size)
    3.                 {
    4.                     ThrowHelper.ThrowArgumentOutOfRange_IndexException();
    5.                 }
    As the difference between indexing a list vs indexing a List in C#. This code will run every time its indexed *maybe*. Hard to tell, I am not a runtime or JIT expert by any stretch. But yea.

    Make sure all tests are done in RELEASE and relevant methods called before running benchmarks for initial JIT cost to be removed from the run time.

    Also the poor performance for enumerating a List<T> with Foreach is unfortunately the expected outcome if you think about it. Because a List<T> manages an internal but potentially RESIZEABLE array reference then it's possible, in the middle of iteration, that it changes. It does not just yield on array index access but instead uses an Enumerator object that checks the state to see if the List was modified and throws. That's why it will never compete with direct index access for or array foreach.
     
    Last edited: Jul 1, 2020
  12. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    In cases where you don't need the extras a List<> gives I'd always go with the lighter weight alternative, especially for any kind of "low level" code.

    For general purpose stuff I agree, though. My default for non-performance-critical code (ie: almost all of it) is List<Whatever> because it'll usually have much more direct impact on the programmer than it will the code.

    I think this is a classic case of why doing performance tests in synthetic tests like this can be misleading. The average case may be a win in terms of specifically what you're measuring, but as others have pointed out the GC cost of allocating new arrays is going to kick you later.

    I haven't checked out how well the new GC in Unity performs, but another consideration here is this: iteration time over your collection is probably fairly predictable, where the GC spikes you could cause are not.

    I personally wouldn't want to introduce unpredictable slowdowns into my code. If it's performance critical I'd use a higher performance collection in the first place, even if that means more work. If it's not performance critical then, as long as it's not stupid, I'd take more stable over slightly faster.

    Also consider "faster" in context. How often is the operation in question performed? How many items are in the collection? Unless it happens a lot, or the data set is large, or you've profiled to find the presence of an actual performance issue then considerations like this are likely to be premature optimisation.


    Edit: That said, I don't want to appear condemning of curiosity. That's good! Just make sure you're looking at the whole impact.
     
  13. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    I was going to type we dont have a single list in our domain but I did a check and we have a few, 7 to be more precise. We have several hundred arrays. Also dictionarys and hashmaps are more common than Lists in our domain (25 dictionaries, 5 hashmaps).

    I think Lists are naturally not needed in unity game dev, atleast not for us. Also when serializing things you use arrays.

    edit: this is not including vendor code offcourse
     
  14. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,338
    Does that mean that you generally don't have arbitrary sized collections of stuff you need to keep track of, or that you stuff those things in an array and do the listy stuff manually?

    For example, some system needing to keep track of everything of type X that's in range (like AI senses or whatnot)
     
    Last edited: Jul 1, 2020
    angrypenguin likes this.
  15. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    Same. Plus I recently built a fancy custom editor and it was much simpler to use List<> thanks to Unity having APIs in place to easily manipulate that collection type in editors.
     
  16. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    I don't know about their particular case, but some games it makes sense to allocate everything up front, in which cases there might not be a lot of benefit to dynamic resizing.

    Another possibility is that they do have arbitrary collections of stuff and their access patterns benefit more from fast random lookup than fast sequential access.
     
  17. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    Was that reply for me? No, but we are more often interested to look those up fast in a dictionary. Though I didnt check our AI branch. Checked our main dev branch so might be a bit more lists in that branch
     
  18. Owen-Reynolds

    Owen-Reynolds

    Joined:
    Feb 15, 2012
    Posts:
    1,998
    I feel like there were a few years when the answer to every question here was "use a dictionary". Like someone had just discovered them and was eager to talk dictionaries in every thread. As a result, I'm pretty sure plenty of people are using dictionaries with keys like 0-19 (which means an array or list would be better in every way).

    I think that's one of the problems learning programming from the microsoft pages -- you don't get an intro to Data Structures and list/linkedList/dictionary remains so mysterious.
     
  19. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    I have a master of science degree, one entire class was dedicated to data structures. Though I agree, most don't know what they do when it comes to data structures.

    We use it for stuff like OnTrigger/Collision/enter/exit and map colliders to behavior instances using lookup.
     
  20. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    It's also cleaner code doing dic[somekey] than doing foreach(var item in items) if some condition etc, do something break;
     
  21. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    I personally find 2D arrays easier to understand for grids, at least in C#, since you can make them non-jagged.
     
  22. Maccyfin

    Maccyfin

    Joined:
    Aug 19, 2012
    Posts:
    87
    Hey guys, thanks for all the replies and input. Really interesting to read through.

    I think when it comes to using a list vs array vs dictionary etc, it really depends on the situation. I'd encourage people do their own benchmark in their projects and see what the difference is. A colleague of mine switched from a list to an array when looping over a large data set and the different was literally a minute + faster.

    FYI, I did another benchmark only using 10 elements in the array / list, and using .ToArray() was faster.

    Micro optimisations won't make a big difference but if you find that an array works better for your needs, then go ahead with that, and perhaps manually call GC.Collect (i think thats the command) at a natural break point in the game. A single array vs list wont make a big different but if you have 100+ throughout the codebase, all of which are looping over 1000+ elements, it might be good to look into this during an optimisation phase. Again, its a case by case basis.

    Gonna be posting some more benchmark tests on here soon guys. Perhaps to do with string concatination as I'm curious about that one.

    Martin :)
     
  23. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    WIth no further context we've got no idea what conclusions to make from that.

    Have a read here in the manual, and here for people diescussing use of GC.Collect() in general.

    In short, calling GC.Collect() can be beneficial, but doing it without having a solid understanding of what's happening under the hood can hurt rather than help. Also note, as I mentioned before, that Unity now has an incremental GC which will also change some of this stuff.

    Note that the manual link specifically calls out "when a function returns an array value" as a problematic case, and that's exactly what ToArray() is doing. It may well make iteration faster, but are you also capturing the performance impact of the garbage collection that it's causing at some unknown time in the future?

    You're right that it's highly situational, but you don't need to do benchmarks in most cases to get a solid idea of which is best. Posters here have already mentioned learning about data structures and the pros and cons of each. This is relatively foundational knowledge that any good programmer should pick up relatively early on.

    In short, different data structures have strengths and weaknesses which lend themselves to different situations really well. Understanding how each structure stores data in memory, how items are added / removed / accessed, and how well (or poorly) that matches up with your usage pattern will obliterate your need to benchmark this stuff in the vast majority of cases.*

    Fixed number of items and you need it iterate the whole collection regularly? Array is a good fit.

    Same deal but you need occasional insertion and removal? List is the way to go. (Not a "linked list", but a managed array.)

    Random access speed is more important? Dictionary or hashset are much faster there because you don't need to iterate the whole collection to find what you want.

    On that note, you can also do things like have a sorted array / list to get both iteration speed and faster lookups. And there are other collection types too, depending on your need.

    * Also note that the majority of cases aren't performance critical code, so "good enough" is fine without obsessing over the minutia. Otherwise we'd always spend the extra time to custom make the perfect structure for every case, and frameworks wouldn't need to come with built-in ones.
     
    Ryiah and EternalAmbiguity like this.
  24. Owen-Reynolds

    Owen-Reynolds

    Joined:
    Feb 15, 2012
    Posts:
    1,998
    If you run the numbers, it makes a pretty good case why not to do this sort of thing. Your best case was a 20% speed-up by converting a list into an array. But that's for a loop doing 1 thing. A real loop would have several statements, which take just as long either way. So speed-up on a real loop might be 5%. Let's say the loops are 65% of the time each frame (after all, we've got rendering, animation, lots of non-loop code... .) So we're at 3%. But that's our maximum speed-up if we change all 100 loops/lists That's going to take a while.

    Meanwhile, maybe we could combine loops. Or some of our logic could be rewritten to go twice as fast. That's more work, but a better payoff if we can do it for only 20 of those 100. We might realize we can do those loops every 5 frames. That's a 50% speed-up (cutting the 65% down to 1/5th -- 13% then re-adding the non-loop 35%), with almost no work, which we wouldn't think of if we were busy making 100 things barely faster. Meanwhile, meanwhile, our new design doc means we now need to add/delete from those lists which are now arrays. We can write and test those array functions, or just go back to lists.
     
  25. EternalAmbiguity

    EternalAmbiguity

    Joined:
    Dec 27, 2014
    Posts:
    3,144
    Hang on, insertion or addition?

    Edit: a better question - do you mean creating a new index or putting something at an already existing index?
     
    Last edited: Jul 3, 2020
  26. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    Either way I usually wouldn't care, because they're "occasional" but I need to "iterate the whole collection regularly". The exceptions to that are if the insertions / additions need to happen at especially critical times, and in that case I'm going to be looking at the whole algorithm, not just what data structure I'm using.
     
    EternalAmbiguity likes this.
  27. Maccyfin

    Maccyfin

    Joined:
    Aug 19, 2012
    Posts:
    87
  28. Neonlyte

    Neonlyte

    Joined:
    Oct 17, 2013
    Posts:
    516
    I think the thread title should be rephrased to: "List vs Array - which one is faster when being iterated using foreach/enumerator"
     
  29. Maccyfin

    Maccyfin

    Joined:
    Aug 19, 2012
    Posts:
    87
    Good call, changed :)