Search Unity

Alternative way to Nested Loops

Discussion in 'Scripting' started by JotaRata, Nov 21, 2019.

  1. JotaRata

    JotaRata

    Joined:
    Dec 8, 2014
    Posts:
    61
    Hi everyone,
    I wanted to share a trick I just discovered as an alternative way to use loops when iterating over a 2D array.

    So let's say you have a 2D Array Initializated like this:
    Code (CSharp):
    1. public void CreateMatrix(int size){
    2.     float [,] arr = new float[size, size];
    3.     }
    Let's say you want to initialize it with every number set to one, the usual approach should be like:
    Code (CSharp):
    1. public void CreateMatrix(int size){
    2.     float [,] arr = new float[size, size];
    3.  
    4.     for (int i = 0; i < size; i++){
    5.            for (int j = 0; j < size; j++){
    6.                  arr[i,j] = 1;
    7.             }
    8.        }
    9.     }
    Instead of using two for-loops to get the element of an array, you could simply use one loop which is much faster than nested loops.
    Code (CSharp):
    1. public void CreateMatrix(int size){
    2.     float [,] arr = new float[size, size];
    3.     for (int i = 0; i < size * size; i++){
    4.           int row = i / size;          // Note that this is integer division..
    5.           int col = i - row * size;
    6.           arr[row, col] = 1;
    7.     }
    Note that there is an integer division at line 4, later the variable assigned (row) is multiplied with the size parameter, thus they dont cancel each other.

    This approach also works for rectangular Arrays, in that case the code would be like:
    Code (CSharp):
    1. public void CreateMatrix(int rows, int columns){
    2.     float [,] arr = new float[rows, columns];
    3.      for (int i = 0; i < rows * columns; i ++){
    4.             int row = i / columns;
    5.             int column = i - row * columns;
    6.             arr [row, column] = 1;
    7.      }
    8.     }
    Note that this approach is running through columns first and rows later like the first example of this diagram (from wikpedia):
    upload_2019-11-21_0-24-40.png

    Example: Identity Matrix Initilization:
    Code (CSharp):
    1. public float[,] Identity(int size){
    2.         float[,] arr = new float[size, size];
    3.         for (int i = 0; i < size * size; i++){
    4.               int row = i / size;
    5.               int column = i - row * size;
    6.               if (row == column){
    7.                      arr[row, column] = 1;
    8.               }else{
    9.                      arr[row, column] = 0;
    10.               }
    11.        return arr;
    12. }
    EDIT: I changed the title because I've just realized that this methos is not efficient for large arrays (1000 x 1000), in that case ested loops are a better option. This method works well with small arrays though.
     
    Last edited: Nov 21, 2019
  2. dgoyette

    dgoyette

    Joined:
    Jul 1, 2016
    Posts:
    4,195
    I'm not sure I understand what would be faster about the approach you've described. With the nested loops, we're looping through N^2 times. With your approach, it's still looping through N^2 times, but then it's also performing an extra multiplication and a division for each of those N^2 passes through the loop. Am I missing something? This seems like more work, as well as being significantly more complicated for someone to look at and understand what it's doing.
     
  3. exiguous

    exiguous

    Joined:
    Nov 21, 2010
    Posts:
    1,749
    I also don't think this is faster. Have you actually profiled it? If the array really has a need for speed (pun intended) I would go with a 1d array and map the index like: index = y * size + x. This should be as fast as you can get and is way less complicated.
     
    SisusCo and JotaRata like this.
  4. JotaRata

    JotaRata

    Joined:
    Dec 8, 2014
    Posts:
    61
    The thing is that in nested loops, for each element in the first loop you make another loop to compute a simple thing, I think this is slower than just creating a single loop to make the same result. Also the integer dovision is also faster than regular float division
     
  5. JotaRata

    JotaRata

    Joined:
    Dec 8, 2014
    Posts:
    61
    I did a profile on this, and realized that this method works great with small arrays (4x4) or (8x8)
    Array size 8:
    Average time Single-loop(ms): 1.2
    Average time Nested-loop(ms): 1.5

    for large arrays the nested loop method is more efficient:
    Array size 1000:
    Average time Single-loop(ms):57.9
    Average time Nested-loop(ms): 21.7

    I changed the title, and removed the tags. This is not an optimization :(
     
    Last edited: Nov 21, 2019
    palex-nx and eisenpony like this.
  6. dgoyette

    dgoyette

    Joined:
    Jul 1, 2016
    Posts:
    4,195
    For your smaller test case, with an array size of 8, that seems like an enormous amount of time, for either method. I would have expected that to be closer to 0.01 ms, not 1+ ms. Are you just basing that on Unity's profiler telling you how long the method took to run? Or were you testing this in some other way, like capturing DateTime.Now before and after running the method? If it's more like the later, you should run the method several times, throwing away the best and worst samples, to get a better sense of the performance. I've often seen that the first time you call DateTime.Now, for example, it really slows things down for a short time, throwing off performance testing.

    Anyway, overall, I'm not surprised by the big picture results. Thanks for profiling it, though. I wouldn't be surprised if nested loops, or loops in general, were just some just some terribly optimized thing in .NET, and that it would be pretty tough to beat their performance overall.
     
    JotaRata likes this.
  7. eisenpony

    eisenpony

    Joined:
    May 8, 2015
    Posts:
    974
    I've read that in c, multidimensional arrays are implemented as 1 dimensional arrays and the lookups inline the same formula mentioned here.

    In C#, that doesn't seem to be the case. 1 dimensional arrays are created in IL with a newarr command while multidimensional arrays are newobj. I won't speculate on the implications except that 1 dimensional arrays seem to get some special treatment and are probably faster to create and perform look-ups.

    Theoretically, multidimensional arrays could run into a problem if you run out of registers to hold the loop variables.

    Of course, you're not supposed to think about things to this level since they are considered implementation details of the language abstractions. Add to this the il2cpp option in Unity, and I wouldn't even consider guessing at which will be most performant.

    Better idea to make your game first, then profile if performance is a problem. Most likely you will find something else to change.
     
  8. exiguous

    exiguous

    Joined:
    Nov 21, 2010
    Posts:
    1,749
    Instead of DateTime.Now maybe use System.Diagnostics.Stopwatch for incode profiling. This could also be used in build and thus should be preferrable for "real" measurements since the editor has its own issues and is not too reliable for such considerations.

    Anyway. I think there are 2 points to "learn" here.
    1. First make it run. Then make it fast (if necessary).
    2. There are happening much things under the hood and its hard to "guess" what will be faster. So if there are speed issues you HAVE to profile.

     
  9. JotaRata

    JotaRata

    Joined:
    Dec 8, 2014
    Posts:
    61
    I was using System.Diagnostics.Stopwatch for the tests, but it was profiling inside my game so I guess there was an extra work doing the game stuff, but matrices of size 4 took 0 ms to initialize with both methods.

    I used Stopwatches indeed. I will do additional profiles to this but the overall picture shows that the nested loops method is better ( especially with large amounts of data ). But as @eisenpony said maybe c# does a similar job when trying to lookup elements of 2D arrays, so my code shown is redundant.

    I apologize for not profiling before posting this, mostly because I thought that in theory it should work better than the usual methods.
     
    Joe-Censored likes this.
  10. eisenpony

    eisenpony

    Joined:
    May 8, 2015
    Posts:
    974
    There's nothing wrong with doing the tests in your game. I find Performance tests done in a laboratory are good for just that: optimizing a laboratory. If you want your optimizations to be useful, context is king.

    Stopwatch is the right tool for this kind of low level profiling, but you also need to run batches and do a bit of statistical analysis. It's more work than it's worth most of the time. The profiler is usually a more tangible way to improve your games performance.

    No need to apologize. This forum is more of a discussion group than a reference guide. They're, thankfully, not trying to be Stack Overflow. I for one appreciate that you followed up after your theory was questioned.
     
    Joe-Censored and JotaRata like this.
  11. Antistone

    Antistone

    Joined:
    Feb 22, 2014
    Posts:
    2,836
    Since the nested loops involve N^2 + N conditional jumps and the single loop involves only N^2 conditional jumps, it is vaguely plausible that the single loop could be faster for small arrays but slower for large arrays (because those N extra jumps become a smaller percentage of (N^2 + N) as N gets bigger). Particularly because those specific N extra jumps are probably the ones where branch prediction will fail.

    If optimizing this loop is important for some reason, you could try using loop unrolling. But if you're just doing this to initialize the array, it's unlikely to be performance-critical (because initialization usually happens less frequently than other operations).
     
    JotaRata likes this.
  12. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,689
    It's a hard impulse to reject, but never optimize speculatively. I have seen more damage to otherwise good codebases done by speculative optimization than pretty much any other reason, in total.

    In the specific example above, if I see you initializing a 2D array, I know instantly and intuitively what's happening, and every member of my team could see that instantly too.

    If I see a single linear loop and a bunch of modulo/addition operators, I don't see it instantly. I have to track down "Is there a bug in this math?" "What does this math even do?" etc.

    Write your code to express what you are doing, and only when the profiler tells you something is slow, ONLY then consider optimizing.
     
  13. Dameon_

    Dameon_

    Joined:
    Apr 11, 2014
    Posts:
    542
    Stopwatches aren't a very good way to profile, either. For best results, use the actual profiler you have built into the Unity editor. It's easy to use Profiler.BeginSample("SomeSampleName") and Profiler.EndSample() to get very precise performance information on a piece of code. Then just filter the profiler by SomeSampleName to easily see those results.

    This is definitely not an optimization. Creating "more loops" doesn't cause a performance hit. Simply put, if you have two arrays with 20 items each, you're going over 40 items total, no matter how you cut it. This is basic time complexity. This is why premature optimization is bad, especially if you don't understand time complexity. Rather than looking for miniscule ways to optimize your code, use the profiler in Unity to identify bottlenecks and solve your actual performance problems.
     
  14. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    Its a neat little trick. Although your application isn't the best example of how to use it to get performance.

    In general a large a 2D array is slightly less performant then a 1D array. So you can use this math trick is to access data in a 1D array as if it were a 2D array. To outside consumers it looks as if you are handling 2D data, which is convenient to code to, when in reality you are using a 1D array which is marginally more performant.

    Unity uses this trick in a few places. Texture.GetPixels() returns a 1D array which represents 2D data.

    You've got to be using the array a lot though to make it worth the effort. I'm talking multiple operations on every element of the array every frame. And if you are chasing that much performance, you might get more bang for your buck out of switching to DOTS or pushing the whole thing to a shader and letting the graphics card do the heavy work.
     
    JotaRata likes this.
  15. palex-nx

    palex-nx

    Joined:
    Jul 23, 2018
    Posts:
    1,748
    Did you made a warmup run? If you don't call a profiled function at least once before measurements, then you just profiling JIT and not your code.
    Also note, what using arrays of arrays is much faster because of crappy multidimensional array implementation in .NET.
    Another thing here is IL2CPP so you might get completely different results when measuring player vs editor.
     
    JotaRata likes this.
  16. Chris-Trueman

    Chris-Trueman

    Joined:
    Oct 10, 2014
    Posts:
    1,261
    The reason for the increase in performance in the large nested loop is that calculation needed to get the indexes. Division and modulus are VERY expensive operations, when being performed in a tight loop they will eat CPU cycles. Division can take up to 21 cycles to complete, modulus even more as it is division with an extra step(s). Multiplication is a max of 5 from what I remember. Simple array look ups are probably a 2 or 3 at best. The addition is minimal as well.

    I may be wrong on the numbers, but not on the fact that they are costly and I avoid division where I can, especially in loops. Need half of an int or float multiple by 0.5 its faster. This is a micro optimization though, only change it if it's a problem. Like in the test you performed, the nested loop took roughly a third the time to complete.
     
    JotaRata likes this.
  17. JotaRata

    JotaRata

    Joined:
    Dec 8, 2014
    Posts:
    61
    I haven't thought of that, loop unrolling could be the key here, just I will have to be careful computing too many operations in a single interation. Thanks for your comment.

    In python there is a module called numPy which has a function called numpy.ravel which can flatten multidimensional arrays into a 1D list, I will investigate how to implement that in C# and use your tip to optimize my code. thanks for that!

    No I didn't :( maybe I was profiling the wrong thing but overall it seems that my method isn't effective at all, I will try arrays of arrays which I read that are faster to iterate.

    I don't know if that's the issue, I've read that integer division shouldn't be a problem at all, but I will compare and see later.



    I have used the profiler before and sometimes it lags the whole editor giving me false readings..

    Thanks for your response, I will look forward to this and give a definitive optimization to this kind of loops. Again thanks for writing this and let me know that the Unity Community is always there to listen and help.

    Thanks you all for your comments, I will post here when I find a better method that I want to share with you.
     
    dgoyette likes this.
  18. Dameon_

    Dameon_

    Joined:
    Apr 11, 2014
    Posts:
    542
    Just run your test on multiple frames, and choose one where the editor hasn't lagged.
     
  19. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    Profile a build instead of in editor, and if necessary run the build on a separate computer from the editor running the profiler.
     
  20. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    The only reason I use modulo and divide is because nested loop are ugly and hard to read, too much clutter too much useless bracket