Search Unity

Vector3 and other structs optimization of operators

Discussion in 'Scripting' started by Aka_ToolBuddy, Jun 17, 2017.

  1. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Hi,
    While working on some real time mesh generating code, that did hundreds of thousands of Vector3 operations per frame, I was surprised to find that Vector3 (among other Unity structs) operators (*, +, ...) can be easily and massively optimized.

    The current implementation of the * operator in Vector3 is:
    Code (CSharp):
    1. public static Vector3 operator *(Vector3 a, float d)
    2. {
    3.     return new Vector3(a.x * d, a.y * d, a.z * d);
    4. }
    The optimized implementation I suggest is:
    Code (CSharp):
    1. public static Vector3 operator *(Vector3 a, float d)
    2. {
    3.     Vector3 result;
    4.     result.x = a.x * d;
    5.     result.y = a.y * d;
    6.     result.z = a.z * d;
    7.     return result;
    8. }
    I run some simple comparison test that you can find here https://dropb.in/ponda.nimrod, and the result was:
    When run 50000 times, the current Unity's operator took 18.9 ms to execute, while the optimized one took 2.5 ms.
    The reason behind this difference is that the optimized version avoids calling unnecessarily the Vector3 constructor.

    I opened a suggestion at Unity's feedback site, so please support it by voting for it so we can see this optimization integrated in Unity some day
    https://feedback.unity3d.com/suggestions/vector3-and-other-structs-optimization-of-operators

    Thanks and have a nice day.
     
    Last edited: Jun 19, 2017
    pixelPhil, DaDonik, zxkne and 2 others like this.
  2. ThomasTrenkwalder

    ThomasTrenkwalder

    Joined:
    Jun 18, 2017
    Posts:
    10
    Interesting, I never thought about checking the performance of unitys math library stuff.
    I just tried out the operator you mentioned, and I do indeed get better performance when I roll my own struct and implement the operator as you suggest.
    Unitys Vector3 seems to take 30% more time for me, both inside the editor and inside a build (on 5.6.1f1).
     
    Aka_ToolBuddy likes this.
  3. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Thanks a lot ThomasTrenkwalder for your tests, and thanks for voting for my suggestion, I hope it will make the Unity team consider the suggestion.
     
  4. CrystalConflux

    CrystalConflux

    Joined:
    May 25, 2017
    Posts:
    99
    Interesting. Considering that the current implementation is less verbose, and all that the constructor does is assign the corresponding fields, I wonder why the C# compiler doesn't optimize this?

    Have you tested this in standalone release mode? Maybe it only affects debug mode?

    When you tested it in standalone did you disable development build/script debugging?
     
    Last edited: Jun 18, 2017
  5. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    I wonder the same thing. It seems to me to be something the compiler can handle, but I suppose things are more complicated than what I imagine.

    I confirm the optimization works in those conditions as well. I used a heavier version of the script above, and used Fraps to get the FPS count (to exclude any Unity's profiler possible issue), and here are the results:
    - Optimized version: 53 FPS
    - Unoptimized version: 42 FPS
     
  6. Rick-Gamez

    Rick-Gamez

    Joined:
    Mar 23, 2015
    Posts:
    218
    Wow I didn't realize that running the constructor in this case would make that big of difference. (I'm self taught BTW) but thanks for this insight. I will keep this in mind when developing my stuff. Thanks for the info!
     
  7. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    6,646
    yep, a constructor function is just that... a function.

    So it allocates a stack frame to call it.

    If you don't call the constructor though, it just allocates the memory needed for the struct with empty values.

    This is why struct's don't allow field initializers, they MUST be empty values. Where as classes always have a constructor phase, so it doesn't have this restriction.

    ...

    I find this a minor optimization, probably resulting from early Unity. I bet it came about because the unity devs were all C++ programmers first and foremost, and so didn't really consider the inner workings of the mono CLR. But it is a area of optimization that could potentially give a little oomph since vector construction is very common.
     
    Rick-Gamez likes this.
  8. Rick-Gamez

    Rick-Gamez

    Joined:
    Mar 23, 2015
    Posts:
    218
    Yeah I knew that the constructor is a glorified method basically but that sheds some light on how C# allocates it's frame steps so thank you for that info!
     
  9. ThomasTrenkwalder

    ThomasTrenkwalder

    Joined:
    Jun 18, 2017
    Posts:
    10
    Yup, no development build here. I measured the times using the .NET Stopwatch class.
    One would think that the compilers (either the C# one or the JIT) should be able to inline this constructor call, but apparently they just don't.

    Considering that working with Vector3s and other math structs is quite common in many games, optimizing these operators would provide a nice benefit, and it doesn't even look like a lot of work ^^
     
  10. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    I completely agree. I think that implementing these optimizations could be done in less than a man-day.
    You are welcome :) And please consider voting for the suggestion to hopefully make Unity's team implement it.
    https://feedback.unity3d.com/suggestions/vector3-and-other-structs-optimization-of-operators
     
  11. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    For those who are interested, here are the IL instructions for the optimized Vector3 multiplication

    Code (CSharp):
    1. .method public hidebysig static
    2.     valuetype [UnityEngine]UnityEngine.Vector3 Optimized_Multiplication (
    3.         valuetype [UnityEngine]UnityEngine.Vector3 a,
    4.         float32 d
    5.     ) cil managed
    6. {
    7.     // Method begins at RVA 0x20e0
    8.     // Code size 58 (0x3a)
    9.     .maxstack 3
    10.     .locals init (
    11.         [0] valuetype [UnityEngine]UnityEngine.Vector3,
    12.         [1] valuetype [UnityEngine]UnityEngine.Vector3
    13.     )
    14.  
    15.     IL_0000: nop
    16.     IL_0001: ldloca.s 0
    17.     IL_0003: ldarga.s a
    18.     IL_0005: ldfld float32 [UnityEngine]UnityEngine.Vector3::x
    19.     IL_000a: ldarg.1
    20.     IL_000b: mul
    21.     IL_000c: stfld float32 [UnityEngine]UnityEngine.Vector3::x
    22.     IL_0011: ldloca.s 0
    23.     IL_0013: ldarga.s a
    24.     IL_0015: ldfld float32 [UnityEngine]UnityEngine.Vector3::y
    25.     IL_001a: ldarg.1
    26.     IL_001b: mul
    27.     IL_001c: stfld float32 [UnityEngine]UnityEngine.Vector3::y
    28.     IL_0021: ldloca.s 0
    29.     IL_0023: ldarga.s a
    30.     IL_0025: ldfld float32 [UnityEngine]UnityEngine.Vector3::z
    31.     IL_002a: ldarg.1
    32.     IL_002b: mul
    33.     IL_002c: stfld float32 [UnityEngine]UnityEngine.Vector3::z
    34.     IL_0031: ldloc.0
    35.     IL_0032: stloc.1
    36.     IL_0033: br IL_0038
    37.  
    38.     IL_0038: ldloc.1
    39.     IL_0039: ret
    40. } // end of method test::Optimized_Multiplication
    and those for the unoptimized one

    Code (CSharp):
    1. .method public hidebysig specialname static
    2.     valuetype UnityEngine.Vector3 op_Multiply (
    3.         valuetype UnityEngine.Vector3 a,
    4.         float32 d
    5.     ) cil managed
    6. {
    7.     // Method begins at RVA 0xb5b8
    8.     // Code size 41 (0x29)
    9.     .maxstack 4
    10.     .locals init (
    11.         [0] valuetype UnityEngine.Vector3
    12.     )
    13.  
    14.     IL_0000: nop
    15.     IL_0001: ldarga.s a
    16.     IL_0003: ldfld float32 UnityEngine.Vector3::x
    17.     IL_0008: ldarg.1
    18.     IL_0009: mul
    19.     IL_000a: ldarga.s a
    20.     IL_000c: ldfld float32 UnityEngine.Vector3::y
    21.     IL_0011: ldarg.1
    22.     IL_0012: mul
    23.     IL_0013: ldarga.s a
    24.     IL_0015: ldfld float32 UnityEngine.Vector3::z
    25.     IL_001a: ldarg.1
    26.     IL_001b: mul
    27.     IL_001c: newobj instance void UnityEngine.Vector3::.ctor(float32, float32, float32)
    28.     IL_0021: stloc.0
    29.     IL_0022: br IL_0027
    30.  
    31.     IL_0027: ldloc.0
    32.     IL_0028: ret
    33. } // end of method Vector3::op_Multiply
    34.  
     
    CrystalConflux and Rick-Gamez like this.
  12. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    895
    Did a test because I was curious if the same issue would happen with the object initializer {} feature.

    Code (CSharp):
    1. public static Vector3 GetSomeVector3()
    2. {
    3.     Vector3 vec;
    4.     vec.x = 3.4f; vec.y = 2.3f; vec.z = 55.5f;
    5.     return vec;
    6. }
    7.  
    8. IL_0000 nop
    9. IL_0001 ldloca.s  vec
    10. IL_0003 ldc.r4    3.4
    11. IL_0008 stfld     System.Single UnityEngine.Vector3::x
    12. IL_000D ldloca.s  vec
    13. IL_000F ldc.r4    2.3
    14. IL_0014 stfld     System.Single UnityEngine.Vector3::y
    15. IL_0019 ldloca.s  vec
    16. IL_001B ldc.r4    55.5
    17. IL_0020 stfld     System.Single UnityEngine.Vector3::z
    18. IL_0025 ldloc.0
    19. IL_0026 stloc.1
    20. IL_0027 br.s      IL_0029
    21. IL_0029 ldloc.1
    22. IL_002A ret
    23.  
    24. public static Vector3 SomeNewVector3()
    25. {
    26.     return new Vector3 {x = 3.4f, y = 2.3f, z = 55.5f };
    27. }
    28.  
    29. IL_0000 nop
    30. IL_0001 ldloca.s  V_0 //Extra Instruction
    31. IL_0003 initobj   UnityEngine.Vector3 //Extra Instruction
    32. IL_0009 ldloca.s  V_0
    33. IL_000B ldc.r4    3.4
    34. IL_0010 stfld     System.Single UnityEngine.Vector3::x
    35. IL_0015 ldloca.s  V_0
    36. IL_0017 ldc.r4    2.3
    37. IL_001C stfld     System.Single UnityEngine.Vector3::y
    38. IL_0021 ldloca.s  V_0
    39. IL_0023 ldc.r4    55.5
    40. IL_0028 stfld     System.Single UnityEngine.Vector3::z
    41. IL_002D ldloc.0
    42. IL_002E stloc.1
    43. IL_002F br.s      IL_0031
    44. IL_0031 ldloc.1
    45. IL_0032 ret

    The object initializer method does also avoid the call to the constructor, but it still has two extra instructions, the important one being an initobj call, which is going to cause a bit of extra work to be done in the form of it initializing all the values of the struct to zero or null. So while that should still be a lot better than the call to the constructor, the local declaration and assignment still wins out.

    I'm really surprised the CLR doesn't optimize this initobj call out if it detects you're assigning to every value in the struct.
     
    bobisgod234 and Peter77 like this.
  13. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Thanks for that extra information. Didn't tough to test that as well.
     
  14. TJHeuvel-net

    TJHeuvel-net

    Joined:
    Jul 31, 2012
    Posts:
    411
  15. Doug_B

    Doug_B

    Joined:
    Jun 4, 2017
    Posts:
    1,585
    I linked to it over on this other thread earlier on. Vote count has gone from 152 to 165 in three hours.

    I wonder why they cannot just fix the aforementioned request rather than create a whole new library that you have to know to get and integrate? I appreciate that a release of Unity (which is presumably what would be required) is no small matter. However, this does seem to be such a fundamental part of a 3D platform to reasonably have expectations of an efficient implementation.

    But then maybe I am simply missing something here. :)
     
  16. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    895
    You are missing something :p
    That mathematics library isn't the "solution" to this tiny little problem here, it's completely unrelated to it. That mathematics library is designed to help ensure highly efficient compilation of your complex vector/matrix/etc.. math in general, helping it be tightly packed and memory efficient in the burst compiler.
    That mathematics library will be integrated in Unity... It's just that it's quite beta right now so people who want to mess with it right now can do so through the repository and also help find bugs or contribute improvements (at some point potentially).
     
  17. Doug_B

    Doug_B

    Joined:
    Jun 4, 2017
    Posts:
    1,585
    Ah, ok. I've got my wires crossed. That means my vote for improved struct performance may not have been wasted then - assuming that ever gets looked at. :)
     
  18. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    4,010
    I rewrote the IL of some Unity's DLLs and measured performance of a few applications. My conclusion was that Unity Technologies can achieve quite some performance improvements, with very little work, with trivial changes only, without actually changing something in user-code.

    Yes, they do provide a new math lib, but to make use of it, you need to change your project. This probably give better performance, but it might also not be a trivial change. Therefore, if Unity would just change some simple code in their Vector classes, every existing Unity project would actually benefit from those changes automagically.

    Here are my findings:
    https://forum.unity.com/threads/wip...faster-without-any-changes-in-seconds.531169/
     
  19. Doug_B

    Doug_B

    Joined:
    Jun 4, 2017
    Posts:
    1,585
    Interesting video. Thumbs up from me. :)
     
    Peter77 likes this.
  20. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    The new Unity mathematics library has definitely its benefits, that are higher than the optimizations this forum thread is about. But using that library means you have to modify/rewrite parts of your code. The Vector3 (and similar) optimization works with 0 modification on your code.

    What kills me the most is to know that this optimization should hardly take more than a man/day to Unity's developers to implement, which is peanuts knowing the increase of performance it creates (Peter77 spoke here about a 4% increase in his game). Knowing that people in Unity are aware of the existence of this optimization (Suggestion ticket + me writing to them), the most probable explanation I see is that the internal organization of the Unity company became so complicated that making such simple useful modifications became a daunting task.
     
  21. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Wow, that's some great tooling there. Thanks a lot Peter for making this, and pushing the idea beyond where I stopped.
     
    Last edited: Jun 5, 2018
    Doug_B and Peter77 like this.
  22. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Thanks a lot for spreading the word.
     
  23. Aka_ToolBuddy

    Aka_ToolBuddy

    Joined:
    Feb 25, 2014
    Posts:
    121
    Hi again,
    I implemented the optimizations in a free asset. Here is its link https://www.assetstore.unity3d.com/#!/content/120660?aid=1101l3N9P
    It is easy to use, just import it and rebuild your game.
    To be completely aware of its limitations, please read the asset description.
    Thanks for your interest everyone and have a nice day.
     
    Joe-Censored, recursive and Doug_B like this.