Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Unity's compiler does not optimize condition scopes based on internal variables well

Discussion in 'Editor & General Support' started by kiriri, Dec 5, 2014.

  1. kiriri

    kiriri

    Joined:
    Jan 14, 2011
    Posts:
    107
    Abstract : For loops using internal Unity Objects like mesh etc are not optimized by the compiler

    Hi,
    I just found a major bottleneck in some code of mine that I really didn't expect. Have a look at this simple benchmark code :

    Code (CSharp):
    1.  
    2.         Stopwatch s = new Stopwatch();
    3.         s.Start();
    4.         for (i = 0; i < 15145 * 3300; i++) // CASE 1
    5.         {
    6.             var o = 0;
    7.         }
    8.         Debug.Log(s.ElapsedMilliseconds);
    9.         s.Reset();
    10.         s.Start();
    11.         for (i = 0; i < rend.sharedMesh.vertexCount * 3300; i++) // CASE 2
    12.         {
    13.             int o = 0;
    14.         }
    15.         Debug.Log(s.ElapsedMilliseconds);
    16.         s.Reset();
    17.         s.Start();
    18.         var count = rend.sharedMesh.vertexCount;
    19.         for (i = 0; i < count * 3300; i++) // CASE 3
    20.         {
    21.             int o = 0;
    22.         }
    23.         Debug.Log(s.ElapsedMilliseconds);
    I have a mesh that contains 15145 vertices. In case 1 I entered the vertex count directly into the for condition.
    In case 2 I used rend.sharedMesh.vertexCount instead, which should be the same.
    In case 3 I created the new variable count and applied rend.sharedMesh.vertexCount's value to it.

    Now here are the results :
    Case1 120 ms
    Case2 4949 ms
    Case3 119 ms

    It's obvious what happened. Using rend.sharedMesh.vertexCount requires Marshalling to get the mesh information, which is managed by c++ code, into c#. However, doing that just once has no real impact on performance as you can see in case 3 .
    Therefore it stands to reason that rend.sharedMesh.vertexCount is being fetched each loop, probably because the compiler can't decide whether any of the actions within the loop affected it.

    Which is the whole point of this post. Unity uses c++ for pretty much everything and most of us users have not the slightest clue on how it works internally. In this environment, is it too much to expect a compiler that can inspect c# code as well as c++ code and that can determine whether the conditional values remain constant or not? Or at least if any c++ calls whatsoever are being made?

    I'm posting this here because I'd really like some professional opinions. Should I pass this on as a bug? Would people even like a fix that could possibly increase compile times?
     
  2. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,500
    Why should it be the same? The value is the same, but the expression is not, because a) you're accessing it via two levels of indirection (one of which is via marshalling, as you point out) and b) there's no way the compiler can know it's not going to be changed within the loop.

    Edit: I want to expand on that last one a bit. It could inspect the code and confirm that the value is changed in the loop, but I don't think it can inspect it and guarantee that it isn't changed in the loop. Consider that the loop might fire an event, which could be picked up elsewhere (eg: not even in your code), which could in turn do something that modifies the data you're iterating over.

    If speed is important and you know that the value isn't going to change, your case 3 is the way to go because it removes issues a) and b). You can do that fairly easily as a human, where it's pretty darn difficult for a computer.
     
    Last edited: Dec 5, 2014
  3. kiriri

    kiriri

    Joined:
    Jan 14, 2011
    Posts:
    107
    Now I can't say I understand much of compilers, but if this was all c# then wouldn't the compiler probably have replaced case2 with something like case3, where a getter needs only be called once instead of each time the condition is checked. That's what I learned at least.
    Instead it stops optimizing as soon as it hits anything external. However, in this case the for loop does not contain anything related to external calls at all. Therefore it should be impossible that the vertexCount, being external/marshalled, changes. Ergo case3 should be just as valid in this case.
    Seems like a rather simple thing to check for a compiler. And now that I know of this issue I will most certainly always use case3, but what about all the other users who may have never heard of it?

    EDIT: I do see your point though. Implementing something like this would require loads and loads of special cases, eg functions like SendMessage or reflection would all need to remarshall the vertex count again and again for each loop.
    All in all it's probably not worth the effort anyways, since only very few people here are into microoptimization and they should know better :D
     
    Last edited: Dec 5, 2014
  4. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,500
    But it does relate to external calls. You said so yourself - the data you're referring to is marshaled from the native side of the engine. The Mono compiler has literally no idea what's going on over there*. It has no way to check, because whatever is happening on the other side of that call isn't a part of your code. The compiler can't see it.

    It's super easy for you, as a human, to know that "I'm iterating over the verts in a mesh which I am not modifying". All the compiler knows, however, is that you're getting some value from a place it can't see and using it in a comparison. Since it has no idea what might be happening to that value it can't make decisions about how to optimize it for you.

    This is why I think it's important for programmers to know "what they're telling the computer to do" in addition to "how to write an algorithm in code form". Cases 1 and 3 tell the compiler to use a locally specified number. Case 2 tells the compiler to follow two references to get a number so, in the absence of information that would allow it to determine a superior equivalent, that's exactly what it's doing.

    Edit: * Though it is only reading a variable, which should be safe enough in and of itself. I don't know enough about the managed/native crossover to know where complications could arise, though. For instance, depending on how the native stuff is wrapped the member we're accessing could be something like a property in C#, which could have side effects. But I've never looked into that.
     
    Last edited: Dec 5, 2014
  5. kiriri

    kiriri

    Joined:
    Jan 14, 2011
    Posts:
    107
    You are correct. Simplest scenario, vertexCount could be a getter that increments vertexCount itself, after returning it, like
    return vertexCount++;

    Which means a full c++ compiler needs to be implemented. Considering how long for example UE4 takes to build, this would bloat compile times in the Unity editor out of proportions.
    I guess that's it then. No choice but to use my own head . Thank you for the discussion though, I really appreciated it :)

    Cheers,
    kiriri​
     
  6. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,500
    A C++ compiler wouldn't help, though. Being able to compile a language doesn't mean you can read existing binaries, which is actually what we're talking about here. Unity doesn't recompile the engine when you change a script, it just recompiles and reloads the DLLs that have stuff we've made (though I assume you know that).
     
  7. Dustin-Horne

    Dustin-Horne

    Joined:
    Apr 4, 2013
    Posts:
    4,568
    It's interesting... in this case it's pretty easy for the compiler to do static analysis on "count" and know that it will never be changed. It's a value type that's being created in scope and used. Now, it would be interesting if you set count, then passed it into another method by ref and that method used a separate thread to modify the value of count... I wonder if that would cause a performance hit (aside from the obvious threading) like directly grabbing vertexCount does.

    At any rate, you have a couple of issues, which penguin pretty much touched on:

    The compiler has no idea when vertexCount may change. My guess is that the vertexCount property probably isn't even inlined... it's likely being marshalled, but also likely that the getter is making a method call which reduces additional optimizations.
     
  8. Zerot

    Zerot

    Joined:
    Jul 13, 2011
    Posts:
    135
    No, the compiler would not do that. In a for loop the condition will be executed every iteration. Because the condition in this case is also property which calls into the native side, there is no way for the compiler to know that the count won't change. Properties can have side-effects.

    Also, it doesn't matter that the loop itself does not do anything that could modify the result of the property. The property might be tied to hardware changes for all the compiler knows.(e.g. a physical switch). And with multithreading and multiple cores there is no guarantee when something changes.
     
    Dustin-Horne likes this.
  9. thxfoo

    thxfoo

    Joined:
    Apr 4, 2014
    Posts:
    515
    I don't think this is true.
    The optimizing JIT could cache the value here.
    Also multithreading is no argument here. That is exactly why you need a lot of tricks to e.g. getting the double checked locking pattern to work (e.g. with memory barriers), because you cannot assume that other threads see the changes you do in your thread.

    The only problem here is if the value comes from C++. And even this could be solved if the compilers were smarter (if it could use the info that a method is const/pure in C++).

    @kiriri:
    But did you run this code some thousand times in a loop before measuring on the final iteration? Because maybe the optimizing JIT was not triggered.

    [rant]
    And anyway, in my eyes the .NET JIT is not very good, look at benchmarks comparing languages, .NET is orders of magnitude slower than Java in many tests.
    [/rant]
     
  10. Dustin-Horne

    Dustin-Horne

    Joined:
    Apr 4, 2013
    Posts:
    4,568
    but it takes orders of magnitude longer just to get a Java app installed and running (and sometimes fixed when you update Java). :p

    Anyway, it would be bad for the compiler to cache a property value as was said earlier they may have side effects.
     
  11. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,447
    Caching this is like caching DateTime.Now value.
     
    angrypenguin likes this.
  12. thxfoo

    thxfoo

    Joined:
    Apr 4, 2014
    Posts:
    515
    No, the compiler could see if getting a property has a side effect.
    The only problem here is that it comes from C++, so the compiler is blind. Otherwise this would not be a problem.

    Hell, JIT can even optimize stuff in a way that can go wrong and check for it later to correct. But that would not even be needed in this simple case.

    Disclaimer: was involved in Java optimizing JIT development. [Stuff like swapping interface method calls with direct call to most probable candidate, you check if it was correct otherwise you virtually call the right method. Beats standard interface method call by orders of magnitude in almost any benchmark. Not sure if .NET JIT has such features.]
     
    Last edited: Dec 5, 2014
  13. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,447
    Code (csharp):
    1. for (i = 0; i < rend.sharedMesh.vertexCount * 3300; i++)
    Ok, putting aside all the theory about what compilers could do, the only practical solution for any observable future is to follow the rule: if you do something and it hurts, don't do that.
     
    R-Lindsay likes this.
  14. Dustin-Horne

    Dustin-Horne

    Joined:
    Apr 4, 2013
    Posts:
    4,568
    The point is that we don't know if vertexCount has side effects. It very well could or the getter could be making a method call which would stop I inlining of the property by the .net compiler thus compounding the marshalling issue.
     
  15. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,447
    We don't know if vertexCount has side effects, we don't know if sharedMesh always returns the same value and has no side effects, we don't even know if rend is always the same. And compilers are not humans, even C++ compilers aren't. There's nothing we can do except adding one simple line of code that just caches the value that needs to be cached.
     
    angrypenguin and Dustin-Horne like this.