Search Unity

IL2CPP build running at only 42% the speed of Mono build

Discussion in 'Windows' started by guavaman, Apr 29, 2020.

  1. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    I spent several days working through issues in a complex project I'm working on to get it to build in IL2CPP on Windows Standalone in an attempt to get more performance out of Unity. After I finished and made sure all the errors and issues were resolved, I decided to do some performance testing in builds to see the results of my efforts. Things didn't turn out like I hoped.

    In the exact same scene with exactly everything identical happening on both, same screen resolution, same display mode, same everything, I'm seeing a 140% speed increase using Mono over IL2CPP. :confused:

    Mono: Avg: 103 fps, Min: 85 fps, Max: 129 fps
    IL2CPP: Avg: 43 fps, Min: 45 fps, Max: 46 fps

    Average frame rates were sampled over 1 second. VSync is disabled on both builds.
    • Unity 2018.4.20f1
    • 64-bit builds
    • 4.x API compatibility level
    • Development build is disabled
    • Tried manual IL2CPP project build and building Release build in VS 2017 with optimizer set to optimize for speed. No difference.
    • Nothing is being log spammed in either build.
    • Core i7-8700K @ 3.7GHz
    • 32 GB RAM
    • Nvidia Geforce RTX 2070
    • Disabled AV software
    I'm baffled by this result.

    Are there known causes for poor performance with IL2CPP vs Mono?
     
    Last edited: Apr 29, 2020
  2. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Did you try profiling it? Does everything run slower or is there a particular function that just takes forever on il2cpp?
     
  3. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Thanks for the reply.

    I've tried, the profiler will not connect to the build. I've tried dev build + auto connect profiler and manually connecting at various IP addresses including 127.0.0.1, 192.168.1.6, 192.168.56.1 (IP addresses of the local NICs). Nothing works. Build and Run and manual launch after build. Windows Defender firewall and Avast AV are fully disabled.

    I will try updating this project to 2019.3 and see if I get different results.
     
  4. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    After getting the Profiler connection working and adding a bunch of profiling code in my application, I was able to see that most of the performance difference came down to one area of code that ran certain functions potentially hundreds of times per frame (retrieving and returning hundreds of small poolable objects [essentially just data structs, but using classes instead] to avoid GC). These objects are being retrieved from an ObjectPool<T> helper class I made which uses C# lock when an item is retrieved, returned, or the pool is cleared to allow the pool to be accessed from multiple threads. (Multiple threads were not accessing this pool instance in this case.)

    This section of code that was calling lock so many times was running in Mono at up to 5x the speed of IL2CPP. Other areas of the application were actually running faster in IL2CPP, but any performance benefit from that was being vastly outweighed by the huge performance hit of lock.

    Digging down even deeper, I attempted to re-implement lock using Monitor.Enter and Monitor.Exit as is generated by the compiler. My results were identical. Gigantic performance hit.

    Code (csharp):
    1. System.Threading.Monitor.Enter(x);
    2. try {
    3.     // do something
    4. } finally {
    5.     System.Threading.Monitor.Exit(x);
    6. }
    Edit: I initially thought the try/finally block was responsible for the slowdown, but it wasn't. The culprits are the calls to Monitor.Enter and Monitor.Exit. Specifically, Monitor.Exit is where the performance hit happens.

    Now the results are:
    Mono: Avg: 103 fps, Min: 85 fps, Max: 129 fps
    IL2CPP: Avg: 156 fps, Min: 128 fps, Max: 182 fps

    Now IL2CPP runs on average 151% the speed of Mono.
    I multiplied the performance of my application by 3.6x by removing lock statements from one class.

    Takeaway:
    Lock (Monitor.Enter/Exit) has an incredible performance overhead in IL2CPP and should be avoided in any code executed frequently.

    Is this a known issue with IL2CPP? Is there some big page somewhere of all the do's and don'ts when using IL2CPP? If there isn't, there seriously should be. All these pitfalls need to be known by developers. It isn't just a simple process of writing C# code and having your application run correctly or fast when built to IL2CPP.
     
    Last edited: Apr 30, 2020
  5. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    I would have never guessing try/finally would have such a profound effect on performance. I think this actually might be a bug. I'll bring this to the attention of people who work on IL2CPP.
     
  6. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Give me a minute. I may be wrong about try/finally. I think it's actually Monitor.Enter and Monitor.Exit. I may have been omitting the Exit call by accident in my previous test due to a return in the contained code. Testing again. It's definitely lock if not try/finally.
     
  7. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Yeah, it's Monitor.Exit. I had a sneaky return that wasn't properly accounted for so it was omitting the Monitor.Exit call. Fixing that omission, I'm back to 43 fps without try/finally. I'll edit the post above to reflect this. I guess it's lock afterall and not try/finally.
     
  8. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    For now you should be able to work around this by using a different synchronization mechanism (like a mutex).
     
    guavaman likes this.
  9. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Unfortunately Mutex is even slower at 26 fps in IL2CPP and 24 fps in Mono.
     
  10. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Damn, how many times are you locking it per frame?
     
  11. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Too many, clearly. :D

    What it looks like to me is lock calls are overhead-free in Mono if no other thread has the lock. That's clearly not the case in IL2CPP.

    I'm going remove use of thread-safe classes like this unless I actually need thread safety.

    Thanks!
     
  12. guavaman

    guavaman

    Joined:
    Nov 20, 2009
    Posts:
    5,624
    Using Interlocked.Exchange and SpinWait for synchronization in this scenario works much better than lock. IL2CPP build runs at the 153 fps. Though, SpinWait has potential problems.
     
    Last edited: Apr 30, 2020