Search Unity

IL2CPP super SLOW vs Mono Standalone

Discussion in 'General Discussion' started by Rafael_CS, Jul 16, 2020.

  1. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
    Hello guys, i working on a WebRTC in full c#.

    In last week i finally finished my H264 Codec Decoder (100% C#) but i facing a super strange problem when building the project with IL2CPP.

    In mono Build i can archive around 30~40fps while decoding a 1920x1020 H264 Stream, but when i swap to IL2CPP i can only reach half of this (around 15~20fps).

    I tried everything in project settings options (Master C++ Compiler Configuration, Strip Engine Code, etc) but seen that IL2CPP can't handle well the heavy CPU byte[] operations.

    Here my results after trying everything:
    Code (CSharp):
    1.  
    2. //Tested in Unity 2019.4.1f1 with 8 threads in intel i7 6700HQ
    3. --------------------------------------
    4. | MONO          | 38,42 frames/sec   |
    5. --------------------------------------
    6. | IL2CPP        | 19,55 frames/sec   |
    7. --------------------------------------
    Maybe someone has a tip to improve perfomance in IL2CPP? Is mono really much better optimized to heavy byte[] operations?

    PS: I Already tried to replace byte[] to unsafe byte* calls, or optimize for(int i=0; i<array.Length; i++) to for(int i=0; i<n; i++) but no improvements noticed in final decoding speed.

    EDIT:
    Well i already made another benchmark with a simple loop with 600000000 iterations
    and seems that IL2CPP is super slow with generics. The weird thing is that i just use byte[] and MemoryStreams in H242 operations... maybe related to memory stream, or Array.Copy / Buffer.BlockCopy?
    upload_2020-7-16_1-49-45.png
     
    Last edited: Jul 18, 2020
    DrSeltsam and Meltdown like this.
  2. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    Can you share this project via a bug report? We would love to have a look at the behavior and improve the performance with IL2CPP here.
     
  3. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
  4. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    Thanks! Can you let me know the bug report number?
     
  5. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
    @JoshPeterson number 1264028

    Another tip from my investigations releaved that one generic method that simplificate BlockCopy perform a huge overhead in IL2CPP

    ArrayHelper.BlockCopy<T>

    Maybe generics dont perform well in IL2CPP?

    EDIT:

    Okey, finally found that all the problems was related to functions with IList<T> as parameter.
    Changing IList<T> to T[] i finally fixed the performance problems in IL2CPP (and now is running faster than Mono)

    EDIT 2: Net Core seems to keep running twice faster than final build in IL2CPP, is that correct? RyuJIT compiler seem to be much faster than unity.

    EDIT 3:

    Unity Editor runs 20 times slower than IL2CPP Build and my code is useless inside editor
     
    Last edited: Jul 19, 2020
    Joe-Censored likes this.
  6. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    586
    I think IList<T> looping isn't generics being slow, it's interfaces.
    A method like:
    void Loop<T> (IList<T> list) { }
    will pass an interface and use virtual calls it can't inline etc, unless you're lucky and the entire method gets inlined away

    I'd be interested to see how it performs if you adjust it to
    void Loop<T, U>(U list) where U : IList<T> { }
    as I believe that'll instantiate functions for every list type you pass in, getting rid of the virtual calls (unless your U is an interface? idk if that works) and making inlining etc easier for the compiler
     
  7. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
    Nice tip, let-me try your suggestion
     
  8. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    Thanks for the updates. We will investigate this.

    Regarding the editor performance - what version of Unity are you using? You may need to ensure that debug code generation is disabled. In versions of Unity prior to 2020.1 (I think that version is correct), code generation is debug by default, and can be changed in the editor preferences. Later versions of Unity switched to release code generation in the editor by default.
     
  9. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
    Thanks for reply @JoshPeterson , i using unity 2019.4

    Where can i find this option? i never saw this property in editor preferences. I tryed to find this "debug code generation" option but i failed to find it.
     
  10. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    You can find it in the Edit > Preferences > External Tools dialog (on Windows, on macOS I believe it is Unity > Preferences > External Tools). You want to disable the "Editor Attaching" option.

    You will need to restart the editor for this change to take effect. Once you do that, the JIT will emit release code, but you won't be able to attach the managed debugger to the editor until you enable this option and restart the editor again.

    In 2020.1, this workflow is much better - you get release code generation by default. Then when a debugger is attached, the editor will prompt you to switch to debug code generation. It can do this without restarting.
     
    Rafael_CS likes this.
  11. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
    Thanks @JoshPeterson i will upgrade to Untiy 2020 to access this better workflow. Thank you for your time.

    @Zuntatos i tested your approuch but it seems to run ultra slow too.

    The problem is with IList<T> ... even if i pass parameter as U with U : IList<T> the methods used will continue be from IList interface, even if i pass, for example, byte[]. The performance is the same as using simple IList<T> as parameter
     
    Last edited: Jul 20, 2020
    Zuntatos likes this.
  12. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
  13. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    I wanted to follow up on this issue, as I've just had a chance to investigate it. As @Zuntatos mentioned, the issue here is not generics, but is instead the interface calls. IL2CPP has a worse algorithm (in terms of run time performance) then Mono for interface method calls. So I would recommend avoiding tight loops which do interface methods calls if at all possible.
     
  14. JohnnyA

    JohnnyA

    Joined:
    Apr 9, 2010
    Posts:
    5,006
    Its great that you got a response back to us, but this seems like a pretty painful limitation (its easy to work around, but the workaround is likely at the expense of the reusability of your code).

    Is there any idea of if/when this might change?
     
    Last edited: Nov 4, 2020
    MadeFromPolygons and Rafael_CS like this.
  15. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    3,267
    When can we expect that to be fixed? Having to avoid interfaces obviously is a major issue, some of us have client projects that are already live and at scale that use interfaces extensively as there was no information beforehand to say otherwise. Rewriting these would not be a small task, and would be at a serious cost to our business not only financially but in terms of time.
     
    Last edited: Nov 4, 2020
  16. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    I don't have an ETA for this work, sorry. As with anything else, we need to weigh the cost of improving this against other priorities. If you have specific cases where this is causing a performance problem, we would love to have a look at them. If we have enough data, that may raise the priority of doing this work, so I would recommend profiling code that you think might be impacted by IL2CPP's interface method invocation algorithm.

    Microbenchmarks like the one discussed here are important for testing and improving specific performance issues. But we've found that benchmarks like this are not indicative of whole program performance. While interface method calls are slow, they usually make up a very small fraction of the overall time spent during program execution. So while improving them would be positive, there are often other changes that can provide more benefit.

    But again, data is the best way to inform decisions like this, so I'd love to see any profiling information. Thanks!
     
  17. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    3,267
    Thanks, unfortunately I dont think I have the rights to do that with our codebase but I will talk to management and see if we can get some sort of reports together for you so that you have some data. Thanks for getting back to us!
     
  18. DrSeltsam

    DrSeltsam

    Joined:
    Jul 24, 2019
    Posts:
    76
    I also want to emphasize that interfaces are fairly important for us - and sometimes they're unavoidable. It's good to know that they can potentially cause performance issues, so we will take a closer look at the parts of our code which use them extensively (e.g. our networking implementation).
     
    Last edited: Nov 14, 2020
    Rafael_CS likes this.
  19. Kamyker

    Kamyker

    Joined:
    May 14, 2013
    Posts:
    719
    Andresmonte, Havyx and Rafael_CS like this.
  20. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,035
  21. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,200
    We have done some benching on il2cpp vs net 5 and in all cases net 5 is faster than il2cpp. In many cases its also faster than cpp. Pretty cool what hardware specific just in time compilation can do ä for performance.

    Not being able to use interfaces in a tight loop is a showstopper if you want to keep a abstract and maintainable domain.
     
    Andresmonte likes this.
  22. Havyx

    Havyx

    Joined:
    Oct 20, 2020
    Posts:
    140
    Still doesn't make sense imo. I thought this was the reason for the issue tracker and upvoting (users decide Unity's priority to a certain extent via voting for stuff they want fixed).

    "View bugs we have successfully reproduced, and vote for the bugs you want to see fixed most urgently."

    But we can't vote for it. Instead we are supposed to deep profile and show specific use-cases where it has an obvious detrimental impact.... and then Unity might consider bumping it up their priority list.

    It's just kind of weird because it's marked as "By Design".

    RESOLUTION NOTE:
    Interface methods calls in IL2CPP are slower then they are in Mono. This is a known issue that we plan to address in the future, although we don't have any ETA for that work to be completed

    So.... the initial IL2CPP implementation is slower by design? I assume there was some trade-off made where it is slower than mono but there are other benefits that outweigh this reduction in speed.
     
    Andresmonte and Kamyker like this.
  23. Kamyker

    Kamyker

    Joined:
    May 14, 2013
    Posts:
    719
    It should be obviously marked as Postponed that doesn't disable voting.
     
  24. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    Yes, the current design favors smaller executable code size and memory usage than the approach Mono takes.
     
    Kamyker, MadeFromPolygons and Havyx like this.
  25. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    5,561
    I've changed the bug report to be postponed now. It might take a few minutes to show up on the issue tracker, but please vote for this issue if it matters for your use case. Thanks!
     
  26. Rafael_CS

    Rafael_CS

    Joined:
    Sep 16, 2013
    Posts:
    142
unityunity