Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question WebGL Framerate Fluctuations and Profiling Issues

Discussion in 'Web' started by wattse13, Sep 19, 2023.

  1. wattse13

    wattse13

    Joined:
    Apr 3, 2019
    Posts:
    9
    I originally posted this problem in General but am moving it over here as this discussion board is more specific to my problem. The original thread can be found here: Original Thread Link.

    Original Thread tldr;

    The Goal:
    • We are targeting the WebGL platform.
    • It is a 3D game with what I think is a moderate amount of 3D assets.
    • Ideally this game could be played on a Chromebook, or other similar low-spec hardware, in a classroom setting.
    • We are using Unity Engine 2019.4.20f1
    The Issues:
    • Framerate fluctuates between 50 and 15 fps.
    • I have been unsuccessful in using the profiler to find the underlying cause.
    Things I Have Tried:
    • I have tried creating/profiling builds in which known-high poly 3D assets have been deleted/disabled.
    • I have tried creating/profiling builds in which no Nav-Mesh agents were active.
    • I have tried creating/profiling builds with baked lighting.
    • Removed scripts that generated garbage.
    When the above changes were made, framerate improved slightly, but the fluctuations did not stop – i.e., with fewer assets the framerate didn’t drop to 15 as often but did still drop severely and regularly.

    profiler09-19.png

    profileAnalysis09-19.png

    The large spikes in the middle of the Profiler and the second half of the Profile Analysis are from WaitForTargetFPS and occured when the browser window that was running the game was minimized. I think they can be ignored?

    Since Posting I have tried using the Profiler Analysis Tool to further investigate what is going on. Between the Profiler Analysis and the Profiler, I get the impression that my options are:
    • Reduce the number of assets that need rendering.
    • Stop using Cinemachine.
    • Upgrade to a newer version of Unity.
    I feel like those are my options because based on the Profiler it looks like rendering operations are the most expensive by a large margin. It looks like there might be some possible improvements to be made with Animators, UI elements, or Nav-Mesh objects, but I’m not sure how much of a difference that would actually make.

    If anyone can see something that I am missing, please let me know!
     
  2. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,236
    If you have elements in your uGUI canvas that need updating every frame, like some text display, then moving them into their own canvas that is less complex might be beneficial. (Compare before and after in Profile Analyser)

    Otherwise, yeah, less draw calls, would probably help. Physics has to run a bit more often in the slower frames but it's unclear of omhow much that's making up. Reducing the FixedUpdate frequency via the TimeManager settings might help make things a bit more stable?

    Also, an upgrade might be worth looking into.
    As does turning off Incremental GC, just to check what the perf impact of the write barriers is. If you've already reduced the GC.Alloc amount, incremental GC might cost you more across all frames vs the spikes that non incremental GC incurs
     
  3. wattse13

    wattse13

    Joined:
    Apr 3, 2019
    Posts:
    9
    Thank you for the suggestions!

    I think because we are targeting WebGL, Incremental GC is unavailable.

    This is a little embarrassing, as lighting/shaders seems like a very obvious place to look first for performance optimizations, but I was able to make some significant performance gains by swapping "Lit" shaders with "Simple Lit" shaders. The Profile Analyzer screenshot below shows the difference between the two. I think I will take it a step further and remove all dynamic lighting and just use baked lighting so that I can use "Baked Lit" Shaders.

    profileAnalysisCompare.png

    If I understand the profiler correctly, and this discussion correctly, would it be accurate to say that this project is CPU bound because it spends very little time in Gfx.WaitForPresentOnGfxThread processes?

    If yes, would it also be fair to think that because our target hardware will be low-spec (integrated or very weak gpus) that being CPU bound is better?

    Thanks again for your time and help! I didn't know about the Profile Analyzer tool and learning more about it has been a great help. We are still working through some of the other suggestions you made and we will hopefully be able to implement most/all of them into our project!
     
  4. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,236
    Given that its single threaded and the Render Thread work is pulled into the Main Thread, reducing parallelism and increasing CPU boundness for anything that is doing a lot of Rendering. And yeah, I think you are CPU bound with a lot of time in Rendering.


    That depends on the relative performance of the CPU I'd say? Also, if the hardware has to split power/heat budgets across CPU/GPU then it only helps you if the GPU has idle time.
     
  5. jukka_j

    jukka_j

    Unity Technologies

    Joined:
    May 4, 2018
    Posts:
    951
    In web browsers, it is the browser itself that waits for presentation. In browsers, there does not exist a mechanism for JavaScript or WebAssembly code to synchronously wait, so code such as Gfx.WaitForPresentOnGfxThread or WaitForTargetFPS should not be running on WebGL, and you should in all scenarios see those blocks to be showing a very small time spent in them. (as they are basically no-ops)

    This is because web browsers run the web page's event loop (the user interaction with any web page elements) in the same main thread as the WebGL rendered content does, so if the WebGL content would stop to synchronously wait, that would block the web browser's own event loop from processing forward user interaction.

    So in Web platform builds, all rendering is based on JavaScript requestAnimationFrame() or setInterval() event callbacks, Unity never stops to wait, and instead the browser always waits for presentation and the GPU to catch up.

    Unfortunately web browsers do not provide any mechanisms for JavaScript/WebAssembly to distinguish between time that the browser spends idling from no work vs the time that the browser spends waiting for the GPU to finish rendering. It can be approximated with WebGL GPU timer queries, but it is not that great. So Unity's profiler won't exactly know whether it is a CPU&GPU idle, CPU-busy-GPU-idle vs a CPU-idle-GPU-busy scenario.

    As developer, you can get a feel of which one it is by doing exactly what you did - by affecting GPU heavy parts (shaders) and test if it changes the performance. If it does, then it gives a good understanding that the system was GPU bound.

    I recommend using Firefox Profiler to get a view from the browser's perspective to the issue. That can give a look from how the performance situation looks like at the browser level, and help see if the CPU is at times idle (waiting for GPU) or 100% busy. Although do note that Firefox, Chrome and Safari each do have somewhat different performance footprints, so there may be a little bit of mapping to do to extrapolate to other browsers. Unfortunately at the time of writing, Chrome and Safari do not provide anywhere as great profiling tools that Firefox does.
     
    wattse13, fxlange and MartinTilo like this.
  6. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,236
    TIL, thanks :)
     
  7. wattse13

    wattse13

    Joined:
    Apr 3, 2019
    Posts:
    9

    Thank you both @MartinTilo and @jukka_j for the responses! Sorry it took me awhile to respond. I wanted to try and familiarize myself with the tools you both suggested.

    @jukka_j I read through another thread you were active in where you described and suggested the Firefox profiler. I tried to set it up as described in that thread and have been trying to use it on my own project.

    When I use the Firefox profiler, I can see that:
    • There is a lot of jank. Like all jank all the time.
    • At least a third of the frame time is spent on LateUpdateFinishFrameRendering processes.
    • The GPU is completely inactive?
    Link to the Profile: https://share.firefox.dev/3LJENrM

    jank10-02023.png

    In previous tests/iterations of the project when we moved from Lit shaders to Simple Lit, performance improved significantly. When we moved from Simple Lit to Baked Lit, performance improved somewhat.

    bakedLitSimpleLit.png

    Because shaders are GPU intensive, the increase in performance we got makes me think that our performance issues are GPU related rather than CPU. However, I am confused because the Firefox profiler seems to be telling me that the GPU isn’t doing much of anything and that the CPU is spending a lot of time in rendering processes.

    We expect that our target hardware will have very low-end GPUs. Because of that, we have been avoiding using features like allow GPU instancing. Is that a correct assumption to make?

    Thank you again for all the suggestions and help so far!
     
  8. jukka_j

    jukka_j

    Unity Technologies

    Joined:
    May 4, 2018
    Posts:
    951
    Thanks for sharing the profile, that helps look at the same thing.

    I like to use the 'm' button in Firefox Profiler to merge callstack frames from distinct intermediate call paths that lead to the same subtrees (right-click in the Flame Graph view to see a context menu of options).

    Cleaning up the Flame Graph profile, it gives https://share.firefox.dev/46c3LID that looks

    upload_2023-10-3_13-31-19.png

    The general "shape" of the profile looks typical to what Unity web projects have, so nothing too bad there. However, looking at a Bottom-Up Callstack of JS functions, it gives

    upload_2023-10-3_13-30-58.png

    the top hot call paths in that profile are in Performance.now(). That suggests you are running a Development build with Unity's profiler enabled.

    In general when profiling with Firefox Profiler, it is recommended to do a Release build with Embedded Debug Symbols and Unity's Profiler disabled, so that there will not be overhead from two profilers running at the same time. That will give cleaner profiles.

    Another observation there is the relatively long time spent in _glClientWaitSync. Firefox likes to spend time there when it is in a GPU backpressure situation waiting for the GPU to catch up, so this does suggest that the game is GPU bound.

    Are you looking at this swimlane?

    upload_2023-10-3_13-39-47.png

    The fact that this swimlane looks empty does not mean that the GPU is idle. Firefox profiler does not actually ever measure GPU utilization, but the GPU Process, CanvasRenderer and Renderer threads there display time spent in Firefox's worker process on the CPU side when it is submitting work for the GPU to do.

    That would not be recommended. GPU instancing is a feature that is designed to improve CPU->GPU work dispatch performance. It does not come with any drawbacks for the GPU, not since DirectX 10 from circa 2007. All GPUs, even the most lowest end, should readily support instancing.

    With Unity, in order to get good rendering performance, it is recommended to always steer towards instancing, which in SRP/URP is implemented by the "SRP Batcher" feature: https://docs.unity3d.com/2023.3/Documentation/Manual/SRPBatcher.html

    Typically all the instancing + batching features should be enabled: https://docs.unity3d.com/Manual/DrawCallBatching.html for maximal performance.

    After enabling these features, Unity's Frame Debugger will be able to give an account of whether the renderer was successful in actually enabling instancing or batching in the draw calls: https://docs.unity3d.com/Manual/frame-debugger-window.html

    See https://docs.unity3d.com/Manual/ProfilerRendering.html also for an overview stats view of how many draw calls are performed. In general in Unity, lots of draw calls will swamp performance on the CPU side.
     
  9. wattse13

    wattse13

    Joined:
    Apr 3, 2019
    Posts:
    9
    Thank you!

    Alright so I took your advice and turned on all batching and instancing features.

    When I turned on allow gpu instancing on the various materials used in our game and on the project settings level I noticed that the number of apply shader calls in the SRP loop decreased and the amount of apply shader calls in the standard loop increased.

    However, the Firefox profiler showed less time being spent in _glClientWaitSync with gpu instancing enabled. The Profile Analysis tool also showed better performance with gpu instancing enabled.

    LWRP; SRP: https://share.firefox.dev/3LIiXoH

    LWRP; SRP & GPU Instancing: https://share.firefox.dev/3RKgfTf

    SRPvsSRPAndGPUInstancing.png

    I am assuming that going forward will be a balancing act of trying to find the most efficient batching/instancing combinations for our target hardware. The materials/textures in the project are also all a bit messy at the moment, so it will probably take some time to get them organized in a way that best takes advantage of the batching/instancing features.

    I am still worried about how much jank there is when I profile the game using the Firefox profiler. Is jank just a general indicator of poor performance? I am struggling a bit to find more about it in the profiler documentation.

    Yes, I thought the GPU swimlane was measuring GPU usage. It is good to know that it doesn’t. Is looking at time spent in _glClientWaitSync generally a good way of seeing how much is going happening on the GPU side of things? Is there a general target for how much time should be spent in _glClientWaitSync?

    I noticed that when the SRP Batch Profiler ran in a build of the game, it said that all apply shader calls were happening in the standard loop. It said nothing was happening in the SRP loop. Is it normal to see a difference in the SRP Batch Profiler between editor play mode and builds?

    Thank you again @jukka_j for all the help and your patience with all my newbie questions! Both my general well-being and the project’s performance have dramatically increased since getting help on this forum :)