Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Material Instancing negative perforance

Discussion in 'General Graphics' started by HugPila, Apr 19, 2021.

  1. HugPila

    HugPila

    Joined:
    Jun 26, 2019
    Posts:
    11
    Hi!

    I am very confused. I created a test scene where I loaded a mesh X amount of times in with two different materials, one Instanced and the other one without. The Stats panel in the editor tells me the Non-instanced version runs faster. How can that be?
     
  2. HugPila

    HugPila

    Joined:
    Jun 26, 2019
    Posts:
    11
    See here
     

    Attached Files:

  3. LennartJohansen

    LennartJohansen

    Joined:
    Dec 1, 2014
    Posts:
    2,394
    I would first try to use the normal profiler from a build. The stats window or profiling in the editor has a lot of overhead.

    but in general just enabling instancing will not speed up that much. You still are using gameobjects and unity need to cull these, group in batches for instancing before drawing the final meshes. This work is similar in both cases. I would think the cpu is limiting the FPS while the GPU is faster on instancing.

    to get the real benefits try to skip the game objects. Do your own culling in jobs or compute shaders and draw the meshes with the instancing api
     
  4. HugPila

    HugPila

    Joined:
    Jun 26, 2019
    Posts:
    11
    From the Unity documentation:
    https://docs.unity3d.com/Manual/GPUInstancing.html

    It says rendering performance significantly improves with instancing. Is quite striking that you are telling it does not speed up that much after all. Specially when measuring it we have a negative performance.

    Why do you mean by skipping the game objects? Why do you mean by " drawing the meshes with the instancing API"?

    Thanks a lot!
     
  5. hopeful

    hopeful

    Joined:
    Nov 20, 2013
    Posts:
    5,676
    Along similar lines ... meshes with low polys and the same material are better combined than instanced.
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    I'd say Unity's documentation is a little too resolute in its assertion that instancing improves performance. It can improve performance. But it can also hurt it. This post from @zeroyao sums it up nicely.
    https://forum.unity.com/threads/gpu...m-to-improve-performance.437766/#post-2833127
    To add to the above quote, instanced rendering on the GPU is often slower than non-instanced rendering. At least if you are to compare a single draw vs an instanced draw of the same complexity. How much slower depends on the GPU. You might think "then what is the purpose of instanced rendering if it's slower?" The answer is it might not be significantly slower, especially on newer GPUs, and the minor additional rendering cost might still be an overall win if the GPU is spending less time idle waiting for rendering commands from the CPU.

    If you look at your screenshots you can see the render thread times have been reduced from 3.4 ms to 1.0 ms. However your main thread has increased from 35.1 to 42.3. This could be due to the "small amount of overhead" mentioned in the quote above being a more significant percentage of your CPU time, or because your GPU has a much higher cost to instancing compared to non-instanced rendering and the main thread is mostly GfxWaitForPresent (GPU) limited.

    Unity was hesitant to add instancing to Unity for a long time, specifically because on a lot of lower end hardware it was a net negative.
     
    hopeful likes this.
  7. HugPila

    HugPila

    Joined:
    Jun 26, 2019
    Posts:
    11
    Thanks a lot for you answer bgolus, very helpful.

    And yes hopeful, you are refering to 300< meshes right?
     
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Dynamic instancing is generally a lot faster to render on the GPU, and can be faster on the CPU. But yes, it has limits both in total vertex count and the amount of data each vertex needs (position, normal, tangent, vertex color, UVs), either 300 vertices or 900 total attributes per mesh.

    The total number of meshes batched into one is limited by the mesh vertex limits. So it'll happily batch together multiple thousands of meshes as long as the total vertex count for all of those meshes combined is under ~65k vertices (the limit for meshes using a 16 bit index buffer). So if you have a mesh within the per mesh limit of 300 vertices you can batch together a little over 200 of them. But if your mesh is a single quad you can dynamically batch over 15,000 of them.

    Your example meshes appear to be just over 8k vertices, which means dynamic batching is out of the question. Though you could still use static batching if they don't need to move, or manual batching (via Mesh.CombineMeshes) if you want to manage it yourself. Though at ~8 meshes per "batch" it may not be worthwhile. Technically you could bump up the index buffer to 32 bits and then batch together several hundred thousand.
     
    hopeful likes this.
  9. hopeful

    hopeful

    Joined:
    Nov 20, 2013
    Posts:
    5,676
    Like you, I was surprised to find instancing wasn't as magical as it first seemed. Object pooling, mesh combining, and imposters / billboards still have their roles in scene design, along with instancing. I'm still slowly figuring it all out.

    You are on the right track with checking performance empirically. That's the smart thing to do.

    @bgolus, BTW, is one of the best sources of knowledge you're going to find in the Unity community. :)
     
  10. BattleAngelAlita

    BattleAngelAlita

    Joined:
    Nov 20, 2016
    Posts:
    400
    If by instancing you mean "Enable GPU Instacing" knob in material inspector then yes, it's very very slow and written hugely unoptimal.
    If you make instancing manually, with BatchRendererGroup performance will be much greater.
     
    hopeful likes this.
  11. HugPila

    HugPila

    Joined:
    Jun 26, 2019
    Posts:
    11
    Thanks a lot for all the great info. Yes, it seems like a tough road ahead.

    I recently tried LOD's. First in a static empty scene: that worked perfectly. And then on my main game scene, where, again, the performance was negative! I passed from 4M triangles to 400k....and still my FPS performance was worse.
    How is this explained?

    For Dynamic instancing, I don't need to turn on/check anything right? It automatically happens once I fullfill the prerequisites?
     
  12. hopeful

    hopeful

    Joined:
    Nov 20, 2013
    Posts:
    5,676
    I think the challenge you keep meeting is that the optimizations aren't free. You pay for them in some way, and you need to be aware of that and work it into your performance budget.

    Keep playing with it, juggling parameters like distance and LOD levels. The most practical thing is to have few LOD levels with fairly drastic changes. Like ... full poly, half texture size, billboard, gone.