Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Have a look at our Games Focus blog post series which will show what Unity is doing for all game developers – now, next year, and in the future.
    Dismiss Notice

Why would you not have GPU instancing on?

Discussion in 'General Graphics' started by SamohtVII, Aug 13, 2019.

  1. SamohtVII

    SamohtVII

    Joined:
    Jun 30, 2014
    Posts:
    332
    I am trying to optimise my game and came across GPU instancing which seems perfect for my trees in game. I am going to turn GPU instancing on for all of them but why would such a feature not be on by default and never be turned off? I haven't found the downside to it.
     
  2. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    9,611
    It has been discussed before. It has overhead, it might make performance worse, it depends, profile your game.
     
  3. LightStriker

    LightStriker

    Joined:
    Aug 3, 2013
    Posts:
    2,709
    Have you a link to that?

    I know Static and Dynamic batching both have pros and cons, but I'm yet to see cons for GPU Instances.
     
  4. SamohtVII

    SamohtVII

    Joined:
    Jun 30, 2014
    Posts:
    332
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,823
    Instancing has some additional fixed costs associated with it, both on the CPU and GPU. For a small number of objects it's potentially more expensive to use instancing vs even just drawing each individually.
     
  6. godbian

    godbian

    Joined:
    Aug 3, 2020
    Posts:
    4
    can be more specific? I want know more about it.
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    11,823
    To do instancing the CPU has to gather all of the renderers using the same material, mesh, and shader variant, then pack up all of the transforms and other per instance data into one or more arrays of data and upload that to the GPU all at once. This is similar work that happens for dynamic instancing, though instancing has to do the additional work of going through the optional material property blocks assigned to the renderers on top of just the base materials. Those arrays are also either needing to be recreated each frame if the number of objects change, or you're reusing an existing larger array which you're reuploading even though only part of the data is changing. Both have costs to use, but you do want the number of objects to change because you want to cull the objects not in view so you're not paying the cost on the GPU of calculating vertices for meshes you won't see.

    On the GPU, using instanced data adds some indirect data access, which adds some cost. When using instancing all of the data is in those arrays, but the shader gets told the instance ID and then has to go to each of those arrays and get the data from them, where as normal rendering, or static/dynamic batching the data directly handed to the shader all ready to be used immediately. How much cost depends on the GPU, which mobile GPUs taking a bigger hit than desktop GPUs.

    One thing most people kind of get wrong about both dynamic batching and instancing is neither are really about making the GPU render faster, it's about GPU utilization. Technically speaking, the GPU is doing roughly the same amount of "work" in roughly the same amount of time regardless of if objects are rendered individually vs batched or instanced (and actually more work for instancing). What both do is remove down time that the GPU is sitting stalled waiting for additional commands from the CPU. If you tell a modern GPU, even a mobile one, to render a single sprite, it'll be finished with that in nanoseconds and sit stalled waiting for the next command from the CPU. So both are trying to take a little more time on the CPU to get a "bigger" draw ready with the expectation that getting one "bigger" draw will take less time on the CPU than the total time it takes to do lots of individual draws, and that the GPU will stay busy long enough that the CPU can get the next draw command sent before the previous finishes so the GPU isn't waiting around.

    But because getting those "bigger" draws takes a bit more time on the CPU, if you're only doing it with a smaller number of meshes you may end up increasing how long the CPU takes vs individual draws.