Search Unity

Just how big a deal is Batch by Shader?

Discussion in 'Universal Render Pipeline' started by Enoch, Oct 2, 2019.

  1. Enoch

    Enoch

    Joined:
    Mar 19, 2013
    Posts:
    198
    I am still trying to evaluate URP in terms of when and if I should use it for projects. I honestly don't know much about this feature. The asset store has plenty of assets built around the concept of solving the issue of rendering speed and most involve the idea of trying to get less draw calls, usually by packing geometry with the same material into one model.

    I have always hated this limitation and the idea of always needing to be concerned with draw calls. Is batching by Shader at the engine level the final answer to this annoyance? Can I now build the ultimate virtual dreamworld without any concern to how many gameobjects I am pumping out to the engine. Will the engine finally be able to optimise this for me?

    If this is true I might just be fine with 8 lights per object (until deffered rendering?), since now my objects don't need to be merged into the same object geometry. I can deal with that and all of the other short commings of URP if I can finally not have to think about object batching as much.

    BTW are there other limitations to the batching in URP? is it like batching SRP where it wouldn't batch objects with more than X (900?) number verts?
     
  2. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,277
    There is no, and never will be, a magic bullet that will fix draw calls. There will always be limits, and systems like the static batcher, dynamic batcher and SRP batcher are some ways to reduce draw calls. All of these have pros and cons, with good uses and bad. As a game developer, you'll always need to be mindful of this.

    But from what I've seen, the SRP Batcher is pretty awesome, and is a pretty amazing batcher. I don't know the details, so I can't answer your final questions, but if you're really wanting to throw around thousands of meshes everywhere I suppose SRP would be your best bet :')
     
    Enoch and spryx like this.
  3. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    There are no such limitations on how this gets batched together. It's batched on a shader basis. This reduces the amount of state changes between real graphics API draw calls, which makes it faster.
     
    Enoch likes this.
  4. WendelinReich

    WendelinReich

    Joined:
    Dec 22, 2011
    Posts:
    228
    Hi @aleksandrk, I'd like to follow up on that because I'm looking into how the SRP Batcher can simplify our production pipeline and help optimize graphics performance.

    - Just to clarify, the SRP Batcher only does static batching, right? (For some reason, this is never actually mentioned in the manual page here.) Does it have all of the limitations of static batching then, or is it somehow different?

    - Batching-by-Shader is extremely interesting to us. Does it mean that texture atlases are now less important, because it doesn't matter if one big or several smaller textures (used by multiple materials that share the same shader variant) are passed over to GPU memory in a single draw call?

    - The manual says that data persistence in GPU memory only works when material properties aren't changed at runtime. Some materials require dynamic per-frame updates (e.g., vegetation affected by wind). Is the SRP Batcher still efficient in such a scenario?

    - Could you tell us how the SRP Batcher relates to GPU instancing? What would be a rule of thumb to decide between one or the other?

    This is exciting stuff. Thank you :)
     
    Lars-Steenhoff likes this.
  5. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    Nope, the opposite :)
    It's a purely runtime batcher.
    Texture atlases are still useful. Switching a texture can still have an overhead. The difference is that it won't cost too much as with the previous batching system.
    Well, it's still efficient. The diagram in the link says that it uploads the custom buffers only when the data changes, so once per frame should be fine.
    I'll take a look at the code and answer later :)
     
    WendelinReich likes this.
  6. WendelinReich

    WendelinReich

    Joined:
    Dec 22, 2011
    Posts:
    228
    I see. In that case, why do you allow us (in the URP settings asset) to toggle on/off Dynamic Batching:
    upload_2020-5-11_21-0-51.png

    ?

    Cool, I'll watch this space. Thanks already for your answers!
     
  7. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    No idea :)
    I would think that they are mutually exclusive, but I might be totally wrong here.

    Re: Instancing
    I'm not entirely sure, in which cases you really have to pick one over the other. I suppose there's a thread about this somewhere (and if not, I guess it would deserve a separate one).

    If I were to choose between the two, I would have written a couple of tests measure the performance first on my target platforms, and then make an informed decision :)
     
  8. WendelinReich

    WendelinReich

    Joined:
    Dec 22, 2011
    Posts:
    228
     
  9. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    @WendelinReich just did, thanks for reminding me to do that :)
    They are _kinda_ mutually exclusive. If the shader is SRP batcher compatible and SRP batcher is turned on, it will not do dynamic batching. If the shader is not SRP batcher compatible, though, it can use dynamic batching.
     
  10. o1o101

    o1o101

    Joined:
    Jan 19, 2014
    Posts:
    639
    Epic disagrees. ;)
    From my experience I have worse performance with the SRP batcher compared to legacy methods on iOS and Android.
    My setup is all good, used the same shader variants everywhere possible, etc, not GPU bound, SRP batcher bumps my CPU usage past 16.7ms, which never happens with legacy methods.
     
  11. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    this is possible - depends on actual content, hardware and graphics API in my experience.
     
  12. WendelinReich

    WendelinReich

    Joined:
    Dec 22, 2011
    Posts:
    228
    It really shouldn't. "Actual graphics API" on Android is one out of three (OpenGLES2/3, Vulkan), and OpenGLES3 seems a reasonable default for most games on most devices at the moment. "Actual hardware" should also not have an impact on whether or not we use the SRP Batcher - that's kinda the point of using Unity in the first place, right? And when it comes to "actual content", OP said they use the same shader variants everywhere possible, so this should be a perfect use case for the batcher.

    This is important - could you please help us understand if the SRP Batcher is or isn't a reliable way of achieving better performance on a vast majority of mobile devices? (Assuming of course its conditions are met, especially the one about having a very small number of shader variants.)
     
  13. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    Probably, it shouldn't, but that's how it is.

    SRP Batcher requires using constant buffers. Using constant buffers instead of individual variables can be slower on some mobile hardware (subject to actual content). At the same time, SRP Batcher makes the general draw call setup cheaper. So, depending on which thing overweights (performance loss from using constant buffers or performance gain from cheaper draw call setup) it can be either faster or slower. And this can also vary between graphics APIs - the drivers can do different optimizations there. So with Vulkan, for example, SRP batcher should be faster in all cases (I haven't seen it being slower, at least). In GLES3, there are various vendor-specific driver optimizations that can be disabled when using constant buffers, so performance may suffer from turning it on - and this depends on actual content.
    GLES2 is out of this equation, as there's no constant buffer support anyway :)

    So there can be no universal "turn SRP batcher on and enjoy faster rendering everywhere" answer. For some GPUs we had to disable constant buffers completely (and hence SRP batcher), because using them resulted in an order of magnitude slower performance, even in the perfect case.

    Devices that support SRP Batcher usually provide Vulkan support as well anyway (as it requires OpenGL ES 3.1), so if your graphics API list has Vulkan as the first graphics API, in the majority of cases performance will be better if SRP Batcher is turned on.
     
    WendelinReich and Lars-Steenhoff like this.