Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Clarifying how Particle systems use Instancing and Dynamic Batching

Discussion in 'General Graphics' started by Deleted User, May 15, 2019.

  1. Deleted User

    Deleted User

    Guest

    Hello!

    I've been doing some experiments to try and understand better how to optimize draw call count with particle systems. This is particularly important in my case where I'm targeting a mobile platform with very tight performance constraints.

    My first experiment was to just have one mesh-based particle system in my scene, with instancing for it disabled and dynamic batching disabled. In that case I was expecting each particle to take one draw call. Here is what I got instead in the Unity profiler (hooked up to the device): Draw Calls 2 Total Batches 2. 1 Draw Call probably comes for the Clear call for the background, which means it took one Draw Call to render the entire system even with instancing and dynamic batching off. This leads to my first set of questions: even with instancing and dynamic batching disabled, does Unity still batch particles within one system? If so, what's the benefit of using instancing here? Would it only allow me to have different properties per instance without breaking batching?

    I then duplicated the particle system. It's using the same material and the same settings except a different transform. Instancing and dynamic batching still off. When profiling on device I now get: Draw Calls 4 Total Batches 2. This seems somewhat consistent with how dynamic batching is supposed to work with particle systems: no extra batch but extra draw calls, supposedly cheaper than regular draw calls. This leads to more questions: why is the Draw Call count increasing by 2 and not one? Also, why is this happening given that I have dynamic batching off? Instancing wouldn’t allow me to combine draw calls across various particle systems, correct?

    Now I keep adding particle systems so there are 5 totals. Same settings, same material, just different transforms. Now I get Draw Calls 10 Total Batches 2.

    What happens if I do the same test but enable dynamic batching? Same results, which kind of makes since the results earlier looked like the system was acting as if dynamic batching was enabled. Why is there no difference with dynamic batching on vs off?

    Thanks for your help!
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    A Particle System doesn't really use "dynamic batching" so much as it generates a combined mesh every frame to render with, which is similar to what dynamic batching does. It isn't limited to 300 vertices or 900 attributes per mesh, and can't be turned off* using project settings.

    Unity also will not dynamically batch two particle systems. Most of the time the meshes a particle system produces is likely more complex than the dynamic batching system would touch anyway. Also more reasons which I'll talk about later.

    * It can be turned off on the particle system itself by enabling instanced rendering, assuming the material in use also supports particle instancing.
    The benefit of batching is the perf cost of generating a combined mesh on the CPU and uploading that combined mesh to the GPU can get prohibitive with a lot of particles and/or complex particle meshes. 1000 particles of a 500 vertex mesh means you're uploading a 500,000 vertex mesh every frame. Actually, you may be generating and uploading 8 meshes a frame since Unity caps out at 65,536 vertices per mesh on some platforms. Instancing gets around this by only uploading the position data for those 1000 particles, which is way less data than 500,000 vertices. This means you can render much higher particle counts with less impact on the CPU with a slightly higher GPU cost.

    I would suggest you check the frame debugger and see where the "draw calls" are coming from. I do not believe Clear() counts. But otherwise, yes, a particle system will take a single draw call regardless of instancing or batching settings, as it ignores those project settings.

    Could be a two pass shader, could be shadows, could be you have two lights in the scene, could be you have the camera depth texture enabled (can be enabled via script on the camera, or by having soft particles enabled in the quality settings, or by having a directional light in the scene on non-mobile platforms). No idea. You'd have to look yourself.

    See above.

    It would not. It certainly could, but it's not something Unity does. And there's a good reason for that too. Sorting. All particles within a single system are sorted between themselves depending on the settings you select, but then each particle system is sorted against other particle systems in the scene. Often times you'll have multiple parts of an effect to sort in a specific order, and you want two copies of that effect to sort separately. ie: imagine a cartoon explosion with smoke and "BLAM" text that pops up. You always want the "BLAM" to show up on top of the smoke in that effect, but you probably want the smoke to render on top of the "BLAM" in another copy of the effect. That means you have to render smoke > blam > smoke > blam, which you can't do if you batch or combine instances.

    Again, they could do it in cases where two particle systems ended up next to eachother in the sorted render order, or if they were opaque, but it probably wasn't worth the added complexity for the rare cases where it would be a gain.
     
  3. Deleted User

    Deleted User

    Guest

    bgolus, thank you infinitely for your detailed answer, this clarifies A LOT! A few follow-ups:

    Does this mean that unless my particle systems have high particle count and / or complex meshes it wouldn't be worth it to switch them to use instancing? Or would it still be beneficial, particularly if I'm CPU-bound? I'm expecting most of my systems to be low particle count (20 or less) and simple meshes (100 vertices or less). Probably rarely more than 2-3k verts total per system.

    Interesting I'll have to double check. This test project I'm using to do profiling has no shadows or lights. But I'll look into the depth texture and soft particles things you mention.

    I see. I'm expecting most of my particle systems to be opaque so sorting wouldn't be an issue. Might be worth it to explore how I could build a system that would leverage instancing across various particle systems on top of Unity, but might be difficult. Any thoughts on how you'd approach it? More generally, if you had several instances of the same particle system trigger at once in your scene, how would you approach minimizing the number of draw calls?

    Thank you!
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    If you’re CPU bound, it might be worth it. It’s not hard to test, assuming you’re already using a particle shader that has support for particle mesh GPU instancing (like the two Particle Standard shaders).

    Use a single particle system and spawn particles using the Emit() function or manually updating the particle list with Get/SetParticles().
     
    Deleted User likes this.
  5. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    If you are CPU bound, I would expect using instancing to always be more efficient, even on simple mesh particle systems.
     
    Deleted User likes this.
  6. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    Using Emit() from script is a good way to fake multiple systems as one. Just remember they get culled/drawn as one giant system: if one particle is visible, they all are :)
     
    Deleted User likes this.
  7. Deleted User

    Deleted User

    Guest

    Thank you to you two for your help! I have a lot of things to go experiment with now :)
     
    richardkettlewell likes this.
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    Maybe.

    An instanced particle at minimum passes a float3x4 matrix, uint color, and float animFrame for each particle. That's 448 bits per particle that has to be transferred from the CPU to the GPU.

    A default billboard particle combined mesh is at minimum a float4 position, float2 uv, and byte4 color per vertex, multiplied by 4. That's 224 bits per particle of vertex data, plus 3 16 bit indices for each triangle, it comes out to a total of 272 bits per particle.

    So the CPU is doing an additional 4 transforms per quad, but it's transferring 60% of the data. It'll depend heavily on where exactly the bottleneck is.


    Plus isn't the instanced path always going to be sending some max number of particles (whatever the buffer size is) worth of data? Like, if the particle system does a burst of 1000 particles, it's now always sending at least 1000 particles' worth of data even if there are only 10 particles visible? The actual API side of data transfers is a blind area of mine since I've always worked on projects in a position where that's abstracted away, so I don't know if you can update only a section of a structured buffer.
     
    Deleted User and dadude123 like this.
  9. Deleted User

    Deleted User

    Guest

    Turns out this was because I'm targeting VR and I had single pass stereo disabled so everything was rendered once for left eye, once for right eye.


    Did a little more exploration today and getting more expected results now that I understand better how things work thanks to your help! One thing I'm curious about is that when I use instancing for my 5 particle systems I get Draw Calls 5 Total Batches 5, whereas when I don't use it and rely on Unity's mesh generation I get Draw Calls 5 Total Batches 1. Is this happening for the same reason described here about how dynamic batching works with particle systems although I'm not actually using dynamic batching?

    Meaning that even though both approaches have the same number of draw calls, the draw calls in the latter case without instancing are cheaper than with instancing?
     
  10. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    You only added 3 indices, not 6 ;)

    It’s an interesting comparison though.
    And from our testing, the benefits do indeed scale exponentially based on mesh complexity, so maybe at the very bottom end there is no clear winner - admittedly I haven’t checked that.
     
    bgolus likes this.
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    Woops! 320 bits, so ~70% of the data. Still a decent savings.

    For mobile devices I suspect the old-school combined mesh method would still be the winner due to the reduced bandwidth; even though the CPU is way slower than a desktop*, but everything is slower there. On consoles or modern desktops I suspect instancing wins 100% of the time due to increased bandwidth / shared memory, and crazy fast GPUs.
    * Except when it's not ... modern SoCs are getting amazingly fast compared to laptops from just a few years ago.

    I haven't tested to much either, though for PS4 & my ~3 year old workstation I didn't see any obvious advantages from either option, but I'm also not trying to stress the system too much. If you swapped to triangle meshes though, it's less than half the data! Probably a more obvious win / wash there even on newer hardware.
     
  12. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    Flurgle and bgolus like this.
  13. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    The default sphere is 515 verts, 768 tris, so that's 40469 bits per particle using 16 bit indices, 77333 bits if you end up using a 32 bit indices, as you would for 10k particles. That's just over 92 MB (or just over 48 MB, and 79 separate meshes!) a frame for 10k default spheres. Compared to 0.5 MB for instancing, and not having to process 5150000 vertices ... yeah, there's no question which one is going to be faster on the CPU. :)
     
  14. Deleted User

    Deleted User

    Guest

    One other thing I noticed with the old-school combined mesh method vs instancing is that although the number of draw calls are the same, the number of batches are different: 1 for old-school, 5 for instancing. Would that make each draw call with the old-school method cheaper, similarly to how dynamic batching works with particle systems? Thanks!
     
  15. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    Honestly, ignore that. The batch counts are kind of buggy.
     
  16. Antony-Blackett

    Antony-Blackett

    Joined:
    Feb 15, 2011
    Posts:
    1,772
    I’ve never found them to be buggy. Sometimes it’s unclear where they come from but the frame debugger helps that a lot.


    As for optimising particle system I’ve always found that reducing the number of systems is the best approach. Move a single system around and call Emit(). On mobile it’s really the only option if you want performance on a wide range of hardware.
     
    Flurgle likes this.
  17. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,236
    * with instancing

    A single particle system with a few thousand particles will show multiple "batches" in the stats, but it's unclear what this means since the actual GPU is rendering them as a single draw call, and there's no reason for it to be merging anything.
     
    Antony-Blackett likes this.
  18. DavidSWu

    DavidSWu

    Joined:
    Jun 20, 2016
    Posts:
    183
    Just to clarify, is it true that particle system gpu instancing does not work with URP?
    We have a massive slow down with high vertex mesh particle systems, and looking at the profiler, it seems like it is doing DrawDynamicParticleSystem.
    The source for the URP Particle Shaders seems to be SRP compatible (Which to my knowledge, has no effect when rendering using a particle system), and not instancing compatible.
    Is it worth trying to make them instancing compatible, or is that a known dead end with URP?

    Thanks,
     
  19. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    Correct. The URP team have it on their roadmap to support GPU instancing for particles, but it's not there yet.

    There's no technical reason why it won't work - just the work hasn't been done.
    It will mostly be a case of transplanting the code from the built-in Particle System shaders.
     
  20. DavidSWu

    DavidSWu

    Joined:
    Jun 20, 2016
    Posts:
    183
    Thank you for the information.
    I tried a few random permutations and somehow got it working with URL 7.21
    I have not had luck with a project using LWRP 6.x. Do you know if there are differences related to the RP or is it more likely that my random permutation of hacks needs to be a different random permutation of hacks for a different project?
     
  21. DonCornholio

    DonCornholio

    Joined:
    Feb 27, 2017
    Posts:
    86
    So i saw on this board that gpu instancing is now available in URP10 / 2020.2 beta. Is this correct and will the feature be added to 2019.4 LTS which supports only URP7 (afaik)?
     
  22. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    Yep :)

    Nope :(
     
  23. DonCornholio

    DonCornholio

    Joined:
    Feb 27, 2017
    Posts:
    86
    Is it possible to port particle mesh gpu instancing to URP7.x or does that depend on engine/editor code that the user doesnt have any access to? I wonder if it is worth to even try - i already tried to copy over the respective shader code but it didnt seem to work (although i might have missed something)
     
  24. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,239
    Yeah it should definitely be possible. We deliberately backported the tiny bit of required engine code, to make this possible, but backporting all the URP changes was deemed too high risk.

    Would you mind starting a new thread for this (because this thread is a little derailed) :)

    If you say what problems you hit when doing it, and tag me in the post, I’ll take a look and try help you out.
     
    DonCornholio likes this.
  25. DonCornholio

    DonCornholio

    Joined:
    Feb 27, 2017
    Posts:
    86
    That's great news ! Thank you very much for being so helpful :)
     
    richardkettlewell likes this.
  26. DonCornholio

    DonCornholio

    Joined:
    Feb 27, 2017
    Posts:
    86
    I've come pretty far rather quickly!
    I opened up another Thread for this - https://forum.unity.com/threads/particle-gpu-instancing-in-urp7-2019-4.996626/
    If you have the time to take a look, i'd be glad @richardkettlewell !
     
    richardkettlewell likes this.