Search Unity

Best way to render meshes with data from ComputeBuffers

Discussion in 'General Graphics' started by Mese96, May 27, 2019.

  1. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    Hello everyone,


    currently I work on a project where hundreds (currently about 650) Meshes need to be rendered, which all get their Vertex data from ComputeBuffers. Currently one buffer per object, and some data which is shared between them, and some which is “per instance”. All share the same material.

    Neither the number of meshes nor the number of vertices/triangles can be known at build time.


    What is the most efficient way to render them?
    Is there some way to instance or batch them?

    Or would it be a good Idea to combine them to one, very, big mesh, and add extra vertex data to “assign” them to a mesh?



    Thanks for your input
     
    deus0 likes this.
  2. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
  3. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    When the models have different vertex counts, and that is the case for most of them, I would still need to call it once per object right ? Would there be any difference then to using meshes (except a little bit less of overhead) ?

    Probably Draw ProceduralIndirect would work well with one or more multi mesh buffers, as I can specify the vertex offset. But there is no way to specify some kind of range which indexes to render...
     
    deus0 likes this.
  4. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Indeed the Vertex Count for these methods is shared between instances, so it is most effective for drawing many copies of the same mesh data.

    Some options:

    - dispatch the largest vert count of all your meshes, then discard the vertices on the vertex shader (by setting them to 0,0,0) for all verts that fall outside the range of the currently drawn mesh. i.e. DrawProcedural(max(vertCountOfEachMesh), meshCount). The GPU will have spend a bit of time on vertices you will throw away, but that cost ought to be negligible unless you have massively varying vertex counts. (Profile and check)
    - if your mesh data can be thought of as one big "polygon soup" i.e. just one big pile of triangles, submit the vertex count of all vertices of all meshes, and an instance count of 1. It's harder to apply your per-mesh properties with this approach though, as a vert would need to figure out which mesh it belongs to
    - call DrawProcedural once per mesh. It means quite a lot of draw calls, which may have a noticeable CPU cost, but solves all your other issues

    Assigning the results to a mesh (or meshes) is definitely possible. it requires reading the data back to the CPU (slow) and building a mesh (also slow) and then uploading the mesh (semi fast). so it depends how often you want to do this operation. once it's done you're gonna get 1 draw call per mesh, so it's going to be somewhat similar to my 3rd suggestion above.
     
    deus0 and Mese96 like this.
  5. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    Essentially my mesh data is the big “polygon soup”, the compute buffer route is used because I need multiple set of positions (and other data) to interpolate between. (And it is very slow to interpolate on CPU and assign million of vertices every frame).
    So there is no read back from GPU anyway. Using meshes and DrawProcedural per Mesh should not make such a big difference. (Currently using a Meshrenderer approch)
    As I started to think about it, my goal was to reduce draw calls…
    I would go for option two if I found a straight-forward way to apply per-mesh properties.

    Option 1 is a really good idea though.

    Maybe not for every mesh the same count (some have some hundred, some have 100k of vertices)
    but I could group them by similar vertices counts.

    I will look into both options this week.
    Thanks for the ideas.
     
    richardkettlewell likes this.
  6. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Yes that could work really well :)

    Good luck!
     
  7. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    Last edited: May 29, 2019
  8. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    deus0 likes this.
  9. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    Ehm, yeah, that was stupid, I could have seen that *is a bit ashamed*
    Maybe I should not ask questions when i`m not awake.
     
    richardkettlewell likes this.
  10. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    *bravely asks another question while not awake*

    At my first take at instancing Unity complained that I can't have ComputeBuffers as Instanced Properties.
    This would make Option 1 kinda useless, as I would not be able to assign individual vertice sets.
    Or am I mistaken ?
     
    Last edited: May 29, 2019
  11. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Take a look at https://docs.unity3d.com/ScriptReference/Graphics.DrawMeshInstancedIndirect.html for how to assign ComputeBuffers to instanced shaders. The key is to use Procedural Instancing, instead of our "built-in" instancing. This gives you full control over how your instance data is populated/loaded.

    (Even though that's not the API you are using, the concept is the same: shader.setBuffer("MyComputBuffer", buf) etc)
     
    deus0 likes this.
  12. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    How to assign them to shaders is not the problem here.
    The example uses one buffer and the instance ID to select one element.
    It would need some kind of Array of Buffers and use the instanceID to select one Buffer.
    (Probably the way to go is one buffer and offset the index accordingly)
     
  13. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Oh sorry yes I understand now.
    Yes you can’t have arrays of compute buffers in a shader, you need one big buffer and per instance offsets. A second ComputeBuffer of ints, with the start index for each instance into the big compute buffer, could work, for example.
     
    Mese96 likes this.
  14. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    Got the basic procedual code working, also the instance offset etc.
    But I am stuck in getting the unity_InstanceID as it is an undeclared identifier.
     
  15. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    That is part of our automatic instancing stuff. Possibly if you add #pragma multicompile instancing it will appear. However, I think you're best just declaring a vertex shader input: uint instanceID : SV_InstanceID
     
    deus0 likes this.
  16. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    I already had the pragma, but the uint instanceID : SV_InstanceID worked well.
    Thanks again.

    Now I need to find an ideal grouping count. (And grouping method) 10 lead to 100% GPU, 20 is 70% and a good amount of FPS more than with meshes. I will report when I cleaned the remaining mess.
     
    richardkettlewell likes this.
  17. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
  18. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    It is a polygon soup insofar, that what I need to render is all individual mesh data, an no "we have this exact same geometry 100 times", and each part can have per part properties.

    So mostly this.
     
    deus0 likes this.
  19. Mese96

    Mese96

    Joined:
    Jul 23, 2013
    Posts:
    40
    While working on it I noticed that I have this "figure the mesh out" problem anyway, so I went for the "one mesh to render them all" approach.
    One question though:
    Why is there no option for ComputeBuffer.SetData() to set only one Element ?
    Now each per renderer data has a one Element Array.
    (Generating a new Array everytime I set data generates too much garbage)
     
    kyriew likes this.
  20. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Indeed it would be nice.. simply put: because no one added support for it yet :(
     
    deus0 likes this.
  21. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    This topic still interests me alot. I'm not sure on best practices. Polygon Soup might reduce draw calls to one (by batching it all together) but it seems less flexible in terms of adding/removing from the soup. It would be good if we can store a 2 dimensional array, so we can store the vertices per mesh as a set of instance data and use it that way in the shader.. That would be the easier solution.
    An example of the pitfalls is, if I have 1000 characters of different sizes, and I update a mesh, I'll need to reupdate the entire batch. This might be slower for my procedural game.
     
  22. xotonic

    xotonic

    Joined:
    Feb 21, 2016
    Posts:
    11
    Agree. That's the kind of knowledge that should be attached to DrawProceduralIndirect documentation imo. It's not so obvious that you need to create polygon soup. Managing instance offsets is even more tricky.
    Also, there is a performance problem with bounds. As I understand, an "ubermesh" is either rendered or not. So there are no effective frustum culling optimizations if it's big enough.
     
    Last edited: Mar 14, 2021
    deus0 likes this.
  23. JJRivers

    JJRivers

    Joined:
    Oct 16, 2018
    Posts:
    137
    Apologies for necroing this, but couldn't you cull the polygonsoup in a compute shader into another final buffer you draw from? Ie you'd have PolygonSoupBuffer => ComputeShaderCulling => CulledPolygonSoupBuffer => Drawcall with CulledPolygonSoupBuffer?
    I'm no expert yet but am i missing something obvious here?
     
  24. xotonic

    xotonic

    Joined:
    Feb 21, 2016
    Posts:
    11
    I don't see a problem here. Except maybe that it doubles the number of primitives in memory. Since in theory camera can "see" the entire soup thus PolygonSoupBuffer will be equal to CulledPolygonSoupBuffer
     
  25. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    On this point, you could just maybe mark faces, and cull them using raycasting during this step. I'm just wondering, does unity do culling already on this? If so how is it done? If I knew they didn't, I would implement it. It's better to have large draw distances for marketing haha.
    As I saw this example the other day on shader toy
    https://www.shadertoy.com/view/ssGSzw
    Using raycasting in the shader to chose whether to draw something or not.
     
  26. JJRivers

    JJRivers

    Joined:
    Oct 16, 2018
    Posts:
    137
    It would double the reserved space certainly whether its visible or not, but so long as the full mesh isn't visible atleast you wouldn't be running the vert and frag shaders on anything that isn't in the view frustum.