Best way to render meshes with data from ComputeBuffers

Mese96 · May 27, 2019

Hello everyone,

currently I work on a project where hundreds (currently about 650) Meshes need to be rendered, which all get their Vertex data from ComputeBuffers. Currently one buffer per object, and some data which is shared between them, and some which is “per instance”. All share the same material.

Neither the number of meshes nor the number of vertices/triangles can be known at build time.

What is the most efficient way to render them?
Is there some way to instance or batch them?

Or would it be a good Idea to combine them to one, very, big mesh, and add extra vertex data to “assign” them to a mesh?

Thanks for your input

richardkettlewell · May 27, 2019

I would look at https://docs.unity3d.com/ScriptReference/Graphics.DrawProcedural.html / https://docs.unity3d.com/ScriptReference/Graphics.DrawProceduralIndirect.html

Mese96 · May 28, 2019

When the models have different vertex counts, and that is the case for most of them, I would still need to call it once per object right ? Would there be any difference then to using meshes (except a little bit less of overhead) ?

Probably Draw ProceduralIndirect would work well with one or more multi mesh buffers, as I can specify the vertex offset. But there is no way to specify some kind of range which indexes to render...

richardkettlewell · May 28, 2019

Indeed the Vertex Count for these methods is shared between instances, so it is most effective for drawing many copies of the same mesh data.

Some options:

- dispatch the largest vert count of all your meshes, then discard the vertices on the vertex shader (by setting them to 0,0,0) for all verts that fall outside the range of the currently drawn mesh. i.e. DrawProcedural(max(vertCountOfEachMesh), meshCount). The GPU will have spend a bit of time on vertices you will throw away, but that cost ought to be negligible unless you have massively varying vertex counts. (Profile and check)
- if your mesh data can be thought of as one big "polygon soup" i.e. just one big pile of triangles, submit the vertex count of all vertices of all meshes, and an instance count of 1. It's harder to apply your per-mesh properties with this approach though, as a vert would need to figure out which mesh it belongs to
- call DrawProcedural once per mesh. It means quite a lot of draw calls, which may have a noticeable CPU cost, but solves all your other issues

Assigning the results to a mesh (or meshes) is definitely possible. it requires reading the data back to the CPU (slow) and building a mesh (also slow) and then uploading the mesh (semi fast). so it depends how often you want to do this operation. once it's done you're gonna get 1 draw call per mesh, so it's going to be somewhat similar to my 3rd suggestion above.

Mese96 · May 28, 2019

Essentially my mesh data is the big “polygon soup”, the compute buffer route is used because I need multiple set of positions (and other data) to interpolate between. (And it is very slow to interpolate on CPU and assign million of vertices every frame).
So there is no read back from GPU anyway. Using meshes and DrawProcedural per Mesh should not make such a big difference. (Currently using a Meshrenderer approch)
As I started to think about it, my goal was to reduce draw calls…
I would go for option two if I found a straight-forward way to apply per-mesh properties.

Option 1 is a really good idea though.

Maybe not for every mesh the same count (some have some hundred, some have 100k of vertices)
but I could group them by similar vertices counts.

I will look into both options this week.
Thanks for the ideas.

richardkettlewell · May 29, 2019

Mese96 said: ↑

I could group them by similar vertices counts.
Click to expand...

Yes that could work really well

Good luck!

Mese96 · May 29, 2019

richardkettlewell said: ↑

Good luck!
Click to expand...

Thanks

Out of interest, is there anything new regarding this: https://forum.unity.com/threads/draw-procedural-indexed.396957/ ?

Until now I was not really aware that I can draw the mesh without indices, so I will change it for now to be non indexed.

richardkettlewell · May 29, 2019

Mese96 said: ↑

is there anything new regarding this
Click to expand...

Yeah we actually added this very recently, in Unity 2019.1. Check out the second method here: here: https://docs.unity3d.com/ScriptReference/Graphics.DrawProcedural.html

Mese96 · May 29, 2019

Ehm, yeah, that was stupid, I could have seen that *is a bit ashamed*
Maybe I should not ask questions when i`m not awake.

Mese96 · May 29, 2019

*bravely asks another question while not awake*

At my first take at instancing Unity complained that I can't have ComputeBuffers as Instanced Properties.
This would make Option 1 kinda useless, as I would not be able to assign individual vertice sets.
Or am I mistaken ?

richardkettlewell · May 29, 2019

Mese96 said: ↑

*bravely asks another question while not awake*

At my first take at instancing Unity complained that I can't have ComputeBuffers as Instanced Properties.
This would make Option 1 kinda useless, as I would not be able to assign individual vertice sets.
Or am I mistaken ?
Click to expand...

Take a look at https://docs.unity3d.com/ScriptReference/Graphics.DrawMeshInstancedIndirect.html for how to assign ComputeBuffers to instanced shaders. The key is to use Procedural Instancing, instead of our "built-in" instancing. This gives you full control over how your instance data is populated/loaded.

(Even though that's not the API you are using, the concept is the same: shader.setBuffer("MyComputBuffer", buf) etc)

Mese96 · May 29, 2019

How to assign them to shaders is not the problem here.
The example uses one buffer and the instance ID to select one element.
It would need some kind of Array of Buffers and use the instanceID to select one Buffer.
(Probably the way to go is one buffer and offset the index accordingly)

richardkettlewell · May 29, 2019

Mese96 said: ↑

How to assign them to shaders is not the problem here.
The example uses one buffer and the instance ID to select one element.
It would need some kind of Array of Buffers and use the instanceID to select one Buffer.
(Probably the way to go is one buffer and offset the index accordingly)
Click to expand...

Oh sorry yes I understand now.
Yes you can’t have arrays of compute buffers in a shader, you need one big buffer and per instance offsets. A second ComputeBuffer of ints, with the start index for each instance into the big compute buffer, could work, for example.

Mese96 · May 31, 2019

Got the basic procedual code working, also the instance offset etc.
But I am stuck in getting the unity_InstanceID as it is an undeclared identifier.

richardkettlewell · May 31, 2019

Mese96 said: ↑

Got the basic procedual code working, also the instance offset etc.
But I am stuck in getting the unity_InstanceID as it is an undeclared identifier.
Click to expand...

That is part of our automatic instancing stuff. Possibly if you add #pragma multicompile instancing it will appear. However, I think you're best just declaring a vertex shader input: uint instanceID : SV_InstanceID

Mese96 · May 31, 2019

I already had the pragma, but the uint instanceID : SV_InstanceID worked well.
Thanks again.

Now I need to find an ideal grouping count. (And grouping method) 10 lead to 100% GPU, 20 is 70% and a good amount of FPS more than with meshes. I will report when I cleaned the remaining mess.

bgolus · Jun 1, 2019

If it really is polygon soup, it doesn't seem like "instancing" is really even necessary or all that helpful. Why not use the compute shader to combine them into one big mesh and draw it with DrawProceduralIndirect?
https://docs.unity3d.com/ScriptReference/Graphics.DrawProceduralIndirect.html

Mese96 · Jun 1, 2019

bgolus said: ↑

Why not use the compute shader to combine them into one big mesh and draw it with DrawProceduralIndirect?
Click to expand...

It is a polygon soup insofar, that what I need to render is all individual mesh data, an no "we have this exact same geometry 100 times", and each part can have per part properties.

richardkettlewell said: ↑

It's harder to apply your per-mesh properties with this approach though, as a vert would need to figure out which mesh it belongs to
Click to expand...

So mostly this.

Mese96 · Jun 4, 2019

While working on it I noticed that I have this "figure the mesh out" problem anyway, so I went for the "one mesh to render them all" approach.
One question though:
Why is there no option for ComputeBuffer.SetData() to set only one Element ?
Now each per renderer data has a one Element Array.
(Generating a new Array everytime I set data generates too much garbage)

richardkettlewell · Jun 4, 2019

Mese96 said: ↑

Why is there no option for ComputeBuffer.SetData()
Click to expand...

Indeed it would be nice.. simply put: because no one added support for it yet

deus0 · Feb 3, 2021

This topic still interests me alot. I'm not sure on best practices. Polygon Soup might reduce draw calls to one (by batching it all together) but it seems less flexible in terms of adding/removing from the soup. It would be good if we can store a 2 dimensional array, so we can store the vertices per mesh as a set of instance data and use it that way in the shader.. That would be the easier solution.
An example of the pitfalls is, if I have 1000 characters of different sizes, and I update a mesh, I'll need to reupdate the entire batch. This might be slower for my procedural game.

xotonic · Mar 14, 2021

Agree. That's the kind of knowledge that should be attached to DrawProceduralIndirect documentation imo. It's not so obvious that you need to create polygon soup. Managing instance offsets is even more tricky.
Also, there is a performance problem with bounds. As I understand, an "ubermesh" is either rendered or not. So there are no effective frustum culling optimizations if it's big enough.

JJRivers · Oct 21, 2021

xotonic said: ↑

Agree. That's the kind of knowledge that should be attached to DrawProceduralIndirect documentation imo. It's not so obvious that you need to create polygon soup. Managing instance offsets is even more tricky.
Also, there is a performance problem with bounds. As I understand, an "ubermesh" is either rendered or not. So there are no effective frustum culling optimizations if it's big enough.
Click to expand...

Apologies for necroing this, but couldn't you cull the polygonsoup in a compute shader into another final buffer you draw from? Ie you'd have PolygonSoupBuffer => ComputeShaderCulling => CulledPolygonSoupBuffer => Drawcall with CulledPolygonSoupBuffer?
I'm no expert yet but am i missing something obvious here?

xotonic · Oct 21, 2021

JJRivers said: ↑

Apologies for necroing this, but couldn't you cull the polygonsoup in a compute shader into another final buffer you draw from? Ie you'd have PolygonSoupBuffer => ComputeShaderCulling => CulledPolygonSoupBuffer => Drawcall with CulledPolygonSoupBuffer?
I'm no expert yet but am i missing something obvious here?
Click to expand...

I don't see a problem here. Except maybe that it doubles the number of primitives in memory. Since in theory camera can "see" the entire soup thus PolygonSoupBuffer will be equal to CulledPolygonSoupBuffer

deus0 · Oct 21, 2021

xotonic said: ↑

I don't see a problem here. Except maybe that it doubles the number of primitives in memory. Since in theory camera can "see" the entire soup thus PolygonSoupBuffer will be equal to CulledPolygonSoupBuffer
Click to expand...

On this point, you could just maybe mark faces, and cull them using raycasting during this step. I'm just wondering, does unity do culling already on this? If so how is it done? If I knew they didn't, I would implement it. It's better to have large draw distances for marketing haha.
As I saw this example the other day on shader toy
https://www.shadertoy.com/view/ssGSzw
Using raycasting in the shader to chose whether to draw something or not.

JJRivers · Oct 22, 2021

xotonic said: ↑

I don't see a problem here. Except maybe that it doubles the number of primitives in memory. Since in theory camera can "see" the entire soup thus PolygonSoupBuffer will be equal to CulledPolygonSoupBuffer
Click to expand...

It would double the reserved space certainly whether its visible or not, but so long as the full mesh isn't visible atleast you wouldn't be running the vert and frag shaders on anything that isn't in the view frustum.

Search Unity

Best way to render meshes with data from ComputeBuffers

Mese96

richardkettlewell

Unity Technologies

Mese96

richardkettlewell

Unity Technologies

Mese96

richardkettlewell

Unity Technologies

Mese96

richardkettlewell

Unity Technologies

Mese96

Mese96

richardkettlewell

Unity Technologies

Mese96

richardkettlewell

Unity Technologies

Mese96

richardkettlewell

Unity Technologies

Mese96

bgolus

Mese96

Mese96

richardkettlewell

Unity Technologies

deus0

xotonic

JJRivers

xotonic

deus0

JJRivers

Search Unity

Unity ID

Useful Searches

Best way to render meshes with data from ComputeBuffers

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies