Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Indirect Instances Help

Discussion in 'Shaders' started by VictorKs, Jan 5, 2022.

  1. VictorKs

    VictorKs

    Joined:
    Jun 2, 2013
    Posts:
    242
    So I implemented InstancingIndirect with a surface shader to render tens of thousand meshes. But I have some questions.
    1) What is the best way to pass the transform matrix?
    2) I pass several data to the buffers (Matrix4x4 for transform and some Vector4) should I pass them in one buffer as a struct or should I use several buffers?
    3) Can material Props be combined with this or should I just use the buffers for passing data?
    4) How should I approach an efficient GPU frustrum culling?
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,005
    1) With as little data as possible. If you just need position, just pass the float3 position. If you need position and rotation, pass the float3 position and float3 euler rotation (or maybe float4 quaternion). If you need position, rotation, and uniform scale, pass a float4 with position and scale and a float3 euler rotation (or float4 quaternion). If you need position, rotation, and non-uniform scale, pass just the 4x3 matrix, as the last float4 will always be float4(0,0,0,1).

    2) Either is fine. One buffer might be easier for management. Multiple buffers might be needed if you're doing a lot of instances are hitting buffer size limits.

    3) If they're constant for all instances, use material properties. No need to waste space on redundant data.

    4)
     
    VictorKs likes this.
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,005
    Some additional thoughts:

    You can get better performance on some GPU if you ensure your buffer is 128 bit aligned. Meaning if you have a struct with a float3 and a float4, using two float4 values or adding a dummy additional float can be faster even though you're sending more data.

    Ideally frustum culling should happen before the vertex shader runs, though you can still get some benefit doing it in the vertex shader if you're rendering most of your instances off screen. Usually better to do it in a compute shader, or even on the CPU, beforehand. You can either construct a new buffer that only has the data for the visible instances, or create a buffer that is a list of the visible indices (which is used to access the matrix buffer rather than the instance ID itself), or some extra bit in the buffer data that you check when running the vertex shader to quickly exit and output all an zero vertex position. There isn't really any special trick to the actual frustum culling itself thought: test bounding spheres or boxes against 6 frustum planes, which in itself is tricky. There are a ton of tutorials out there on the topic, try one and see if it's faster than not doing it at all.
     
    VictorKs likes this.