Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Discarding Instances from being rendered

Discussion in 'Shaders' started by m506, Jul 26, 2021.

  1. m506

    m506

    Joined:
    Dec 21, 2015
    Posts:
    93
    Hi all,

    Can someone let me know how I can prevent an instance from being rendered in an instanced shader?

    For example, I have a StructuredBuffer holding my object transforms (position, rotation etc) with 100 instances and I want index = 10 to be ignored. What would be the best approach to achieve it?

    Thank you in advance
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    There is no "best" way, just several different ways that each has their benefits and drawbacks.

    If it really is just 100 instances, and you know which index you want to hide on the CPU, you could use a buffer with that instance removed. Updating an array of 100 elements isn't terribly expensive, either on the CPU nor re-uploading it to the GPU.

    If you really don't want to do that, and really are just removing 1, or maybe only a small number of instances, you could have another property for the shader that is which index to skip. If the current instance is equal to that value, output the vertex positions at
    float4(0,0,0,0)
    and the GPU won't render them. You'll still be paying the cost of computing those vertices, and all other vertices rendered by the shader will be slightly more expensive, but it's an easier solution than updating the structured buffer every time.
     
  3. m506

    m506

    Joined:
    Dec 21, 2015
    Posts:
    93
    Hi, thanks for replying. The issue is that I have thousands of objects and computebuffer.SetData() is the bottleneck.

    I tried using an appendstructuredbuffer to add records on demand (in a computeshader) however that means the full data needs to be uploaded initially which takes a lot of memory and crashes the gpu.

    What I will try now to improve the performance of SetData() is to transfer less data. This is my shader struct:

    struct Buffer_Transform
    {
    fixed4 colors; //maps to Color
    float textures; //maps to float
    float4x4 obj2world; //maps to Matrix4x4
    float4x4 world2obj; //maps to Matrix4x4
    };

    Are there any techniques to compress data that you know of or any other hacks to improve data transfer (specially with those matrix4x4)?

    Thanks
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    fixed
    isn't a thing on modern GPUs anymore, desktop or mobile. So that's a full 128 bits of
    float4
    . The work around for that is to use a single
    uint
    in the shader struct and decode the color values from the original
    Color32
    in the c# struct

    Neither of these need to be full 4x4 matrices. You can get away with a 3x4 matrix for both, or even just pass in the single object to world matrix and reconstruct the inverse in the shader!

    Both of this and the color unpacking are things Unity's instanced particle systems already do.
    https://github.com/TwoTailsGames/Un...ncludes/UnityStandardParticleInstancing.cginc

    Really this should be a
    uint
    as well, both in c# and in the shader, but it doesn't really matter if it works for you.

    Code (csharp):
    1. // c#
    2. struct BufferData {
    3.     Color32 color;
    4.     uint textures;
    5.     float3 objectToWorld0;
    6.     float3 objectToWorld1;
    7.     float3 objectToWorld2;
    8. };
    9.  
    10. BufferData data = new BufferData();
    11. data.color = myColor; // yeah, that's it
    12. data.objectToWorld0 = transform.localToWorldMatrix.GetColumn(0); // might be GetRow()?
    13. data.objectToWorld1 = transform.localToWorldMatrix.GetColumn(1);
    14. data.objectToWorld2 = transform.localToWorldMatrix.GetColumn(2);
    15.  
    16. // hlsl
    17. struct BufferData {
    18.     uint color;
    19.     uint textures;
    20.     float3x4 transform;
    21. };
    22. // see the particle instancing cginc linked above for decoding

    One minor caveat to be aware of, the color value you're passing to the shader is in gamma space. There's a good chance the color value you were already passing was also in gamma space which you may or may not have noticed looks funny. You can fix that by using
    myColor.linear
    in c#, but you don't want to do that when using this setup. Instead you'll want to do the gamma correction in the shader using the built in
    GammaToLinearSpace
    function, if you're going to do it at all.
     
    lilacsky824 likes this.
  5. m506

    m506

    Joined:
    Dec 21, 2015
    Posts:
    93
    Hi bgolus, thank you for your time explaining all this. Really appreciated!

    I managed to decrease the struct stride from 148 to 56 bytes per instance, so that's really an excellent improvement!

    Just so people have a reference in the future, below are two functions that are part of the conversion that was not covered on your comment:

    Code (CSharp):
    1. private float3x4 GetDecomposedMatrix(Matrix4x4 dataMatrix)
    2.         {
    3.             float3x4 transformMatrix = 0;
    4.  
    5.             transformMatrix.c0 = new float3(dataMatrix.GetColumn(0).x, dataMatrix.GetColumn(0).y, dataMatrix.GetColumn(0).z);
    6.             transformMatrix.c1 = new float3(dataMatrix.GetColumn(1).x, dataMatrix.GetColumn(1).y, dataMatrix.GetColumn(1).z);
    7.             transformMatrix.c2 = new float3(dataMatrix.GetColumn(2).x, dataMatrix.GetColumn(2).y, dataMatrix.GetColumn(2).z);
    8.             transformMatrix.c3 = new float3(dataMatrix.GetColumn(3).x, dataMatrix.GetColumn(3).y, dataMatrix.GetColumn(3).z);
    9.  
    10.             return transformMatrix;
    11.         }
    12.        
    13.         private uint ColorToUint(Color color)
    14.         {
    15.             uint packedR = (uint)(color.r * 255);
    16.             uint packedG = (uint)(color.g * 255) << 8;
    17.             uint packedB = (uint)(color.b * 255) << 16;
    18.             uint packedA = (uint)(color.a * 255) << 24;
    19.  
    20.             return packedR + packedG + packedB + packedA;
    21.         }
    Also, in the struct I used a float3x4 (methematics) instead of 3 float3 like you described. It worked pretty well.

    One question I have is in relation with the "[StructLayout(LayoutKind.Sequential)]" tag. Does it make any difference in terms of performance here? Would you recommend it added to the struct declaration?