Search Unity

Question Sending Data to GPU, ComputeShader/Buffer, MaterialProperty and Hlsl Script

Discussion in 'Shaders' started by SuperFranTV, Nov 30, 2021.

  1. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    I'am currently have some Questions about data structure and how can i send only the minimum i need to the gpu and whats about the other things. (I'am using the URP)

    1. I need to send 2 values each between 0 and 255.

    So there are small types:
    byte = 1
    short = 2
    int = 4
    uint = 4

    But byte can't send to the GPU right, so the minimum i can use is short?


    2. I'am using DrawMeshInstancedProcedural some Questions about
    Mesh as Quad, Material all OK:

    SubMeshes don't need that if i set them to 0, it's okay or ist there a way to remove them complete?

    Bounds i change them to zero or any other value, nothing changed when i hit play?

    BufferCount, if i sending a larger Buffer but only filled half, what is used?

    MaterialPropertyBlock only for SetBuffer if i use one StructuredBuffer for all my things, is there a way to send it directly without MaterialPropertyBlock?

    3. There are 2 Files
    HLSL File, than can used to implement some code into the ShaderGraph right? (.hlsl File)
    ComputeShader to let the Gpu calculate some things, with Kernel thing and numthreats (.compute File)

    How to get the perfect number of threats to a ComputeShader?

    No my Question is, is it usefull to use HLSL File and ComputeShader together on ShaderGraph or a Shader?
    Is there some Performance Difference?

    If iam clear about this all i can write my best option to the ground.
    Hope someone can explain me some things.
    Thank you all :D
     
    Last edited: Nov 30, 2021
  2. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Edit:

    2. BufferCount i found out that using a large ComputeBuffer but filling it only with some content, impacts the performance very heavy.
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    It depends on how you're sending data. But generally you can't guarantee the GPU will recognize anything but 32 bit types:
    int
    ,
    uint
    ,
    float
    . HLSL has no concept of
    byte
    or
    short
    . There are
    fixed
    and
    half
    variable types, but they're defined as a signed floating point value that "can hold a value between -2 and 2 with a precision of at least 1/256" or is "at least 16 bits", and a 32 bit
    float
    fulfills the requirements for both, so most GPUs just use that.

    The easiest solution to passing two bytes to shaders is ... don't. Just pass two float values. You can define them as
    int
    or
    uint
    in the shader file, and there's even a
    material.SetInt()
    function, but it's a lie. Under the hood Unity casts that
    int
    to a
    float
    when you call
    SetInt()
    , then casts it back to an
    int
    or
    uint
    depending on what the shader wants.

    However if you're passing a lot of values via a compute buffer, you can take advantage of the fact c# does support the byte variable type, and that the compute buffer is passed to the GPU as ray bits that can be interpreted any way you want.
    Code (csharp):
    1. // c#
    2. // create compute buffer
    3. ComputeBuffer cb = new ComputeBuffer(numObjects, 2); // 2 bytes
    4. // important: numObjects needs to be an even number
    5.  
    6. // struct of two bytes
    7. public struct TwoBytes
    8. {
    9.     public byte a;
    10.     public byte b;
    11. }
    12.  
    13. // create array of bytes
    14. TwoBytes[] data = new TwoBytes[numObjects];
    15.  
    16. // set the data in the array
    17. for (int i=0; i<numObjects; i++)
    18. {
    19.     data[i].a = //object byte value A
    20.     data[i].b = //object byte value B
    21. }
    22.  
    23. // copy data in array into the compute buffer
    24. cb.SetData(data);
    25.  
    26. // pass it to the shader calling SetBuffer() where appropriate
    Code (csharp):
    1. // shader code
    2. StructuredBuffer<uint> _Data; // yes, a 32 bit uint, not a struct, not bytes
    3.  
    4. uint2 GetDataAtIndex(uint index)
    5. {
    6.     // real index is half of input index because shader is working with 32 bit uints and not bytes
    7.     // this means the two bytes per index are packed into the first 16 and last 16 bits of the 32 bit uint
    8.     uint realIndex = index / 2;
    9.     uint packedData = _Data[realIndex];
    10.  
    11.     // bit shift over 16 bits if we're trying to get the odd index
    12.     if (index % 2 == 1)
    13.         packedData = (packedData >> 16);
    14.    
    15.     return uint2(
    16.         (packedData >> 0) & 0xF, // extract the first byte
    17.         (packedData >> 8) & 0xF // extract the second byte
    18.         );
    19. }
    If you're looking to pack this into an existing struct with other data in it, you're likely best off just padding out the struct to keep it byte aligned to 32 bits.
    Code (csharp):
    1. // c# struct
    2. public struct MyDataStruct
    3. {
    4.     public Vector3 position;
    5.     public byte a;
    6.     public byte b;
    7.     public short padding;
    8. } // sizeof(MyDataStruct) == 16
    9.  
    10. // hlsl struct
    11. struct myDataStruct {
    12.     float3 position;
    13.     uint packedData;
    14. }; // "get data" function just uses the last 2 lines to unpack

    You always need at least 1 submesh. A mesh with zero submeshes is a mesh with no data.

    Bounds are used by
    MeshRenderer
    components for CPU side frustum and occlusion culling. When you use
    DrawMeshInstancedProcedural()
    , and several of the similar functions, you're telling Unity to skip any of that and you're handling it yourself, especially since the position data you're passing in might not even ever by known on the CPU side.

    Junk data. Hopefully zeros, but I don't know if it's guaranteed.

    That's about as direct as you get. You could call
    SetBuffer()
    on the material directly, but if you're rendering multiple sets of meshes with the same material you'll want to use the property blocks.

    I think we'd all wish we knew that answer.

    Shader Graph is a shader generator. It spits out HLSL shader code that is otherwise nearly identical to what you could write by hand when writing a vertex fragment shader. The advantage of Shader Graph is it "just works" with the lighting systems without you haven't to do anything.

    Writing a vertex fragment shader by hand may produce slightly more efficient / faster shader code as you can be very explicit about making sure the shader only does the things you need it to, but most of the time it won't be a significant difference.

    However you can't use a compute shader with Shader Graph, not directly. You can run a compute shader to generate data that you store in a compute buffer, then use that buffer with a Shader Graph that's has a Custom Function node pointing at an HLSL file that accesses that buffer to extract the relevant data. But you can't include a compute shader into a Shader Graph. And at this time you can't create compute shaders using Shader Graph.
     
  4. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    First of all, I thank you for the detailed explanation.
    I tried something yesterday evening and first tried to send only the Int as an index to the GPU.
    Seen here, i got some errors and a strange problem.
    https://forum.unity.com/threads/strange-thing-inside-shader-textfile-hlsl.1205551/

    First i need to convert the index to a float4x4 for the Matrix of the Mesh, if this is finally done, i can change from int to byte script you put in here, thank you for that this is very usefull in my case.

    But i allready put a "0" into that field and all works fine?
    Code (CSharp):
    1. Graphics.DrawMeshInstancedProcedural(mesh, 0, material, bounds, buffer.count, propertyBlock);
    So i can ignore the bounds, because iam only sending that data, that i want to see?

    I'am useing only 1 type of Mesh (Quad) so i test it out how is the difference, thank you for that fact.

    i found that on this forum, i think it can be a beginning.

    Code (CSharp):
    1. Remember that the numbers you pass to Dispatch() are the amount of groups, not threads. If you want to process 4096 items and your kernel group size is (128, 1, 1), you need to call Dispatch(32, 1, 1).
    My way to do this is, creating the data on cpu, then the computeShader should convert or doing some math on that data, this data then is used in the textfile (.hlsl) inside ShaderGraph to finally let the shader do its thing.

    I'm a little smarter now than I was before.
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    Ah! I misunderstood the question! I was thinking about the settings on the mesh you passed to the
    DrawMeshInstancedProcedural()
    , not the actual parameters of that function!

    Let's do this again.
    submeshIndex: You want it to be 0 because that's the first submesh in the mesh. If you were using a mesh with multiple materials you'd need to call
    DrawMeshInstancedProcedural()
    multiple times, once for each submesh. The quad mesh just has one submesh.

    bounds: This does need to be a position that's in view of the camera. If the world origin is in view, a zero bounds will work. While the individual objects won't get frustum culled automatically, the entire draw mesh call might be.

    It's less about if you're using one mesh and more if you're calling
    DrawMeshInstancedProcedural()
    multiple times per frame reusing the same material. You'd need a unique material per
    DrawMeshInstancedProcedural()
    call.
     
    Last edited: Dec 1, 2021
    SuperFranTV likes this.
  6. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    I implemented it to my code but i got the error:
    Code (CSharp):
    1. Invalid stride 2 for Compute Buffer - must be greater than 0, less or equal to 2048 and a multiple of 4.
    i dont know that the stride of a computeBuffer should be minimum 4?
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    Ah, yeah. I guess Unity "knows" that the GPU can only interpret 32 bit variables. I was trying to sidestep that by using a stride of 2 and having you make sure you use an even
    numObject
    count.

    Just means you'll have to deal with some of the logistics of the values actually being packed on the C# side as well.

    You'd have to use a struct with 4 byte variables in it with "Object i+0" and "Object i+1" represented.
     
    SuperFranTV likes this.
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    Code (csharp):
    1. // c#
    2. int bufferSize = Mathf.CeilToInt((float)numObjects / 2f);
    3. // create compute buffer
    4. ComputeBuffer cb = new ComputeBuffer(bufferSize , 4); // 4 bytes
    5.  
    6. // struct of two bytes
    7. public struct TwoTwoBytes
    8. {
    9.     public byte a0;
    10.     public byte b0;
    11.  
    12.     public byte a1;
    13.     public byte b1;
    14. }
    15.  
    16. // create array of bytes
    17. TwoTwoBytes[] data = new TwoTwoBytes[bufferSize];
    18.  
    19. // set the data in the array
    20. for (int i=0; i<numObjects; i+=2)
    21. {
    22.     data[i].a0 = //object i byte value A
    23.     data[i].b0 = //object i byte value B
    24.  
    25.     data[i].a1 = //object i+1 byte value A
    26.     data[i].b1 = //object i+1 byte value B
    27. }
    The shader code would be unchanged.
     
    SuperFranTV likes this.
  9. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140

    For now i got no errors, but the DrawMesh isn'd drawing anything?

    Code (CSharp):
    1.     uint2 GetDataAtIndex(uint index) {
    2.         // real index is half of input index because shader is working with 32 bit uints and not bytes
    3.         // this means the two bytes per index are packed into the first 16 and last 16 bits of the 32 bit uint
    4.         uint realIndex = index / 2;
    5.         uint packedData = _Indexes[realIndex];
    6.  
    7.         // bit shift over 16 bits if we're trying to get the odd index
    8.         if (index % 2 == 1)
    9.             packedData = (packedData >> 16);
    10.  
    11.         return uint2(
    12.             (packedData >> 0) & 0xF, // extract the first byte
    13.             (packedData >> 8) & 0xF // extract the second byte
    14.             );
    15.     }
    16.  
    17. void ConfigureProcedural () {
    18.     #if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
    19.         uint2 i2 = GetDataAtIndex(unity_InstanceID);
    20.         int i = i2.x;
    21.         int y = i / (128 * 128);
    22.         int x = (i - y * 128 * 128) / 128;
    23.         int z = i - y * 128 * 128 - x * 128;
    24.         int d = (i % 6);
    25.         float3 v = float3(x, y, z); + DirectionVector[d];
    26.         float3x4 m = float3x4(rot1[d], rot2[d], rot3[d], v);
    27.         //unity_ObjectToWorld = m;
    28.    
    29.         //float3x4 m = _Matrices[unity_InstanceID];
    30.         unity_ObjectToWorld._m00_m01_m02_m03 = m._m00_m01_m02_m03;
    31.         unity_ObjectToWorld._m10_m11_m12_m13 = m._m10_m11_m12_m13;
    32.         unity_ObjectToWorld._m20_m21_m22_m23 = m._m20_m21_m22_m23;
    33.         unity_ObjectToWorld._m30_m31_m32_m33 = float4(0.0, 0.0, 0.0, 1.0);
    34.     #endif
    35. }
    Is this correct? i'am added a TwoBytes of 4 (a, b, a1, b1) like you posted above.
    i'am curently let "b" and "a1, b1" out of the calculation. B is for the index of the TextureAtlas, this comes later into game.

    The Buffer has now a smaller size then before sending float3x4 with stride of 48. Thats great if i got it working xD

    I i want to use the b1 and a1 values later for other things, how should i extract this inside the shader? Because A was inside the first 16 bits and B is inside the last 16 bits, but where is a1 and b1?


    I got it working, by changeing the matrix.

    Update [SOLVED]: I got it working right, i make a misstake at Matrix.
    Code (CSharp):
    1.         unity_ObjectToWorld._m00_m01_m02 = rot1[d] / 1.0; //Size x
    2.         unity_ObjectToWorld._m10_m11_m12 = rot2[d] / 1.0; //Size y
    3.         unity_ObjectToWorld._m20_m21_m22 = rot3[d] / 1.0; //Rotation / Size z
    4.  
    5.         unity_ObjectToWorld._m03_m13_m23 = v; //Position
    6.         unity_ObjectToWorld._m33 = 1.0; //Immer 1.0 for Projection
    That's the correct Matrix now.
     
    Last edited: Dec 2, 2021
  10. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Some Questions about the packing of bytes, if i have 6 Bytes and want to send them to the gpu, where are is each byte packed?
     
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    I'm confused by the question, because you aren't giving enough information to be able answer.

    But think about it this way. The actual layout of the structs on the c# and hlsl sides do not actually matter that much. They certainly don't need to match. The CPU is passing a stream of bits to the GPU which is being interpreted in wherever way you want. That's how the "TwoTwoBytes" example worked. It was passing an array of 32 bits that on the CPU was struct of 4
    byte
    values, and on the GPU was an array of
    uint
    values. You could pass an array of 6
    byte
    value structs, and then interpret that as an array of uints still, where 4 values are packed into one
    uint
    , and 2 more are packed into the start / end of the next / previous
    uint
    . Though depending you might start looking at other ways of packing, especially if the values don't use the full 0-255 range a
    byte
    gives you. For example you're only use 6 values for the orientation. Assuming you're looking to separate those out from the position, that only needs 3 bits to store. So you could directly bit pack a
    uint
    on the CPU and potentially get all 6 values you need in that (depending on the precision you need for each).
     
  12. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    If this is my struct each value can be 0 to 255.
    I'am currently need something like this for testing, i hope i can avoid this.

    Code (CSharp):
    1.     struct MoreBytes {
    2.         byte a, b, c, d, e, f;
    3.     }
    My main problem is loading the world each frame if player position updates, thats impact my performance to much.

    Generating and Loading Once gives me >300Fps but i need a fast way to get all voxels in renderDistance and put them in the buffer.

    In my case: 1 Voxel has 6 quads, only the quads where the neighbor is air, are shown, i got it working to precalulate the quads once. Now all data is precalculated once.

    The best option for me can be a List of Chunks and each chunk has a Array of voxeldata, but nasted Arrays in Burst ist not allowed :/

    Is there a good way to get all positions (rounded like 0,0,1 or 10,2,3) at player location, and doing .Copy inside BeginWrite to the ComputeBuffer?
     
  13. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    That sounds more like a data management problem rather than anything directly related to rendering.

    And no, there’s no way to efficiently get only positions around you to copy out of a basic array. This is why things like Morton Z ordering are a thing, as well as data partitioning. The simplest solution is to break up your world into larger chunks, with each chunk having a pre-calculated AABB bounds. Test against those bounds, and then either render each chunk separately, or copy them into a larger flat array only when you need to change what you’re rendering.
     
  14. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    This looks like my problem solver, but how can i create a flatten 3d array with Z ordering?
     
  15. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Another solution that can work for me would be:

    1. I need all positions that the player sees like a "field of view"
    2. Then i convert them to the indexes for each position

    Now comes the point where I can't get any further.

    3. Copy individual values into the ComputeBuffer (like indexes, but never copy byte or int that is equal to 0)

    Some pseudo code:
    Code (CSharp):
    1. NativeArray<T>.Copy(data, firstIndex, bufferData, 0, amount) where i > 0;
    2.  
    If i got this working, i'am finish with all generating and loading :D
     
  16. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,343
    There isn't really any solution to frustum culling that isn't going through the list one by one and adding the entries that pass the visibility to a new list.

    However you might want to look into doing it on the GPU instead of on the CPU.
     
  17. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140

    I've got it working with Chunk System inside NativeMultiHashMap<int, uint2>, the loading is very fast.
    My current problem is:

    i store the whole world inside the NativeMultiHashMap, but i got the message:

    "Attempted to operate on {size} bytes of memory: nonsensical"

    where is the limit in capacity?