Question Sending Data to GPU, ComputeShader/Buffer, MaterialProperty and Hlsl Script

SuperFranTV · Nov 30, 2021

I'am currently have some Questions about data structure and how can i send only the minimum i need to the gpu and whats about the other things. (I'am using the URP)

1. I need to send 2 values each between 0 and 255.

So there are small types:
byte = 1
short = 2
int = 4
uint = 4

But byte can't send to the GPU right, so the minimum i can use is short?

2. I'am using DrawMeshInstancedProcedural some Questions about
Mesh as Quad, Material all OK:

SubMeshes don't need that if i set them to 0, it's okay or ist there a way to remove them complete?

Bounds i change them to zero or any other value, nothing changed when i hit play?

BufferCount, if i sending a larger Buffer but only filled half, what is used?

MaterialPropertyBlock only for SetBuffer if i use one StructuredBuffer for all my things, is there a way to send it directly without MaterialPropertyBlock?

3. There are 2 Files
HLSL File, than can used to implement some code into the ShaderGraph right? (.hlsl File)
ComputeShader to let the Gpu calculate some things, with Kernel thing and numthreats (.compute File)

How to get the perfect number of threats to a ComputeShader?

No my Question is, is it usefull to use HLSL File and ComputeShader together on ShaderGraph or a Shader?
Is there some Performance Difference?

If iam clear about this all i can write my best option to the ground.
Hope someone can explain me some things.
Thank you all

SuperFranTV · Nov 30, 2021

Edit:

2. BufferCount i found out that using a large ComputeBuffer but filling it only with some content, impacts the performance very heavy.

bgolus · Nov 30, 2021

SuperFranTV said: ↑

1. I need to send 2 values each between 0 and 255.

So there are small types:
byte = 1
short = 2
int = 4
uint = 4

But byte can't send to the GPU right, so the minimum i can use is short?
Click to expand...

It depends on how you're sending data. But generally you can't guarantee the GPU will recognize anything but 32 bit types:
int
,
uint
,
float
. HLSL has no concept of
byte
or
short
. There are
fixed
and
half
variable types, but they're defined as a signed floating point value that "can hold a value between -2 and 2 with a precision of at least 1/256" or is "at least 16 bits", and a 32 bit
float
fulfills the requirements for both, so most GPUs just use that.

The easiest solution to passing two bytes to shaders is ... don't. Just pass two float values. You can define them as
int
or
uint
in the shader file, and there's even a
material.SetInt()
function, but it's a lie. Under the hood Unity casts that
int
to a
float
when you call
SetInt()
, then casts it back to an
int
or
uint
depending on what the shader wants.

However if you're passing a lot of values via a compute buffer, you can take advantage of the fact c# does support the byte variable type, and that the compute buffer is passed to the GPU as ray bits that can be interpreted any way you want.

Code (csharp):

// c#

// create compute buffer

ComputeBuffer cb = new ComputeBuffer(numObjects, 2); // 2 bytes

// important: numObjects needs to be an even number

// struct of two bytes

public struct TwoBytes

{

public byte a;

public byte b;

}

// create array of bytes

TwoBytes[] data = new TwoBytes[numObjects];

// set the data in the array

for (int i=0; i<numObjects; i++)

{

data[i].a = //object byte value A

data[i].b = //object byte value B

}

// copy data in array into the compute buffer

cb.SetData(data);

// pass it to the shader calling SetBuffer() where appropriate

Code (csharp):

// shader code

StructuredBuffer<uint> _Data; // yes, a 32 bit uint, not a struct, not bytes

uint2 GetDataAtIndex(uint index)

{

// real index is half of input index because shader is working with 32 bit uints and not bytes

// this means the two bytes per index are packed into the first 16 and last 16 bits of the 32 bit uint

uint realIndex = index / 2;

uint packedData = _Data[realIndex];

// bit shift over 16 bits if we're trying to get the odd index

if (index % 2 == 1)

packedData = (packedData >> 16);

return uint2(

(packedData >> 0) & 0xF, // extract the first byte

(packedData >> 8) & 0xF // extract the second byte

);

}

If you're looking to pack this into an existing struct with other data in it, you're likely best off just padding out the struct to keep it byte aligned to 32 bits.

Code (csharp):

// c# struct

public struct MyDataStruct

{

public Vector3 position;

public byte a;

public byte b;

public short padding;

} // sizeof(MyDataStruct) == 16

// hlsl struct

struct myDataStruct {

float3 position;

uint packedData;

}; // "get data" function just uses the last 2 lines to unpack

SuperFranTV said: ↑

SubMeshes don't need that if i set them to 0, it's okay or ist there a way to remove them complete?
Click to expand...

You always need at least 1 submesh. A mesh with zero submeshes is a mesh with no data.

SuperFranTV said: ↑

Bounds i change them to zero or any other value, nothing changed when i hit play?
Click to expand...

Bounds are used by
MeshRenderer
components for CPU side frustum and occlusion culling. When you use
DrawMeshInstancedProcedural()
, and several of the similar functions, you're telling Unity to skip any of that and you're handling it yourself, especially since the position data you're passing in might not even ever by known on the CPU side.

SuperFranTV said: ↑

BufferCount, if i sending a larger Buffer but only filled half, what is used?
Click to expand...

Junk data. Hopefully zeros, but I don't know if it's guaranteed.

SuperFranTV said: ↑

MaterialPropertyBlock only for SetBuffer if i use one StructuredBuffer for all my things, is there a way to send it directly without MaterialPropertyBlock?
Click to expand...

That's about as direct as you get. You could call
SetBuffer()
on the material directly, but if you're rendering multiple sets of meshes with the same material you'll want to use the property blocks.

SuperFranTV said: ↑

How to get the perfect number of threats to a ComputeShader?
Click to expand...

I think we'd all wish we knew that answer.

SuperFranTV said: ↑

No my Question is, is it usefull to use HLSL File and ComputeShader together on ShaderGraph or a Shader?
Is there some Performance Difference?
Click to expand...

Shader Graph is a shader generator. It spits out HLSL shader code that is otherwise nearly identical to what you could write by hand when writing a vertex fragment shader. The advantage of Shader Graph is it "just works" with the lighting systems without you haven't to do anything.

Writing a vertex fragment shader by hand may produce slightly more efficient / faster shader code as you can be very explicit about making sure the shader only does the things you need it to, but most of the time it won't be a significant difference.

However you can't use a compute shader with Shader Graph, not directly. You can run a compute shader to generate data that you store in a compute buffer, then use that buffer with a Shader Graph that's has a Custom Function node pointing at an HLSL file that accesses that buffer to extract the relevant data. But you can't include a compute shader into a Shader Graph. And at this time you can't create compute shaders using Shader Graph.

SuperFranTV · Dec 1, 2021

bgolus said: ↑

However if you're passing a lot of values via a compute buffer, you can take advantage of the fact c# does support the byte variable type, and that the compute buffer is passed to the GPU as ray bits that can be interpreted any way you want.
Click to expand...

First of all, I thank you for the detailed explanation.
I tried something yesterday evening and first tried to send only the Int as an index to the GPU.
Seen here, i got some errors and a strange problem.
https://forum.unity.com/threads/strange-thing-inside-shader-textfile-hlsl.1205551/

First i need to convert the index to a float4x4 for the Matrix of the Mesh, if this is finally done, i can change from int to byte script you put in here, thank you for that this is very usefull in my case.

bgolus said: ↑

You always need at least 1 submesh. A mesh with zero submeshes is a mesh with no data.
Click to expand...

But i allready put a "0" into that field and all works fine?

Code (CSharp):

Graphics.DrawMeshInstancedProcedural(mesh, 0, material, bounds, buffer.count, propertyBlock);

bgolus said: ↑

Bounds are used by
MeshRenderer
components for CPU side frustum and occlusion culling. When you use
DrawMeshInstancedProcedural()
, and several of the similar functions, you're telling Unity to skip any of that and you're handling it yourself, especially since the position data you're passing in might not even ever by known on the CPU side.
Click to expand...

So i can ignore the bounds, because iam only sending that data, that i want to see?

bgolus said: ↑

That's about as direct as you get. You could call
SetBuffer()
on the material directly, but if you're rendering multiple sets of meshes with the same material you'll want to use the property blocks.
Click to expand...

I'am useing only 1 type of Mesh (Quad) so i test it out how is the difference, thank you for that fact.

bgolus said: ↑

I think we'd all wish we knew that answer.
Click to expand...

i found that on this forum, i think it can be a beginning.

Code (CSharp):

Remember that the numbers you pass to Dispatch() are the amount of groups, not threads. If you want to process 4096 items and your kernel group size is (128, 1, 1), you need to call Dispatch(32, 1, 1).

bgolus said: ↑

However you can't use a compute shader with Shader Graph, not directly.
Click to expand...

My way to do this is, creating the data on cpu, then the computeShader should convert or doing some math on that data, this data then is used in the textfile (.hlsl) inside ShaderGraph to finally let the shader do its thing.

I'm a little smarter now than I was before.

bgolus · Dec 1, 2021

SuperFranTV said: ↑

But i allready put a "0" into that field and all works fine?
Click to expand...

Ah! I misunderstood the question! I was thinking about the settings on the mesh you passed to the
DrawMeshInstancedProcedural()
, not the actual parameters of that function!

Let's do this again.
submeshIndex: You want it to be 0 because that's the first submesh in the mesh. If you were using a mesh with multiple materials you'd need to call
DrawMeshInstancedProcedural()
multiple times, once for each submesh. The quad mesh just has one submesh.

bounds: This does need to be a position that's in view of the camera. If the world origin is in view, a zero bounds will work. While the individual objects won't get frustum culled automatically, the entire draw mesh call might be.

SuperFranTV said: ↑

I'am useing only 1 type of Mesh (Quad) so i test it out how is the difference, thank you for that fact.
Click to expand...

It's less about if you're using one mesh and more if you're calling
DrawMeshInstancedProcedural()
multiple times per frame reusing the same material. You'd need a unique material per
DrawMeshInstancedProcedural()
call.

SuperFranTV · Dec 1, 2021

bgolus said: ↑

ComputeBuffer cb = new ComputeBuffer(numObjects, 2); // 2 bytes
Click to expand...

I implemented it to my code but i got the error:

Code (CSharp):

Invalid stride 2 for Compute Buffer - must be greater than 0, less or equal to 2048 and a multiple of 4.

i dont know that the stride of a computeBuffer should be minimum 4?

bgolus · Dec 1, 2021

Ah, yeah. I guess Unity "knows" that the GPU can only interpret 32 bit variables. I was trying to sidestep that by using a stride of 2 and having you make sure you use an even
numObject
count.

Just means you'll have to deal with some of the logistics of the values actually being packed on the C# side as well.

You'd have to use a struct with 4 byte variables in it with "Object i+0" and "Object i+1" represented.

bgolus · Dec 1, 2021

Code (csharp):

// c#

int bufferSize = Mathf.CeilToInt((float)numObjects / 2f);

// create compute buffer

ComputeBuffer cb = new ComputeBuffer(bufferSize , 4); // 4 bytes

// struct of two bytes

public struct TwoTwoBytes

{

public byte a0;

public byte b0;

public byte a1;

public byte b1;

}

// create array of bytes

TwoTwoBytes[] data = new TwoTwoBytes[bufferSize];

// set the data in the array

for (int i=0; i<numObjects; i+=2)

{

data[i].a0 = //object i byte value A

data[i].b0 = //object i byte value B

data[i].a1 = //object i+1 byte value A

data[i].b1 = //object i+1 byte value B

}

The shader code would be unchanged.

SuperFranTV · Dec 2, 2021

bgolus said: ↑

Code (csharp):

// c#

int bufferSize = Mathf.CeilToInt((float)numObjects / 2f);

// create compute buffer

ComputeBuffer cb = new ComputeBuffer(bufferSize , 4); // 4 bytes

// struct of two bytes

public struct TwoTwoBytes

{

public byte a0;

public byte b0;

public byte a1;

public byte b1;

}

// create array of bytes

TwoTwoBytes[] data = new TwoTwoBytes[bufferSize];

// set the data in the array

for (int i=0; i<numObjects; i+=2)

{

data[i].a0 = //object i byte value A

data[i].b0 = //object i byte value B

data[i].a1 = //object i+1 byte value A

data[i].b1 = //object i+1 byte value B

}

The shader code would be unchanged.
Click to expand...

For now i got no errors, but the DrawMesh isn'd drawing anything?

Code (CSharp):

uint2 GetDataAtIndex(uint index) {

// real index is half of input index because shader is working with 32 bit uints and not bytes

// this means the two bytes per index are packed into the first 16 and last 16 bits of the 32 bit uint

uint realIndex = index / 2;

uint packedData = _Indexes[realIndex];

// bit shift over 16 bits if we're trying to get the odd index

if (index % 2 == 1)

packedData = (packedData >> 16);

return uint2(

(packedData >> 0) & 0xF, // extract the first byte

(packedData >> 8) & 0xF // extract the second byte

);

}

void ConfigureProcedural () {

#if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)

uint2 i2 = GetDataAtIndex(unity_InstanceID);

int i = i2.x;

int y = i / (128 * 128);

int x = (i - y * 128 * 128) / 128;

int z = i - y * 128 * 128 - x * 128;

int d = (i % 6);

float3 v = float3(x, y, z); + DirectionVector[d];

float3x4 m = float3x4(rot1[d], rot2[d], rot3[d], v);

//unity_ObjectToWorld = m;

//float3x4 m = _Matrices[unity_InstanceID];

unity_ObjectToWorld._m00_m01_m02_m03 = m._m00_m01_m02_m03;

unity_ObjectToWorld._m10_m11_m12_m13 = m._m10_m11_m12_m13;

unity_ObjectToWorld._m20_m21_m22_m23 = m._m20_m21_m22_m23;

unity_ObjectToWorld._m30_m31_m32_m33 = float4(0.0, 0.0, 0.0, 1.0);

#endif

}

Is this correct? i'am added a TwoBytes of 4 (a, b, a1, b1) like you posted above.
i'am curently let "b" and "a1, b1" out of the calculation. B is for the index of the TextureAtlas, this comes later into game.

The Buffer has now a smaller size then before sending float3x4 with stride of 48. Thats great if i got it working xD

I i want to use the b1 and a1 values later for other things, how should i extract this inside the shader? Because A was inside the first 16 bits and B is inside the last 16 bits, but where is a1 and b1?

I got it working, by changeing the matrix.

Update [SOLVED]: I got it working right, i make a misstake at Matrix.

Code (CSharp):

unity_ObjectToWorld._m00_m01_m02 = rot1[d] / 1.0; //Size x

unity_ObjectToWorld._m10_m11_m12 = rot2[d] / 1.0; //Size y

unity_ObjectToWorld._m20_m21_m22 = rot3[d] / 1.0; //Rotation / Size z

unity_ObjectToWorld._m03_m13_m23 = v; //Position

unity_ObjectToWorld._m33 = 1.0; //Immer 1.0 for Projection

That's the correct Matrix now.

SuperFranTV · Dec 20, 2021

bgolus said: ↑

If you're looking to pack this into an existing struct with other data in it, you're likely best off just padding out the struct to keep it byte aligned to 32 bits.
Click to expand...

Some Questions about the packing of bytes, if i have 6 Bytes and want to send them to the gpu, where are is each byte packed?

bgolus · Dec 20, 2021

I'm confused by the question, because you aren't giving enough information to be able answer.

But think about it this way. The actual layout of the structs on the c# and hlsl sides do not actually matter that much. They certainly don't need to match. The CPU is passing a stream of bits to the GPU which is being interpreted in wherever way you want. That's how the "TwoTwoBytes" example worked. It was passing an array of 32 bits that on the CPU was struct of 4
byte
values, and on the GPU was an array of
uint
values. You could pass an array of 6
byte
value structs, and then interpret that as an array of uints still, where 4 values are packed into one
uint
, and 2 more are packed into the start / end of the next / previous
uint
. Though depending you might start looking at other ways of packing, especially if the values don't use the full 0-255 range a
byte
gives you. For example you're only use 6 values for the orientation. Assuming you're looking to separate those out from the position, that only needs 3 bits to store. So you could directly bit pack a
uint
on the CPU and potentially get all 6 values you need in that (depending on the precision you need for each).

SuperFranTV · Dec 21, 2021

bgolus said: ↑

I'm confused by the question, because you aren't giving enough information to be able answer.
Click to expand...

If this is my struct each value can be 0 to 255.
I'am currently need something like this for testing, i hope i can avoid this.

Code (CSharp):

struct MoreBytes {

byte a, b, c, d, e, f;

}

My main problem is loading the world each frame if player position updates, thats impact my performance to much.

Generating and Loading Once gives me >300Fps but i need a fast way to get all voxels in renderDistance and put them in the buffer.

In my case: 1 Voxel has 6 quads, only the quads where the neighbor is air, are shown, i got it working to precalulate the quads once. Now all data is precalculated once.

The best option for me can be a List of Chunks and each chunk has a Array of voxeldata, but nasted Arrays in Burst ist not allowed :/

Is there a good way to get all positions (rounded like 0,0,1 or 10,2,3) at player location, and doing .Copy inside BeginWrite to the ComputeBuffer?

bgolus · Dec 22, 2021

That sounds more like a data management problem rather than anything directly related to rendering.

And no, there’s no way to efficiently get only positions around you to copy out of a basic array. This is why things like Morton Z ordering are a thing, as well as data partitioning. The simplest solution is to break up your world into larger chunks, with each chunk having a pre-calculated AABB bounds. Test against those bounds, and then either render each chunk separately, or copy them into a larger flat array only when you need to change what you’re rendering.

SuperFranTV · Dec 22, 2021

bgolus said: ↑

Morton Z ordering
Click to expand...

This looks like my problem solver, but how can i create a flatten 3d array with Z ordering?

SuperFranTV · Dec 22, 2021

bgolus said: ↑

That sounds more like a data management problem rather than anything directly related to rendering.
Click to expand...

Another solution that can work for me would be:

1. I need all positions that the player sees like a "field of view"
2. Then i convert them to the indexes for each position

Now comes the point where I can't get any further.

3. Copy individual values into the ComputeBuffer (like indexes, but never copy byte or int that is equal to 0)

Some pseudo code:

Code (CSharp):

NativeArray<T>.Copy(data, firstIndex, bufferData, 0, amount) where i > 0;

If i got this working, i'am finish with all generating and loading

bgolus · Dec 24, 2021

There isn't really any solution to frustum culling that isn't going through the list one by one and adding the entries that pass the visibility to a new list.

However you might want to look into doing it on the GPU instead of on the CPU.

SuperFranTV · Dec 26, 2021

bgolus said: ↑

There isn't really any solution to frustum culling that isn't going through the list one by one and adding the entries that pass the visibility to a new list.

However you might want to look into doing it on the GPU instead of on the CPU.
Click to expand...

I've got it working with Chunk System inside NativeMultiHashMap<int, uint2>, the loading is very fast.
My current problem is:

i store the whole world inside the NativeMultiHashMap, but i got the message:

"Attempted to operate on {size} bytes of memory: nonsensical"

where is the limit in capacity?

Search Unity

Unity ID

Useful Searches

Question Sending Data to GPU, ComputeShader/Buffer, MaterialProperty and Hlsl Script