ComputeBuffer.BeginWrite.

sebas77 · Mar 14, 2022

https://docs.unity3d.com/2021.2/Documentation/ScriptReference/ComputeBuffer.BeginWrite.html

When I test this almost 2 years ago I found out that it was working ONLY with the Vulkan renderer. Is this still the case?

joelv · Mar 15, 2022

No, it should be working on all platforms (Hybrid Renderer builds on this). It might take a slow path on some (not slower than SetData).
There is now also a variant for GraphicsBuffer as well (2022.1 and up IIRC).

There are a lot of caveats around this API though. You need to keep track of what the GPU is reading so you don't overwrite something by mistake. Also it might place your buffer in write combined memory or CPU side visible memory that is slower to read from the GPU. Use carefully and take care how you read it on the GPU and write to it on the CPU.

sngdan · Mar 15, 2022

@joelv

It would really be nice if you could post an example and watch outs for this API.
I played with it the same time @sebas77 did and @eizenhorn must have, too, as he gave me some tips here in the forum (unofficial watch outs and possibly outdated). It seems there have been further updates 2022.1.

This was an interesting API, in particular with ECS/DOTS and I wanted to download unity again, when/if entities 0.50 comes out. A question I had back then was to understand how you would update specific parts of the compute buffer. Ie if you keep track of chunk changes in ecs (and ensure order is not changing) how do you do the begin / end write… (1) full buffer and you only update the changed indices or (2) you do X read/end writes for each of the X changes chunks and only update those slices…

joelv · Mar 16, 2022

sngdan said: ↑

@joelv
It would really be nice if you could post an example and watch outs for this API.
Click to expand...

For the hybrid renderer we do this:

First we get a list of all chunks that has changed, and calculate a set of sizes from that, this gives us an upper bound of how many upload buffers we need. We then do BeginWrite on these upload buffers on the main thread, and kick a series of upload jobs that copies GPU visible components to our upload buffers. This is a simple memcpy operation that copies directly do GPU visible memory, no extra steps through other buffers as is needed when you do SetData.

After all the copy jobs are done we EndWrite on the main thread again and do a series of compute shader dispatches. These dispatches will execute on the GPU/Graphics timeline and patch up the big buffer where we store all component data. This has the upside that we do not keep track of different ranges in the big buffer per frame. It's always in a good state once the series of dispatches are done.

The only thing we need to ensure here is that the upload buffers are not reused too soon. Initially we did a fixed N frame multibuffering but it seems that is not to be trusted. So currently we keep track of where the GPU is using a tiny AsyncReadback per frame. It's certainly not nice but currently there is no exposed API to keep track of what frame is done on the GPU.

You can of course implement a ring buffer or do multiple begin/end write instead of multiple upload buffers but it's not guaranteed to be that good because of some hoops we have to jump through to get this fast on some devices. DX11 notoriously do not allow persistently mapped buffers, and many devices have problems with doing this kind of things from multiple threads (the graphics device usually lives on a separate thread).

What to watch out for is: Call BeginWrite only once per frame on a buffer, calling it multiple times can hit slow paths. Always write to allocated buffers linearly, never read on the CPU. Read only once from the buffer on the GPU.

Hope this clears at least some things up, but it's not simple at all sadly.

sngdan · Mar 16, 2022

Thank you very much. I had this issue of using buffers to soon and then used fixed N frame multi buffering (based on tip from Eizenhorn) - I also had the idea of async read back but never implemented it. Thank you for clearing this up and sharing what the current unity process is. Would definitely be helpful if this gets some Better API support in the future…

unity_fime5ey3qd · Mar 18, 2022

It works fine with (no RW)StructruedBuffer only on DX11.
It works fine with both (no RW)StructruedBuffer and RWStructuredBuffer on Vulkan.

sebas77 · Mar 18, 2022

so it works fine only if I read from buffers? I think that's fine as I have to write from the CPU and read from the GPU

Tony_Max · Sep 23, 2023

joelv said: ↑

What to watch out for is: Call BeginWrite only once per frame on a buffer, calling it multiple times can hit slow paths. Always write to allocated buffers linearly, never read on the CPU. Read only once from the buffer on the GPU.
Click to expand...

What does it means to write to allocated buffers lineraly?

Asking because I'm trying to find a source of graphical artifacts in my system. I use
ComputeBuffer
and call
BeginWrite
on it to write from parallel jobs. Jobs can write to arbitrary range of buffer depending on chunk changes, is this allowed? I read values in shader through
StructurredBufer
.

c0d3_m0nk3y · Sep 23, 2023

I think, joelv was referring to write-combined memory:
https://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/

Tony_Max · Sep 23, 2023

c0d3_m0nk3y said: ↑

I think, joelv was referring to write-combined memory:
https://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/
Click to expand...

From what I understand from this page there are two things which I should avoid:
* reading from buffer, which I don't
* write to whole range even if items aren't used

If this correct then latter may be a source of issue, but then how to efficiently update only ranges of ComputeBuffer. In my case I want do that because I want to update data on GPU only for changed chunks.

sngdan · Sep 23, 2023

If I recall correctly, I was keeping an index of changed chunks and tried to split the compute buffer writes in continuous larger blocks (including some unchanged chunks, but avoiding large continuous unchanged blocks)… can’t remember if this was old set data api or begin/end write…or both…

edit:
https://forum.unity.com/threads/200k-dynamic-animated-sprites-at-80fps.695809/page-2#post-5024381

in buffer system
value.y = chunk.Count;

in render system
UpdateRenderBuffer

but I see it’s not the latest version I had one with begin/end write , thx to eizehorn, I think back in the days..

Tony_Max · Sep 23, 2023

sngdan said: ↑

If I recall correctly, I was keeping an index of changed chunks and tried to split the compute buffer writes in continuous larger blocks (including some unchanged chunks, but avoiding large continuous unchanged blocks)
Click to expand...

As I understand it will lead to calling BeginWrite multiple times in the same frame which is also seems like "bad thing".

joelv said: ↑

After all the copy jobs are done we EndWrite on the main thread again and do a series of compute shader dispatches. These dispatches will execute on the GPU/Graphics timeline and patch up the big buffer where we store all component data. This has the upside that we do not keep track of different ranges in the big buffer per frame. It's always in a good state once the series of dispatches are done.

The only thing we need to ensure here is that the upload buffers are not reused too soon. Initially we did a fixed N frame multibuffering but it seems that is not to be trusted. So currently we keep track of where the GPU is using a tiny AsyncReadback per frame. It's certainly not nice but currently there is no exposed API to keep track of what frame is done on the GPU.
Click to expand...

Here joelv tells about some "upload" buffers and some another "big buffer". For me it sounds like a way to write to GPU buffer only for changed ranges without breaking things.

sngdan · Sep 24, 2023

Tony_Max said: ↑

As I understand it will lead to calling BeginWrite multiple times in the same frame which is also seems like "bad thing".
Click to expand...

Yes. I can’t remember what worked best in the end. This was 4 years ago and Begin/end write & dots was new then…tried many things…and rendered million+ sprites…and recall that this was the last thing to optimize (only changes to gpu)…I recall that I had to pool to wait 4 (wait frames) x multiple writes per frame … but don’t remember if this was better or worse then 1 full buffer per frame…

so there is still no good solution and whole buffer works best is what you are saying?

edit: just re-read the thread again. Sorry…long time not active and forgetful… @joelv … you explained it well, but I would still not know how you patch up on the gpu. If I recall correctly, I was trying to write to blocks to the same big buffer on the gpu by read/write to parts of it, based on changed chunk ranges (which seems not the right way). Effectively I had a copy of the big buffer in memory and then decided which continues blocks to upload via begin/end write per frame

Tony_Max · Sep 24, 2023

sngdan said: ↑

so there is still no good solution and whole buffer works best is what you are saying?
Click to expand...

First of all it works without any artifacts. Second - yes, this might be more optimized then waiting frames and also it will be more honest.

But I'm more asking than saying

Search Unity

ComputeBuffer.BeginWrite.

sebas77

joelv

Unity Technologies

sngdan

joelv

Unity Technologies

sngdan

unity_fime5ey3qd

sebas77

Tony_Max

c0d3_m0nk3y

Tony_Max

sngdan

Tony_Max

sngdan

Tony_Max

Search Unity

Unity ID

Useful Searches

ComputeBuffer.BeginWrite.

Unity Technologies

Unity Technologies