Search Unity

ComputeBuffer.BeginWrite.

Discussion in 'General Graphics' started by sebas77, Mar 14, 2022.

  1. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,644
  2. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    No, it should be working on all platforms (Hybrid Renderer builds on this). It might take a slow path on some (not slower than SetData).
    There is now also a variant for GraphicsBuffer as well (2022.1 and up IIRC).

    There are a lot of caveats around this API though. You need to keep track of what the GPU is reading so you don't overwrite something by mistake. Also it might place your buffer in write combined memory or CPU side visible memory that is slower to read from the GPU. Use carefully and take care how you read it on the GPU and write to it on the CPU.
     
    sebas77 likes this.
  3. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    @joelv

    It would really be nice if you could post an example and watch outs for this API.
    I played with it the same time @sebas77 did and @eizenhorn must have, too, as he gave me some tips here in the forum (unofficial watch outs and possibly outdated). It seems there have been further updates 2022.1.

    This was an interesting API, in particular with ECS/DOTS and I wanted to download unity again, when/if entities 0.50 comes out. A question I had back then was to understand how you would update specific parts of the compute buffer. Ie if you keep track of chunk changes in ecs (and ensure order is not changing) how do you do the begin / end write… (1) full buffer and you only update the changed indices or (2) you do X read/end writes for each of the X changes chunks and only update those slices…
     
  4. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    For the hybrid renderer we do this:

    First we get a list of all chunks that has changed, and calculate a set of sizes from that, this gives us an upper bound of how many upload buffers we need. We then do BeginWrite on these upload buffers on the main thread, and kick a series of upload jobs that copies GPU visible components to our upload buffers. This is a simple memcpy operation that copies directly do GPU visible memory, no extra steps through other buffers as is needed when you do SetData.

    After all the copy jobs are done we EndWrite on the main thread again and do a series of compute shader dispatches. These dispatches will execute on the GPU/Graphics timeline and patch up the big buffer where we store all component data. This has the upside that we do not keep track of different ranges in the big buffer per frame. It's always in a good state once the series of dispatches are done.

    The only thing we need to ensure here is that the upload buffers are not reused too soon. Initially we did a fixed N frame multibuffering but it seems that is not to be trusted. So currently we keep track of where the GPU is using a tiny AsyncReadback per frame. It's certainly not nice but currently there is no exposed API to keep track of what frame is done on the GPU.

    You can of course implement a ring buffer or do multiple begin/end write instead of multiple upload buffers but it's not guaranteed to be that good because of some hoops we have to jump through to get this fast on some devices. DX11 notoriously do not allow persistently mapped buffers, and many devices have problems with doing this kind of things from multiple threads (the graphics device usually lives on a separate thread).

    What to watch out for is: Call BeginWrite only once per frame on a buffer, calling it multiple times can hit slow paths. Always write to allocated buffers linearly, never read on the CPU. Read only once from the buffer on the GPU.

    Hope this clears at least some things up, but it's not simple at all sadly.
     
    Nyapsora, ekakiya and sngdan like this.
  5. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    Thank you very much. I had this issue of using buffers to soon and then used fixed N frame multi buffering (based on tip from Eizenhorn) - I also had the idea of async read back but never implemented it. Thank you for clearing this up and sharing what the current unity process is. Would definitely be helpful if this gets some Better API support in the future…
     
  6. unity_fime5ey3qd

    unity_fime5ey3qd

    Joined:
    Jul 20, 2020
    Posts:
    2
    It works fine with (no RW)StructruedBuffer only on DX11.
    It works fine with both (no RW)StructruedBuffer and RWStructuredBuffer on Vulkan.
     
    sebas77 likes this.
  7. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,644
    so it works fine only if I read from buffers? I think that's fine as I have to write from the CPU and read from the GPU
     
  8. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    353
    What does it means to write to allocated buffers lineraly?

    Asking because I'm trying to find a source of graphical artifacts in my system. I use
    ComputeBuffer
    and call
    BeginWrite
    on it to write from parallel jobs. Jobs can write to arbitrary range of buffer depending on chunk changes, is this allowed? I read values in shader through
    StructurredBufer
    .
     
  9. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    675
  10. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    353
    From what I understand from this page there are two things which I should avoid:
    * reading from buffer, which I don't
    * write to whole range even if items aren't used

    If this correct then latter may be a source of issue, but then how to efficiently update only ranges of ComputeBuffer. In my case I want do that because I want to update data on GPU only for changed chunks.
     
  11. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    If I recall correctly, I was keeping an index of changed chunks and tried to split the compute buffer writes in continuous larger blocks (including some unchanged chunks, but avoiding large continuous unchanged blocks)… can’t remember if this was old set data api or begin/end write…or both…

    edit:
    https://forum.unity.com/threads/200k-dynamic-animated-sprites-at-80fps.695809/page-2#post-5024381

    in buffer system
    value.y = chunk.Count;

    in render system
    UpdateRenderBuffer

    but I see it’s not the latest version I had one with begin/end write , thx to eizehorn, I think back in the days..
     
    Last edited: Sep 23, 2023
  12. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    353
    As I understand it will lead to calling BeginWrite multiple times in the same frame which is also seems like "bad thing".

    Here joelv tells about some "upload" buffers and some another "big buffer". For me it sounds like a way to write to GPU buffer only for changed ranges without breaking things.
     
  13. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    Yes. I can’t remember what worked best in the end. This was 4 years ago and Begin/end write & dots was new then…tried many things…and rendered million+ sprites…and recall that this was the last thing to optimize (only changes to gpu)…I recall that I had to pool to wait 4 (wait frames) x multiple writes per frame … but don’t remember if this was better or worse then 1 full buffer per frame…


    so there is still no good solution and whole buffer works best is what you are saying?

    edit: just re-read the thread again. Sorry…long time not active and forgetful… @joelv … you explained it well, but I would still not know how you patch up on the gpu. If I recall correctly, I was trying to write to blocks to the same big buffer on the gpu by read/write to parts of it, based on changed chunk ranges (which seems not the right way). Effectively I had a copy of the big buffer in memory and then decided which continues blocks to upload via begin/end write per frame
     
    Last edited: Sep 24, 2023
  14. Tony_Max

    Tony_Max

    Joined:
    Feb 7, 2017
    Posts:
    353
    First of all it works without any artifacts. Second - yes, this might be more optimized then waiting frames and also it will be more honest.

    But I'm more asking than saying :)