Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Feedback [Feature Request] Vertex Buffer with LockBufferForWrite when possible.

Discussion in '2022.2 Beta' started by nishikinohojo, Jun 12, 2022.

  1. nishikinohojo


    Aug 31, 2014
    I don't know the internal details of these things, but I believe we should be able to use vertex buffer directly in Burst job at least on some platforms.
    Currently VertexBuffer obtained by using GetVertexBuffer does not have a LockBufferForWrite flag.

    I couldn't test this with 2022.2a as I'm suffering from a load of crash. But as far as I'm aware by reading release note no difference has made from 2022.1 so I'm requesting it here.

    If possible, it would be really nice!
    I have made custom implementation of Bone Animation and Skinning with Burst because I wanted to steal tasks out of GPU and give it to CPU.(It is running much faster than builtin Animator and CPU skinning, of course.)
    Currently the only proper way to feed data into vertex buffer from CPU is making new Graphics Buffer with LockBufferForWrite enabled and use it for (Burst) calculation, then copy it to Vertex Buffer by using Graphics.CopyBuffer.
    It already runs bloody fast but I would like to avoid Graphics.CopyBuffer whenever possible, apparently.

    I'm posting a thread for the first time so if I'm doing anything wrong, please let me know.
    Last edited: Jun 12, 2022
  2. TJHeuvel-net


    Jul 31, 2012
    What you want to avoid is sending data from your CPU to the GPU. This will cause the GPU to have to wait for this.

    As long as you do your work on the CPU, e.g. Burst, you *have* to do this. It is unavoidable.
  3. nishikinohojo


    Aug 31, 2014
    Thank you for your reply.
    On dGPU with dedicated VRAM, I know there's not much I can do.
    But specifically, on unified memory device like mobile phones, I think we don't need multiple buffers in this scenario. If I'm wrong, sorry! (Or just my English was so bad that I couldn't express what I was implying about in the first comment? I used the word "on some platform" to indicate iOS. Sorry! Seems like I've learnt something new today.)

    But now I'm thinking! Even if it is possible, this kind of platform-specific optimization may end up notoriously complicated API. I'm sure I can handle whatever it will be, but it might not be appropriate for game engine.
    At this point my implementation of skinning runs fast enough, so I think I can retract this topic.
    Last edited: Jun 27, 2022
  4. nishikinohojo


    Aug 31, 2014
    I am reviving this thread.

    After a few month, I rethink this and I concluded there is no reason for Vertex Buffer not to have LockBufferForWrite.
    I mean, this is currently what is happening in my environment with DX12.


    EndGraphicsJobs(24ms) means waiting for uploading GraphicsBuffer to the GPU and CopyBuffer.
    What I'm doing here is

    1. I LockBufferForWrite for GraphicsBuffer manually created like this to schedule skinning job in burst.

    2. After completing the job, Graphics.CopyBuffer is invoked to copy skinning result to actual vertex buffer. Like this.

    BTW This is my custom skinning n animation solution.

    Seriously, CopyBuffer should be avoided in this situation because I have CPU side copy of the buffer. I mean, I am making mesh isReadable true.(I think it is not that important for LockBufferForWrite though.)
    In some platform LockBufferForWrite seems get a direct pointer to GPU resource but even if so burst can handle that because that is the point of the API.(I am assuming this happens in unified memory architecture)

    If I can directly use a vertex buffer in Job, this is the expected result.(I boldly emulated just by removing Graphics.CopyBuffer)
    EndGraphicsJobs finishes much faster.


    This makes difference.
    I do not expect this for 2022.2, but in the future, I think this will help somebody if implemented properly.
    Last edited: Aug 24, 2022