Search Unity

Unexpected and Inefficient ComputeBuffer behavior in OpenGL

Discussion in 'General Graphics' started by ResolveRyan, Jan 25, 2022.

  1. ResolveRyan

    ResolveRyan

    Joined:
    Aug 24, 2021
    Posts:
    2
    Hello folks!

    My team is encountering some unexpected behavior when utilizing Compute Buffers specifically when using OpenGL. This behavior works as expected when building for DirectX.

    Here is the specific issue as I understand it (I'm a Unity Engineer, this was discovered by our Graphics Engineer). For some context, this is being used while rendering a large amount of mesh data using a ComputeBuffer and DrawProcedural.

    • We are using a single 256MB Buffer to store our mesh data, split into 64kb chunks (pages).
    • Each frame we may write new data to some number of these pages in the buffer. This is currently done using the ComputeBuffer.SetData() function.
    • We are using the mesh data for two purposes each frame:
      • 1. As an input to a compute shader to perform culling
      • 2. As an input to a vertex shader to actually render the mesh data.
    • On D3D11, this is working fine.
    • On OpenGL, the rendering doesn't work as expected.
    • In RenderDoc with OpenGL, we observe the following:
      • In our code, we only create one buffer to store mesh data, but in RenderDoc, it looks like a different buffer object can be used in different frames. This seems to suggest that Unity is creating multiple buffer objects.
      • We noticed a glUnmapBuffer() function call. The mapped range appears to be the entire length of the buffer i.e. 256 MB. There's nothing in our code where we try to write data to the entire range of the buffer all at once, so we're not entirely sure what Unity is trying to do here.
      • After unmapping the buffer, the buffer appears to contain all zeros i.e. no valid data.
    • Based on the above observations in RenderDoc, our Graphics Engineer speculates that Unity is trying to perform some sort of synchronization by creating multiple buffer objects and copying data between them (hence the glUnmapBuffer). Copying 256 MB of data would of course not be very efficient, and clearly it's not working either, as the buffer contains all zeros after unmapping.
    • A fix appears to be to use ComputeBufferMode.SubUpdates and ComputeBuffer.BeginWrite() rather than SetData(). However, this has the disadvantage that it is completely unsynchronized and so we would likely have to write manual sync code to ensure we avoid visual artifacts.
    This is the best description we have of the issue, so hopefully someone from Unity can chime in and may have an immediate answer. If not, it should be possible for us to abstract this behavior into a demo project, but it would take some time.

    Thanks!
     
  2. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    Hi,

    There are few things here. First of all zeroing of the contents is definitely a bug in the OpenGL backend, so if you could whip up a small repro project and file a bug that would help getting it fixed.

    Secondly I heavily recommend just using the SubUpdates mode. It's meant specifically for this kind of thing (remember, it's write only from CPU). As for synchronization you can use async readbacks, not necessarily from that buffer, from anywhere to signal that the corresponding frame has passed on the GPU. Just go with the assumption that you're usually running maybe 2-3 frames ahead in C# compared to the GPU and just use async readback to make sure you don't move further ahead. Hybrid renderer that renders DOTS entities uses this mode for it's buffers that are used in a similar way how you're using them.

    Your Graphics Engineer was spot on, that's exactly what we're trying to do. We do resource versioning rather heavily in most of our backends, except in DX11 as there drivers handle that part. It allows things to run in parallel and means we don't need to break renderpasses when SetData is being called iteratively between drawcalls. Basically imagine a project that does Draw(buffer1);SetData(buffer1);Draw(buffer1);SetData(buffer1);... in a loop using the same buffer. This keeps projects like this, of which there are many, working reasonably fast. In DX12 and Vulkan this is essentially mandatory to some degree, similar thing is done in our OpenGL backend mostly due to certain drivers not doing this well enough internally.

    We do this tracking on per GPU resource basis to save CPU cost. It works like a charm for small buffers where it's used for vast majority of the time. It doesn't work well on large buffers where tracking ranges that are in use might work better. So it just does full copy on them too. Therefore even when the OpenGL backend bug is fixed it will still be slower than it could be. This does also apply on DX11. It will be slower than what it could be compared to using explicit synchronization.
     
  3. ResolveRyan

    ResolveRyan

    Joined:
    Aug 24, 2021
    Posts:
    2
    Thank you for your response!

    We will put together a small project for the OpenGL bug and submit a report.

    Thank you for the detail on the *why* of the situation as well, I can see how this functionality would work well for the average use case. We are definitely outside the bounds of average use case, so it's understandable we'd have to do some extra work in some areas like this. The tip about async readback is helpful as well, so we'll investigate that route.

    Thanks again!
     
  4. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Interesting, I'm doing something similar (procedural drawing with geometry data being read from structured buffers in the vertex shader) and facing several issues on Switch (including GPU crashes), and this gives me some ideas.

    Unfortunately I'm on an older Unity, so no BeginWrite for me. I'm thinking of refactoring so the CPU never directly writes into the buffers I need to modify. Instead, I write the modified data to a smaller buffer, and use a compute shader to perform the place the data in the target indices.

    It's a shame there's no ComputeBuffer.CopyData(), would make my life easier. I have some data produced by compute shaders which would benefit from being stored in constant buffers to be consumed by other shaders, and in DX11 you cannot create a buffer that is both constant and structured. Maybe a native plugin can work around that.