Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Entities Graphics on Mobile, Tile Binning cost is higher compared to not using Entities Graphics

Discussion in 'Graphics for ECS' started by ehww, May 16, 2023.

  1. ehww

    ehww

    Joined:
    Jan 30, 2020
    Posts:
    8
    Is there anything that can be done (any tweaks to buffer sizes etc), or a reason why the Tile Binning costs would be higher when using Entities Graphics ?

    I'm testing the same scene with a fixed camera view, regular URP approach, and then a version that has all the MeshRenderers in a Subscene so that it uses Entities Graphics. Also using Vulkan for both.

    I can see that with Entities Graphics, the DrawSRPBatcher cost and overall RenderLoop cost is lower (~3.5ms), yet using Perfetto the Tile Binning takes longer and the app no longer performs at framerate. (18 ms total)

    Without using Entities Graphics, DrawSRPBatcher and the RenderLoop costs a bit more (~4ms), but Tile Binning costs less and the app runs at framerate. (12 ms total)
     
  2. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    331
    Which hardware are you running on? When running with Entities Graphics, what kind of instance counts do you see in the Frame Debugger? What kinds of shaders do your objects have? If you are using URP/Lit, I suggest trying a Shader Graph instead in case it helps.
     
  3. ehww

    ehww

    Joined:
    Jan 30, 2020
    Posts:
    8
    @JussiKnuuttila thanks for the response, this is on Quest2
    I'm using a basic Lit ShaderGraph for all materials in the test scene.

    There are 3 Hybrid Batch Groups, 2 only have 1 draw call, but the 3rd main group has 150 DrawInstanced Calls & 150 instances, 664,225 vertices & 1,477,359 indices (492,453 triangles)

    from profiler also:
    SetPass Calls: 4
    DrawCalls: 153
    Batches: 150
    Triangles: 500.2k
    Vertices: 674.4k

    I realize it's a lot of geometry, but the scene is performant without BRG / Entities Graphics (around 10-11,000 APP_T), so we were doing a test to see how much more perf we could squeeze out of it if using BRG.

    It did definitely reduce the CPU cost of issuing the drawcalls in the Unity Profiler, the only thing on the Unity Profiler side that went up in cost is the EarlyUpdate.XRUpdate (from ~9ms without Entities Graphics, to ~13ms with). It runs at around 16,000 APP_T with BRG

    So then I looked in Perfetto and noticed to tile binning seemed to cost more overall (I'm not sure if that is from BRG, does it access meshes in a different way on the GPU?)


     
  4. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    331
    BRG does not access meshes in a different way, but it does access transform matrices differently (through an SSBO on Vulkan, instead of loading from a regular UBO). I don't know exactly how the tile binning works, but it probably needs to access the transform matrix so it could be affected by this difference.

    Trying out GLES might be a worthwhile experiment. BRG on GLES uses a UBO based code path instead of SSBOs, so it could have different performance characteristics on some hardware, although it does instance ID based loads from the UBO so the actual difference is usually small on the Android HW that I have tested.

    I see from the numbers that you have about 150 draw calls, and about 150 instances, which sounds like your instances are being rendered one by one in single instance draw calls. Unless this is expected (e.g. every instance has a unique mesh/material combination), it might be another worthwhile experiment to see whether those could be batched together in a smaller amount of draw calls, which typically improves Android GPU performance especially on Adreno-based hardware.
     
  5. ehww

    ehww

    Joined:
    Jan 30, 2020
    Posts:
    8
    @JussiKnuuttila thanks again!
    I tried with GLES31 and it is at framerate (about 11,000 APP_T) The thing I noticed is on the CPU side rendering is slower, taking closer to 2ms for the DrawOpaques section (whereas on Vulkan without BRG it was about 1ms, and with BRG nearly half that)
    Also the profiler overall on GLES31 is more "unstable", fluctuating above and below 13ms. On Vulkan it was much more stable and flat (both with and without BRG)

    I also tried a test of not using mesh combined geometry, and it is still at framerate at 12-13,000 APP_T but definitely stressing the GPU as its at 99% usage

    When not using the mesh combined geometry, there are 13 Hybrid Batch Groups. There are 2 big groups:
    one with 257 DrawInstanced Calls & 1709 Instances, and another with 235 DrawInstanced Calls & 1568 Instances
    The rest are small with ~10 instances or less

    AFAIK Tile Binning runs the vertex shader to check what tiles geometry covers, but I also don't know much about it. I guess it seems reasonable that SSBO's might be slowing down that process on this hardware, but also don't have insight into why. As far as I could see from the Perfetto traces it was just the binning process that seemed huge.

    Also with BRG and GLES31 the binning was back at 12ms which is what I was seeing on Vulkan without BRG
     
  6. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    331
    This definitely sounds like SSBO vs UBO could be the key difference GPU performance wise here, which is definitely a surprising result. Would it be possible for you to share a repro project?

    In scenes that have many copies of the same mesh (as you seem to have), combining meshes is likely not as profitable when using BRG, as it will increase mesh sizes and the amount of memory that the GPU has to load.

    I would also suggest checking how many Hybrid Per Instanced properties your Shader Graph has. On PC type hardware they don't have much extra cost, but on Android enabling this setting increases the GPU load of the shader. If there are unnecessary ones, disabling the setting might speed the shader up.
     
  7. ehww

    ehww

    Joined:
    Jan 30, 2020
    Posts:
    8
    @JussiKnuuttila thank you, I will get a support ticket made that I can provide a repro project with