Search Unity

Bug Performance BUG ? HybridRendererV2 even slower than GameObject-Renderer ?

Discussion in 'Graphics for ECS' started by lic1227, Nov 10, 2021.

  1. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
    Recently I am doing some research about HRV2 + URP and trying to update megacity demo to test the performance on Android mid-end device.But I found maybe some reason here,the performance is worse than Megacity-Demo which use hybrid render V1,Then I did a performance compare,And I found the HRV2 is even slower than GameObject Renderer.Is this a bug ,or I didn't set up my test project correctly.


    Here is the test Scene:
    upload_2021-11-10_20-33-48.png


    Both Running on Android(Mix2s,SnapDragon 845) IL2CPP + ARMV64 + Dev + Vulkan
    Entities / URP / HybridRender Version is the lastest

    Burst is On,Leak Detection is OFF

    A. GameObject-Based

    CPU Rendering Code Cost about 6.4ms
    upload_2021-11-10_20-32-53.png

    B. HybridRenderV2 + URP
    CPU Rendering Code Cost about (3.91 + 4.9) ms
    upload_2021-11-10_20-32-42.png


    Addition compare,Editor PC i7-9700
    HybridV2+URP [vs] HybridV1+HDRP

    upload_2021-11-10_20-36-27.png


    Need some help,THANKs VERY MUCH!
    @SebastianAaltonen @arnaud-carre
     

    Attached Files:

  2. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    Hello!

    This is definitely an unexpected result. In our internal tests we have typically seen HRV2 to outperform both GameObjects and HRV1, both in Megacity and in other benchmark tests. It's possible that this is a bug or that something has been configured weirdly.

    The HRV2 capture has two specific things of note:
    • There are large WaitForPresent / Present spikes in the frame, which suggests that the scene might be bottlenecked by GPU rendering instead of CPU. If possible, it would be good to take a GPU performance capture from both versions and look at the GPU side difference.
    • UpdateHybridChunksJob takes a lot of time. If you are not modifying any entities each frame, then this could indicate a very high chunk count.
    Looking at the HRV1/HRV2 comparison, we can see that in the HRV2 version the chunk utilization is extremely bad, and the vast majority of entities seem to be in single entity chunks. We also see the HybridBatchPartition component, which is used to force this situation in order to sort transparencies correctly. I would suggest checking two things:
    • Make sure that your Materials are not marked as transparent unnecessarily. Transparent materials cause Hybrid Renderer to add HybridBatchPartition in order to sort transparencies correctly, but it costs a lot of performance. Regular opaque materials should not have it. It is also possible to disable this behavior with the DISABLE_HYBRID_TRANSPARENCY_BATCH_PARTITIONING scripting define, in which case you might see transparency ordering issues (in HRV1 you will always get this behavior, it cannot render sorted transparencies correctly).
    • Try profiling with incremental conversion disabled. This has been known in some cases to increase chunk fragmentation.
    I attached an edited version of the HRV1/HRV2 comparison picture to highlight where the problem is visible.
     

    Attached Files:

  3. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
    Thanks so much for quickly reply! @JussiKnuuttila.
    I have already use DISABLE_HYBRID_TRANSPARENCY_BATCH_PARTITIONING scripting define and not work with LiveLink(Is it making incremental conversion disable),The test result improve a lot,rendering cost is lower chunk utilization is improved,but ata all still NOT GOOD

    HybridRenderV2 + URP
    CPU Rendering Code Cost about 2.73(Improve 1.2ms) + 5.38
    Still Lower(at least,not faster) than GameObjectRendering
    upload_2021-11-11_10-9-36.png


    Editor PC i7-9700 HybridV2+URP
    1.45ms -> 0.59ms,
    but lower than HRV1 (0.07ms)
    upload_2021-11-11_10-11-25.png


    If it is necessary,I can report a bug with the test repo.THANKS

     
  4. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    It is expected that HRV2 will always have some small baseline CPU cost, because it checks every chunk for changed data. HRV1 permanently classifies batches as static or dynamic, and has a very low CPU cost for static batches and a very high CPU cost for dynamic ones. In addition, adding new static batches also carries a high cost in HRV1. HRV2 does all of this automatically.

    When comparing against HRV1, it's a good idea to profile total frame times instead of just system update times, as HRV1 also has a significantly higher draw call setup cost compared to HRV2, because it reuploads instance data from CPU to GPU each frame, even for static batches. In 100% static scenes, we would expect HRV1 to be slightly faster on the DOTS system side (like in your capture), but slower on the main thread rendering and render thread side.

    We could consider adding some kind of feature (for example, a tag component) to mark entities as completely static and have Hybrid Renderer skip checking them for changed data to reduce this overhead. We will also continue to try to optimize the system update cost in general. If you want to test this kind of change yourself, you can try altering the query used by the Hybrid Renderer for UpdateHybridChunksJob to exclude entities with a certain tag component, and then add that tag component to your static entities.
     
    apkdev likes this.
  5. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
    Thanks very much,I will try it latter.By the way,Is my caputre showing that HRV2 is slower than GameObject Rendering in my case,a performance bug or by design?
     
  6. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    It is definitely unexpected, so I guess we could say it's a performance bug. In the capture, HRV2 rendering is faster with the actual rendering (DoRenderLoop), but has the added DOTS system overhead which is more than the difference. If the DOTS system overhead is optimized more, it would be faster again.

    Possibly this overhead could be reduced by adding a way to opt out some entities from data update checking, e.g. via a tag component.
     
  7. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
    Do you need me to sumbit a bug with the repo so you can fix that?
     
  8. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
  9. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    I don't think this specific case (GameObject vs HRV2) requires a formal bug report. We will keep optimizing the Hybrid Renderer to try to make it faster. Thanks a lot for bringing this to our attention though!
     
    lic1227 and optimise like this.
  10. lic1227

    lic1227

    Joined:
    Jul 14, 2015
    Posts:
    41
    Thank you! I'm really looking forward to your update