Search Unity

Question Optimized rendering of specific known GameObject

Discussion in 'Universal Render Pipeline' started by TheoKiloo, Aug 8, 2022.

  1. TheoKiloo

    TheoKiloo

    Joined:
    Dec 5, 2016
    Posts:
    10
    I am working on a custom render feature that will allow rendering 3D objects within UI-defined RectTransforms with per-object customization.

    The current setup is working, but I'd like to optimize the performance to improve scalability when using many images at the same time.

    upload_2022-8-8_14-55-16.png

    In simple terms, my current setup is as follows:

    1. Find all active 3D images.
    2. Change all the model instances to be on a "NoRender" layer.
    3. For each of the images:
    a. Mask the area: draw a stencil-only pass of a quad that matches the RectTransform rect.
    b. Change the layer of the relevant model instance to be a "CustomRender" layer.
    c. Apply custom settings.
    d. Cull the visible geometry (filtering for "CustomRender" layer).
    e. Draw opaque renderers.
    f. Draw transparent renderers.
    g. Unmask the area: draw a stencil-zero'ing pass using the data from step a.
    h. Revert layer modification, camera settings, etc.

    When profiling, I see that the culling takes up ~35% of the render time, which seems like an obvious candidate for optimization, especially since I already know which objects to render.

    I am currently using ScriptableRenderContext.DrawRenderers to render objects in steps e and f, as it seemed to be the simplest way of rendering the hierarchy. Unfortunately it requires a
    CullingResults
    struct, which I haven't been able to customize with a known list of objects.

    Is there any other approach that I can use instead?
     
    Tartiflette likes this.
  2. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,048
    Have you looked into URP's renderobjects using stencil?
    It is a bit less scalable maybe (unless if the viewing angle always is straight on, then you can use 1 stensil override)
     
  3. TheoKiloo

    TheoKiloo

    Joined:
    Dec 5, 2016
    Posts:
    10
    Not entirely sure what kind of stencil setup you're thinking of. Could you elaborate a bit? My custom render feature mostly builds upon the RenderObjects code, and I do use the stencil already in order to draw only the masked area (stencil id 1, comp equals, pass keep).

    The system would need to handle a highly flexible amount of 3D objects, so I would not be able to just add separate layers for each image either.
     
  4. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,048
    Stuff like this is how I made portals in my AR project. Seems like it would work here as well:

     
  5. TheoKiloo

    TheoKiloo

    Joined:
    Dec 5, 2016
    Posts:
    10
    In essence, placing the 3D objects in the right spot, write the stenciled area and then do a single cull/render pass for all 3D objects?

    Unfortunately I don't think this approach will work. The rendered gameobjects would appear in other images' masked area due to the stencil value. Each image should only be allowed to render the assigned gameobject.

    upload_2022-8-9_15-54-1.png

    In addition to that, the camera can also be customized for each object, with each image having custom projection matrices (based on the center of the image, instead of the center of the screen).

    There might be a slight optimization possible in terms of incrementing the stencil id per-image instead of having an unmask pass, but that puts a limit to how many images can be on the screen at any given time, and it doesn't seem like the unmasking is what's making it expensive.
     
  6. TheoKiloo

    TheoKiloo

    Joined:
    Dec 5, 2016
    Posts:
    10
    I've found an optimization that has provided a very nice performance boost. It does introduce a limitation of 32 active 3D elements at a time, but that's good enough for my intended purpose.

    The change I've done is to eliminate the per-image cull and instead perform a single cull operation where all the objects of the visible images are active at the same time. Before culling, I find all the child renderers for each individual image and assign an incrementing mask value to their renderingLayerMask (all objects in Image1 is assigned 1 << 0, Image2 is 1 << 1, Image3 is 1 << 2 and so on).

    Then before calling
    context.DrawRenderers
    I make sure to update the
    FilteringSettings.renderingLayerMask to match the value of the image I'm rendering. This is a much cheaper way of rendering only certain objects rather than doing a full cull operation.

    After rendering the objects, I then set their renderingLayerMask back to 0, to ensure subsequent cameras don't catch the same objects.

    It's worth mentioning that I've also changed the recursive layer assignment to only affect GameObjects with renderers, but it doesn't seem like it had a big impact on performance.

    If more than 32 images are needed, I believe it should be possible to loop this entire operation so another culling operation would be performed, but I'll leave that problem for future me.
     
    dylannorth and DevDunk like this.