Search Unity

Question The `afterCullingOutputReady.Invoke` is taking 2x longer than rendering of the entire scene

Discussion in 'General Graphics' started by NightElfik, Jan 22, 2021.

  1. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    I have a scene with a few hundred of trees. When debugging rendering performance, I have found out that `afterCullingOutputReady.Invoke` takes majority of the frame time. And the more trees are on the screen the longer it takes. I have also found out that this time is only caused by trees in the view frustum as trees outside of view do not increase this time.

    All trees are clones of 3 prefabs generated by SpeedTree, with original materials, and with 3 LoDs where the last level is billboard (all generated by SpeedTree).

    Is there any way to cut down on this time? I had to reduce the number of trees in the scene to half to get FPS above 60.

    Here is an output from profiler: CulllingSpeedTreeLods.PNG

    Thanks!
     
  2. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    Anyone has an idea what is causing the excessive time during culling here? I am thinking that it is the LoD group on the trees resolving what to show? Any ideas on how to optimize this?
     
  3. georgerh

    georgerh

    Joined:
    Feb 28, 2020
    Posts:
    72
    @NightElfik Did you ever find out what the cause is? We are having the exact same problem (also seems to be either SpeedTree or LodGroup related).

    We are on Unity 2020.3.33f1, by the way. I don't remember seeing this in 2019.3.

    speedtree.jpg
     
  4. georgerh

    georgerh

    Joined:
    Feb 28, 2020
    Posts:
    72
    This is being caused by the Tree component that all SpeedTree trees have.

    I also repro'd it with a tree from the asset store. Also tested it with 2019.4.15f1, 2022.2.0a18.2602 and SpeedTree7 - same problem.

    I'll file a bug report.

    Update: Painting trees doesn’t get rid of the problem but seems to have less of a cost for the same number of trees as compared to placing them as individual game objects. But that only works if you don't need additional components for making them harvestable, for example.
     
    Last edited: Jul 5, 2022
  5. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    Thanks for the reply, I was not able to solve this yet. I have just updated to Unity 2021 LTS and it is still causing issues. I am strongly considering rewriting the tree rendering using instancing to avoid this penalty.
     
  6. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    Actually, after upgrading from 2019 LTS to 2021 LTS, I can see significant drop in performance caused by trees, 10-20% FPS drop after upgrading. When trees are hidden or removed, no FPS drop is observed.

    Sigh, we might need to stay on 2019 LTS until this is resolved or until we rewrite trees rendering...
     
  7. georgerh

    georgerh

    Joined:
    Feb 28, 2020
    Posts:
    72
    Spent two weeks to reverse engineer the Tree component and replace it with a custom one. Got rid of almost the entire afterCullingOutputReady cost.

    No response from Unity on the ticket yet.
     
  8. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    That's cool, would you mind sharing some learnings? Did you remove the overhead or move it somewhere else in your script? We have just removed the component completely, loosing trees animations and smooth LOD transitions, but getting 30% more FPS.

    I was also investigating a way to make custom trees that are cheap to animate, do you have any insight on how they do the animations? I was thinking to do a "normal" looped animation but bake it to a texture and use instancing + shader animations. These would not be affected by the direction and strength of wind as speed trees are.
     
  9. georgerh

    georgerh

    Joined:
    Feb 28, 2020
    Posts:
    72
    Removed overhead entirely. The nice thing about the way the SpeedTree wind animation works is that you basically just have to set the wind parameters once and then plug in the current time which is already available in shaders via the _Time.y variable. You only lose that you can fade-in/-out each variable with a different graph.

    These are the parameters you have to set
    _ST_WindVector.xyz = Wind direction, will be normalized if it isn’t; Vector length can’t be used for wind strength
    _ST_WindVector.w = Amount of fine detail? 0-1 where 0=coarse and 1=fine but very subtle - default 0.5

    _ST_WindGlobal.x = _Time.y, SpeedTree8 automatically adds instance pos for variation
    _ST_WindGlobal.y = Amount of oscillation independent of wind direction? 0-20, default 10
    _ST_WindGlobal.z = How much height contributes to wind linearly?, 0.001-0.1 (must not be zero), default 0.015
    _ST_WindGlobal.w = How much height contributes to wind exponentially?, 1.0-10.0, default 1.75

    _ST_WindBranch.x = _Time.y
    _ST_WindBranch.y = How much branches are affected by wind, 0-1
    _ST_WindBranch.zw = unused

    _ST_WindBranchTwitch.x = Amount of branch twitching, 0-1, default 0.5
    _ST_WindBranchTwitch.y = Frequency scale factor for branch twitching, 0.01-5.0, default 1.0
    _ST_WindBranchTwitch.zw = unused

    _ST_WindBranchWhip.x = Amount of whipping for palm tree branches, 0-1, default 1.0
    _ST_WindBranchWhip.yzw = unused

    _ST_WindBranchAnchor.xyz = Anchor point for palm branches, must have positive adherence
    _ST_WindBranchAnchor.w = How much of an effect the anchor has on palm branches, 0-1, default 1.0

    _ST_WindBranchAdherences.x = How much the branches react to wind direction, 0-1, default 1.0
    _ST_WindBranchAdherences.y = How much frond branches react to wind direction in addition to the global value, 0-1, default 1.0
    _ST_WindBranchAdherences.zw = unused

    _ST_WindTurbulences.x = Time scale factor for branch turbulence, 0.01-100.0, default 1.0
    _ST_WindTurbulences.yzw = unused

    _ST_WindLeaf#Ripple.x = _Time.y
    _ST_WindLeaf#Ripple.y = Amount of ripple for leaves in group 1 or 2, 0-1, default 1.0
    _ST_WindLeaf#Ripple.zw = unused

    _ST_WindLeaf#Tumble.x = _Time.y
    _ST_WindLeaf#Tumble.y = Amount of lifting for leaves in group 1 or 2, 0-1, default 0.2
    _ST_WindLeaf#Tumble.z = Amount of twisting for leaves in group 1 or 2, 0-1, default 0.3
    _ST_WindLeaf#Tumble.w = Amount of rotation for leaves in group 1 or 2, 0-1, default 0.0 (the LeafTumble parameter is called fAdherence but I think this is rotation)

    _ST_WindLeaf#Twitch.x = Amount of twitching for leaves in group 1, 0-1, default 1.0
    _ST_WindLeaf#Twitch.y = Contribution power of leaf twitching in group 1, 0.01-100.0, default 1.0
    _ST_WindLeaf#Twitch.z = _Time.y
    _ST_WindLeaf#Twitch.w = unused

    _ST_WindFrondRipple.x = _Time.y
    _ST_WindFrondRipple.y = Amount of frond ripple, 0-1, default 1.0
    _ST_WindFrondRipple.z = Time scale factor for frond ripple, 0.0-100.0, default 1.0
    _ST_WindFrondRipple.w = How much lighting is adjusted for frond ripple, 0-1 default 1.0

    _ST_WindAnimation.x = _Time.y (just for palm tree turbulence)
    _ST_WindAnimation.yzw = unused

    The following parameters must be multiplied by wind strength:
    _ST_WindBranch.y
    _ST_WindBranchAdherences.y
    _ST_WindLeaf#Ripple.y
    _ST_WindLeaf#Tumble.xyz
    _ST_WindLeaf#Twitch.x
    _ST_WindFrondRipple.y

    Also _WindEnabled must be set and != 0 (boolean)

    I recommend scaling _Time.y by a random number near 1 to break synchronization between trees

    You may want to convert the original parameters to static globals so that you can set them in the shader. Make sure to put your own parameters into an instancing block. You can add some turbulence to the wind direction but you have to do it per tree because otherwise it looks like a choreography.

    You need an additional component on the same object that has the WindZone component because the WindZone component doesn't set any shader parameters. The wind direction is simply the forward vector of the object that has the WindZone compoonent. The wind strength is windZone.windMain. You can also pass windZone.winTurbulence to the shaders.

    Disclaimer: I found this by trial and error and by looking a SpeedTreeWind.cginc- no guarantee that it is correct. Some of it is also documented here: https://docs.speedtree.com/doku.php?id=advancewind

    PS: This is what we did but it's pretty crazy and I'm not saying everybody should do it.
     
    Last edited: Jul 30, 2022