Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Help me optimize this IF out of my shader

Discussion in 'Shaders' started by Guedez, Jan 26, 2020.

  1. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    As I understand, GPUs really hate ifs, something like 'it takes as long as going through both if and else regardless of going through either', but this one IF is pretty hard to figure out how to get rid of it (for me at least, I hardly ever mess with shaders).
    It's purpose is to do a LOD per instance before the LOD is done in the CPU, so that it happens gradually rather than suddenly (see WEBM (https://imgur.com/a/d3Arnbh) of before/after). Also, if there is a method to get rid of a Graphics.DrawMeshInstancedIndirect instance before it gets to the surface shader part, please tell me, because my current 'culling' does make it disappear, but have no impact on performance (I tested by making the IF always be true, and the performance actually got worse and I got a "Curl error 56: Receiving data failed with unitytls error code 1048578").
    here is the IF
    Code (CSharp):
    1.  
    2.         void setup()
    3.         {
    4. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    5.             (Lots of unrelated code)
    6.             float dist = distance(_WorldSpaceCameraPos, pos)+16;
    7.             dist += (((unity_InstanceID * 8763) % 27817) / 27814.0) * sqrt(dist);//random factor to make it smooth rather than a obvious circle of grass removal
    8.  
    9.             float x = max(0, dist - 25) / 4;
    10.             _closestPower = min(64, ceil(max(1, pow(2, ceil(log2(x))))));
    11.             if (_closestPower>4) {
    12.                 if((_closestPower==64 && id > dists2) ||
    13.                     (_closestPower == 32 && id > dists1.w) ||
    14.                     (_closestPower == 16 && id > dists1.z) ||
    15.                     (_closestPower == 8 && id > dists1.y) ||
    16.                     (_closestPower == 4 && id > dists1.x))
    17.                 _Length = 0;
    18.             }
    19.  
    20.             _Width *= min(5, max(1, _closestPower / 32.0));
    21. #endif
    22.         }
    23.  
     
    Last edited: Jan 26, 2020
  2. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,539
    You're not really doing any extra math inside the branch though, you're just setting
    _Length = 0;
    So there isn't really much cost involved there on modern GPUs. Also, if the branch wouldn't be taken for the whole pixel warp, then it's not always quite like "all paths are evaluated", that's primarily the case where 1 pixel in the group of pixels that are rendered together ends up taking the path, and the rest of those pixels have to wait on that 1 pixel before they can push their results out and free up processing for another group (though this also depends on if it's a dynamic branch or not, which has it's own minor cost but can be worth it if there's a decent amount of costly instructions to avoid).

    And if a branch is dependent on a value that is simply passed into the vert or frag program, not calculated in it (unless calculated in vert and passed to frag), the correct path can be preemptively taken.

    Also, is there a reason you can't use the Crossfade option on the Unity LODGroup component? Then it won't be instant, it will do a dithered transition between the LODs.

    (Also... Your first condition is >4, yet you have a sub-condition that checks == 4, which could never be true in that case.)
     
    Guedez likes this.
  3. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    Glad to know, whatever the result would be, would be pretty hard to read, if that's unneeded, I will leave as is
    it is now
    if (_closestPower>=4) {
    , thanks for pointing it out
    I am not really using Unity's LOD system, since a LOD version of my grass tile is the same tile drawing less instances, I am not familiar with Unity's LOD system, so I don't know if it would help me, but for my system, I simply tell the tile to draw less blades each distance level, usually 1/2 of the previous LOD level.
    The fact that it draws less blades evenly rather than randomly with big holes and lines appearing everywhere is because I sort the positions compute buffer in a certain pattern rather than accept whatever the order the compute shader originally generated it as
     
  4. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    meaning that this:
    Code (CSharp):
    1.                 if (_type == 0) {
    2.                     height = min(v.vertex.y, _Length);
    3.                     v.vertex.x -= ((v.texcoord.x - .5f) * pow(height, 1 + _LeafFormat)) * _WidthDecrease;
    4.                     v.vertex.x = lerp(0, v.vertex.x, _Width);
    5.                     v.vertex.y = lerp(lerp(height, sin(height * 2) / 2, _Gravity), 0, clamp(0, 0.85, _Bend)) * _Lenght_Mult;
    6.                     v.vertex.z = lerp(lerp(0, .5f - cos(height * 2) / 2, _Gravity), height, _Bend) * _Lenght_Mult;
    7.                     v.texcoord.x = (v.vertex.x + 0.5) / 4 + _type * 0.5;
    8.                 }
    9.                 else if(_type == 1) {
    10.                     height = min(v.vertex.y, _Length)*2;
    11.                     v.vertex.x -= ((v.texcoord.x - .5f) * pow(height, 1 + _LeafFormat)) * _WidthDecrease;
    12.                     v.vertex.x = lerp(0, v.vertex.x, _Width);
    13.                     v.vertex.y = lerp(lerp(height, sin(height * 2) / 2, _Gravity), 0, clamp(0, 0.85, _Bend)) * _Lenght_Mult;
    14.                     v.vertex.z = lerp(lerp(0, .5f - cos(height * 2) / 2, _Gravity), height, _Bend) * _Lenght_Mult;
    15.                     v.texcoord.x = (v.vertex.x + 0.5) / 4 + _type * 0.5;
    16.                 }
    will not actually double the vert shader processing time because _type was set in the setup() part of the shader?
    I couldn't have read better news, I can actually make tons and tons of subtypes of grass and grass like plants if that's the case
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    If by "the setup() part of the shader" you mean you're calculating the _type from some other data (like something per vertex, or from sampling a texture, etc) then it might "double the vertex shader processing time". Or it might not. GPUs are funny.

    If _type is a material property, then it might not. But it might still depending on your platform. Modern desktop GPUs using a material property to switch between
    if
    , or even using an actual
    switch
    can be nearly as fast as using shader variants (
    #if
    and
    #pragma shader_feature
    or
    #pragma multi_compile lines
    ). Mobile and WebGL ... it's a total crapshoot for if it'll be fast or not. Hopefully it will be, but there's no guarantee.

    Another thing, in the above example for your
    _type
    switch, both of those branches are basically identical apart from that first line having a *2. Most shader compilers will notice that and rewrite the above example to be equivalent to:
    Code (csharp):
    1. height0 = min(v.vertex.y, _Length)
    2. height1 = height * 2;
    3. height = _type == 1 ? height1 : height0;
    4. v.vertex.x -= ((v.texcoord.x - .5f) * pow(height, 1 + _LeafFormat)) * _WidthDecrease;
    5. v.vertex.x = lerp(0, v.vertex.x, _Width);
    6. v.vertex.y = lerp(lerp(height, sin(height * 2) / 2, _Gravity), 0, clamp(0, 0.85, _Bend)) * _Lenght_Mult;
    7. v.vertex.z = lerp(lerp(0, .5f - cos(height * 2) / 2, _Gravity), height, _Bend) * _Lenght_Mult;
    8. v.texcoord.x = (v.vertex.x + 0.5) / 4 + _type * 0.5;
    Basically, it won't ever bother doing any kind of branch because there's no need. Shader compilers are really good at removing duplicate work.


    The short version is try things and see if it's faster or not.
     
  6. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    It's a instanced shader, so I find it pretty strange it would render some instances with a shader and others with another, pretty cool if the GPU can actually do that