Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

How expensive is Smooth step in Shaders for Mobile?

Discussion in 'Shaders' started by resetme, Oct 26, 2017.

  1. resetme

    resetme

    Joined:
    Jun 27, 2012
    Posts:
    204
    Hi Masters.

    Again shader questions, right now for optimization i try to put all that i can inside vertex, use simple meshs, for lerps colors i use a ramp texture and some times i just save the sin cos inside a lut. And sure, no float as possible, half and fixed.

    Yesterday i sent a couple of shaders to the engineer team and they were a little of *this is too much*
    The deal was that i had 4 smoothstep inside frag, is that so expensive? and if is, how could i avoid it?

    Im using those smooth step to blurry some shapes created using uv coord.

    Appreciate any input!
     
  2. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    823
    Hey there, just looked it up, this is what happens in smoothstep:
    Code (CSharp):
    1. smoothstep(edge0, edge1, x) {
    2.     t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0);
    3.     return t * t * (3.0 - 2.0 * t);
    4. }
    Source

    This doesn't sound too expensive, I think it shouldn't be that much of a problem. You could replace them with cheap lerps if the smoothness is not that important... or see if you can get rid of one or two. But I think in the end 4 smoothsteps won't kill the GPU.
     
  3. nat42

    nat42

    Joined:
    Jun 10, 2017
    Posts:
    353
    Are look up tables for sin & cos really a win for mobile shaders?

    I think lerps are probably more than adequate for antialiasing shapes drawn in a fragment shader.

    Suspect it's hard to evaluate the efficiency without the shader. You might be targetting GLES2 hardware and the smoothstep varies the coord you sample thus you have dependant texture reads (if that's the case perhaps you can take a non-dependant bilinear texture read and weight it to approximate the gradient?)
     
  4. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    lookup for sin and cos was slightly faster or breaking even on the original iphone 3g for me, presumably because memory access was roughly as slow.

    For shaders, I do not think SmoothStep (lerp probably done in silicon) is an issue for anyone. Worry when you get to pow and friends.

    Why not benchmark it?
     
  5. nat42

    nat42

    Joined:
    Jun 10, 2017
    Posts:
    353
    Sorry, but I believe that glsl never sees the smoothstep due to the way Unity transpiles shaders, it becomes what Johannski posted I think, so GLES can't optimise it to anything better (not that I believe it would, so it's like what 6 instructions (sub, rcp, mul, sub, mul, mul, mad) ?)

    Isn't it pontential more expensive than pow() (exp2 & log2 & mul?)

    EDIT: if there's 4 of them and targetting older vector hardware, perhaps try and do all four smoothsteps at once on a vector? It'll also look like less ;)
     
  6. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Sorry I meant to clarify lerp is done in hardware. At least I thought it was! sorry if I was mistaken.
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Each smoothstep is about 7~8 instructions. If the first two inputs into the smoothsteps are constant values defined in the shader (ie: smoothstep(0.2, 0.5, x)) then this will be optimized down to about 3~5 instructions by the compiler (depending on the target). If all you need from the smoothstep is the range remapping & clamping, and you're using constant values, then it can be done in basically a single instruction or two (depending on the target) if reformed into a MAD (multiply add).

    So, for example smoothstep(0.2, 0.5, x) is:
    float t = clamp((x - 0.2) / (0.5 - 0.2), 0.0, 1.0); // remap and clamp
    x = t * t * (3.0 - 2.0 * t); // apply smoothstep curve


    If you remove the curve and look at just the first line without the clamp it looks like this:
    (x - 0.2) / (0.5 - 0.2)

    You can refactor that to this:
    x * (1.0 / (0.5 - 0.2)) - (0.2 / (0.5 - 0.2))

    Which looks way uglier, but it can be solved down to this:
    x * 3.3333 - 0.6666

    That's a MAD, which GPUs can do in a single step. Add back the clamp (or use the saturate function) and that'll either be one or two instructions on the GPU! Shader compilers are smart, but only so far. The original first line and the last line are identical in their results, but the compiler won't be smart enough to do that work for you. It can however take that full refactored middle line and plug in fixed values, it will solve it down to the last line.

    So, that means you can do this:

    Code (CSharp):
    1. fixed remap(a, b, x) {
    2.     return x * (1.0 / (b - a)) - (a / (b - a));
    3. }
    4.  
    5. ...
    6.  
    7. x = saturate(remap(0.2, 0.5, x)); // this is just 1 or 2 instructions!
    However, if you're not using constant values defined in the shader, the original form is cheaper. :(
     
    pwka, AlejMC, funkyCoty and 5 others like this.
  8. resetme

    resetme

    Joined:
    Jun 27, 2012
    Posts:
    204
    omg , all answer are so awesome, today i will do some profiler using snapdragon profiler or adreno.

    and sure i will try to use the constant value solution.

    THANKS!
     
    soramamenatan likes this.
  9. resetme

    resetme

    Joined:
    Jun 27, 2012
    Posts:
    204
    so i change my smooth step for

    saturate((circleBig * -20) + 16 * _Width)

    its looks the same, still reducing it more. actually im getting some cool effects .

    have almost the same functionality as before:
    www.franfndz.com/share/AreaDamageSimple.mp4