Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Voting for the Unity Awards are OPEN! We’re looking to celebrate creators across games, industry, film, and many more categories. Cast your vote now for all categories
    Dismiss Notice
  3. Dismiss Notice

correct lerp input parameters

Discussion in 'Shaders' started by Johannski, Apr 30, 2018.

  1. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    816
    I just noticed that lerp expects the same input dimensions for a,b, and s (https://msdn.microsoft.com/en-us/library/windows/desktop/bb509618(v=vs.85).aspx).

    I'm wondering now, what is more performant / does it make a difference:
    1.
    Code (CSharp):
    1. half3 a = tex2D(_MainTex, i.uv);
    2. half3 b = tex2D(_OtherTex, i.uv);
    3. half s = 0.5;
    4. result = lerp(a, b, s.xxx);
    2.
    Code (CSharp):
    1. half3 a = tex2D(_MainTex, i.uv);
    2. half3 b = tex2D(_OtherTex, i.uv);
    3. half s = 0.5;
    4. result = lerp(a, b, s);
    3.
    Code (CSharp):
    1. half3 a = tex2D(_MainTex, i.uv);
    2. half3 b = tex2D(_OtherTex, i.uv);
    3. fixed s = 0.5;
    4. result = lerp(a, b, s);
    Target platform would be iOS, but the question interests me in general.
     
  2. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    There are two steps in getting to know "which one is faster".
    First - check the generated GLSL code for iOS / Android. There's actually no "lerp" in GLSL, there's only "mix" and it has quite a lot of overloads - https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/mix.xhtml
    And as you can see, it really supports mixing vectors with a single float value. But Unity sometimes generates code that creates a vector for one float to pass it as a third "mixing" parameter there, and it depends on a target platform and (I believe) Unity version.
    Second - take that GLSL code and put it into PVR Shader Editor tool that allows compiling it to actual assembly for different GPU architectures. It's a VERY powerful tool that allows almost instant iteration times as it displays total number of clocks-per-shader and clocks-per-line, so you can tweak GLSL code right in that editor and then "backport" it back to Unity CG.
    One thing to note though is Unity doesn't allow full control over generated GLSL, it will still generate some things based on its preferences. But Unity allows writing GLSL directly, which is great if you can sacrifice some of the crossplatform-friendliness of CG.

    Back to your case - the first option will be the slowest in most cases. I remember profiling something like that in a Shader Editor and on an actual device with actual hardware compiler (it can spit out the assembly if you ask it to) and common mobile GPUs seem to dislike swizzles like ".xxx", ".xyx" etc. (not "xyz" over a four-component vector) because it usually has to spend a few clocks on a intermediate vector copy.
    So I would go with a second case as it lies down to actual GLSL "mix" function with a scalar "mixer" value (just double check that using generated GLSL)

    And I would also suggest avoiding "fixed" in shaders on mobile, especially when returning a color from a fragment shader. Mobile GPUs don't usually support "lowp float", and it will require some intermediate "casting" from "fixed" to "half" (mediump float) when returning from the function. It is currently the case for mid-end iPhones (their fragment shaders expect mediump float), and these things are hard to profile because they don't usually register anywhere in the profiling pipeline. It also seems like a common case in Unity examples to use "fixed" values for everything related to color. Just use "half" for everything instead - position / normals / tangents / uvs also don't really suffer from decreased depth in most cases.
     
    gaolei_nls likes this.
  3. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    816
    Hey @Kumo-Kairo,
    Thanks a lot for the detailed answer! Really interesting input. I will definitelly take a look at PVR Shader Editor. I'm using Intel GPA and XCode right now for profiling, but the iteration and evaluation times are really long that way. Do you have experience if the conclusions won with PVR Shader Editor are applicable for metal shaders?

    Hm, very interesting! Thanks a lot for pointing out that .xxx might be less performant. I saw this technique in Unity's post processing stack (and I think in Keijiros projects as well), so I thought it might even give a performance boost.

    Also fixed variables are very common in the builtin shaders, good to know that half is always a better option for metal shaders :)
     
  4. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    Unfortunately no, I haven't focused on Metal specifically, it's usually OpenGL ES 2.0 in my case

    Yes, this is what bothers me. It's a common scenario to have quite a lot of copy-paste from standard built-in shaders so the precision of the variables is right ("float"s and "fixed"s all over the place) and this alone won us a few milliseconds on render on gles2/gles3

    Again, I can only say for OpenGL / GLSL, there's a possibility that Metal is different