Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Operations on UVs in fragment

Discussion in 'Shaders' started by opel_cobalt, May 18, 2021.

  1. opel_cobalt

    opel_cobalt

    Joined:
    May 18, 2021
    Posts:
    26
    I'm aware that texture sampling with raw UVs from vertex output:
    half4 frag (vOutput i) : SV_Target { return tex2D(_Texture, i.uv); }

    runs much faster than sampling with UVs that were modified in fragment program:
    half4 frag (vOutput i) : SV_Target { return tex2D(_Texture, i.uv * 2); }

    but I'm not quite sure why, would be nice to see a detailed explanation of this phenomenon

    Also I'd like to know if
    half4 frag (vOutput i) : SV_Target { return tex2D(_Texture, i.uv.zw); }
    and
    half4 frag (vOutput i) : SV_Target { return tex2D(_Texture, float2(i.uv1.x, i.uv2. y)); }
    get optimized in the same way
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Except it's not.*

    * On modern GPUs in the general case.

    Try it, there'll be effectively no difference in performance between those two. A single additional multiply in the fragment shader won't even be something you'll be able to benchmark a difference in unless you look at microsecond (1/1,000,000th of a second) timings, and even then it'll be tiny single digit differences. This ceased to be an real thing over a decade ago. Only really still true on the small handful of low end OpenGLES 2.0 Android devices still being made.

    The reason why it was important is because of texture caching. When the fragment shader needed to sample a texture, the GPU needs to copy that data from the GPU's slow memory to a fast cache used by the texture sampling hardware. On old GPUs this was really, really slow. In some cases explicit "GPU memory" may not even exist and was directly reading from CPU memory which was even slower (still true for mobile and some integrated desktop GPUs). The GPU could hide that by pre-caching the data before the fragment shader even ran. But it could only do this if the UVs weren't modified in the fragment shader.

    Today GPU memory is much faster, and they (generally) don't pre-cache before the fragment shader anymore. They still use a texture cache, mind you, but not based on UVs output by the vertex shader. Rather they cache based on screen space. GPUs work on multiple pixels at once, so if a triangle is being rendered, and one pixel needs a texture, it's likely the rest of the triangle can reuse the same cached section of that texture. I'm simplifying things greatly here, but that's close enough to how it works on modern GPUs.

    However it is still relatively slow to sample a texture today. What happens now is shader compilers try to reorganize the code to sample textures as early as possible, then do the work that doesn't require the texture values, letting it "hide" the time it takes to sample the texture. And cache reuse further helps head the cost. If you have UVs that are based on the output of another texture, this can be slow because it has to happen later in the shader, but it still might not be an issue if there's other code for the shader to still be doing to hide it.

    For modern GPUs, there is a minor difference here, but about the same as between multiplying the UVs by 2 or not. In the second example you're getting the x and y values from different input float# vectors, so there'll be an extra instruction to move them into one float2. But it's basically meaningless in terms of a performance difference.

    On old GPUs where it mattered that you didn't modify the UVs in the fragment shader, yes, this would be much different. The second case would be the slow case. Depending on the GPU the
    i.uv.zw
    might be treated as the slow case too! Some GPUs only supported textures using the
    .xy
    . Couldn't be
    .zw
    on some GPUs, could on others. Definitely couldn't be
    .yx
    or any kind of swizzle on any, and certainly couldn't be from more than one texcoord.
     
  3. opel_cobalt

    opel_cobalt

    Joined:
    May 18, 2021
    Posts:
    26
    Thanks for the detailed reply, @bgolus, that's the exact kind of info I was hoping to get! Glad to hear that sampling isn't so restrictive nowadays