Search Unity

Question Sampling data from several uv coordinates in same texture

Discussion in 'Shaders' started by highFlylowFlow, Apr 29, 2021.

  1. highFlylowFlow

    highFlylowFlow

    Joined:
    Nov 27, 2019
    Posts:
    15
    Is there a pre-made function that does this, given the user-defined offsets ?
    As of now i use
    Code (CSharp):
    1. float sampledZ = tex2Dgrad(tex, (centerUV) + float2(xOffset, yOffset), float2(0, 0), float2(0, 0))).r
    Note that this is for depth texture from custom camera in orthographic mode with a frustum/near/far planes that exactly matches the size of stock unity cube, so basically no transformations are needed (i checked it and so far it works as intended).

    Since i'm taking several samples from sorrounding nearby texels, i will repeat the above operation so i'm wondering :
    - if there is a function that does this in a single shot (filling up some kind of user-defined array of floats) ?
    - if i'm going to end up using tex2Dgrad() several times per vertex rendered in frag() part of shader, what is the performace impact ?
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    The answer is yes, and no.

    Is there a function that you can pass an array of arbitrary positions or offsets and get back an array of values?
    No.​

    Is there a function that you can get back more than one value from?
    Yes! But...​

    HLSL has a function called
    Gather()
    which you can use to get back the value of 4 texels at once. But specifically it's 4 neighboring texels, the four texels used for bilinear filtering at that specific UV position. The other two important bits of information about
    Gather()
    is it only returns a
    float4
    with the red channel value of the four texels, and it only works with Direct3D 11 or better. For something like sampling a depth texture, getting only the red channel is fine, because that's the only channel with data anyway. For the Direct3D 11 minimum requirement, that depends on your use case if that's a problem or not.

    If those caveats all work for you, then you'll want your code to look a little like this:
    Code (csharp):
    1. // note, this is Texture2D and SamplerState, not a single sampler2D
    2. Texture2D _CameraDepthTexture;
    3. SamplerState sampler_CameraDepthTexture;
    4. float4 _CameraDepthTexture_TexelSize;
    5.  
    6. // in the shader function
    7. float2 uv = // sample position somewhere in the middle of 4 pixels
    8. float4 depthValues = _CameraDepthTexture.Gather(sampler_CameraDepthTexture, uv);
    9.  
    10. // direct3d gather() returns samples counter clockwise starting in the bottom left
    11. // but unity renders direct3D upside down, so the order for what's shows on screen is
    12. // | x | y |
    13. // | w | z |
    14. // also note, it only ever samples the top mip
    So lets say you want to sample 9 positions, a center texel and the 8 around it. You can do that with just four
    Gather()
    calls instead of nine
    tex2Dlod()
    calls, or two
    Gather()
    and two
    tex2Dlod()
    , though I honestly haven't seen any difference in performance between those two options.

    Also, note, if you don't use
    Gather()
    , you should probably be using
    tex2Dlod()
    instead of
    tex2Dgrad()
    .

    If you want to see an example implementation, see this shader:
    https://gist.github.com/bgolus/c3bc079a81c5b43e9830b98a0d7c32d6
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    As for the last question:
    It depends. If all the samples are very close together in the texture, like the example of a single texel and the 8 surrounding values, the difference in performance vs a single sample is certainly measurable, though the difference between sampling only 4 of the surrounding values (either on the axis or diagonally) and all 8 is almost unmeasurable on modern GPUs. Lower end or mobile GPUs it may be a more significant impact. Really the answer is ... it depends on what GPU you're using, what else is happening in the scene / shader being rendered, how hot the device is, etc. The only way to answer that question is to try it and find out yourself.
     
    MaxEden and highFlylowFlow like this.