Search Unity

Is tex2Dlod faster than tex2D?

Discussion in 'Shaders' started by tmcthee, May 20, 2020.

  1. tmcthee

    tmcthee

    Joined:
    Mar 8, 2013
    Posts:
    119
    If I read from a mipmap level rather than a "full" texture. Is it faster?

    For example, if I do

    col = tex2Dlod(_MainTex, float4(i.uv, 0, 1));

    is that faster than

    col = tex2D(_MainTex, i.uv);

    because the tex2Dlod is reading from a mipmap level 1 which is 1/4 the size of the full texture?
    Or does tex2Dlod have to read the whole texture anyway,
    or have I completely misunderstood the whole thing?
     
  2. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,551
    The size of the texture itself being read doesn't really impact performance much, it's kind of like asking is reading index 8 in a 100 length array faster than reading index 8 in a 10,000 length array, it's both just pointing to a location in memory.

    tex2Dlod can be faster because it doesn't need to compute the mip-level to sample from using implicit derivatives. Mip-maps aren't really a sampling performance measure, they are a GPU memory saving feature if only lower level mips need to be used (like with mip-streaming or when user lowers their graphics setting texture quality), and offer a way to have a texture be filtered or look slightly different at different distances.
     
    Last edited: May 20, 2020
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Like @Invertex said, using
    tex2Dlod
    instead of
    tex2D
    is theoretically slightly more efficient in terms of the GPU not having to compute the mip level. If you're doing something like a full screen, screen space texture sample (like for a post process effect) where the texture is only ever going to need a specific mip level, then yes, this could be faster.

    How much faster?

    Honestly, on most modern hardware you won't even be able to measure a difference. It'll likely be on the order of a few microseconds faster (millionths of a second), or less.


    Otherwise, if you're talking about a 3D scene with this shader being applied on a variety of surfaces at different distances and angles to the camera, then
    tex2D
    is going to be way, way faster. Measurably so on desktop & console, and very likely visible so on mobile if the texture is large enough.

    The way GPUs work when it displays a texture is it loads only small chunks (usually called a block, which is some number of contiguous pixels, like a 4x4 group) of textures at one time into small, fast memory (cache) that sits near the hardware that actually reads the textures (Texture Mapping Unit). If multiple pixels next to each other all need the same few blocks then it can reuse the blocks already in that fast cache. If a pixel needs a block that's not in the cache, then the GPU needs to fetch it from the main memory, and that takes time, which means it takes longer for the shader to get that sample, which means it takes longer.

    Lets imagine an infinite floor plane that goes into the horizon. In the foreground just below your feet the the texture might be displayed at a size that is larger than the source texture. Large portions of the screen might be reusing the same cache, so this is super fast to render, and
    tex2Dlod
    and
    tex2D
    aren't going to matter much here. Sure forcing a lower mip level might be slightly cheaper, but the cache is already getting reused a lot so it isn't a major bottleneck.

    No look into the distance. Far at the horizon every pixel is sampling a different part of the texture. With
    tex2Dlod
    you've locked it to a specific mip level and now the block needed for each pixel is different. Now suddenly the GPU is needing to fetch a new set of blocks from main GPU memory for every single pixel being rendered. With
    tex2D
    the GPU has already calculated it only needs a smaller mip level. Potentially one of the smallest mip levels, each of which is the size of or smaller than a single block of memory. Now the blocks for those mip levels all fit into cache at one time, so all of the pixels in the distance out to the horizon are able to reuse the data already in the cache.
     
    Propagant, RafaelKuhn and Invertex like this.