Search Unity

Feedback Texture Array Index, each requiring a sample

Discussion in 'Shader Graph' started by Horus_Sungod42, May 27, 2019.

  1. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Hello,

    After talking with an industry veteran, I've taken a look at array textures, as a means to manage large amounts of data.

    I imagine that one of the main advantages of arrays is the ability to static batch more aggressively, since you rely on fewer materials in a scene.

    However, I was told that another advantage is to reduce the number of texture samplings in the material, for optimization and to bypass the 16 texture limit of shaders.

    However, here it seems accessing a different index (layer) of an array "consumes" one sample. Meaning that you'd run out of samples (and have the performance overhead of samples) in the same way as if you were using ordinary textures.


    Is this accurate? Can several indexes be accessed using a single sample? Is this a potential feature of the graph?

    Thank you.

    beep.jpg
     
  2. RoughSpaghetti3211

    RoughSpaghetti3211

    Joined:
    Aug 11, 2015
    Posts:
    1,709
    This doesn’t seem right, I’ve used over 50 texture in a texture array in Amplify. Are you using LWRP? My last project was still the old Renderer so maybe something changed in SRP
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Samplers. Each sampler unit is a physical part of the GPU that reads and filters texture data. Each shader invocation (the shader running on each pixel) gets access to 16 of them. You can use a single sampler as many times as you want, but if you use all 16, you can sample 16 different textures (or the same texture in multiple places) for roughly the same cost as a single sample as they run in parallel, all happening at the same time. Using one sampler to sample 16 different textures would take roughly 16 times longer as it’s performed in serial, one after another. On old GPUs, a sampler unit and a texture were bound together before the shader started to run, so you couldn’t change what sampler was used for what texture in the shader, but you can today on all but old GLES 2.0 devices. In the past the way to work around this was using texture atlases, as you could sample multiple locations on the atlas using only the single sampler unit. Texture arrays are similar conceptually in that they are a single “texture asset” bound to a single sampler unit, but layered so you don’t have to worry about border padding, mips, and wrapping like you do with atlases.

    Each Sample Texture node with out a sampler state input should be using the sampler state defined by the texture asset. Each sampler state is linked to a physical sampler unit, so you can only have 16 sampler states, but essentially unlimited textures. Unless there’s a bug in Shader Graph, you should be able to put 100 sample texture nodes all pointing at the same texture array asset with out running into sampler limits. If you do, you could create a sampler state node and pipe that into the sampler state input on all of those nodes to work around it, but it should be reported as a bug.

    Now understand, sampling a texture array multiple times with a single sampler has exactly the same “cost” as sampling any other texture with a single sampler. It’s not free to sample a texture array multiple times, each index sampled from is another serial texture read. Like mentioned above, it may actually be faster to have multiple textures, or more specifically sampler units, that you sample from than a single atlas and reused sampler state.

    So, for a single object, texture arrays are not inherently more efficient than multiple individual textures, it just allowed for an easy way to access a large number of textures with out having to deal with atlases, or per shader texture asset limits (which is a separate limit that depends on the API and hardware, somewhere between something like 16 and 1024). And it allows you to batch multiple meshes together by storing the index in the mesh data, similar to how with an atlas you would map a mesh’s UVs to different parts of the atlas texture.


    Also of note, you’re using a version of Shader Graph that has the bug where the default inputs for a node still shows when you have another node piped into that input. That is an old version of Shader Graph.
     
    Dawdlebird and foxnne like this.
  4. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,364
    I thought so too but the numbers don't lie. Nintendo Switch. unlit master node. 1 plane full screen. 3 textures samples added together > 1 texture sample added 3 times (65% vs 56% GPU usage). Why?
    Oh look at me: I just switched off aniso level and gained 10%, switched quality to disabling aniso and bam another 5%
    dat Tegra :D
     
    Last edited: Dec 21, 2020
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    I did say "roughly" the same cost. On desktop you'll probably have a hard time measuring a difference. On mobile GPUs you absolutely will, because of memory bandwidth. Even with Nvidia's Tegra line which is much more desktop-like than most mobile GPUs, it is still significantly more bandwidth limited. Not to mention mobile GPUs often also skimp on the number of physical sampler units. As best I can tell the Tegra X1 in the Switch has 32 samplers total ... for the whole GPU, of which only 16 are available at a time, and shared between groups of vertex and fragment shader invocations.
     
  6. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,364
    Yep, I had forgotten that Aniso was expensive.
    Do you think that a cpu sending a shader an array instead of a texture for a splat map could be faster because it doesn't take up one of those precious samplers?
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Maybe, but probably not. Memory bandwidth is still being consumed, and the "samples" when accessing an array will be far less coherent unless you make sure they're in morton z order. Also shaders & texture samplers can be really good at hiding latency from delayed memory access where directly accessing an array ... won't be.
     
  8. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,364
    ok thanks Ben.
     
  9. brad_unity639

    brad_unity639

    Joined:
    Nov 23, 2020
    Posts:
    5
    Thanks for the detailed answer bgolus. It's almost as if this kind of detailed information should be in some sort of documentation, somewhere, if only unity had some .... :p
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Nothing here is Unity specific. It depends on the hardware (and sometimes API) being used. Unity is mostly just providing generic shader code to the Graphics API that conform to whichever API you've asked it to target. The information about OpenGL vs Direct3D sampler count is in the documentation for each of those APIs, and depends on which version you're targeting, or even which GPU you're running on (in which case there's often documentation for those specific GPUs, or in the case of OpenGL several sites that try to aggregate a ton of information about different GPUs, or at least what GPUs report to the system that their spec is (note: sometimes that information is a lie).

    Other things like morton z order or the fact texture sample latency can be hidden in most shaders isn't information most people need to know, and again isn't anything Unity specific, but depends on the hardware. It's entirely possible those two comments aren't true on some GPUs.