Search Unity

Question Texture sampling performance in shaders (parallel sampling?)

Discussion in 'General Graphics' started by Azeew, Jun 11, 2022.

  1. Azeew

    Azeew

    Joined:
    Jul 11, 2021
    Posts:
    49
    I've been reading on the performance implications of texture sampling, but I'm still really confused.

    First of all, the most basic of questions: how much does it really matter nowadays?
    A simple terrain system with triplanar mapping would have 4 "materials" being blended, with 3 necessary samples each for the triplanar effect, times the number of textures per material, let's say 3 (albedo, normals, orm). That's 36 total texture fetches. That sounds like a lot. Is that a lot? I mean, it's definitely a lot, but what can you even do about this? Would this be unreasonable for any hardware that isn't super high end?

    But then I read multiple forum threads where bgolus mentions that sampling can be done in parallel, implying a bit that sampling 1 time or 16 times will have around the same performance, as long as they're done in different samplers and it doesn't overload the memory bandwidth. I trust bgolus' posts more than I trust the bible, so there's definitely something here, but I can't find any more information about this online.

    Please enlighten me. I'm working on a custom terrain shader for generic meshes that would require a very high amount of texture fetching. And I really don't get what could be the performance implications of it in the long run. I'm really stressing over this trying to optimize everything, but I'm not even sure if this will matter and I might be wasting my time.

    I'd highly appreciate any help. Thanks a lot in advance!
     
  2. Azeew

    Azeew

    Joined:
    Jul 11, 2021
    Posts:
    49
    I'm trying to work around this, but it feels like a silly hack that shouldn't work. I'm branching the code to select the appropriate texture to sample. So a single sample at the end, with math before to select the relevant texture, instead of sampling them all inside the branch. Does this actually work, though?

    For the example of a splatmap + triplanar terrain shader, I'm selecting the two textures with the highest splat value (in my case, it's vertex color), and only sampling those two. This makes sense to me, but I have no idea what the performance impact of so much branching could be. And it leads to code that just sounds pretty dumb to me:

    Code (CSharp):
    1. //R
    2. if(vertexColor.r > valueA)
    3. {
    4.         albedo_layerA = albedo_r;
    5.         normal_layerA = normal_r;
    6.         extra_layerA = extra_r;
    7.        
    8.         valueA = vertexColor.r;
    9. }
    10. else if (vertexColor.r > valueB)
    11. {
    12.         albedo_layerB = albedo_r;
    13.         normal_layerB = normal_r;
    14.         extra_layerB = extra_r;
    15.  
    16.         valueB = vertexColor.r;
    17. }
    18.  
    19. //G
    20. ///...
    upload_2022-6-12_17-9-55.png

    This feels extra awkward cause I can't just iterate over an array, or pack properties into a struct, etc. I'm doing the same to convert triplanar into biplanar mapping, by the way.

    Does anyone know if there's a better way to handle these situations? Thanks!