Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Takes 5 times longer to sample large rendertexture vs sampling large texture3d in Compute Shader?

Discussion in 'Shaders' started by Pjbomb2, Jan 18, 2023.

  1. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    Firstly I am in Unity 2021.3.11f1 using DX12
    So this is driving me nuts, and at this point, im just looking for an explanation.
    So the situation is this, I have a large 3d rendertexture, and I have a large 3d texture3d with the same data, same format, same dimensions, same file size as the rendertexture
    Now I have a compute shader that samples from these many times(volume rendering)
    If I use the texture3D, the compute shader takes 50ms to run
    if I use the rendertexture, the compute shader takes 170ms to run
    why does this happen? It also happens with large 2D textures as well
    I would just say whatever and use the texture2D/3D, but the problem with that is that it creates large lag spikes when I need to copy from the rendertexture to the texture2D(I use another compute shader to write the data to the texture), which is a big problem because in the case of the 2D texture, that gets recreated frequently(dynamic atlas)
    What can I do and why is this happening in the first place?
    Thanks!
    Heres how I am accessing the texture for a 2D case:
    float2 Uv = fmod(BaseUv + 100.0f, float2(1.0f, 1.0f)) * (_Materials[MaterialIndex].AlbedoTex.xy - _Materials[MaterialIndex].AlbedoTex.zw) + _Materials[MaterialIndex].AlbedoTex.zw;
    float4 BaseColor = _TextureAtlas.SampleLevel(my_point_clamp_sampler, Uv, 0)
    Where the materials store the UV offsets that offset a UV into an atlas
    I do this several times for several atlas's in one compute shader(one for metallic, one for roughness, one for emission, one for alpha)

    Code: https://paste.myst.rs/j7du7mqf
     

    Attached Files:

    Last edited: Jan 21, 2023
  2. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Are the textures the same format? For example, Unity by default compresses textures on disk with alpha channels to BC3 which is 4 times smaller than RGBA32.
     
  3. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    Yes, both are the same format, and the same file size
     
  4. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    501
  5. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
  6. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    What is the format?

    Render Textures don't have a file size; they exist only only in memory.
     
  7. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    RFloat for both, and taking up 0.92 gigs of memory
     
  8. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    501
    3d textures use more pixels for interpolation and they probably put more data in the local cache.

    https://www.sciencedirect.com/science/article/pii/S2468502X1730027X

    for example, performance difference x4 times if you use more distance between pixels -> more cache misses.

    upload_2023-1-19_12-7-0.png
     
  9. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    The GPU could be storing the 3D texture pixels in memory using a swizzled layout that packs neighboring pixels in all three axis together, improving cache hits across depth.
     
  10. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    heck ok
    So is there any way for me to make the rendertexture more performant, or am I out of luck here?
     
  11. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
  12. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    So then why doesnt it do that for rendertextures as well? and why would it produce the same performance drop between large 2d rendertextures and texture2d's?
     
  13. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    501
    no one knows how the cache works, except nvidia/amd developers
     
  14. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    darn ok...
    thanks
     
  15. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    EDIT: Are you saying that a 2D texture loaded from disk will be as fast as a 3D texture loaded from disk, and the only slow case is the render texture?

    If so, you should probably make a new topic and be clear since most of the responses here have been about 2D vs 3D.
     
    Last edited: Jan 19, 2023
  16. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    no sorry, I am saying that a 3d rendertexture is 5 times slower to sample from in a compute shader than a texture3d of the same size
    and that this behavior also happens for 2d rendertextures and texture2ds that are large
     
  17. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,539
    If you're talking about RWTexture#D vs Texture#D I would imagine the compiler can't make as many performance optimizations due to the writeable nature of the RWTexture.
     
  18. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    My gut is telling me it might be mipmaps. If you're using
    Sample
    on them instead of
    Load
    then it will be loading 4x as much data, and if one texture has mips but the other doesn't, then it needs to fetch every sample from main memory, which could explain a 4x difference very easily.

    Try using
    LOAD_TEXTURE2D
    instead of
    SAMPLE_TEXTURE2D
    and see if there's still any difference.

    However, I would make a separate topic, and also post your actual code (or at least a snippet). Most of the answers here are about 2D vs 3D, but the difference is render texture vs non render texture.
     
  19. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    ok I will try! in the compute shader I do use SampleLevel...
     
  20. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    unfortunately it doesnt really seem to help at all to directly sample the rendertexture vs using samplelevel
    Neither have mipmaps btw
     
  21. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    SampleLevel is not the same as Load.

    You should post your code (at least a snippet) so we can see how you're actually accessing the texture.
     
  22. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    Added a small bit of how I access it for the 2D case
     
  23. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    I'd still try using Load instead of sample. I don't think it will make a difference, but it's worth a shot. Eg replace...

    Code (CSharp):
    1.  
    2. float4 BaseColor = _TextureAtlas.SampleLevel(my_point_clamp_sampler, Uv, 0)
    With...

    Code (CSharp):
    1. uint3 coords = uint3((uint2) (saturate(Uv) * _TextureAtlas_TexelSize.zw), 0);
    2. float4 BaseColor = _TextureAtlas.Load(coords);
    Load does not use the sampler at all; it directly loads the data given a pixel location: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-to-load

    If that doesn't help, make a separate topic about the RT vs non-RT cases. As you can see, most of the responses here have been about 3D vs 2D, which is not the problem you're facing.
     
  24. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    yeah ill create a new topic
    However, .Load doesnt work for textures in compute shaders
     
  25. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    What should I put in the new topic to make it more clear my question?
     
  26. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Huh? Load works just fine in compute shaders. It's the recommended way to access textures everywhere expect pixel shaders unless you need filtering.
     
  27. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    upload_2023-1-21_10-44-38.png
    and since I cant dissable d3d11... it refuses to compile the d3d12 variant
    upload_2023-1-21_10-45-6.png
     

    Attached Files:

  28. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39

    and since I cant dissable d3d11... it refuses to compile the d3d12 variant
     
  29. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Just say "render texture 5x slower to access than normal texture"
     
  30. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
  31. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    Sorry, I do declare it as that
    its Texture2D<float4>
     
  32. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Go ahead and post the full shader. .Load member function absolutely works; I have done it many times. For example (copy-pasted from the project I'm working on):

    Code (CSharp):
    1.     Texture2D<float> _sourceDepth;
    2.     float2 sampleMinMaxDepth(uint2 pixel)
    3.     {
    4.         float s = getDepthToDownsample(LOAD_TEXTURE2D(_sourceDepth, pixel));
    5.         return float2(s, s);
    6.     }
    7.  
    So something else must be wrong.
     
  33. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    tried doing what you did
    https://paste.myst.rs/j7du7mqf upload_2023-1-21_11-43-34.png
     
  34. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    In BIRP,

    Code (CSharp):
    1. #include "UnityCG.cginc"
    In HDRP/URP/SRP:

    Code (CSharp):
    1. #include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
     
  35. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    By the way, if it is a float, as you said above, you should be using
    Texture2D<float>
    and not
    Texture2D<float4>
    .

    But seriously you gotta post all your code if you're having a problem since it's really hard to help without full information.
     
  36. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    501
    No, it works.
    Also, in compute shaders you can use
    this
    Code (CSharp):
    1. float4 res = MyTexture[id.xy]; //xy range [0-textureSize]
    instead of
    Code (CSharp):
    1. float4 res = MyTexture.Load(uint3(id.xy, 0));
    it's the same thing
     
  37. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    dang that doesnt help either unfortunately with performance
     
  38. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    I think ill start putting stuff on the new thread since that should be more focused on the problem and I kinda polluted the post with unclear references between texture2d and 3d, but I dont know what code I should post, as the samplings pretty straight forward, and I dont see how the creation of the texture would help much(and that they can be up to a couple hundred lines)
     
    Last edited: Jan 21, 2023
  39. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    501
    I don't think the problem is really in the cache. There are probably some additional calculations that you missed.
     
  40. Pjbomb2

    Pjbomb2

    Joined:
    Jan 29, 2021
    Posts:
    39
    but, I change nothing on the compute shader side, only the texture I feed into it, texture2d or rendertexture