Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

what's the most efficient texture format to send a splat map of floats to the gpu?

Discussion in 'Shaders' started by laurentlavigne, Dec 14, 2020.

  1. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    I need to pick a few brains.
    I'm currently using a R8 to store an array of float as influence map then send it to the gpu with this
    upload_2020-12-13_20-29-47.png
    vfx graph also uses this map so instead of SetGlobal I'm using a R16_sfloat render texture (based on tests I ran, the vfx graph don't access the global texture of shaders)
    Since it's on mobile (switch) I would like to minimize bandwidth cpu->gpu an shader ops, a 2D texture map needs 2 modulos and 4 div to unpack a particle ID into uv coordinates.
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    There’s a lot of versions of “efficient”.
    Passing an R8 or A8 is certainly going to be efficient from a data minimizing stand point, as long as you don’t need any more precision that the 8 bits per value that gives you. The Blit may or may not be significant depending on how big the texture is. The switch is quite powerful for a “mobile” platform when it comes to shader performance, so converting from an index to UV shouldn’t be a significant cost.

    It’s possible just transferring the data as an R16 to begin with and using CopyTexture instead of Blit could be faster, as it avoids using the ROPs entirely.
     
  3. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    Is there a way to do all these texture operations on a job? each time tex.apply is called i see a mini spike in the cpu profiler.

    R16 like this?
    tex = new Texture2D(texWidth, texWidth, TextureFormat.R16, false);

    It does something weird to the rendertexture after copytexture.
    The one that works fine with R8, R16_sfloat and blit is this:
    upload_2020-12-15_13-45-2.png
    the one with R16 R16_sfloat and copytexture is that
    upload_2020-12-15_13-45-17.png

    I think I'm doing it wrong so here is the script
    Code (CSharp):
    1.  
    2.     public void Awake()
    3.     {
    4.         _instance = this;
    5.         int texWidth = HALF_GRID_SIZE * 2;
    6.         newMap = new Color[texWidth * texWidth * 4];
    7.         lastMap = new float[texWidth * texWidth * 4];
    8.         for(int i = 0; i < lastMap.Length; i++)
    9.         {
    10.             lastMap[i] = 0.5f;
    11.         }
    12.         tex = new Texture2D(texWidth, texWidth, TextureFormat.R16, false);
    13.         //set the bliss map to 0.5 which is neutral
    14.         Color c = new Color(0.5f, 0.5f, 0.5f, 0.5f);
    15.         for(int i = 0; i < newMap.Length; i++)
    16.         {
    17.             newMap[i] = c;
    18.         }
    19.         tex.SetPixels(newMap);
    20.         tex.Apply(false);
    21.         Graphics.CopyTexture(tex, blissRT);// Blit(tex, blissRT);
    22.         //Shader.SetGlobalTexture("_BlissMap", tex);
    23.     }
    24.  
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    CopyTexture requires both the source and destination to be the same format. R16 and R16_sfloat are different formats, so the copy is failing (or you're going to get junk data).
     
  5. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    that must be the latter, same base format though.
    what are single channel float formats for texture and rendertexture that are == ?
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Same number of bits, totally different format.

    More specifically, when you say “R16_sfloat” that means you’re defining the render texture format using the GraphicsFormat enum. This is a replacement for both the old TextureFormat and RenderTextureFormat used in the past. TextureFormat.R16 actually matches GraphicsFormat.R16_SInt format. You probably want TextureFormat.RHalf, which matches GraphicsFormat.R16_SFloat.
     
    MisfitXXX likes this.
  7. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    I see, thanks for explaining all that low level stuff. One doesn't bother looking into it until building on hardware that needs more careful work. I hope that new GraphicsFormat will leave experimental soon, unified enum is nice.

    Something you said about blitting using ROP got me thinking that I'm putting tons of emphasis on moving stuff to the gpu but the cpu is what's weak on the switch, is there a more performant way to build a texture on the cpu? I tried using GetRawTextureData the other day and couldn't make it work, not even sure it's better.
     
  8. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,539
    GetRawTextureData is better, since you avoid having to do the memory copy of SetPixels().

    var nativeTex = tex.GetRawTextureData<Color>();
    would get you a direct view into the CPU side texture data. Assuming using a 32bpc RGBA. Or if 8bpc you'd just
    <Color32>
    so your struct alignment matches. So for R16_SFloat it should in theory be
    GetRawTextureData<half>()
    , though there isn't a native half float format in C# (until most recent .NET 5).... Would have to use a custom struct with the right alignment.

    After you are done modifying the values in the texture, you can then call
    tex.Apply()
    to upload to GPU when ready.

    edit: Maybe Unity.Mathematics' half format will work... not sure if it works outside burst.
     
    Last edited: Dec 16, 2020
    bgolus, adamgolden and Neto_Kokku like this.
  9. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Secoding @Invertex suggestion. I had good results with GetRawTextureData<>() and a struct that aligned to the texture pixel format on Switch (two bytes, one short, and a Color32 mapped into an R32_SFloat) and could manipulate it using jobs and burst.

    I actually kept the NativeArray around: I would treat it as normal CPU-sided array and it's data was even serialized into the save files. Whenever the CPU changed something and it was time to update the GPU, I called Apply and called GetRawTextureData() again to grab the new pointer.

    This kept the memory usage at the minimum possible, which is two times the texture size (due to read-write being enabled). Having a persistent array would add a 3rd copy of the data and require extra copying to update the texture.

    The texture was 250x260, BTW, and updating it barely registered in the profiler.
     
    Invertex likes this.
  10. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    I use that, it works
    confirmed
    left is GetRawTextureData, the cliff is when i switch to SetPixels.
    upload_2020-12-15_17-50-57.png
    that's native array set kicking in (left is setpixels)
    upload_2020-12-15_18-4-29.png
    Blit vs CopyTexture doesn't show up on the cpu profiler and the gpu profiler is dead on the NX addon so I'll trust bgolus and leave it copytexture

    here is my test, feel free to sanity check it
    Code (CSharp):
    1.         if (TEST_GetRawTextureData)
    2.             newTextureMapHalf = tex.GetPixelData<half>(0);
    3.         for (int u = changeUMin; u < changeUMax; u++)
    4.         {
    5.             for (int v = changeVMin; v < changeVMax; v++)
    6.             {
    7.                 int coord = u + v * texWidth;
    8.                 int bliss = GetBoost(coord);
    9.                 half newf = bliss == -1 ? BAD : bliss == 0 ? NEUTRAL : GOOD;
    10.                 if (TEST_GetRawTextureData)
    11.                     newTextureMapHalf[coord] = newf;
    12.                 else
    13.                     newTextureMapColor[coord].r = newf;
    14.                 lastMap[coord] = newf;
    15.             }
    16.         }
    17.         if (TEST_GetRawTextureData == false)
    18.             tex.SetPixels(newTextureMapColor);
    19.         tex.Apply(false);
    20.         if (TEST_BLIT)
    21.             Graphics.Blit(tex, blissRT);
    22.         else
    23.             Graphics.CopyTexture(tex, blissRT);
    24.         //Shader.SetGlobalTexture("_BlissMap", tex);
    btw: i fixed 4 bugs in the time i started this post, having to publicize code somehow makes me more vigilant, neat trick.

    btw2: i think GetRawData and using native arrays allows to make the changes in a job and push the new texture in that job too, can anyone comment on that?
    cool, care to share some code sample on how you do that?

    100x100 and it definitely registers in deep profiler, maybe due to half2float conversion
    upload_2020-12-15_18-2-58.png
     
    Last edited: Dec 16, 2020
  11. adamgolden

    adamgolden

    Joined:
    Jun 17, 2019
    Posts:
    1,549
    Is there any overhead to GetRawTextureData? Should we assign it to like
    private NativeArray<Color32> rawTextureData;
    and use that, or is it just as performant calling .GetRawTextureData again whenever?
     
  12. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    To be safe I call GetRawTextureData() after calling Apply(), but you can keep it around between updates. In our game updates don't happen every frame and we hold onto the array for a very long time (because some game logic actually needs to read from it) until we need to call Apply() again. I was in a hurry and didn't confirm if we could do with never calling GetRawTextureData again.

    Here's some pseudo code:

    Code (CSharp):
    1. // The struct
    2. private struct DataPacket
    3. {
    4.     public byte x;
    5.     public byte y;
    6.     public short v;
    7.     public Color32 c;
    8. }
    9.  
    10. // The array field
    11. private NativeArray<DataPacket> dataArray;
    12.  
    13. // The texture
    14. private Texture2D dataTexture;
    15.  
    16. // Initialization
    17. dataArray = dataTexture.GetRawTextureData<DataPacket>();
    18. // Update stuff in data array
    19. dataArray[0] = new DataPacket {
    20.     x = 1, y = 2, v = 1000, c = default
    21. };
    22. // After everything is done, send to GPU and grav the texture data again
    23. dataTexture.Apply();
    24. dataArray = dataTexture.GetRawTextureData<DataPacket>();
    25.  
    The shader receives a Texture2D<float2> and uses asuint() and bitwise operations to unpack the data.

    The biggest advantage of GetRawTextureData is that there is no data conversion at all when sending the data to the GPU: the memory content of the NativeArray is sent verbatim to the GPU. This can be problematic with F16 textures, since native CPU-side half support is spotty. I think this is why you're having a hard time, @laurentlavigne

    In my experience the Unity.Mathematics types are quite slow when used outside of burst'ed code. If you're manipulating a large amount of indices per frame, you definitely should use bursted jobs.

    Also, is there any reason you are copying from tex to blissRT, instead of using tex directly?
     
  13. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    I keep that nativearray around so maybe I don't need to GetRawTextureData, it's a pointer isn't it? It should stay around as I did a persistent allocation.

    Interesting point regarding half support being spotty, all tests I've ran show mathematics is very slow indeed. I could run a test with F32, it's just double the bandwidth which on switch is shared memory anyway so don't pay the price of pcie transfer, it's just a pointer if memory serves.

    Bursted jobs is tomorrow, maybe. I still think I'm not doing it right and it's only 76x76 update at 10Hz, maybe I could just do the good old way of a coroutine splice. All that for 1ms, I'll be happy when I return to PC target next year.

    and send it as Shader.SetGlobal?
    That's because i'm trying to use vfx graph which doesn't work with SetGlobal so I use the RT for both material shader and vfx graph. Is that a slow path? Alternative exist? Oh maybe vfx.SetTexture... but perhaps this does 2 transfers to the gpu instead of sharing pointer to the texture in gpu mem, not sure.

    PS: I think part of the overhead is set nativearray in each loop iteration. I think poking at the nativearray has a cost, maybe because those live in c++ land so each time there is a conversion from managed to native that must be done. I wonder if it's faster to use a [] and then do a bulk conversion with ToNative(), last year this used to be a copy mem which is super fast.
     

    Attached Files:

    Last edited: Dec 16, 2020
  14. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,225
    partial result: switching from half to float results in:
    increase in bandwidth usage from 2.5 GB/s to 3GB/s and gpu utilization increase by 5 % points
    decrease in IL2CPP usage from 0.65ms to 0.5ms
    So I will stick with half and move that to burst job.

    more testing: switching from rendertexture copytexture to setglobaltexture has zero impact.
     
    Last edited: Dec 16, 2020
    IgorAherneBusiness and MisfitXXX like this.
  15. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Sounds like burst is the way to go then, the GPU bandwidth saving is worth it.