Search Unity

Question Dithering Algorithms Lag on Metal; Not Others

Discussion in 'Shaders' started by lukamoon, Sep 6, 2021.

  1. lukamoon

    lukamoon

    Joined:
    Nov 15, 2018
    Posts:
    15
    Hi all! A bit of a strange issue. I have been working on a Unity project in both a Windows environment and an OSX environment, and both are pretty capable machines. Obviously the Windows has the better GPU but the OSX puts up a good fight and can, for example, handle pretty expensive blurs. However I run into the weirdest example - dithering blurs to save on GPU cost actually seems to 100% my OSX (and ONLY on OSX) GPU for seemingly no reason.

    I've tried two algorithms:
    - return frac(dot(float3(Pos.xy, FrameIndexMod4), uint3(2, 7, 23) / 17.0f)); (https://developer.oculus.com/blog/tech-note-shader-snippets-for-efficient-2d-dithering/)
    - https://docs.unity3d.com/Packages/com.unity.shadergraph@6.9/manual/Dither-Node.html (4x4 and 8x8)

    I genuinely have no clue why dither of all things would perform so poorly on Metal, but I really would like to find a fix for it! I have narrowed it down to these pieces of code so I am 100% sure there is something with them.

    Thanks all :)
     
  2. BattleAngelAlita

    BattleAngelAlita

    Joined:
    Nov 20, 2016
    Posts:
    400
    Maybe mac GPU bad at uint's? Did you try to change uint to float?
     
  3. lukamoon

    lukamoon

    Joined:
    Nov 15, 2018
    Posts:
    15
    Yeah, I thought that might be the case - I tried replacing all uint's with both normal integers and floats, and even tried to pre-calculate some of the math (although less accurate) to see if that was causing the issue. No luck on any of it.
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Have you tried using a dither texture to see if that's any better? My theory is the noise function is a red herring itself, and the cost is coming from the change in memory access patterns from using a noise presumably to jitter the sample pattern that's the performance issue. Mobile GPUs are generally very bandwidth limited, and Apple's GPUs are no different. There might be some fancy prediction Apple's GPUs are doing that a noisy sample pattern is tripping up.
     
    lukamoon likes this.
  5. lukamoon

    lukamoon

    Joined:
    Nov 15, 2018
    Posts:
    15
    It seems like using a dithering map works! I tried both a 16x16 and 32x32 map, and didn't run into the same issue. Nice catch, and thanks a lot for the suggestion. Just for my own knowledge and future reference, is there any way to "bypass" this prediction model on Apple's GPUs? I'm not too concerned about mobile, but I feel like modern generation Apple Laptops/Desktops are more than capable of handling psuedo-random dithering. For reference, I primarily want to get this working (from Oculus's dev blog on optimized dithering in VR):
    Code (CSharp):
    1. frac(dot(seed, uint2(2, 7) / 17.0));
    Much thanks either way! :)
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Well, if the noise texture worked, then my prediction for the cause was wrong ... and I have no idea what the issue actually is.;)
     
    lukamoon likes this.
  7. lukamoon

    lukamoon

    Joined:
    Nov 15, 2018
    Posts:
    15
    My bad, let me rephrase: I used a dithering texture, not a noise. Like an obra-dithering or similar, it was patterned to look like a dither from the start. I didn't use a psuedo-random texture. My bad for the confusion :oops:
     
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Dither or noise, it doesn't really matter. The fact using a texture removed the perf problem vs using a very simple bit of math is very confusing.