Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

About fixed precision in shaders and performance on mobile.

Discussion in 'Shaders' started by RockSPb, Mar 14, 2020.

  1. RockSPb

    RockSPb

    Joined:
    Feb 6, 2015
    Posts:
    112
    Hello! I have a few questions about developing shaders for mobile GPUs (adreno and mali).

    Firstly, the effect of vertex buffer size on performance.

    For example, I have several sets of colors.
    I can record each color directly into float3 (and fill in all the UV channels by float4 numbers), or pack each of RGB into one float and save 2/3 of the size at the cost of additional math in the vertex shader for unpacking.
    What will be better?

    The next question is about fixed precision computing.
    Is there any benefit from using fixed precision interpolators (COLOR instead of TEXCOORD) on modern mobile GPUs?
    Is using fixed precision for textures and half for vectors and normals still actual?
    Even more, is there still a fixed precision computing (VS and PS) on mobile GPUs?
    Or now a fixed numbers converted to a half by design?
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    *shrug* You’d have to test, and it’ll probably depend on the GPU you’re using. Vertex buffer size used to have a significant impact on perf, but it doesn’t seem to be a big deal anymore, plus Unity already seems to align the per vertex data in an optimal way, which can often be a bigger factor than the overall size. Modern GPUs, mobile or otherwise, I would expect it to be faster though.

    Very, very few GPUs support fixed precision anymore. I’m actually not sure if any new GPU from the last few years does at all. (New as in new design, they still build new GPUs using 10+ year old designs.) Looking at Unity’s shader code it appears that all fixed precision values are compiled as half precision and fixed is completely ignored.
     
    RockSPb likes this.
  3. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,983
    For GLES3.
    Fixed is still in the output for ES2, exactly for those 10+ year old GPUs.

    Memory bandwidth is usually the most limiting factor on mobile, but, as @bgolus said, you need to profile to see if this will help or not. In general, I would say it's worth packing.

    Any modern mobile GPU will have half and float support, but not fixed.
    AFAIK there is no difference between using COLOR and TEXCOORD<X> semantics.
     
    bgolus and RockSPb like this.
  4. RockSPb

    RockSPb

    Joined:
    Feb 6, 2015
    Posts:
    112
    Thank you for your answers! Really helpful. Now is the time for tests. I will post the results later.
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,329
    Ah, yeah, I misread the code. It sends the HLSL shader code with the fixed / half / float variable types unmodified directly to the HLSL2GLSL translator which converts them to float / vec# types and adds the lowp / medp / highp qualifiers.
     
    RockSPb likes this.
  6. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    2,983
    Well, the same applies to HLSLcc :)
    When it produces GLSL for ES2, it keeps fixed type.
     
    RockSPb likes this.
  7. RockSPb

    RockSPb

    Joined:
    Feb 6, 2015
    Posts:
    112
    @bgolus @aleksandrk
    Finally have some time to make a measures. We are work from home now, so I can't make tests on different devices.
    Only Adreno 509 test now.

    I have 4 float3 values.
    2 indeces & blend mask for reflection probes.
    Box-projection params for them(probe pos + min\max bound)

    Unpacking code:
    Code (CSharp):
    1. float3 unpackVector(float f)
    2. {
    3.     float3 v;
    4.  
    5.     v.x = floor(f / 256.0 / 256.0);
    6.     v.y = floor((f - v.x * 256.0 * 256.0) / 256.0);
    7.     v.z  = floor(f - v.x * 256.0 * 256.0 - v.y * 256.0);
    8.  
    9.     return v;
    10. }

    Two points of view (35k tris & 120k tris in frame)
    upload_2020-3-27_18-21-27.png
    upload_2020-3-27_18-21-48.png

    It's curious that the result is almost the same. Rendering with compression is slightly slower (around 0.1ms on booth cameras).

    PS Will wait to time that I can make more tests on different GPUs.