Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Reducing redundant code vs. optimizations

Discussion in 'Shaders' started by ArachnidAnimal, Nov 25, 2017.

  1. ArachnidAnimal

    ArachnidAnimal

    Joined:
    Mar 3, 2015
    Posts:
    1,727
    I'm using code from an existing shader that has multiple passes. Each pass uses the same "GRABXYPIXEL" code. So I figure this is "redundant" code.

    Code (csharp):
    1.  
    2. Pass
    3. {  
    4. #define GRABXYPIXEL(kernelx, kernely) tex2Dgrad(_GrabTexture, UNITY_PROJ_COORD(float4(i.uvgrab.x + _GrabTexture_TexelSize.x * kernelx, i.uvgrab.y +  _GrabTexture_TexelSize.y * kernely, i.uvgrab.z, i.uvgrab.w))/i.uvgrab.w, 0, 0)
    5.  
    6. //...
    7. for(int i=0 i < 10; i++)
    8. {
    9. sum += GRABXYPIXEL(0,0)
    10. sum += GRABXYPIXEL(0,1)
    11. sum += GRABXYPIXEL(1,0)
    12. sum += GRABXYPIXEL(1,1)
    13. }
    14. //...
    15. }
    16.  
    17. Pass
    18. {  
    19. #define GRABXYPIXEL(kernelx, kernely) tex2Dgrad(_GrabTexture, UNITY_PROJ_COORD(float4(i.uvgrab.x + _GrabTexture_TexelSize.x * kernelx, i.uvgrab.y +  _GrabTexture_TexelSize.y * kernely, i.uvgrab.z, i.uvgrab.w))/i.uvgrab.w, 0, 0)
    20.  
    21. //...
    22. for(int i=0 i < 10; i++)
    23. {
    24. sum += GRABXYPIXEL(0,0)
    25. sum += GRABXYPIXEL(0,-1)
    26. sum += GRABXYPIXEL(-1,0)
    27. sum += GRABXYPIXEL(-1,-1)
    28. //...
    29. }
    30. }
    31.  
    I'm trying to create a gcinc to define a function for GRABXYPIXEL instead of using the "#define GRABXYPIXEL" , so I can call the function and reduce redundant code:

    Code (csharp):
    1.  
    2. inline half4 GRABXYPIXELfunc(float kernelx, float kernely, float4 uvgrab,
    3. sampler2D _GrabTexture, float2 _GrabTexture_TexelSize)
    4. {    half4 result = tex2Dgrad(_GrabTexture, UNITY_PROJ_COORD(float4(uvgrab.x + _GrabTexture_TexelSize.x * kernelx, uvgrab.y +  _GrabTexture_TexelSize.y * kernely, uvgrab.z, uvgrab.w))/uvgrab.w, 0, 0);
    5.     return result;
    6. }
    7.  
    It works, but I'm just wondering if I'm shooting myself in foot by trying to do things like this which could ultimately cause the shader to run slower. Couldn't this be an issue of the shader now not running as quickly?
    https://forum.unity.com/threads/branches-how-expensive-are-they.152411/

    [Additionally, I'm having a problem trying to use the GPU profiler because Unity keeps crashing when I try to the GPU profiler (this is a separate issue).]

    Can someone tell me if me doing the above approach a good idea?
     
  2. StevenGerrard

    StevenGerrard

    Joined:
    Jun 1, 2015
    Posts:
    97
    pvr sdk has a tool can compile shader to gpu instruction, thus you will know how many instructions a shader need. More instructions cost more gpu resource. I'd suggest to use it profile your shader.
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,245
    It might be slightly slower to compile the shader when you use a .cginc file (because it now has to load an additional file to compile the shader), but ultimately there's no difference between a #define "function", an inline function, a "regular" function, or the same code just written out in the shader. In the end during shader compilation it all gets inlined into one long shader anyway.

    All of these will compile to identical shader code.
    define:
    Code (csharp):
    1. #define MYFUNC(x, y) (x + y)
    2. ...
    3. float z = MYFUNC(1.0, 2.0);
    inline function:
    Code (csharp):
    1. inline float MyFunc(float x, float y) { return x + y; }
    2. ...
    3. float z = MyFunc(1.0, 2.0);
    standard function:
    Code (csharp):
    1. float MyFunc(float x, float y) { return x + y; }
    2. ...
    3. float z = MyFunc(1.0, 2.0);
    inline code:
    Code (csharp):
    1. float z = 1.0 + 2.0;
    About inline functions vs "regular functions", there isn't actually a difference. See this from the HLSL documentation.
    https://msdn.microsoft.com/en-us/library/windows/desktop/bb509607(v=vs.85).aspx
    The same is also true for GLSL and Cg.

    Additionally shader compilers can be quite good at dealing with redundant code. So these three examples below may compile into identical shaders.
    Code (csharp):
    1. float MyFunc(float2 uv, float2 scale, float2 offset) { return tex2D(_MyTex, uv * scale + offset).a; }
    2. ...
    3. sum += MyFunc(i.uv, _UVScale, float2(0,0));
    4. sum += MyFunc(i.uv, _UVScale, float2(1,0));
    5. sum += MyFunc(i.uv, _UVScale, float2(0,1));
    6. sum += MyFunc(i.uv, _UVScale, float2(1,1));
    Code (csharp):
    1. float MyFunc(float2 uv, float2 offset) { return tex2D(_MyTex, uv + offset).a; }
    2. ...
    3. float2 uv = i.uv * _Scale;
    4. sum += MyFunc(uv, float2(0,0));
    5. sum += MyFunc(uv, float2(1,0));
    6. sum += MyFunc(uv, float2(0,1));
    7. sum += MyFunc(uv, float2(1,1));
    Code (csharp):
    1. float2 uv = i.uv * _Scale;
    2. float2 uvA = uv + float2(0,0);
    3. float2 uvB = uv + float2(1,0);
    4. float2 uvC = uv + float2(0,1);
    5. float2 uvD = uv + float2(1,1);
    6. sum += tex2D(_MyTex, uvA).a;
    7. sum += tex2D(_MyTex, uvB).a;
    8. sum += tex2D(_MyTex, uvC).a;
    9. sum += tex2D(_MyTex, uvD).a;
    I say "may" instead of "will" because different shader compilers are, well, different.
     
    AshwinMods and ArachnidAnimal like this.
  4. ArachnidAnimal

    ArachnidAnimal

    Joined:
    Mar 3, 2015
    Posts:
    1,727
    It sounds like these shader compilers are very intelligent and do optimizations during compile time.
    I guess I'l just keep on doing the effort to remove redundant code when it makes sense, and not have to be concerned so much that the shader might not be as fast.
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,245
    Removing redundant code also makes good sense for code cleanliness and maintenance.
     
  6. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,445
    Shaders have a tendency to become unmanageable, as they often contain a lot of conditional compilation and often turn into a "tower of macros" (See Unity's standard shader's lighting model for a good example of how unwieldily this can become). I would suggest liberally using functions and structs to keep your code clean, as well as avoiding having to heavily refactor things when you want to add or remove some data through a complex shader. The low level language these are compiled into doesn't have functions, pointers, or structures at all, so these things tend to compile out completely. Further, I try to limit macro use to things which can ONLY be done with macros, otherwise the tower will grow large and unstable.

    I actually find optimizing shaders fairly easy and fun (though often results are not what you'd expect), especially if you have tools like Instrument's GPU profiler, which can show you how much time is spent in each line of your code. I rarely find that the tools I use to help keep code clean have anything to do with speed, and that in most cases keeping my code super clean helps me reason about the shaders much easier.
     
    ArachnidAnimal likes this.
  7. ArachnidAnimal

    ArachnidAnimal

    Joined:
    Mar 3, 2015
    Posts:
    1,727
    Honest, trying to follow all of Unity's .gcinc files is a nightmare for me as I just starting to learn shaders, especially because I don't have an IDE to view CG programs at the moment.


    This is very interesting to play around with. I tried to create some cases in a shader, then compile the code to see what the compiled code looks like.

    Here is how this compiles:
    Code (csharp):
    1.  
    2.     half4 result =  half4(0,0,0,0);
    3.     for(int i = 0; i <= 10; i++)
    4.     {
    5.          result.x = i;
    6.     }
    7.     return result;
    8.  
    compiles to:
    Code (csharp):
    1.  
    2.   dcl_output o0.xyzw
    3.    0: mov o0.xyzw, l(10.000000,0,0,0)
    4.    1: ret
    5.  
    So it's 100% optimized. The loop is completely optimized out. It simply returns 10

    This brings me to wonder about whether having a for loop in a shader where the test expression (i <= _SomeShaderProperty) should be avoided if possible when you know what _SomeShaderProperty value is going to be?
    Here is what I mean: I'm creating a blur shader. There is a shader property called "NumberOfLoops" which controls the amount of loops a for-loop runs for. I figured out it should be about 10 loops after experimenting with the shader. So should I hard-code the value "10" like in the very above code in order to gain any possible optimizations? Or leave it set as a property to allow for flexibility?
    (These are mainly just questions to myself at the moment. I don't expect an answer mainly because these are probably not easy questions to answer without knowing much more about the shader)
     
    Last edited: Nov 27, 2017
  8. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,445
    If a value is known at compile time, the compiler has a lot more it can do. In many cases, it will unroll the loop for you. Also, be aware that there are restrictions when sampling a texture inside a loop or branch that can have adverse effects on performance because texture fetches in a fragment shader are designed to share samples with their neighboring pixels to increase throughput.
     
    AshwinMods likes this.
  9. ArachnidAnimal

    ArachnidAnimal

    Joined:
    Mar 3, 2015
    Posts:
    1,727