Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

is there any performance benefit related to declaration placement?

Discussion in 'Shaders' started by tswalk, May 8, 2014.

  1. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    Is there any performance benefit in vertex/fragment shaders if I use a uniform to declare a variable versus placing its' instantiation within the fragment portion of the shader?

    My programming background says to me that type definition (either static or variable) should be declared once instead of making declarations within an iterative loop (like vert/frag)...

    for example to use:

    Code (csharp):
    1. uniform float4 diffuse = (0,0,0,1);
    2. ...
    3. float4 frag(vo i) : COLOR
    4. {
    5.    ... do stuff ...
    6.    diffuse.rgb = something.rgb;
    7.    return diffuse;
    8. }

    versus:

    Code (csharp):
    1. float4 frag(vo i) : COLOR
    2. {
    3.    ... do stuff ...
    4.    float4 diffuse = (0,0,0,1);
    5.    diffuse.rgb = something.rgb;
    6.    return diffuse;
    7. }
     
  2. Daniel_Brauer

    Daniel_Brauer

    Unity Technologies

    Joined:
    Aug 11, 2006
    Posts:
    3,355
    There should be zero difference between those two arrangements, whether there are loops involved or not. That's true for CPU code as well as GPU code.
     
  3. RC-1290

    RC-1290

    Joined:
    Jul 2, 2012
    Posts:
    639
    The best way to figure it out is the to test it ;).

    By the way, if you run the first one through the HLSL compiler (DX11 mode), you might not even get the values you expect, unless you mark it as static.
     
  4. Daniel_Brauer

    Daniel_Brauer

    Unity Technologies

    Joined:
    Aug 11, 2006
    Posts:
    3,355
    Can you explain that?
     
  5. RC-1290

    RC-1290

    Joined:
    Jul 2, 2012
    Posts:
    639
    To be honest, it's in one of my notes from about a year ago, so I had to look it up and do a quick test.

    MSDN calls them storage class modifiers, and this is the description:
    So Unity (the 'application') can not see or modify the variable, and you get the value that it's initialized with.
    I use it for pixel offset values that I want to use across multiple compute shaders in the same file ('id + leftOffset' is a lot easier to debug than 'id + int3(-1, 0, 0)'). But it turns out that it works with regular shaders too.

    Even with regular Cg, in the regular (non-DX11) editor mode, you need to mark it as static, otherwise it will show up black (same thing with the uniform keyword).
    If you remove the float4 from the declaration, like in the example above, it will show up as white if it's marked as static, or it will complain about "non constant expression in initialization".

    The following shader uses the initialized diffuse variable as output for the fragment shader); it outputs red:
    Code (csharp):
    1. Shader "Custom/TestingStaticStorageClass" {
    2.     Properties {
    3.         _MainTex ("Main Texture", 2D) = "black" {}
    4.     }
    5.     SubShader {
    6.         Pass {
    7.             CGPROGRAM
    8.             #pragma vertex vert
    9.             #pragma fragment frag
    10.             #pragma target 3.0
    11.  
    12.             #include "UnityCG.cginc"
    13.            
    14.             sampler2D _MainTex;
    15.             float4 _MainTex_ST;// Required for TRANSFORM_TEX
    16.            
    17.             static float4 diffuse = float4(1,0,0,1);
    18.  
    19.             struct fragmentInput{
    20.                 float4 position : SV_POSITION;
    21.                 float2 uv : TEXCOORD0;
    22.             };
    23.  
    24.             void vert(appdata_base v, out fragmentInput o){
    25.                 o.position = mul (UNITY_MATRIX_MVP, v.vertex);
    26.                 o.uv = TRANSFORM_TEX (v.texcoord, _MainTex);// Apply Scale and Bias settings
    27.             }
    28.             float4 frag(fragmentInput i) : COLOR {
    29.                 return diffuse;
    30.             }
    31.             ENDCG
    32.         }
    33.     }
    34. }
     
  6. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    I'll need to do some massive geometry tests I think to see if there is a slight difference to find if the repetitive declaration creates even a mu difference.. but, what the hell... my curiosity drives my learning, so i'll probably do it. So far, using optimized PRT I'm getting incredible draw call performance in fragment on a scene with over 38,000 tris on Tegra 3. I figure i'll have to push that up a bit and do a lot of captures to get an average...

    but I do notice this difference with Unity generated byte code:

    pre-declaration: ConstBuffer "$Globals" 112 // 80 used size, 8 vars



    inner declaration: ConstBuffer "$Globals" 96 // 48 used size, 7 vars
     
    Last edited: May 8, 2014
  7. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    I guess as mentioned [and suggested, thanks RC-1290 :) ] here also at the end:

    http://http.developer.nvidia.com/CgTutorial/cg_tutorial_chapter10.html

    I can see though if I had to create something for vs_1_1, that perhaps I'ld run into problems with the shader I current have made if I bumped the uniform/constants buffer above 96 by doing the pre-declaration for that diffuse value. But i'm targeting something pretty specific, so I'll do some tests later this evening and post some results.

    thanks for the feedback!
     
  8. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    well, I would rather get confirmation on my findings before stating anything as fact, but one should definitely check this out. I know there are so many factors, and I have an extremely small sampling I've gathered.. but there is a difference.


    I think i'll revisit this after I've completed my game...


    but I'll post these, I didn't have time to build suitable geometry to sample.. so this will just have to do for me for now:

    scene view:
    $ff_sample_NMPRT.jpg

    data sample charted:
    $ff_metrics_NMPRT.jpg


    I believe, an overall average performance I am seeing a difference of about 12% (compounded draw call times for each method, averaged, and deviation)... plus an apparent closer to predictable draw call per frame.

    A variance of about +/- 0.38% on low gap measures (differences on lowest draw call time in all frames) to +/- 9.97% on high gap measures (differences between highest draw call times).... if that makes sense, although I'm probably speaking my own language there.
     
  9. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Shader programs are a different kind of beast compared to normal CPU programming. "Uniform variables" are constant for all vertices/fragments throughout the shader program execution. Now that execution may happen 60+x per second and each time the uniform value may get set to a new value by your CPU code. But from the shader code point-of-view, it's a "constant variable". If you declare what we may call a "local variable" in a code block that is run in a vertex or fragment program, its value may be "variable"/unique for each vertex/fragment, or AGAIN may be identical (uniform!) during this one program execution for all vertices/fragments -- depending entirely on your code logic! Shader compilers nowadays should be smart enough to detect whether this is so or not even before the code is ever run, hence Daniel's answer "there should be zero difference".

    Which means, in this particular case it's really pointless to measure. Uniforms have a specific purpose, providing data to the shader that will be uniform for its execution across all fragments/vertices. Local variables have the classical "variable purpose".

    Even if you have a local variable whose value is derived entirely by only constant and uniform inputs, again the shader compiler should turn this automagically into a "quasi-uniform" (perf-wise) with nothing to worry about.
     
  10. RC-1290

    RC-1290

    Joined:
    Jul 2, 2012
    Posts:
    639
    So did you use uniform or static for those tests?

    Your post is even more interesting after tswalk posted his test results, because there does appear to be a difference.
     
  11. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Well correct me if I'm wrong but I always thought the general shader coding heuristic should be "if you can express it in a uniform, do it; if not, well then you can't". So again, what's there to measure then ;)
     
  12. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    I used a uniform declaration because the value for diffuse changes within the iteration. I did not try a static because from my understanding this is equal to a constant declaration basically making it a read only buffer value (?).. I couldn't find anything yet to substantiate that.. I guess I could test that real quick and find out :D
     
  13. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    that's what I was initially thinking.. would difference could this make, but I am guessing the GPU works just as a CPU in that there is some small work to be done each time you make a declaration. However small that may be, if it is within an iteration like fragment, then we know this happens quite often each frame.
     
  14. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    I ran a few small samples this morning, just a short frame series capture... it takes a while to do. I wouldn't say this is conclusive, but the results are interesting.

    The only change I've made in the shader is with regards to how the final diffuse (float4) variable is declared, either as a uniform, static, or within the fragment (inner declaration).

    results:
    $ff_metrics_NMPRT_capture2.jpg


    again, it seems if the declaration is done outside the fragment loop.. the overall draw call timing appears more stable and in general has better performance. I was surprised to be able to make it static, but it worked. It did seem to change the ConstBuffer "$Globals" values in byte code, though i'm not sure what to make of that.

    I may come back to this when i'm closer to release and have more time to dedicate to squeezing out optimizations.
     
  15. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Again.. I'm not sure how your shader use-case is different from 99.9% of cases but generally:

    1. if you CAN declare something as a const (I guess that's what DX calls "static") that's best

    2. failing that, if you CAN declare it as uniform, that's next best

    3. failing that, if your shading logic dictates no other choice but to use a local var, that's what you gotta do.

    Am I wrong in the above? :D
     
  16. RC-1290

    RC-1290

    Joined:
    Jul 2, 2012
    Posts:
    639
    const is separate from static. Apparently it's a type modifier, where static is a storage class.
     
  17. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Does it use branching? AFAIK DX11 shaders will have step optimisations where possible so this might be something to look at.
     
  18. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    no branching, no conditionals.. just straight calculations. The only differences between those three samples are how I declared the diffuse variable. 8 instructions in the fragment, with 40 in vertex... primarily vertex lit, some alterations for fragment obviously :)