Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Select a color component to use?

Discussion in 'Shaders' started by imaginaryhuman, Nov 9, 2015.

  1. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    Input to my shader is an RGBA texture. In the fragment shader I wish to output ONE of the 4 color components based on a modulo of the UV2.x coordinate (which has been input as pixel/window coordinates). ie a pixel in column 0 outputs Alpha, a pixel in column 1 outputs Red, a pixel in column 2 outputs Green and in column 3 outputs Blue component.

    I can do this with a simple modulo like fmod(), but then it seems I have to do a set of nested IF statements to choose like, if the modulo is 0 do this, else if the modulo is this do this, else if the modulo is this do this etc.... seems inefficient.

    Is there a faster way to do this? Some kind of swizzle indexing of the component based on a number? Converting the rgba components to an array then using an integer index?
     
  2. Fuegan

    Fuegan

    Joined:
    Dec 5, 2012
    Posts:
    20
    Maybe something like this would work :
    - Let say X is your result from your modulo (as an integer)
    - _source is your source color
    - col is the output

    Code (CSharp):
    1. col.a = (1 - X) * _source.a;
    2. col.r = (1 - abs(X - 1)) * _source.r;
    3. col.g = (1 - abs(X - 2)) * _source.g;
    4. col.b = (1 - abs(X - 3)) * _source.b;
    Assuming negative values on col are OK (which I have to say I'm not sure but from some tests I did a while ago it seemed not to be a problem and acts like 0) you should have the results you want.
    I think it's better than if statements but maybe there are better solutions.
     
  3. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    That seems like it involves even more operations than before though.
     
  4. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    I suppose one thing I could do is split the geometry into thin 1-pixel columns and assign 4 separate shaders, one for each modulo, then each shader could just be written to directly output a given color component without any other branching or modulo testing needed. I think that should probably be faster unless this produces significant overhead from all the extra triangles and separate shaders?
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,248
    Splitting the geometry up into single pixel slices with 4 shaders, or using if statements on the mod of the pixel coordinate are likely both slower than just doing all of the math all of the time and lerping.

    4 shaders means context switching and a lot more polygons.
    If statements means conditional branching which is pretty fast on modern hardware, but changing every pixel is literally the worst case performance case for it.

    So, unless you're doing a lot of math that is uniquely different per pixel, just do all the math all the time.
     
  6. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    Ok I'll try it and see how the speed is. Thanks
     
  7. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    Is it possible to convert an RGBA float4 into like an array of 4 floats, so that you can index it with an integer?
     
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,248
    No need to convert, vectors are already arrays.
    Code (CG):
    1. float4 foo = float4(1.1, 2.2, 3.3, 4.4);
    2. float a = foo.x; // a == 1.1 using .xyzw
    3. float b = foo.g; // b == 2.2 using .rgba
    4. float c = foo.p; // c == 3.3 using .stpq
    5. float d = foo[3] // d == 4.4 using [n]
    xyzw are the default vector component accessors, intended for positions and generic variables.
    rgba are the most commonly used alternatives, intended for use with texture or other color values.
    stpq are the almost completely unknown alternatives to the above, intended for use with uv coordinates. These exist in glsl and cg, but not hlsl. Since most people use hlsl documentation for a stand in for cg people miss these exist unless they came from a glsl background.

    In most cases the above options are the best because they allow easy swizzling (.xxx, .zyx, etc.). Internally they all remap back to xyzw for the compiled shader, but they're nice shorthand for showing the different types of information sources even if they're really all the same data. Note you cannot mix the different accessor sets in the same swizzle; .xgp will fail! You can use the different accessors for assignments or comparisons; foo.xyz = bar.rgb or foo.xyz == foo.rgb.

    [n] allows direct access to each value, and is most commonly seen when using explicit arrays (float foo[4] = {1.1, 2.2, 3.3, 4.4};) but can also be used with vectors and matrices. You cannot swizzle or access multiple components in-line with them though, so foo[0][1] will result in an error for float foo[4] or float4 foo, that is used for accessing multi-dimensional data types; matrices like float4x4 foo or vector arrays like float4 foo[4] or nested arrays like float foo[4][4].
     
    Last edited: Nov 12, 2015
  9. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    Thanks for the help. It turns out that e.g. foo[3] runs quite a bit slower than the way I was doing it with IF statements.
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,248
    Something like the below code might end up faster than the if statements, especially on some older hardware.
    Code (CG):
    1. fixed4 data = tex2D(_DataTex, UV.xy);
    2. half channel = floor(fmod(UV2.x,4.0));
    3. fixed output = dot(data, saturate(-abs(channel - half4(0.0, 1.0, 2.0, 3.0)) + 1.0));
    A little explanation:
    The first line is self explanatory, get the texture data.
    The second line gets us a value of 0, 1, 2, or 3 by flooring the fmod.
    The third line is the magic.

    A dot product is a easy way to add a bunch of values together as it's highly optimized on GPUs.
    foo.x + foo.y + foo.z + foo.w
    is slower than
    dot(foo, float4(1.0,1.0,1.0,1.0))
    A dot product on modern hardware can be done in a single cycle where the adds are all a single cycle each. Even on older hardware the dot product is probably going to be two cycles and not three.

    So now the saturate(-abs(channel - half4(0.0, 1.0, 2.0, 3.0)) + 1.0) part. This can probably best be explained by a wolfram alpha link.
    http://www.wolframalpha.com/input/?i=-abs(x+-+(0.0,+1.0,+2.0,+3.0))+++1.0+with+x+=+0+to+4
    Basically it's taking the value and getting them into 0 to 1 ranges (plus some negatives). Because channel is floored the values you get back are actually only zero or one, but adding floor to the wolfram alpha link makes it more difficult to understand. So now a channel value of 0 will result in a half4(1.0, 0.0, 0.0, 0.0) and a channel value of 1 will result in a half4(0.0, 1.0, 0.0, 0.0) etc.

    The saturate and abs are both "free", so that entire line is just 3 cycles even with the dot product. The second line of just floor(fmod(UV2.x, 4.0)) might be slower!
     
    Last edited: Nov 12, 2015
  11. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    Thanks for taking the time to write and explain this. I did a test of this code on my main computer vs my original IF-based code, and somehow the two still run at the same speed. I presume the operations are just getting optimized so much that it's more or less dependent on texture fetches. Your code does work, btw, which is cool. I might have to try it on a slower computer to see if there's any difference there. No matter what else I've tried, the IF statements somehow end up being the fastest solution already.