Search Unity

Help with looping bottleneck in shader

Discussion in 'Shaders' started by AzeExMachina, Feb 25, 2019.

  1. AzeExMachina

    AzeExMachina

    Joined:
    Jan 30, 2016
    Posts:
    14
    Hi everyone,

    I am creating a heatmap for my application, I am simply visualizing it over a world map such as this example here

    I've used a shader applied on a quad as big as my camera frustum to visualize it and I was able to do it.

    The issue is that for this shader to work it needs to loop every single pixel of the quad for many times as all the points I want to visualize, which is actually a big bottleneck, since it can be like 100x100x300 loops, this obviously weighs too hard on the app performance. Does anyone know how to avoid this behaviour?

    One thing I did to try to avoid this was to divide my quad into many little quads each with its own material, positioned in a grid as big as the first quad. This way I could save on my performance but it's a bit confusing when the point is shared between quads

    Down here I'm posting my shader code, to pass the points I'm simply using things like Material.SetVectorArray() etc
    Code (CSharp):
    1.  struct vertInput
    2.              {
    3.                  float4 pos : POSITION;
    4.              };
    5.              struct vertOutput
    6.              {
    7.                  float4 pos : POSITION;
    8.                  fixed3 worldPos : TEXCOORD1;
    9.              };
    10.              vertOutput vert(vertInput input)
    11.              {
    12.                  vertOutput o;
    13.                  o.pos = UnityObjectToClipPos(input.pos);
    14.                  o.worldPos = mul(unity_ObjectToWorld, input.pos).xyz;
    15.                  return o;
    16.              }
    17. half4 frag(vertOutput output) : COLOR
    18.          {
    19.              half h = 0;
    20.              for (int i = 0; i < _Points_Length; i ++)
    21.              {
    22.                  half dist = distance(output.worldPos, _Points[i].xyz);//_Points[i].xyz is the point I'm passing to my shader
    23.                  half radi = _Properties[i].x; //this is the radius of the area around the actual point
    24.                  half hi = 1 - saturate(dist / radi);
    25.                  h += hi * _Properties[i].y; //Properties[i].y is just an intensity modifier
    26.              }
    27.              h = saturate(h);
    28.              half4 color = tex2D(_HeatTex, fixed2(h, 0.5));
    29.              return color;
    30.          }
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    Yeah, this is a good approach for what you're attempting. You're roughly trying to do what so many tiled / clustered lighting systems use, which is break down the number of objects per tile. Note that unless you're careful each tile will still be iterating over "300" points even if you break it down into individual quads! When using WebGL or GLES you're kind of limited as to what you can do since many of the dynamic branching / iterating options that desktop GPUs have access to aren't available. Specifically WebGL can't do dynamic iterators, so the loop is unrolled to a fixed length and does all of the work regardless of what the _Points_Length is set to. The real shader then looks a bit like this:

    Code (csharp):
    1. float h = 0;
    2. float temp0 = calcH(_Point[0]);
    3. if (0 < _Points_Length)
    4.     h += temp0;
    5. float temp1 = calcH(_Point[1]);
    6. if (1 < _Points_Length)
    7.     h += temp1;
    8. float temp2 = calcH(_Point[2]);
    9. if (2 < _Points_Length)
    10.     h += temp2;
    11. // ...
    12. float temp299 = calcH(_Point[299]);
    13. if (299 < _Points_Length)
    14.     h += temp299;
    You might be better off creating a version with a hard coded limit of 10 or 20 items, and then bin up your points by position and render scaled quads that cover the area those points will, rendering only those 10 or 20 at a time. Render that into an R8 format render texture and read from that when drawing the heat map.

    You could even do it with a max count of say 50 and then draw to the RT over a few frames when the data changes rather than doing it all at once. Then you don't have to calculate it every update as the heatmap will then be cached.
     
    AzeExMachina likes this.
  3. AzeExMachina

    AzeExMachina

    Joined:
    Jan 30, 2016
    Posts:
    14
    Hey there, thank you for your answer, glad I was alreayd on the right track. In the end I did just that, divided all the points in buckets and also put the output on a RenderTexture to only update it when needed. Thanks for your help!