Search Unity

Compute shaders white screen crash

Discussion in 'Shaders' started by psomgeorg, Oct 31, 2019.

  1. psomgeorg

    psomgeorg

    Joined:
    Mar 16, 2019
    Posts:
    99
    I am trying to fill a volume texture with some noise values via compute shaders. The threadgroupsize is 8,8,8 and the volume texture resolution is 128,64,64. So the main idea is to dispatch a compute shader with 16,8,8 so that each thread is responsible for one voxel of the texture. However it seems i cant do that with a single dispatch call. I get a black screen and then unity goes to full white. I am compelled to have a for loop in a c# script which calls dispatch 4 times( so that means i get a dispatch of 16,8,2 per iteration) and i just change the offset of the id.z ,for instance to get the next chunk of the texture.I also tried to do a for loop inside the compute shader. So with a single 16,8,2 dispatch it runs 3 extra times per thread.However i get the same issue. It seems like that if i put too much work in each thread ,it crashes. I don't understand why that happens. Btw the gpu is 2060 if i use more threads the device allows.
     
  2. customphase

    customphase

    Joined:
    Aug 19, 2012
    Posts:
    176
    Thats definitely not the source of the issue. Would need to see the code to tell anything else.
     
  3. psomgeorg

    psomgeorg

    Joined:
    Mar 16, 2019
    Posts:
    99
    Code (CSharp):
    1. float worley1(StructuredBuffer<float3> points, float3 samplePos)
    2. {
    3.     float minDist = 1000000.0;
    4.     for (int i = 0; i < 216000; i++)
    5.     {
    6.         if (points[i].x > 1.0)
    7.             continue;
    8.         float3 posWorld = float3(samplePos.x * (-90.0), samplePos.y * 40.0, samplePos.z * (-60.0));
    9.         float3 diff =posWorld-points[i];
    10.         float dist = sqrt(dot(diff, diff));
    11.         minDist = min(dist, minDist);
    12.  
    13.     }
    14.     //return minDist;
    15.     return min(abs(minDist),2);
    16. }
    17.  
    18. [numthreads(numThreads, numThreads, numThreads)]
    19. void CSWorley(uint3 id : SV_DispatchThreadID)
    20. {
    21.  
    22.         //float3 index = (float3)(id.x, id.y, id.z * (i+1));
    23.         float3 pos = float3(id.x / (float)resolution.x, id.y / (float)resolution.y, (id.z+offset) / (float)resolution.z);
    24.      
    25.         float noiseSum = 0;
    26.         noiseSum += worley1(points1, pos);
    27.         noiseSum += worley1(points2, pos);
    28.         noiseSum += worley1(points3, pos);
    29.         noiseSum += worley1(points4, pos);
    30.         noiseSum += worley1(points5, pos);
    31.         noiseSum += worley1(points6, pos);
    32.         noiseSum += worley1(points7, pos);
    33.         noiseSum += worley1(points8, pos);
    34.         noiseSum += worley1(points9, pos);
    35.         noiseSum += worley1(points10, pos);
    36.         noiseSum += worley1(points11, pos);
    37.         noiseSum += worley1(points12, pos);
    38.         noiseSum += worley1(points13, pos);
    39.         noiseSum += worley1(points14, pos);
    40.         noiseSum += worley1(points15, pos);
    41.         noiseSum += worley1(points16, pos);
    42.  
    43.      
    44.  
    45.         float maxVal =32.0;
    46.  
    47.    
    48.         noiseSum /= maxVal;
    49.    
    50.         if (invertNoise) {
    51.             noiseSum = 1 - noiseSum;
    52.         }
    53.  
    54.         // keep track of min max (using int to support atomic operation)
    55.         int val = (int)(noiseSum * minMaxAccuracy);
    56.         InterlockedMin(minMax[0], val);
    57.         InterlockedMax(minMax[1], val);
    58.  
    59.         // Store result in specified channel of texture
    60.         Result[uint3(id.x, id.y, id.z+offset )] = Result[uint3(id.x, id.y, id.z+offset)] * (1 - channelMask) + noiseSum * channelMask;
    61.  
    62.    
    63.  
    64. }
    the points# buffers store the position of a point in 3d space to calculate the distance(worley noise)
    you can also ignore the min and max, i have commented them and didnt fix it. They are used for another kernel to normalize the values. However i dont call the dispatch, so dont bother
    Code (CSharp):
    1.   for (int j = 0; j < 8; j++)
    2.                 {
    3.                     noiseCompute.SetInt("offset", 8 * j);
    4.                     noiseCompute.Dispatch(0, 16, 8, 1); //i wanted one iteration of 16,8,8 but it crashes
    5.                     minMaxBuffer.GetData(minMax);
    6.                 }
    7.  
     
  4. customphase

    customphase

    Joined:
    Aug 19, 2012
    Posts:
    176
    Um. Holy hell, no wonder it crashes the driver, thats way too much work. You need some acceleration structures, you cant just bruteforce check each individual pair of points, running 216000*16 loop iterations on each thread, youre losing all benefits of multithreading/parallel there. Check Sebastian Lague's tutorial on rendering clouds, he mentions generating worley noise in parallel there.

     
  5. Olmi

    Olmi

    Joined:
    Nov 29, 2012
    Posts:
    814
    Instead of trying write own Worley noise version (unless you're specifically learning how to write a noise on GPU) I would recommend using an existing one. There's quite few out there in GLSL format and some in HLSL too. And it's pretty easy to port them to HLSL anyway. Those will most likely perform quite well.
     
  6. psomgeorg

    psomgeorg

    Joined:
    Mar 16, 2019
    Posts:
    99
    I have already seen his implementation, he uses the neighboring cells. However he has a fixed cell size and knows beforehand how to find the neighboring with a simple subtraction and addition to the index of the current voxel. In my case i don't have any fixed size. I can maybe organize the points and accelerate it. I just did not know that you can't put too much work. I thought you can put any load you want and you will just wait more. It doesn't explain why it works with 8*Dispatch(16,8,1) and not Dispatch(16,8,8).
     
  7. customphase

    customphase

    Joined:
    Aug 19, 2012
    Posts:
    176
    It will wait more, but up to a certain point. GPUs have something called Timeout Detection and Recovery, which basically means if something takes too much time to calculate on GPU (over 2 second by default i think), it will consider it as if GPU is stuck in an infinite loop, and therefore it shuts down and restarts the driver to fix that.
     
  8. psomgeorg

    psomgeorg

    Joined:
    Mar 16, 2019
    Posts:
    99
    Ok i didn't know that. Any way to bypass that, or i shouldn't even bother changing it?
     
  9. psomgeorg

    psomgeorg

    Joined:
    Mar 16, 2019
    Posts:
    99
    I am working now, on how to distribute the points for the worley noise. Its not total chaotic or random they are based on some data, so i don't know the compute shader eventually might have to do much less work than the 16x216000
     
unityunity