Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question Compute shader stops working with large ComputeBuffer

Discussion in 'General Graphics' started by BrandonK, May 31, 2020.

  1. BrandonK

    BrandonK

    Joined:
    Sep 18, 2015
    Posts:
    41
    Hi everyone

    I am doing some work involving procedural texture generation. The first part of my application runs on the CPU in parallel and writes data to multiple 2D arrays of a custom Pixel struct. This is already very performant and I want to keep it on the CPU as is.

    Now I simply need to combine all of the data from the separate threads to create a single image. I am working with very large images (roughly 10000x10000 pixels). Combining the pixel arrays and writing to a texture2D on the CPU is very slow. I am trying to speed this part up with a compute shader. I have got something basic working with low res images but anything above 1024x1024 pixels just returns a black texture. I have a 980-Ti (6GB VRAM), I am sure it can do better than this?

    Please see the relevant code below. It's a bit messy and I've been playing around with it, but what I've got at the moment simply draws a white pixel where there is data. Any help will be much appreciated.

    Code (CSharp):
    1. /// Draw to texture ///
    2.         Stopwatch textureStopwatch = new Stopwatch();
    3.         textureStopwatch.Start();
    4.  
    5.         int computeKernel = computeShader.FindKernel("CSMain");
    6.  
    7.         int[] tmpPixels = new int[renderJobsOutput[0].pixels.GetLength(0) * renderJobsOutput[0].pixels.GetLength(1)];
    8.         for (int x = 0; x < resolution; x++) {
    9.             for (int y = 0; y < resolution; y++) {
    10.                 tmpPixels[(y * resolution) + x] = renderJobsOutput[0].pixels[x, y].frequency;
    11.             }
    12.         }
    13.  
    14.         ComputeBuffer buffer = new ComputeBuffer(tmpPixels.Length, 4);
    15.         buffer.SetData(tmpPixels);
    16.         computeShader.SetBuffer(computeKernel, "dataBuffer", buffer);
    17.  
    18.         computeShader.SetTexture(computeKernel, "tex", outputTexture);
    19.         computeShader.Dispatch(computeKernel, resolution / 16, resolution / 16, 1);
    20.  
    21.         buffer.Dispose();
    Code (CSharp):
    1. public struct Pixel {
    2.         public int frequency;
    3.         public Color col;
    4.  
    5.         public Pixel(int _frequency, Color _col) {
    6.             frequency = _frequency;
    7.             col = _col;
    8.         }
    9.     }
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. RWTexture2D<float4> tex;
    4.  
    5. struct GPUPixel {
    6.     int frequency;
    7.     float3 col;
    8. };
    9.  
    10. StructuredBuffer<int> dataBuffer;
    11.  
    12. [numthreads(16,16,1)]
    13. void CSMain (uint3 id : SV_DispatchThreadID)
    14. {
    15.     int bufferID = id.x + id.y * 1024;
    16.     float val = ((float)dataBuffer[bufferID]) / 1;
    17.     tex[id.xy] = float4(val, val, val, 1);
    18. }
     
  2. AleksiUnity

    AleksiUnity

    Unity Technologies

    Joined:
    Mar 31, 2020
    Posts:
    6
    Hi Brandon

    Just to be sure, you did change that constant 1024 in the compute shader when testing with bigger buffers?
    I'm sure you did, just double checking as I can't see any obvious problems in the code.

    Have you tried taking a RenderDoc capture and
    - checking what the compute buffer contents is there?
    - checking what gets written to the output texture?
     
  3. BrandonK

    BrandonK

    Joined:
    Sep 18, 2015
    Posts:
    41
    Hi Aleksi

    Thanks for your help. I did change the resolution in the shader (Line 15) as well.

    This is actually the first time I have heard of RenderDoc, but I will definitely try it out!

    After a lot of research, I found that apparently the max size array you can send to the GPU is 1023 elements, and that it is also limited to 64kb of data. Not sure whether this is correct, perhaps you know more about the limits of buffer size? I also thought that compute buffers were different and could handle more.

    I spent a lot of time on this and eventually just gave up on the compute shader approach. Maybe the compute shader approach would have been better but I am not good enough with shaders to do this at the moment. I am now using 2 separate render textures, Graphics.Blit and a fragment shader. I am uploading data to the GPU the same way using a compute buffer (Strangely it works with larger buffer sizes). However, I am now doing multiple passes and only writing a portion of the pixels at a time.

    Below is my new code (Which is working, but I am still making some improvements). Hopefully it'll help someone else out in the future.
    Code (CSharp):
    1. /// Draw to texture ///
    2.         Stopwatch textureStopwatch = new Stopwatch();
    3.         textureStopwatch.Start();
    4.  
    5.         if (computeBuffer != null) {
    6.             computeBuffer.Release();
    7.         }
    8.  
    9.  
    10.         renderMat.SetInt("imgWidth", resolution);
    11.         renderMat.SetInt("imgHeight", resolution);
    12.         renderMat.SetVector("backgroundCol", new Vector4(backgroundColour.r, backgroundColour.g, backgroundColour.b, 1));
    13.  
    14.         RenderTexture renderTexA = new RenderTexture(resolution, resolution, 24);
    15.         RenderTexture renderTexB = new RenderTexture(resolution, resolution, 24);
    16.         renderTexA.enableRandomWrite = true;
    17.         renderTexB.enableRandomWrite = true;
    18.         renderTexA.autoGenerateMips = false;
    19.         renderTexB.autoGenerateMips = false;
    20.         renderTexA.useMipMap = true;
    21.         renderTexB.useMipMap = true;
    22.         renderMat.SetTexture("_MainTex", renderTexA);
    23.  
    24.  
    25.         computeBuffer = new ComputeBuffer(resolution * resolution * THREAD_COUNT, 20);
    26.         Pixel[] linearPixel = new Pixel[resolution * resolution * THREAD_COUNT];
    27.         for (int x = 0; x < resolution; x++) {
    28.             for (int y = 0; y < resolution; y++) {
    29.                 int f = 0;
    30.                 float r = 0;
    31.                 float g = 0;
    32.                 float b = 0;
    33.                 for (int i = 0; i < THREAD_COUNT; i++) {
    34.                     try {
    35.                         Pixel cPixel = renderJobsOutput[i].pixels[x, y];
    36.                         f += cPixel.frequency;
    37.                         r += cPixel.col.r;
    38.                         g += cPixel.col.g;
    39.                         b += cPixel.col.b;
    40.                     } catch {
    41.  
    42.                     }
    43.                 }
    44.                 r /= THREAD_COUNT;
    45.                 g /= THREAD_COUNT;
    46.                 b /= THREAD_COUNT;
    47.                 linearPixel[x * resolution + y] = new Pixel(f, new Color(r, g, b));
    48.             }
    49.         }
    50.  
    51.         RenderTexture activeRenderTexture = renderTexA;
    52.         RenderTexture otherRenderTex;
    53.  
    54.  
    55.  
    56.         int copyIndex = 0;
    57.         // Update size in shader as well
    58.         // 2048 x 2048
    59.         int transferSize = 4194304;
    60.         if (transferSize > resolution * resolution)
    61.             transferSize = resolution * resolution;
    62.         Pixel[] subArray = new Pixel[transferSize];
    63.         while (copyIndex < resolution * resolution) {
    64.             int cLength = transferSize;
    65.             if (copyIndex + cLength >= resolution * resolution) {
    66.                 cLength = (resolution * resolution) - copyIndex - 1;
    67.             }
    68.  
    69.             for (int i = 0; i < cLength; i++) {
    70.                 subArray[i] = linearPixel[copyIndex + i];
    71.             }
    72.  
    73.            
    74.             computeBuffer.SetData(subArray);
    75.             renderMat.SetBuffer("pixels", computeBuffer);
    76.  
    77.             renderMat.SetInt("startIndex", copyIndex);
    78.  
    79.             otherRenderTex = (activeRenderTexture == renderTexA) ? renderTexB : renderTexA;
    80.  
    81.             Graphics.Blit(activeRenderTexture, otherRenderTex, renderMat);
    82.             copyIndex += transferSize;
    83.  
    84.             renderMat.SetTexture("_MainTex", otherRenderTex);
    85.             activeRenderTexture = otherRenderTex;
    86.  
    87.             outputImage.texture = activeRenderTexture;
    88.  
    89.             yield return null;
    90.         }
    91.         textureStopwatch.Stop();
    92.         ///
    Code (CSharp):
    1. Shader "Custom/RenderToTexture"
    2. {
    3.     Properties
    4.     {
    5.         _MainTex("InputTex", 2D) = "white" {}
    6.     }
    7.     SubShader
    8.     {
    9.         Pass
    10.         {
    11.             CGPROGRAM
    12.             #pragma target 3.5
    13.  
    14.             #pragma vertex VSMain
    15.             #pragma fragment PSMain
    16.  
    17.             struct GPUPixel {
    18.                 int frequency;
    19.                 float4 col;
    20.             };
    21.  
    22.             sampler2D _MainTex;
    23.  
    24.             StructuredBuffer<GPUPixel> pixels;
    25.             int imgWidth;
    26.             int imgHeight;
    27.             float4 backgroundCol;
    28.  
    29.             int startIndex;
    30.  
    31.             void VSMain(inout float4 vertex:POSITION,inout float2 uv : TEXCOORD0)
    32.             {
    33.                 vertex = UnityObjectToClipPos(vertex);
    34.             }
    35.  
    36.             float4 PSMain(float4 vertex:POSITION,float2 uv : TEXCOORD0) : SV_TARGET
    37.             {
    38.                 int x = int(floor(uv.x*imgWidth));
    39.                 int y = int(floor(uv.y*imgHeight));
    40.                 int index = x * imgWidth + y;
    41.                 if (index >= startIndex && index < startIndex + 4194304) {
    42.                     float4 colOutput = pixels[x*imgWidth + y - startIndex].col;
    43.                     float frequency = pixels[x*imgWidth + y - startIndex].frequency;
    44.                     colOutput = lerp(backgroundCol, colOutput, clamp(frequency / 1, 0, 1));
    45.                     return colOutput;
    46.                 } else {
    47.                     return tex2Dlod(_MainTex, float4(uv.xy, 0, 0));
    48.                 }
    49.             }
    50.             ENDCG
    51.         }
    52.     }
    53. }
     
  4. Olmi

    Olmi

    Joined:
    Nov 29, 2012
    Posts:
    1,553
    Array size limit is not the same as the Structured Buffer limit (afaik). You can have millions of data items in a Structured Buffer.

    I don't have time to read your code but there's probably something sideways if you got it all black. Some size or Dispatch doesn't match.
     
  5. BrandonK

    BrandonK

    Joined:
    Sep 18, 2015
    Posts:
    41
    Thanks Olmi. Good to know that at least it is possibly with compute shader.

    Like I said, I'm quite new to compute shaders. I really have tried doing a lot of research but just can't work it out. I would really appreciate it if someone could point out where I went wrong.
     
  6. Olmi

    Olmi

    Joined:
    Nov 29, 2012
    Posts:
    1,553
    @BrandonK I can write a quick example about similar case a bit later after I get back to my workstation.
     
  7. AleksiUnity

    AleksiUnity

    Unity Technologies

    Joined:
    Mar 31, 2020
    Posts:
    6
    I tested quickly that this works.

    I'm not sure how you use the textures and/or what formats you use.
    But this one has integer structured buffer and it works correctly.
    I did some modifications to the value to actually visualize them on the screen.
    I tested buffer sizes up to 8192x8192. Seems to work correctly.

    Code (CSharp):
    1. #pragma kernel CSMain
    2. RWTexture2D<float4> tex;
    3. struct GPUPixel {
    4.     int frequency;
    5.     float3 col;
    6. };
    7. StructuredBuffer<int> dataBuffer;
    8. [numthreads(16,16,1)]
    9. void CSMain (uint3 id : SV_DispatchThreadID)
    10. {
    11.     int bufferID = id.x + id.y * 4096;
    12.     int intVal = dataBuffer[bufferID];
    13.     float x = (float)(intVal >> 16);
    14.     float y = (float)(intVal & 0xffff);
    15.     tex[id.xy] = float4(x  / 4096.0f, y / 4096.0f, 0.0f, 1);
    16. }
    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using System.Diagnostics;
    5. using UnityEngine.Experimental.Rendering;
    6.  
    7. public class TestScript : MonoBehaviour
    8. {
    9.     public ComputeShader computeShader;
    10.     public RenderTexture outputTexture;
    11.     private RenderTexture renderTexture;
    12.  
    13.     public struct Pixel {
    14.         public int frequency;
    15.         public Color col;
    16.         public Pixel(int _frequency, Color _col) {
    17.             frequency = _frequency;
    18.             col = _col;
    19.         }
    20.     }
    21.    
    22.     public const int resolution = 4096;
    23.     public Pixel[,] pixels = new Pixel[resolution, resolution];
    24.    
    25.     // Start is called before the first frame update
    26.     void Start()
    27.     {
    28.         renderTexture = new RenderTexture(resolution, resolution, 1, GraphicsFormat.R8G8B8A8_UNorm);
    29.         renderTexture.enableRandomWrite = true;
    30.         renderTexture.Create();
    31.     }
    32.  
    33.    
    34.    
    35.     // Update is called once per frame
    36.     void Update()
    37.     {
    38.         /// Draw to texture ///
    39.         Stopwatch textureStopwatch = new Stopwatch();
    40.         textureStopwatch.Start();
    41.         int computeKernel = computeShader.FindKernel("CSMain");
    42.         int[] tmpPixels = new int[pixels.GetLength(0) * pixels.GetLength(1)];
    43.         for (int x = 0; x < resolution; x++) {
    44.             for (int y = 0; y < resolution; y++)
    45.             {
    46.                 tmpPixels[(y * resolution) + x] = (x << 16) | y;//pixels[x, y].frequency;
    47.             }
    48.         }
    49.         ComputeBuffer buffer = new ComputeBuffer(tmpPixels.Length, 4);
    50.         buffer.SetData(tmpPixels);
    51.         computeShader.SetBuffer(computeKernel, "dataBuffer", buffer);
    52.         computeShader.SetTexture(computeKernel, "tex", renderTexture);
    53.         computeShader.Dispatch(computeKernel, resolution / 16, resolution / 16, 1);
    54.         buffer.Dispose();
    55.        
    56.         Graphics.Blit(renderTexture, outputTexture);
    57.     }
    58. }
    59.  
     

    Attached Files:

  8. BrandonK

    BrandonK

    Joined:
    Sep 18, 2015
    Posts:
    41
    Thank you @AleksiUnity so much for your efforts.

    Very clever how you pass the xy coordinate to the compute shader! I have tried your code, it works fine on my computer as well. The problems seem to start when I actually sample from the 2D array. Perhaps it is an issue with how my data type is passed to the GPU - Pixel (cs) and GPUPixel (shader) ?

    I have spent the whole morning trying to fix this and still can't get it working. Once again it works with all resolutions up to and including 1024, anything above that and I just get black image. Extremely frustrating, I am starting to think that perhaps it is a system issue. I have even tried changing registry settings at (Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers) to increase tdrDelay.

    I just can't understand what changes between resolutions 1024x1024 and 2048x2048 that causes this issue. It will still draw fine at 2048x2048 on the CPU code, so it must be an issue with the shader or buffer.

    Unfortunately, I cannot share my entire project, but I will share the relevant code again. renderJobsOutput[0].pixels is a 2D array of type Pixel. I can guarantee that it is correctly populated with data for all resolutions, the frequency variable creates something like a heightmap. I have attached a screenshot of a mask of the frequency variable.

    Code (CSharp):
    1. /// Draw to texture ///
    2.         Stopwatch textureStopwatch = new Stopwatch();
    3.         textureStopwatch.Start();
    4.  
    5.         // Create new rendertexture if resolution is different
    6.         if (resolution != lastRenderTextureResolution) {
    7.             lastRenderTextureResolution = resolution;
    8.             outputTexture = new RenderTexture(resolution, resolution, 24);
    9.             outputTexture.enableRandomWrite = true;
    10.             outputTexture.Create();
    11.         }
    12.  
    13.         int computeKernel = computeShader.FindKernel("CSMain");
    14.  
    15.         // Pixel data
    16.         ComputeBuffer dataBuffer = new ComputeBuffer(resolution * resolution, 20);
    17.         dataBuffer.SetData(renderJobsOutput[0].pixels);
    18.         computeShader.SetBuffer(computeKernel, "dataBuffer", dataBuffer);
    19.  
    20.         computeShader.SetTexture(computeKernel, "tex", outputTexture);
    21.         computeShader.Dispatch(computeKernel, resolution / 16, resolution / 16, 1);
    22.  
    23.         dataBuffer.Dispose();
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. RWTexture2D<float4> tex;
    4.  
    5. struct GPUPixel {
    6.     int frequency;
    7.     float4 col;
    8. };
    9.  
    10. StructuredBuffer<GPUPixel> dataBuffer;
    11.  
    12. [numthreads(16,16,1)]
    13. void CSMain (uint3 id : SV_DispatchThreadID)
    14. {
    15.     // Change number below with resolution in C# script
    16.     int bufferID = id.x + id.y * 1024;
    17.     // Test output with simple mask of frequency
    18.     float val = ((float)dataBuffer[bufferID].frequency) / 1;
    19.     tex[id.xy] = float4(val, val, val, 1);
    20. }
    Capture.JPG
     
  9. AleksiUnity

    AleksiUnity

    Unity Technologies

    Joined:
    Mar 31, 2020
    Posts:
    6
    Hi

    I would still suggest to take that RenderDoc capture. It might sound intimidating, but it's really easy and you can then verify all your assumptions about what happens on the GPU side.

    https://docs.unity3d.com/Manual/RenderDocIntegration.html

    The capture icon has changed to camera in my version at least. So I suppose we need to update the manual.
    If you don't have RenderDoc installed, you can find the latest version here: https://renderdoc.org/

    Once you have the capture, you can find your compute shader from the beginning of the capture.

    So check what you have in the structured buffer when you have bigger than 1024 sizes.
    And check what it writes to the RenderTexture.

    This way you can at least pinpoint the issue to CPU or GPU problem and also if it's an issue with corrupt data in the structured buffer or an issue with writing to the RenderTexture.
     

    Attached Files: