Search Unity

Check if a ComputeShader.Dispatch() command is completed on GPU before doing second kernel dispatch

Discussion in 'General Graphics' started by joergzdarsky, Nov 23, 2015.

  1. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Hi,

    I am trying to move from my CPU based procedural planet generation approach to a GPU based (when it comes to plane calculation and rendering).
    Being fairly new new to shader programing at all, I am right now at the stage that I can hand over generation constants in a buffer to a compute shader, precalulcate the plane in a compute shader (vertice positions, norrmals, based on noise) and hand them over via the buffer for rendering (replacing the vertex positions of a prototype mesh with the ones from the buffer).



    But right now I havent been able to render more than one plane at once. Guess this is due to some dependency I am not aware of (maybe the order of creating the buffers, materials, DrawMesh calls), or maybe I in the end need one gameobject per DrawMesh call?!.

    Any hint on what I might be doing wrong would be very helpfull and appreciated.

    So right now my (not working) approach is I moved the buffers into the **QuadtreeTerrain** class (a quadtree node), as well as the material (not sure if individual materials are necessary).

    Code (CSharp):
    1.  
    2.     class QuadtreeTerrain {
    3.         // Quadtree classes
    4.         public QuadtreeTerrain parentNode; // The parent quadtree node
    5.         public QuadtreeTerrain childNode1; // A children quadtree node
    6.         public QuadtreeTerrain childNode2; // A children quadtree node
    7.         public QuadtreeTerrain childNode3; // A children quadtree node
    8.         public QuadtreeTerrain childNode4; // A children quadtree node
    9.         // Buffer
    10.         public ComputeBuffer generationConstantsBuffer;
    11.         public ComputeBuffer patchGeneratedDataBuffer;
    12.         // Material
    13.         public Material material;
    14.             ....
    15.     }

    In the **SpaceObjectProceduralPlanet** script, applied to a single game object, I hold six instances of quadtrees [=QuadtreeTerrain] then.

    Code (CSharp):
    1.     public class SpaceObjectProceduralPlanet : MonoBehaviour {
    2.         ....
    3.         // QuadtreeTerrain
    4.         private QuadtreeTerrain quadtreeTerrain1;
    5.         private QuadtreeTerrain quadtreeTerrain2;
    6.         private QuadtreeTerrain quadtreeTerrain3;
    7.         private QuadtreeTerrain quadtreeTerrain4;
    8.         private QuadtreeTerrain quadtreeTerrain5;
    9.         private QuadtreeTerrain quadtreeTerrain6;
    10.    
    11.         // We initialize the buffers and the material used to draw.
    12.         void Start()
    13.         {
    14.             ...
    15.             // QuadtreeTerrain
    16.             this.quadtreeTerrain1 = new QuadtreeTerrain(0, edgeVector1, edgeVector2, edgeVector3, edgeVector4, quadtreeTerrainParameter1);
    17.             this.quadtreeTerrain2 = new QuadtreeTerrain(0, edgeVector2, edgeVector5, edgeVector4, edgeVector7, quadtreeTerrainParameter2);
    18.             this.quadtreeTerrain3 = new QuadtreeTerrain(0, edgeVector5, edgeVector6, edgeVector7, edgeVector8, quadtreeTerrainParameter3);
    19.             this.quadtreeTerrain4 = new QuadtreeTerrain(0, edgeVector6, edgeVector1, edgeVector8, edgeVector3, quadtreeTerrainParameter4);
    20.             this.quadtreeTerrain5 = new QuadtreeTerrain(0, edgeVector6, edgeVector5, edgeVector1, edgeVector2, quadtreeTerrainParameter5);
    21.             this.quadtreeTerrain6 = new QuadtreeTerrain(0, edgeVector3, edgeVector4, edgeVector8, edgeVector7, quadtreeTerrainParameter6);
    22.             CreateBuffers(this.quadtreeTerrain1);
    23.             CreateBuffers(this.quadtreeTerrain2);
    24.             CreateBuffers(this.quadtreeTerrain3);
    25.             CreateBuffers(this.quadtreeTerrain4);
    26.             CreateBuffers(this.quadtreeTerrain5);
    27.             CreateBuffers(this.quadtreeTerrain6);
    28.             CreateMaterial(this.quadtreeTerrain1);
    29.             CreateMaterial(this.quadtreeTerrain2);
    30.             CreateMaterial(this.quadtreeTerrain3);
    31.             CreateMaterial(this.quadtreeTerrain4);
    32.             CreateMaterial(this.quadtreeTerrain5);
    33.             CreateMaterial(this.quadtreeTerrain6);
    34.             Dispatch(this.quadtreeTerrain1);
    35.             Dispatch(this.quadtreeTerrain2);
    36.             Dispatch(this.quadtreeTerrain3);
    37.             Dispatch(this.quadtreeTerrain4);
    38.             Dispatch(this.quadtreeTerrain5);
    39.             Dispatch(this.quadtreeTerrain6);
    40.     }
    41.    
    42.         // We compute the buffers.
    43.         void CreateBuffers(QuadtreeTerrain quadtreeTerrain)
    44.         {
    45.             .... preparing generation constants
    46.             quadtreeTerrain.generationConstantsBuffer.SetData(generationConstants);
    47.             // Buffer Output
    48.             quadtreeTerrain.patchGeneratedDataBuffer = new ComputeBuffer(nVerts, 16 + 12 + 4 + 12);
    49.         }
    50.    
    51.         //We create the material
    52.         void CreateMaterial(QuadtreeTerrain quadtreeTerrain)
    53.         {
    54.             Material material = new Material(shader);
    55.             material.SetTexture("_MainTex", this.texture);
    56.             material.SetFloat("_Metallic", 0);
    57.             material.SetFloat("_Glossiness", 0);
    58.             quadtreeTerrain.material = material;
    59.         }
    60.    
    61.         //We dispatch threads of our CSMain1 and CSMain2 kernels.
    62.         void Dispatch(QuadtreeTerrain quadtreeTerrain)
    63.         {
    64.             // Set Buffers
    65.             computeShader.SetBuffer(_kernel, "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    66.             computeShader.SetBuffer(_kernel, "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    67.             // Dispatch first kernel
    68.             _kernel = computeShader.FindKernel("CSMain1");
    69.                computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    70.             // Dispatch second kernel
    71.             _kernel = computeShader.FindKernel("CSMain2");
    72.             computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    73.         }
    74.    
    75.         // We set the material before drawing and call DrawMesh on OnRenderObject
    76.         void OnRenderObject()
    77.         {
    78.             this.quadtreeTerrain1.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain1.patchGeneratedDataBuffer);
    79.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain1.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    80.    
    81.             this.quadtreeTerrain2.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain2.patchGeneratedDataBuffer);
    82.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain2.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    83.    
    84.             this.quadtreeTerrain3.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain3.patchGeneratedDataBuffer);
    85.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain3.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    86.    
    87.             this.quadtreeTerrain4.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain4.patchGeneratedDataBuffer);
    88.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain4.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    89.    
    90.             this.quadtreeTerrain5.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain5.patchGeneratedDataBuffer);
    91.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain5.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    92.    
    93.             this.quadtreeTerrain6.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain6.patchGeneratedDataBuffer);
    94.             Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain6.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    95.         }
    96.    
    97.         //When this GameObject is disabled we must release the buffers.
    98.         private void OnDisable()
    99.         {
    100.             ReleaseBuffer();
    101.         }
    102.    
    103.         //Release buffers and destroy the material when play has been stopped.
    104.         void ReleaseBuffer()
    105.         {
    106.             // Destroy everything recursive in the quadtrees.
    107.             this.quadtreeTerrain1.generationConstantsBuffer.Release();
    108.             this.quadtreeTerrain1.patchGeneratedDataBuffer.Release();
    109.             this.quadtreeTerrain2.generationConstantsBuffer.Release();
    110.             this.quadtreeTerrain2.patchGeneratedDataBuffer.Release();
    111.             this.quadtreeTerrain3.generationConstantsBuffer.Release();
    112.             this.quadtreeTerrain3.patchGeneratedDataBuffer.Release();
    113.             this.quadtreeTerrain4.generationConstantsBuffer.Release();
    114.             this.quadtreeTerrain4.patchGeneratedDataBuffer.Release();
    115.             this.quadtreeTerrain5.generationConstantsBuffer.Release();
    116.             this.quadtreeTerrain5.patchGeneratedDataBuffer.Release();
    117.             this.quadtreeTerrain6.generationConstantsBuffer.Release();
    118.             this.quadtreeTerrain6.patchGeneratedDataBuffer.Release();
    119.             DestroyImmediate(this.quadtreeTerrain1.material);
    120.             DestroyImmediate(this.quadtreeTerrain2.material);
    121.             DestroyImmediate(this.quadtreeTerrain3.material);
    122.             DestroyImmediate(this.quadtreeTerrain4.material);
    123.             DestroyImmediate(this.quadtreeTerrain5.material);
    124.             DestroyImmediate(this.quadtreeTerrain6.material);
    125.         }
    126.    
    127.         void Update() {
    128.             // Do nothing
    129.         }
    130.    
    131.     }

    Of course this is very bruteforce, but well this should work before I proceed as I need to figure out how to handle the buffers and draw calls and where to put them.
     
    bb8_1 likes this.
  2. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    I tried to get closer to the problem. It seems to depend which "Dispatch()" call I do first to precalculate a computebuffer in one computeshader before it is sent to the vertex buffer.
    It seems the first ComputeBuffer.Dispatch() call overrules all following ones,. as only the results from the first call are drawn. Although from my point of unserstanding I am using different buffers.
    Edit: To be more precise: Both meshes are drawn but they seem to share the same locations and probably buffer. I noticed that as the rendered triangles doubled with each Graphics.DrawMesh added.

    Each "QuadtreeTerrain" class has two compute buffer references.

    Code (CSharp):
    1.   class QuadtreeTerrain {
    2.   public ComputeBuffer generationConstantsBuffer;
    3.   public ComputeBuffer patchGeneratedDataBuffer;
    4.   }

    In SpaceObjectProceduralPlanet I initialize the buffers (call CreateBuffer(QuadtreeTerrain) in Start(), where in the called functions the buffers of each object are initialized ("new ComputeBuffer()"). Afterwards I dispatch each buffer by calling (in Start() ) the function Dispatch(QuadtreeTerrain).
    Then, in "OnRenderObject()" I sent the buffers to the renderer.
    For the ease of read I reduced the number of different buffers to 2.
    Any hint why the first Dispatch() call overrules all others is very much appreciated.

    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.Threading;
    4. using System.Collections.Generic;
    5.  
    6. [RequireComponent(typeof(GameObject))]
    7. public class SpaceObjectProceduralPlanet : MonoBehaviour {
    8.  
    9.     public int seed;
    10.     public Position position;
    11.     public string name;
    12.     public float radius;
    13.     public float diameter;
    14.     public Transform m_Transform;
    15.     private int LOD;
    16.     // Primitive
    17.     private AbstractPrimitive primitive;
    18.     private enum PrimitiveState { IN_PRECALCULATION, PRECALCULATED, DONE };
    19.     private PrimitiveState primitiveState;
    20.     // QuadtreeTerrain
    21.     private QuadtreeTerrain quadtreeTerrain1;
    22.     private QuadtreeTerrain quadtreeTerrain2;
    23.     // Plane Template
    24.     public Mesh prototypeMesh;
    25.     public Mesh prototypeMesh2;
    26.     public Plane plane;
    27.     public Texture2D texture;
    28.     // ComputeShader
    29.     public Shader shader;
    30.     public ComputeShader computeShader;
    31.     private ComputeBuffer generationConstantsBuffer;
    32.     private ComputeBuffer patchGeneratedDataBuffer;
    33.     private int _kernel;
    34.     // Constants
    35.     public static int nVertsPerEdge { get { return 224; } }     //Should be multiple of 32
    36.     public static int nVerts { get { return nVertsPerEdge * nVertsPerEdge; } }
    37.     public int THREADS_PER_GROUP_X { get { return 32; } }
    38.     public int THREADS_PER_GROUP_Y { get { return 32; } }
    39.     public int THREADGROUP_SIZE_X { get { return nVertsPerEdge / THREADS_PER_GROUP_X; } }
    40.     public int THREADGROUP_SIZE_Y { get { return nVertsPerEdge / THREADS_PER_GROUP_Y; } }
    41.     public int THREADGROUP_SIZE_Z { get { return 1; } }
    42.  
    43.     struct PatchGenerationConstantsStruct
    44.     {
    45.         public int nVertsPerEdge;
    46.         public float scale;
    47.         public float spacing;
    48.         public Vector3 patchCubeCenter;
    49.         public Vector3 cubeFaceEastDirection;
    50.         public Vector3 cubeFaceNorthDirection;
    51.         public float planetRadius;
    52.         public float terrainMaxHeight;
    53.         public float noiseSeaLevel;
    54.         public float noiseSnowLevel;
    55.     }
    56.  
    57.     struct patchGeneratedDataStruct
    58.     {
    59.         public Vector4 position;
    60.         public Vector3 normal;
    61.         public float noise;
    62.         public Vector3 patchCenter;
    63.     }
    64.  
    65.     // Initial call. We setup the shaders and prototype meshes here.
    66.     void Awake () {
    67.         // Transform
    68.         m_Transform = transform;
    69.  
    70.         // Mesh prottype
    71.         this.prototypeMesh = MeshServiceProvider.setupNavyFishDummyMesh(nVertsPerEdge);
    72.         this.prototypeMesh2 = MeshServiceProvider.setupNavyFishDummyMesh(nVertsPerEdge);
    73.         // Plane Template (not used right now as we have the prototype mesh)
    74.         this.plane = new Plane(nVertsPerEdge, Vector3.back);
    75.         // Shader
    76.         this.shader = Shader.Find("Custom/ProceduralPatch3");
    77.         // ComputeShader
    78.         this.computeShader = (ComputeShader)Resources.Load("Shaders/Space/Planet/Custom/ProceduralPatchCompute3");
    79.         // Texture
    80.         this.texture = (Texture2D)Resources.Load("Textures/space/planets/seamless/QuadtreeTerrainTexture.MugDry_1024") as Texture2D;
    81.     }
    82.  
    83.     // We initialize the buffers and the material used to draw.
    84.     void Start()
    85.     {
    86.         // Edge coordinates for initialization
    87.         Vector3 edgeVector1 = new Vector3(-1, +1, -1);
    88.         Vector3 edgeVector2 = new Vector3(+1, +1, -1);
    89.         Vector3 edgeVector3 = new Vector3(-1, -1, -1);
    90.         Vector3 edgeVector4 = new Vector3(+1, -1, -1);
    91.         Vector3 edgeVector5 = new Vector3(+1, +1, +1);
    92.         Vector3 edgeVector6 = new Vector3(-1, +1, +1);
    93.         Vector3 edgeVector7 = new Vector3(+1, -1, +1);
    94.         Vector3 edgeVector8 = new Vector3(-1, -1, +1);
    95.         // Parameters
    96.         QuadtreeTerrainParameter parameter = new QuadtreeTerrainParameter();
    97.         parameter.nVertsPerEdge = nVertsPerEdge;
    98.         parameter.scale = 2.0f / nVertsPerEdge;
    99.         parameter.spacing = 2.0f / nVertsPerEdge;
    100.         parameter.planetRadius = 6371.0f; // 6371000.0f; = earth
    101.         parameter.terrainMaxHeight = 15.0f;
    102.         parameter.noiseSeaLevel = 0.0f;
    103.         parameter.noiseSnowLevel = 0.8f;
    104.         QuadtreeTerrainParameter quadtreeTerrainParameter1 = parameter.clone();
    105.         quadtreeTerrainParameter1.cubeFaceEastDirection = new Vector3(1, 0, 0);
    106.         quadtreeTerrainParameter1.cubeFaceNorthDirection = new Vector3(0, 1, 0);
    107.         QuadtreeTerrainParameter quadtreeTerrainParameter2 = parameter.clone();
    108.         quadtreeTerrainParameter2.cubeFaceEastDirection = new Vector3(0, 0, 1);
    109.         quadtreeTerrainParameter2.cubeFaceNorthDirection = new Vector3(0, 1, 0);
    110.         // QuadtreeTerrain
    111.         this.quadtreeTerrain1 = new QuadtreeTerrain(0, edgeVector1, edgeVector2, edgeVector3, edgeVector4, quadtreeTerrainParameter1);
    112.         this.quadtreeTerrain2 = new QuadtreeTerrain(0, edgeVector2, edgeVector5, edgeVector4, edgeVector7, quadtreeTerrainParameter2);
    113.         CreateBuffers(this.quadtreeTerrain1);
    114.         CreateBuffers(this.quadtreeTerrain2);
    115.         CreateMaterial(this.quadtreeTerrain1);
    116.         CreateMaterial(this.quadtreeTerrain2);
    117.  
    118.     // Only the mesh is drawn where there has been the first Dispatch(..) call. E.g. if the first call is commented out, the second mesh (QuadtreeTerrain2) is drawn.
    119.         //Dispatch(this.quadtreeTerrain1);
    120.         Dispatch(this.quadtreeTerrain2);
    121.     }
    122.  
    123.     void Update()
    124.     {
    125.  
    126.     }
    127.  
    128.     // We compute the buffers.
    129.     void CreateBuffers(QuadtreeTerrain quadtreeTerrain)
    130.     {
    131.         // Buffer Patch Generation Constants
    132.         quadtreeTerrain.generationConstantsBuffer = new ComputeBuffer(4, // 1x int (4 bytes) for one index, index = 0
    133.             4 +     // nVertsPerEdge (int = 4 bytes),
    134.             4 +     // scale (float = 4 bytes),
    135.             4 +     // spacing (float = 4 bytes),
    136.             12 +    // patchCubeCenter (float3 = 12 bytes),
    137.             12 +    // cubeFaceEastDirection (float3 = 12 bytes),
    138.             12 +    // cubeFaceNorthDirection (float3 = 12 bytes),
    139.             4 +     // planetRadius (float = 4 bytes),
    140.             4 +     // terrainMaxHeight (float = 4 bytes),
    141.             4 +     // noiseSeaLevel (float = 4 bytes),
    142.             4);     // noiseSnowLevel (float = 4 bytes),
    143.         PatchGenerationConstantsStruct[] generationConstants = new PatchGenerationConstantsStruct[1];
    144.         generationConstants[0].nVertsPerEdge = quadtreeTerrain.parameters.nVertsPerEdge;
    145.         generationConstants[0].scale = quadtreeTerrain.parameters.scale;
    146.         generationConstants[0].spacing = quadtreeTerrain.parameters.spacing;
    147.         generationConstants[0].patchCubeCenter = quadtreeTerrain.centerVector;
    148.         generationConstants[0].cubeFaceEastDirection = quadtreeTerrain.parameters.cubeFaceEastDirection;
    149.         generationConstants[0].cubeFaceNorthDirection = quadtreeTerrain.parameters.cubeFaceNorthDirection;
    150.         generationConstants[0].planetRadius = quadtreeTerrain.parameters.planetRadius;
    151.         generationConstants[0].terrainMaxHeight = quadtreeTerrain.parameters.terrainMaxHeight;
    152.         generationConstants[0].noiseSeaLevel = quadtreeTerrain.parameters.noiseSeaLevel;
    153.         generationConstants[0].noiseSnowLevel = quadtreeTerrain.parameters.noiseSnowLevel;
    154.         quadtreeTerrain.generationConstantsBuffer.SetData(generationConstants);
    155.         // Buffer Output
    156.         quadtreeTerrain.patchGeneratedDataBuffer = new ComputeBuffer(nVerts, 16 + 12 + 4 + 12); // Output buffer contains vertice position (float4 = 16 bytes),
    157.                                                                                                 // normals (float3 = 12 bytes),
    158.                                                                                                 // noise (float = 4 bytes)
    159.                                                                                                 // patchCenter (float3 = 12 bytes)
    160.     }
    161.  
    162.     //We create the material
    163.     void CreateMaterial(QuadtreeTerrain quadtreeTerrain)
    164.     {
    165.         quadtreeTerrain.material = new Material(shader);
    166.         quadtreeTerrain.material.SetTexture("_MainTex", this.texture);
    167.         quadtreeTerrain.material.SetFloat("_Metallic", 0);
    168.         quadtreeTerrain.material.SetFloat("_Glossiness", 0);
    169.     }
    170.  
    171.     //The meat of this script, it sets the buffers for the compute shader.
    172.     // We then dispatch threads of our CSMain1 and 2 kernels.
    173.     void Dispatch(QuadtreeTerrain quadtreeTerrain)
    174.     {
    175.         // Set Buffers
    176.         computeShader.SetBuffer(_kernel, "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    177.         computeShader.SetBuffer(_kernel, "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    178.         // Dispatch first kernel
    179.         _kernel = computeShader.FindKernel("CSMain1");
    180.         computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    181.         // Dispatch second kernel
    182.         _kernel = computeShader.FindKernel("CSMain2");
    183.         computeShader.Dispatch(_kernel, THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    184.     }
    185.  
    186.     //After all rendering is complete we dispatch the compute shader and then set the material before drawing.
    187.     void OnRenderObject()
    188.     {
    189.         this.quadtreeTerrain1.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain1.patchGeneratedDataBuffer);
    190.         Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain1.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    191.         this.quadtreeTerrain2.material.SetBuffer("patchGeneratedDataBuffer", this.quadtreeTerrain2.patchGeneratedDataBuffer);
    192.         Graphics.DrawMesh(this.prototypeMesh, transform.localToWorldMatrix, this.quadtreeTerrain2.material, LayerMask.NameToLayer(GlobalVariablesManager.Instance.layerLocalSpaceName), null, 0, null, true, true);
    193.     }
    194.  
    195.  
    196.     //When this GameObject is disabled we must release the buffers.
    197.     private void OnDisable()
    198.     {
    199.         ReleaseBuffer();
    200.     }
    201.  
    202.     //Release buffers and destroy the material when play has been stopped.
    203.     void ReleaseBuffer()
    204.     {
    205.         // Destroy everything recursive in the quadtrees.
    206.         this.quadtreeTerrain1.generationConstantsBuffer.Release();
    207.         this.quadtreeTerrain1.patchGeneratedDataBuffer.Release();
    208.         this.quadtreeTerrain2.generationConstantsBuffer.Release();
    209.         this.quadtreeTerrain2.patchGeneratedDataBuffer.Release();
    210.         DestroyImmediate(this.quadtreeTerrain1.material);
    211.         DestroyImmediate(this.quadtreeTerrain2.material);
    212.     }
    213.  
    214. }
    215.  
     
    Last edited: Nov 25, 2015
    bb8_1 likes this.
  3. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    The problem seems to concentrate of the second Dispatch() call to the second kernel, which I immediately do after the first one.

    In CSMain1 I initially calculate the position of a vertex based on some noise.
    In CSMain2 I want to calculate the normals and some other things (terraintype etc.)

    My problem:
    I am not sure when I can do the second Dispatch() call to the second kernel.
    If I use the following line of code, the planes (calculated in the first kernel (CSMain1)) do not correctly show up.

    // Set Buffers CSMain1
    computeShader.SetBuffer(_kernel[0], "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    computeShader.SetBuffer(_kernel[0], "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    // Dispatch first kernel CSMain1
    computeShader.Dispatch(_kernel[0], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    // Set Buffers CSMain2
    computeShader.SetBuffer(_kernel[1], "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    computeShader.SetBuffer(_kernel[1], "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    // Dispatch second kernel CSMain2
    computeShader.Dispatch(_kernel[1], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);

    It works when I comment the second Dispatch() call out.

    // Set Buffers CSMain1
    computeShader.SetBuffer(_kernel[0], "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    computeShader.SetBuffer(_kernel[0], "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    // Dispatch first kernel CSMain1
    computeShader.Dispatch(_kernel[0], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);
    // Set Buffers CSMain2
    //computeShader.SetBuffer(_kernel[1], "generationConstantsBuffer", quadtreeTerrain.generationConstantsBuffer);
    //computeShader.SetBuffer(_kernel[1], "patchGeneratedDataBuffer", quadtreeTerrain.patchGeneratedDataBuffer);
    // Dispatch second kernel CSMain2
    //computeShader.Dispatch(_kernel[1], THREADGROUP_SIZE_X, THREADGROUP_SIZE_Y, THREADGROUP_SIZE_Z);

    I guess the problem is that the second C# Dispatch() to the second kernel is done (although nothing happens in the second kernel at the moment) while the first one is still being worked on in the first kernel.

    How do you determine and orchestrate the Dispatch() calls of two or more kernels on the CPU in C# code in Unity?
     
  4. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Both and especially the second stage / second kernel are now correctly invoked. For the next one that comes by that problem:
    The error was that in the compute shader I had both pragma definitions on top, afterwards both functions. Like:

    Code (CSharp):
    1. #pragma kernel CSMain1
    2. #pragma kernel CSMain2
    3.  
    4. [numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]
    5.  
    6. void CSMain1 (uint3 id : SV_DispatchThreadID)
    7. {
    8. // code
    9. }
    10.  
    11. void CSMain2 (uint3 id : SV_DispatchThreadID)
    12. {
    13. // code
    14. }
    Things started to work when I put the code in different order:

    Code (CSharp):
    1. #pragma kernel CSMain1
    2.  
    3. [numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]
    4.  
    5. void CSMain1 (uint3 id : SV_DispatchThreadID)
    6. {
    7. // code
    8. }
    9.  
    10. #pragma kernel CSMain2
    11.  
    12. [numthreads(threadsPerGroup_X,threadsPerGroup_Y,1)]
    13.  
    14. void CSMain2 (uint3 id : SV_DispatchThreadID)
    15. {
    16. // code
    17. }
    There are a few (of the only few) compute shader tutorials around which describe my above initial implementation which didnt work. So anyone who has the same problem like me might try to change the order of codelines like above.
     
    bb8_1 likes this.
  5. sirshelley

    sirshelley

    Joined:
    Aug 14, 2014
    Posts:
    26
    Hello there!, I have had similar issues and my solution :
    Don't use get data in real time! (took me 2 months to find the reason, which I can explain another time if you like)
    Instead:
    Make an array of compute buffers, a minimum of 2: one for read, one for write.
    Do a rw structure in compute, filled with junk data to be overridden in the compute
    On the next frame (this is import - I suggest doing an enum to 'waitforenfoframe' yield )
    Lastly copy the Contents of the write buffer into read.
    Then use these cloned buffers with whatever you need - get data does work on static buffers in this case.
     
  6. ModLunar

    ModLunar

    Joined:
    Oct 16, 2016
    Posts:
    374
    What do you mean don't use GetData(...) in realtime? Is there a way to get the data without calling GetData(...) that we should do instead?
     
  7. djarcas

    djarcas

    Joined:
    Nov 15, 2012
    Posts:
    246
    What happens if the ComputeShader takes more than 2 frames...?
     
  8. Cambesa

    Cambesa

    Joined:
    Jun 6, 2011
    Posts:
    119
    Can today be another time? I'm also trying to run a shader multiple times after eachother but I can not find a way to check wether a shader is done. Does GetData() wait for the shader to complete? I'm also trying to run this in editor so I can not use yield return new WaitForEndOfFrame().
     
  9. sirshelley

    sirshelley

    Joined:
    Aug 14, 2014
    Posts:
    26
    GetData simply overrides the CPU buffer by whatever is available. The best way to ensure you have completion is a loop and extra a variable for a checksum, then continue.
     
  10. Darren-R

    Darren-R

    Joined:
    Feb 11, 2014
    Posts:
    66
    Hey SirShelly, I am trying to figure this out but I'm really new to compute shaders, how can I make the checksum variable and extract it?

    Cheers!
     
  11. ModLunar

    ModLunar

    Joined:
    Oct 16, 2016
    Posts:
    374
    'Makes you really wonder where people find this information out...
     
  12. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    Yes it does.

    No it doesn't.

    Let's try and dispel some myths here.... :)

    If you dispatch a ComputeShader, then call ComputeBuffer.GetData on a buffer written by the shader, you should see the data that the ComputeShader wrote. There is no "it's still in progress" or anything like that. It should be the data as it is after the ComputeShader ran. No exceptions. Anything different is a bug, which should be reported.

    FWIW, you shouldn't use ComputeBuffer.GetData for anything other than debugging purposes, because reading data back from the GPU is slow, and doing it immediately after dispatching the ComputeShader is making the CPU wait for the GPU to finish doing something. Graphics pipelines are not designed to send data back to the CPU quickly. GPUs like to be told what to do by CPUs, and then left to get on with it. The exception to this guideline is if you use the AsyncGPUReadback API. This API lets you ask the GPU to send something back, without waiting for it to happen. Then, it's up to you to ask if it's done yet, and not block the CPU if the data isn't ready yet. The GPU will usually send it back in something like 1-3 frames. If you're in a situation where you feel like you need the data back immediately, it may be time to rethink what you're doing and do some redesigning of your algorithm.
     
  13. ModLunar

    ModLunar

    Joined:
    Oct 16, 2016
    Posts:
    374
    @richardkettlewell Apologies if that came off sounding rude, thanks so much for your explanation, that helps a lot! :)

    This would be really great info for the docs page on ComputeBuffer.GetData if possible, I'll leave a suggestion on that page as well.
     
    Opeth001 and richardkettlewell like this.
  14. Darren-R

    Darren-R

    Joined:
    Feb 11, 2014
    Posts:
    66
    Thanks for the reply @richardkettlewell, I've found it extremely difficult to find info on compute shaders, it's like researching the holy grail :), so thank you Richard!
     
    richardkettlewell and ModLunar like this.
  15. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    I'm sorting out getting this added to the docs :)
     
    sstrong and ModLunar like this.
  16. talofen

    talofen

    Joined:
    Jan 1, 2019
    Posts:
    40
    Is there a way to know if the ComputeShader Kernel has completed, without issuing a GetData() call?
    I'm asking because I have a computeshader that needs to be called thousands of times per second. So I'm calling the Dispatch() on FixedUpdate(), inside a loop:

    Code (CSharp):
    1.         for (int i = 0; i < numberOfStepsPerFixedUpdate; i++)
    2.         {
    3.             shader.Dispatch(khUpdateSimulation, Mathf.CeilToInt((float)vertices.Length / THREADGROUP_SIZE), 1, 1);
    4.         }
    5.  
    6.  
    The problem is that I want to call as many loop iterations as possible by changing
    numberOfStepsPerFixedUpdate
    dynamically, but I need to check that I'm not calling more iterations than the GPU is able to process. I tried many workarounds, but none looks perfect:

    1. Adding a
    dummy.GetData();
    and retrieve a dummy buffer, which I don't need, and time how long the loop has run for. If too much, reduce iterations, if too litte, increase.
    2. Monitoring framerate, and lower the loop count if frame rate goes down.

    But idea number 1 adds a very costly (performace-wise) GetData() call that I really don't need....not a good solution
    idea number 2 does not really work well, because if frame rate goes down for whatever reason (CPU?) I get an unwanted reduction of the loop count.

    I would need something that tells me "okay, now your dispatch calls have completed" without a performance hit...
    Any ideas?
    Thank you

    I forgot: At this time, Target platform is Windows and API is DirectX11
     
    Last edited: Sep 2, 2020
  17. methusalah999

    methusalah999

    Joined:
    May 22, 2017
    Posts:
    643
    I don't know of any way to know if some dispatch has been completed on the GPU. GetData won't let you know that either because it is slow.

    I dn't know exactly what you are trying to achieve, but if you try to maximize the dispatch iterations while keeping a target fps, you can run a growing number of iterations and stop growing it when the frame rate get to the target, including a margin. It will work only if your compute shader thread have little execution divergence, that is. Also, your graphic card will melt your computer ^^
     
  18. kadd11

    kadd11

    Joined:
    Mar 11, 2018
    Posts:
    33
    Sorry to necro, but a somewhat related question:

    @richardkettlewell , would you mind answering/confirming a couple of GPU side ordering questions/hypothesis?
    1. If I call ComputeShader.Dispatch and use a buffer written to by that compute shader in a standard rendering shader (being invoked by Graphics.DrawMeshX), that compute is guaranteed to finish before the draw call happens, is that correct? (basing this off of the GraphicsFence docs, specifically "GPUFences do not need to be used to synchronise a GPU task writing to a resource that will be read as an input by another")
    2. If I dispatch a compute shader kernel several times in a row via CommandBuffer.Dispatch, and all of the dispatches write to the same AppendStructuredBuffer, are each of the dispatches guaranteed to finish before the next dispatch runs?
     
    ModLunar likes this.
  19. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    1. Yes
    2. Yes

    :)
     
    Subcreation, ModLunar and kadd11 like this.
  20. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    ModLunar likes this.
  21. kadd11

    kadd11

    Joined:
    Mar 11, 2018
    Posts:
    33
    Short and sweet, thanks!
     
    ModLunar and richardkettlewell like this.
  22. kadd11

    kadd11

    Joined:
    Mar 11, 2018
    Posts:
    33
    Actually, one quick follow up @richardkettlewell related to render and compute shader order: Is it kosher to schedule a compute shader mid-render pipeline (for example using a command buffer with
    CameraEvent.BeforeForwardAlpha
    )? And does that answer change for tile based gpus?

    Asking out of general curiosity because I'd like to better understand the relationship between the rendering and compute queue/hardware, but also admittedly I've had a day full of fun debugging that exact scenario on a quest 2. Specifically, scheduling a compute shader in that way causes issues with the content rendered after the camera event, even if the following draw calls have no dependency on the compute shader output (i.e., a simple unlit color shader). Either the following draw calls don't render at all (which seems to happen in single pass) or b) the tiles flicker in and out randomly (which happens in multi pass).

    If I take a capture with render doc (even Oculus' fork of it), everything looks good in the capture, and it also works fine if I use Oculus link. But running on the quest directly has these issues, and I'm not sure if interleaving draw calls and compute shaders just isn't safe to do but happens to be handled well on the few other platforms I've tested on, or if it's a bug somewhere down the stack.
     
  23. DominiqueSandoz

    DominiqueSandoz

    Joined:
    Aug 29, 2017
    Posts:
    25
    I found this simple answer clarifying as hell for my own question. To further clarify, with 2. you are saying that the order of Dispatches is guaranteed. Is this only guaranteed when using CommandBuffers or also when simply Dispatching different calls? Because it seems so, although i am not sure why.

    Consider the following code:
    Code (csharp):
    1.  
    2. void Run(Texture3D inTex, RenderTexture outTex, ComputeShader compute)
    3. {
    4.     var kernelA = compute.FindKernel("Reset");
    5.     compute.SetTexture(kernelA, "ReadTexture", inTex);
    6.     compute.SetTexture(kernelA, "WriteTexture", outTex);
    7.     compute.Dispatch(kernelA, 64, 64, 64);
    8.     Graphics.CopyTexture(outTex, inTex);
    9.  
    10.     var kernelB = _compute.FindKernel("Iterative");
    11.  
    12.     for (int i = 0; i < 10; i++)
    13.     {
    14.         compute.SetTexture(kernelB, "ReadTexture", inTex);
    15.         compute.SetTexture(kernelB, "WriteTexture", outTex);
    16.         compute.Dispatch(kernelB, 64, 64, 64);
    17.         Graphics.CopyTexture(outTex, inTex);
    18.     }
    19. }
    20.  
    It "seems" to run fine - can I rely in this situation that all Invocations work with the results from the previous invocation? If yes, why?
     
  24. methusalah999

    methusalah999

    Joined:
    May 22, 2017
    Posts:
    643
    Basically, everything that is asked to the GPU is added to a queue and the GPU will execute the queue in order. That is true for any get, set, dispatch, copy, render, etc. I don't know of any GPU racing situation with the Unity API (which is great).
     
    Subcreation likes this.
  25. kadd11

    kadd11

    Joined:
    Mar 11, 2018
    Posts:
    33
    I want to add clarification as well, because the statement that Unity handles the dependencies and everything runs in the order specified tripped me up for a while on tile based GPUs. I don't think it's Unity's fault, I think it's just a result of how tile GPUs work (but I could be wrong). My scenario was:

    - Draw Call which writes to an
    AppendStructuredBuffer

    - Compute Dispatch: Consumes the
    AppendStructuredBuffer
    from the previous draw call and outputs some data into another structured buffer
    - Draw Call consumes the structured buffer from the compute shader

    While this worked on my discrete gpu, it did not work on the tile based GPUs that I tried. I misunderstood how tile based GPUs work. I thought they went through each draw call, and rendered it tile by tile. Instead, they go through each tile and render all draw calls for the tile before moving on to the next tile. Which means, without explicitly adding a break somehow (like saving off the results of the first draw call to a render texture, running the compute, and then continuing the subsequent draw calls), I don't think there's a way to interleave draw calls and compute dispatches on tile based GPUs.

    Maybe this is obvious to those who know, but wanted to mention it in case anyone else hits a similar issue. Also, feel free to correct me if I'm wrong.
     
    KrabbyQ likes this.
  26. DominiqueSandoz

    DominiqueSandoz

    Joined:
    Aug 29, 2017
    Posts:
    25
    Thank you so much. If this is the case, it is quite awesome. From what I understand now, there is 1 queue on the GPU where all commands get added to and executed in order.

    Does that mean then:

    1. There is exactly command at the time running on the GPU and the next only starts if the this one finished completely
    2. If that is true, why does a long running Compute Shader not interrupt the scene rendering?

    For 2., I get the feeling from my tests that running Compute Shaders is somehow "free", as it doesn't seem to impact my scene FPS at all and completes silently after a few frames (using AsyncGPUReadback). Is this me, hallucinating?
     
  27. methusalah999

    methusalah999

    Joined:
    May 22, 2017
    Posts:
    643
    I wouldn't say that the GPU execute everything in order on its side because I simply don't know. But as far as I know, you just can't produce a racing condition. A data modified by your compute shader can't be read by the rendering shader before the compute shader finishes execution, if you dispatch it before the rendering.

    The rendering will "wait" for your dispatch to finish. It is not exactly waiting, because the rendering and the compute shader are using the same computational ressource, so it is more exact to say that the rendering will be delayed because of your compute shader execution. To may knowledge, there is no way to run a compute shader "asynchronously". Unintuitively, a GPU does not have multiple pipelines of executions like a CPU and is not multi-threaded in the same way. A GPU execute a batch of the same computation at once, wait for it to finish before getting to the next batch.

    So, when you observe that the result of your compute shader is available only a few frames after its dispatch, there are multiple things to consider:
    • if the compute shader is dispatched during the frame "n" (in Update, LateUpdate, PreCull, etc.), the rendering of the frame "n" won't occur before it finishes.
    • if your GPU runs at 30fps because of long rendering or compute shaders, and the CPU runs at 60fps, then the CPU code will wait for the GPU to finish. In the Profiler, you will see a line "GFXWaitForRenderThread" that lower your CPU fps and make it wait for the GPU.
    • when the CPU waits for the GPU, it is one logic frame ahead. The CPU will only wait for the GPU to finish the rendering of the previous image, so it can run the logic of the frame "n+1" while the GPU is rendering and drawing the image of the frame "n" on the screen. If you think about it, the CPU "could" compute multiple logic frames ahead in this situation, but many computations need to know the duration of the previous frame (Time.deltaTime) so it has to wait.
    • when you use AsyncGPUReadback, you ask the GPU to send data back in the CPU memory after the compute shader is finished. This is useful to avoid waiting for the GPU in the middle of your logic, and keep the CPU one logic frame ahead if it is fast enough. So the data of the frame "n" will most likely be computed by the GPU at some point during the "n+1" CPU logic frame.
    • most importantly with AsyncGPUReadback, a buffer will be written asynchronously on the CPU side. The hardware data bus between a CPU and a GPU is very fast for downloading (CPU => GPU) and very slow for uploading (GPU => CPU). This is due to the fact that by design, a GPU only consume data from the game logic and send the result right to the screen. GPU are now often used to produce computations that need to be going back to the logic (GPGPU) but the download is still more important in most situations. So, depending on the size of the data, the transfert may take several frames of time.
    All of this is only my personal knowledge and should be taken as such ^^
     
    bb8_1 and Subcreation like this.
  28. DominiqueSandoz

    DominiqueSandoz

    Joined:
    Aug 29, 2017
    Posts:
    25
    Holy. I had _no_ idea... and it also means that GPUs are freakingly fast (at suitable tasks) but also that we're effectively dealing with a single core in terms of parallelism of tasks when looking at the GPU. Really unintuitive.

    I thank you very much for these insights, extremely valuable. Following your logic I would also conclude that ComputeShaders, no matter how big, are guaranteed to be done the next frame - transferring the data to the CPU can span several frames?
     
  29. methusalah999

    methusalah999

    Joined:
    May 22, 2017
    Posts:
    643
    Compute shader kernels that are dispatched during the frame "n" are guaranteed to be executed before the rendering of the frame "n", which is fortunate because the frame to render is generally dependent of the compute shader result. Of course, at that moment, the result of the kernels is only available in buffers stored in the GPU memory and not in the CPU side.

    If your kernels are too slow, then of course you will delay the rendering and impact your FPS.

    As for the transfer of data from GPU memory to CPU memory, if you ask for it asynchrounously, then it can be ready at any number of frames in the future, depending on the volume of data to transfer. If you ask for the data synchronously with buffer.GetData for example, then you will suffer a double delay. First, your CPU code will halt and wait until the GPU has executed all its queue to reach this point. Second, the CPU code will have to wait again for the data to be uploaded from the GPU memory, which is quite slow.

    If you must do that, you should at least do it as late as possible in the CPU frame logic (LateUpdate, PreCull and such, not in Update), so there is the least possible tasks remaining in the GPU queue to wait for.
     
  30. JJRivers

    JJRivers

    Joined:
    Oct 16, 2018
    Posts:
    137
    That's one way to think of it, they are SIMD processors, which is an entirely different beast from a modern CPU architecture, if you want a deeper understanding googling something like CPU vs SIMD can help you there a lot.

    So yes it's essentially a Hugely wide single core but only in terms of the wavefront, which are processed in hardware vendor specific sizes (commonly 32 on NVIDIA and 64 on AMD, generally when you're not doing wildly random access of memory it's best to keep numthreads as even groupings of those where possible (not multiple as often implied).

    The parallelism comes from the fact that there are thousands of blocks of these wavefronts (also known as warps) and the GPU tries to schedule these in a manner that maintains maximum occupancy.

    Example with hypothetical numbers: GPU has 128 warps available, you could schedule 64 of those for one computeshader and at the same time it could schedule 64 other compute shaders with 1 warp size at the same time provided they do not rely on the first 64 warp size one as dependency (you could on lowlevel API level not obey that but that's really bad mojo and why unity doesn't allow you to achieve that, ie why unity not only guarantees but enforces that .Dispatch() for two dependent shaders must be executed in order to prevent doing bad mojo stuff).

    I'm fairly junior with compute shaders too but i believe atleast most of the things i said here is correct and if you do spot an error please correct me soonest!
     
    DominiqueSandoz likes this.
  31. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,364
    Thanks for that!
    Is this still Gospel with all the new graphics API of 2021+?
     
  32. BoltScripts

    BoltScripts

    Joined:
    Feb 12, 2015
    Posts:
    20
    One thing that slightly confuses me is in the situation of dispatching a shader several times in a loop, how does changing a variable in the shader work with that?
    Is something like "SetInt" simply queued the same as a Dispatch call?
     
  33. methusalah999

    methusalah999

    Joined:
    May 22, 2017
    Posts:
    643
    Everything is queued altogether indeed, the setting and the dispatching.
     
    Subcreation likes this.
  34. BoltScripts

    BoltScripts

    Joined:
    Feb 12, 2015
    Posts:
    20
    Alright, sick. Everything makes sense and is as it should be.
    And with that, I think this is the definitive thread to answer all the questions I had about how gpu compute scheduling works. :)
     
    Subcreation and methusalah999 like this.
  35. BoltScripts

    BoltScripts

    Joined:
    Feb 12, 2015
    Posts:
    20
    Hate to be back here, but I do seem to have encountered an issue, seems like a bug.
    Code (CSharp):
    1.  
    2. ComputeBuffer countBuff = new ComputeBuffer(gMesh.chunks.Length, sizeof(int));
    3.  
    4. for (int i = 0; i < gMesh.chunks.Length; i++) {
    5.     var chunk = gMesh.chunks[i];
    6.     int dispatchCount = chunk.terrainTriCount * gMesh.instanceCount;
    7.  
    8.     compute.SetVector(_chunkPosID, chunk.chunkPos);
    9.     compute.SetInt(chunkID, i);
    10.     compute.SetInt(dispatchCountID, dispatchCount);
    11.  
    12.     posKernel.DispatchByCount(dispatchCount);
    13.  
    14.     ComputeBuffer.CopyCount(posBuffer, countBuff, i * sizeof(int));
    15. }
    16.  
    In this I just need to get actual count of appended instances per chunk, and this code works fine on my PC, but if I try it on my phone, it seems to be massively out of sync or something and produces results like this:
    upload_2023-2-23_16-59-19.png

    It works fine on mobile as well if I use GetData after every dispatch, but that obviously is not ideal.
    Am I missing something big here?
    One weird thing that makes me feel like it's my fault somehow is that the values are consistent between runs, it's always 157496 and always switches at index 74.
    For now I've just solved this doing an interlocked add to get the counts and this works without needing to use CopyCount but still just seems like this was a bug.
     
    Last edited: Feb 24, 2023