Search Unity

Question Performance issue with DrawMeshInstancedProcedural()

Discussion in 'General Graphics' started by safinean, Sep 22, 2021.

  1. safinean

    safinean

    Joined:
    Nov 23, 2016
    Posts:
    2
    Hi everyone,

    I am very new to Unity and gamedev (I'm coming from the webdev world) so I'd like to apologize in advance if my question was already adressed or if this thread has not been posted at the proper place.

    Following a tutorial, I've been trying to render a simple animated sin wave in 3D using a compute shader. Basically, I have a C# script that uses the compute shader for calculating the position of each point of the wave (they share a compute buffer), and then draws each point using Graphics.DrawMeshInstancedProcedural().

    When the wave is made of 100 points, the script runs at roughly 30fps.
    When the wave is made of 2500 points, the fps drops to 5fps...
    So, there obviously is a performance issue here, and I tried to understand what was going on using the profiler but I am a bit stuck.

    Here is what it looks like:
    Capture d’écran 2021-09-22 à 12.59.24.png

    Capture d’écran 2021-09-22 à 13.01.36.png
    My understanding is that the "Editor loop" is at fault here. Yet, I tried to build and run the app and the issue persists outside of the editor...
    I do not have both the "scene" and the "game" tabs visible at the same time.

    To be precise, I'm using Unity 2021.1.21f1 on a Macbook pro (MacOS 11.5.2).

    I hope I have been clear and complete enough.
    Any idea what I've been doing wrong?

    Thanks.
     

    Attached Files:

  2. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,448
  3. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    This means you are overloading your GPU. Since 2500 elements is nothing for even integrated GPUs, there's probably something wrong with the way you coded your algorithm. Please show your computer shader and the code you're using to dispatch it.
     
    richardkettlewell likes this.
  4. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    To add to the above, also what mesh are you using for each point? If each "point" is a very high resolution mesh, that may explain your performance.
     
  5. safinean

    safinean

    Joined:
    Nov 23, 2016
    Posts:
    2
    Thanks you all for your help.

    @Neto_Kokku Here is the compute shader in question:

    Code (CSharp):
    1. #pragma kernel WaveKernel
    2. #define PI 3.14159265358979323846
    3.  
    4. RWStructuredBuffer<float3> _Positions;
    5. uint _Resolution;
    6. float _Step;
    7. float _Time;
    8.  
    9. float2 GetUV (uint3 id) {
    10.     return (id.xy + 0.5) * _Step - 1.0;
    11. }
    12.  
    13. void SetPosition(uint3 id, float3 position) {
    14.     if (id.x < _Resolution && id.y < _Resolution) {
    15.         _Positions[id.x + id.y * _Resolution] = position;
    16.     }
    17. }
    18.  
    19. float3 Wave(float u, float v, float t)
    20. {
    21.     float3 p;
    22.     p.x = u;
    23.     p.y = sin(PI * (u + v + t));
    24.     p.z = v;
    25.     return p;
    26. }
    27.  
    28. [numthreads(8, 8, 1)]
    29. void WaveKernel (uint3 id: SV_DispatchThreadID) {
    30.     float2 uv = GetUV(id);
    31.     SetPosition(id, Wave(uv.x, uv.y, _Time));
    32. }
    As you can see, I'm rendering the function f(x, z) = sin(PI * (x + z + t)), where x and z belong to the interval [-1;1].
    _Resolution is the number of points to be rendered on each axis.
    _Step is the size of a single point. Each point being a cube of 1x1x1 => _Step = 2f / _Resolution.
    _Time is self-explanatory.

    @richardkettlewell I'm using a simple cube mesh, but I think I found the root of the issue.

    The tutorial makes us create a Standard Surface Shader so that the color of each cube depends on its position in space.
    Here is the full shader :
    Code (CSharp):
    1.  
    2. Shader "Graph/Point Surface GPU" {
    3.    Properties {
    4.       _Smoothness("Smoothness", Range(0,1)) = 0.5
    5.    }
    6.  
    7.    SubShader {
    8.       CGPROGRAM
    9.       #pragma surface ConfigureSurface Standard fullforwardshadows addshadow
    10.       #pragma instancing_options assumeuniformscaling procedural:ConfigureProcedural
    11.       #pragma target 4.5
    12.  
    13.       struct Input {
    14.          float3 worldPos;
    15.       };
    16.  
    17.       float _Smoothness;
    18.  
    19.       #if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
    20.       StructuredBuffer<float3> _Positions;
    21.       #endif
    22.  
    23.       void ConfigureProcedural () {
    24.          #if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
    25.          float3 position = _Positions[unity_InstanceID];
    26.  
    27.          unity_ObjectToWorld = 0.0;
    28.          unity_ObjectToWorld._m03_m13_m23_m33 = float4(position, 1.0);
    29.          unity_ObjectToWorld._m00_m11_m22 = _Step;
    30.          #endif
    31.       }
    32.  
    33.       void ConfigureSurface (Input input, inout SurfaceOutputStandard surface) {
    34.          surface.Smoothness = _Smoothness;
    35.          surface.Albedo = saturate((input.worldPos * 0.5) + 0.5);
    36.       }
    37.       ENDCG
    38.    }
    39.  
    40.    FallBack "Diffuse"
    41. }
    42.  
    It turns out I have its equivalent for the URP (a Lit Shader Graph) which runs quite well: 30fps for around 160 000 points!
    The BRP, on the other hand, is awfully slow...
     
    Last edited: Sep 23, 2021
    richardkettlewell likes this.
  6. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    That looks pretty straightforward, what do your rendering C# code looks like?

    Now I realized you're using DrawMeshInstancedProcedural, not the indirect variant (which is the one I'm used to), maybe there's something funky going on with that in the Unity side. I suggest using RenderDoc to capture the rendering frame and see what is really going on the GPU side. It's a very useful tool when working with compute shaders and more custom rendering.
     
  7. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    2,285
    it should be pretty much the same as the indirect one. it's actually faster if your script knows the draw counts, because you dont have to waste time assigning them to the argsBuffer, like you do for the indirect version.
     
    safinean likes this.