Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question GPU instancing is much slower than normal instantiating.

Discussion in 'General Graphics' started by JonteBoi, Aug 7, 2023.

  1. JonteBoi

    JonteBoi

    Joined:
    Aug 4, 2023
    Posts:
    1
    I've been trying to create grass for a while now and learned about GPU instancing. However when i tried to implement it and created ~100k grass objects the game had about 20 fps but with just using instantiate i got about 40fps. I think it might be that my GPU is a lot worse compared to the CPU (A ryzen 5 5600 with a gt 1030), but it shouldn't create THAT big of a difference, right? Im very new to GPU instancing so it could also be something wrong with my code.


    Creating grass using instantiate (~40fps):
    Code (CSharp):
    1.  void Start()
    2. RaycastHit hit;
    3.     {
    4.        spawnGrass();      
    5.     }
    6.     void spawnGrass()
    7.     {
    8.         for (int minValZ = 0 - chunkSize, maxValZ = 0, rowsGenerated = 0, a = 0; rowsGenerated < chunkAmount; rowsGenerated++, maxValZ += chunkSize, minValZ += chunkSize, a+= chunkAmount)
    9.         {
    10.             for (int minValX = 0 - chunkSize, maxValX = 0, chunksGenerated = 0, grassCounter = 0; chunksGenerated < chunkAmount; maxValX += chunkSize, minValX += chunkSize, chunksGenerated++, grassCounter += grassDensity)
    11.             {
    12.                 for (int i = 0; i < Instances; i++)  // instances = 10
    13.                 {
    14.                     Vector3 grassPos = new Vector3(UnityEngine.Random.Range(chunkStart.x + maxValX, chunkStart.x + minValX), 35, UnityEngine.Random.Range(chunkStart.y + maxValZ, chunkStart.y + minValZ));
    15.                     Vector3 dir = new Vector3(0, -1, 0);
    16.                    
    17.                     if (Physics.Raycast(grassPos, dir, out hit))
    18.                     {
    19.                         if (hit.point.y > grassRange)
    20.                         {
    21.                            
    22.                             GameObject grassClone = Instantiate(GrassLOD1, hit.point, Quaternion.Euler(0, UnityEngine.Random.Range(0, 360), UnityEngine.Random.Range(87, 93)));
    23.                             grassClone.transform.localScale = scale;
    24.                         }
    25.                     }
    26.                 }
    27.             }
    28.         }
    29.     }


    GPU instanced grass (~20fps):

    Code (CSharp):
    1. private List<List<Matrix4x4>> Batches = new List<List<Matrix4x4>>();
    2.     RaycastHit hit;
    3.     void Start()
    4.     {
    5.         spawnGrass();
    6.     }
    7.     private void Update()
    8.     {
    9.         RenderBatches();
    10.     }
    11.     void spawnGrass()
    12.     {
    13.         for (int minValZ = 0 - chunkSize, maxValZ = 0, rowsGenerated = 0, a = 0; rowsGenerated < chunkAmount; rowsGenerated++, maxValZ += chunkSize, minValZ += chunkSize, a+= chunkAmount)
    14.         {
    15.             for (int minValX = 0 - chunkSize, maxValX = 0, chunksGenerated = 0, grassCounter = 0; chunksGenerated < chunkAmount; maxValX += chunkSize, minValX += chunkSize, chunksGenerated++, grassCounter += grassDensity)
    16.             {
    17.                 for (int i = 0; i < Instances; i++)
    18.                 {
    19.                     Vector3 grassPos = new Vector3(UnityEngine.Random.Range(chunkStart.x + maxValX, chunkStart.x + minValX), 35, UnityEngine.Random.Range(chunkStart.y + maxValZ, chunkStart.y + minValZ));
    20.                     Vector3 dir = new Vector3(0, -1, 0);
    21.                    
    22.                     if (Physics.Raycast(grassPos, dir, out hit))
    23.                     {
    24.                         if (hit.point.y > grassRange)
    25.                         {
    26.                             int addedMatrices = 0;
    27.                             Batches.Add(new List<Matrix4x4>());
    28.                             if (addedMatrices < 1000)
    29.                             {
    30.                                 Batches[Batches.Count - 1].Add(Matrix4x4.TRS(hit.point, Quaternion.Euler(rotation), scale));
    31.                                 addedMatrices += 1;
    32.                             }
    33.                             else
    34.                             {
    35.                                 Batches.Add(new List<Matrix4x4>());
    36.                                 addedMatrices = 0;
    37.                             }
    38.                         }
    39.                     }
    40.                 }
    41.             }
    42.         }
    43.     }
    44.     private void RenderBatches()
    45.     {
    46.         foreach (var Batch in Batches)
    47.         {
    48.             for (int i = 0; i < mesh.subMeshCount; i++)
    49.             {
    50.                 Graphics.DrawMeshInstanced(mesh, i, Materials[i], Batch);
    51.             }
    52.         }
    53.     }
     
  2. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    4,528
  3. nasos_333

    nasos_333

    Joined:
    Feb 13, 2013
    Posts:
    13,027
    The direct Instantiation of million of grass blades is an overkill and limited to only tiny areas.

    Use a combo of pre batched models and instantiation to get the best performance.
     
  4. kenamis

    kenamis

    Joined:
    Feb 5, 2015
    Posts:
    386
    Regardless of optimal grass rendering implementation. To give advice about your comparison of Instantiate vs GPU instancing;

    First, to clarify some terminology. Even regular instantiated objects could be using GPU instancing. If they have a compatible shader, like the standard shaders, and you toggle on GPU instancing. Then the native Unity rendering pipeline will attempt to GPU instance those objects too. Sometimes they're broken into different batches due to various things and also, with URP or HDRP. If you're using SRP batcher, then it will take precedent over GPU instancing.

    So, the Graphics API just bypasses a lot of the native Unity rendering, namely culling. That's where I think you should first look to see the difference. In your DrawMeshInstanced example, it will always be trying to draw all those instances on the GPU, no matter where the camera is. For the instantiated objects, those will automatically get frustum culled if not in view. So, look in Frame Debugger to compare what's actually being sent to the GPU in each of your scenarios.