Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

Bug Graphics.RenderMeshInstanced generates GC and is slower than Graphics.DrawMeshInstanced?

Discussion in 'General Graphics' started by funkyCoty, Nov 30, 2022.

  1. funkyCoty

    funkyCoty

    Joined:
    May 22, 2018
    Posts:
    738
    Hey there. I have two snippets of code, both doing the same rendering. For the first, I'm using
    Graphics.RenderMeshInstanced(), because it has support for passing in NativeArray<Matrix4x4>, which is convenient for my use case (heavy job system usage). Unfortunately, I've noticed that doing it this way generates garbage every frame and takes over twice as long as the 'old' method??

    Code (CSharp):
    1. // 9.3 KB garbage per frame, ~0.25ms
    2. for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
    3. {
    4.     //..
    5.    
    6.     var renderParams = new RenderParams(material);
    7.         renderParams.matProps = propertyBlock;
    8.         renderParams.lightProbeUsage = unityLightProbeUsage;
    9.         renderParams.shadowCastingMode = shadowsMode;
    10.    
    11.     Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
    12. }
    Here's an example of using Graphics.DrawMeshInstanced(), with a cached copy of the NativeArray<Matrix4x4> copied into a Matrix4x4. It's much faster and does not generate garbage.

    Code (CSharp):
    1. // 0 KB garbage per frame, ~0.10ms
    2. for (var splitIndex = 0; splitIndex < trsDataFallback.Length; ++splitIndex)
    3. {
    4.     //..
    5.  
    6.     Graphics.DrawMeshInstanced(mesh, 0, material, renderGroup, renderGroup.Length,
    7.         propertyBlock, shadowsMode, true, layer, null, unityLightProbeUsage);
    8. }
    Surely, this is a bug in the new API? If anyone has any ideas as to what is going wrong here, please let me know.
     
    arkano22 likes this.
  2. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    Yes, it's generating a lot of GC because they've made the method generic and they're trying to Parse your struct layout on each call using Marshal.OffsetOf (reason for GC), who knows why. Everybody wanted a NativeArray overload of the DrawMeshInstanced since 2018 but yet we get this. No replies from graphics team in years. We're using the old DrawMeshInstanced so still have to pay the cost of copying matrices to temporary array.
     
    Unifikation likes this.
  3. funkyCoty

    funkyCoty

    Joined:
    May 22, 2018
    Posts:
    738
    Whoever is responsible for this needs to be held accountable.
     
    Pr0x1d, Unifikation and tommox86 like this.
  4. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    It explains why in the docs: https://docs.unity3d.com/ScriptReference/Graphics.RenderMeshInstanced.html

    It doesn't explain why they need to do it every call instead of caching it per type.

    BTW, if you want to write to a managed array from a job, you can pin it. I'm doing this because CommandBuffer's DrawMeshInstanced still only accepts a
    Matrix4x4[]
    . To do this, set up a list of pins for each frame in a manager class (eg your render pipeline if doing a custom SRP):

    Code (CSharp):
    1. private readonly List<GCHandle> _arrayHandles = new();
    2. public T* pin<T>(T[] array) where T : unmanaged
    3. {
    4.     if(array == null || array.Length == 0)
    5.         throw new Exception("Array must be non-null and non-empty");
    6.     GCHandle gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
    7.     _arrayHandles.Add(gcHandle);
    8.     return (T*) gcHandle.AddrOfPinnedObject();
    9. }
    When scheduling a job, pin it with that method and put the pointer into the job like this

    Code (CSharp):
    1. private unsafe struct ConfigureInstanceDataJob : IJob {
    2.     [NativeDisableUnsafePtrRestriction] private Matrix4x4* pointMatrices;
    And then at the end of frame (don't forget!!!)

    Code (CSharp):
    1. foreach(GCHandle pinned in _arrayHandles)
    2.     pinned.Free();
    3. _arrayHandles.Clear();
     
    Last edited: Dec 4, 2022
    Sluggy likes this.
  5. funkyCoty

    funkyCoty

    Joined:
    May 22, 2018
    Posts:
    738
    I'm aware of this method, but it has two drawbacks:
    • requires "unsafe", which I think makes asset store buyers wary, and adds an extra step in documentation for any plugins using this method
    • cannot be Burst compiled (correct me if I'm wrong)
     
  6. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    It can be burst compiled with pointers. But, yes, it requires unsafe code.
     
  7. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    The problem is you can't convert Pointers or Spans or NativeArrays back to regular C# array without allocation. The DrawMeshInstanced only accepts that.
    Even if they added a startIndex parameter that would solve it.
     
  8. tommox86

    tommox86

    Joined:
    Apr 30, 2015
    Posts:
    88
    yes can confirm garbage collection for RenderMeshInstanced. however it seems to run faster than drawmeshinstanced on my tests. fps spikes down due to gc. is there any indication for when this will be fixed?
    UPDATE
    drawmeshinstanced runs at the same speed the issue was converting my nativearray to matrix4x4 array. drawmeshinstanced has zero gc and runs at same speed
     
    Last edited: Dec 10, 2022
  9. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,652
    After I converted my code to use RenderMeshInstanced I had to revert everything, the GC work is so intense it doubles the execution time.

    I wish I could rely on unity stuff :(
     
    Unifikation likes this.
  10. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    The first thing I noticed is you are creating params every loop. If you take this out of your loop, I should think the GC will be zero.


    Code (CSharp):
    1. void Start()
    2. {
    3.       renderParams = new RenderParams(material);
    4. }
    5.  
    6. ...
    7.  
    8. for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
    9. {
    10.     //..
    11.  
    12.         renderParams.matProps = propertyBlock;
    13.         renderParams.lightProbeUsage = unityLightProbeUsage;
    14.         renderParams.shadowCastingMode = shadowsMode;
    15.  
    16.     Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
    17. }
    Every time you use new for an object instance that exists, you replace it and therefore the GC needs to clean up the old object from memory.
     
  11. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    Further to that, if every render is going to use the same values, then you can take that out of the loop too. Anything which stays the same for every loop, doesn't need to be in the loop, and can be initialised just before.

    Code (CSharp):
    1. void Start()
    2. {
    3.       renderParams = new RenderParams(material);
    4. }
    5. ...
    6.  
    7. renderParams.matProps = propertyBlock;
    8. renderParams.lightProbeUsage = unityLightProbeUsage;
    9. renderParams.shadowCastingMode = shadowsMode;
    10.  
    11. for (var splitIndex = 0; splitIndex < trsData.Length; ++splitIndex)
    12. {
    13.     //..
    14.  
    15.     Graphics.RenderMeshInstanced(renderParams, mesh, 0, renderGroup, renderGroup.Length, 0);
    16. }
     
  12. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    Garbage Collection, hope this helps

    myString = new string "Bob"

    Mem 001-003 = "Bob"


    myString = new string "Cat" (you wouldn't do this, but just to explain new)

    Mem 001-003 Cleanup
    Mem 004-006 = "Cat" 'new' is saying you want new memory allocated, even though it's the same size


    myString = "Bob"

    Mem 004-006 = "Bob" Same size no need to cleanup


    myString = "Bobby"

    Mem 004-006 Cleanup
    Mem 007-011 = "Bobby" As Bobby is 5 letters, a new location is required, hence GC on the old
     
  13. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,652
    are you sure this will make it 0 GC? I don't want to test it again and find out it's still allocating. Also the material changes, so unless the RenderMeshInstanced will make a copy of the params, I cannot do what you are suggesting.
     
  14. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    You would just need to add the material change in the loop if it changes. The problem of allocation is when you allocate new memory. I would hope that it grabs the RenderParams by reference in the function
     
  15. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,652
    Sorry I want just to be clear here. You are not saying this because I do new RenderParams right? Because RenderParams is a struct, not a class and therefore new RenderParams doesn't allocate by itself.
     
  16. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    *When you create a struct object using the new operator, it gets created and the appropriate constructor is called. Unlike classes, structs can be instantiated without using the new operator. If you do not use new, the fields will remain unassigned and the object cannot be used until all of the fields are initialized.

    When you are using new RenderParams, you don't have to fill in all the other values, they are set as a default. But it is still 'creating' a new struct, meaning the old one is marked for deletion. If you are using a struct persistently, then create that outside of where you use it, Class level: RenderParams myRP (allocated, uninitialised, not usable), Start: myRP = new RenderParams (allocate new, set as defaults) , Loop: renderParams.matProps = propertyBlock; (change a property)
    When you change a part of the struct, you are not allocating new memory because it already exists from the class level, stored in memory. I hope I'm right anyway!

    Here's a test I made ages ago, 1million blades of grass, 170+fps, zero allocation. This is part of the construct, although I'm using RenderMeshIndirect here which is a little complex to setup, but showing the setup I use (cut out all the extra code)


    Code (CSharp):
    1. public class DrawGrass2 : MonoBehaviour
    2. {
    3.     RenderParams renderParams;
    4.  
    5. public void Start()
    6. {
    7.         // Init with a material
    8.         renderParams = new RenderParams(material);
    9.         // Set receive shadows
    10.         renderParams.receiveShadows = true;
    11.         // Set shadow casting mode
    12.         renderParams.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.Off;
    13.         // MaterialPropertyBlock so you can access the custom StructuredBuffer in the shader
    14.         renderParams.matProps = new MaterialPropertyBlock();
    15.         // Send PositionsBuffer to the GPU ShaderBuffer
    16.         renderParams.matProps.SetBuffer("_PerInstanceData", GB_positionsBuffer);
    17. }
    18.  
    19.     public void Update()
    20.     {
    21.         if (meshRenderer.isVisible)
    22.         {
    23.             // Instances to draw
    24.             GB_commandData[0].instanceCount = (uint)drawInstances;
    25.  
    26.             GB_commandBuffer.SetData(GB_commandData);
    27.      
    28.             Graphics.RenderMeshIndirect(in renderParams, mesh, GB_commandBuffer);
    29.         }
    30.     }
    31. }
    upload_2023-1-6_19-42-20.png
     
  17. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    Ps. you can't see the count/verts etc. in Statistics, because it's Indirect. Good thing with Indirect is no limit on the number of instances
     
  18. sebas77

    sebas77

    Joined:
    Nov 4, 2011
    Posts:
    1,652
    Graphics.RenderMeshIndirect behaves differently than RenderMeshInstanced though. It's possible that the problem is linked only to RenderMeshInstanced
     
  19. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    And you might be right about it not allocating if you use new, it may just reset the struct since it calls the default constructor. But the fact is in your code, you are creating a new struct by saying RenderParams/var renderParams = new RenderParams(); You are saying a new one exists, in every loop, so it would be marked for deletion at the end of every loop.

    It makes more sense if you think of it as:

    allocate new RenderParams renderParams = default RenderParams();
     
  20. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    418
    I have used it somewhere but can't remember where. And I might have encountered this issue too, reading the comments above
     
  21. AdrienVR

    AdrienVR

    Joined:
    Apr 6, 2016
    Posts:
    17
    I don't know if someone already did, so I just reported a bug to Unity for this.
    Looking at the Unity's code in RenderInstancedDataLayout it seems that it could be very easily fixed by also using
    if (!(t == typeof (Matrix4x4))) for "prevObjectToWorld" and "renderingLayerMask" as it is already done for "objectToWorld".
    It is strange that they do it in 2 different ways to do the same operation. Looks like a mistake.

    Code (CSharp):
    1. // Assembly location: C:\Program Files\2022.2.12f1\Editor\Data\Managed\UnityEngine\UnityEngine.CoreModule.dll
    2. using System;
    3. using System.Runtime.InteropServices;
    4.  
    5. namespace UnityEngine
    6. {
    7.   internal readonly struct RenderInstancedDataLayout
    8.   {
    9.     public RenderInstancedDataLayout(System.Type t)
    10.     {
    11.       this.size = Marshal.SizeOf(t);
    12.       IntPtr num1;
    13.       int num2;
    14.       if (!(t == typeof (Matrix4x4)))
    15.       {
    16.         num1 = Marshal.OffsetOf(t, "objectToWorld");
    17.         num2 = num1.ToInt32();
    18.       }
    19.       else
    20.         num2 = 0;
    21.       this.offsetObjectToWorld = num2;
    22.       try
    23.       {
    24.         num1 = Marshal.OffsetOf(t, "prevObjectToWorld");
    25.         this.offsetPrevObjectToWorld = num1.ToInt32();
    26.       }
    27.       catch (ArgumentException ex)
    28.       {
    29.         this.offsetPrevObjectToWorld = -1;
    30.       }
    31.       try
    32.       {
    33.         this.offsetRenderingLayerMask = Marshal.OffsetOf(t, "renderingLayerMask").ToInt32();
    34.       }
    35.       catch (ArgumentException ex)
    36.       {
    37.         this.offsetRenderingLayerMask = -1;
    38.       }
    39.     }
    40.  
    41.     public int size { get; }
    42.  
    43.     public int offsetObjectToWorld { get; }
    44.  
    45.     public int offsetPrevObjectToWorld { get; }
    46.  
    47.     public int offsetRenderingLayerMask { get; }
    48.   }
    49. }
     
  22. Spiraseven

    Spiraseven

    Joined:
    May 2, 2014
    Posts:
    25
    Do you have the link to the bug report? I'd like to use the fixed version of this when it happens!
     
  23. AdrienVR

    AdrienVR

    Joined:
    Apr 6, 2016
    Posts:
    17
    Sure, I will post it when Unity will have confirmed the bug.
     
    Spiraseven likes this.
  24. AdrienVR

    AdrienVR

    Joined:
    Apr 6, 2016
    Posts:
    17
  25. JarkkoUnity

    JarkkoUnity

    Unity Technologies

    Joined:
    Feb 1, 2021
    Posts:
    17
    This issue was already fixed in Sep 2022, but for some reason the fix didn't land to trunk and fell through the cracks without proper tracking, so now it's reworked and should land soon *fingers crossed*. Heads up: the fix will cache the layout.
     
    abegue and AdrienVR like this.
  26. XaneFeather

    XaneFeather

    Joined:
    Sep 4, 2013
    Posts:
    101
    Encountered the exact same problem. Glad I found this thread!

    Thank you. Will wait for a fix, until then it I'll stick with DrawMeshInstanced for now.
    Will it be backported to 2021 or just 2022?
     
  27. Leonidas85

    Leonidas85

    Joined:
    Mar 11, 2015
    Posts:
    14
    Also checking here for a potential backport to 2022.
    Any news on that @JarkkoUnity?

    Also wondering what kind of speed up we could expect.
    I'm currently calling this 1000+ times per frame, rendering 35k+ instances in ~500 batches, but with a ~17ms CPU cost, which is obviously not tenable.

    I'm trying to stream a procedural world made up of fairly small modular pieces, so lots of instancing opportunities.
    Using default instancing takes too much time on CPU to create the batches, so I've written my own instance batch creation algorithm. It's super fast, but now I'm bottlenecking on actually submitting the batches using Graphics.RenderMeshInstanced()...

    If there could be a variant of Graphics.RenderMeshInstanced() that could be used multithreaded from a (burstable) job, that would certainly help things. Are there any plans or options for that?
     
  28. DylanF

    DylanF

    Joined:
    Jun 25, 2013
    Posts:
    55
    Leonidas85 and burningmime like this.
  29. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Leonidas85 likes this.
  30. JarkkoUnity

    JarkkoUnity

    Unity Technologies

    Joined:
    Feb 1, 2021
    Posts:
    17
    @XaneFeather @Leonidas85 Yes, it's in my todo to backport to 2022 and 2021 this week.

    The performance should be on par with DrawMeshInstanced
     
    Last edited: May 2, 2023
  31. mrc_akka

    mrc_akka

    Joined:
    Mar 26, 2020
    Posts:
    24
    Last edited: Jul 11, 2023
  32. mrc_akka

    mrc_akka

    Joined:
    Mar 26, 2020
    Posts:
    24
  33. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
  34. mrc_akka

    mrc_akka

    Joined:
    Mar 26, 2020
    Posts:
    24
    BTW using commandbuffer API seems better in performance terms.