Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Turning off culling of RenderMeshSystemV2

Discussion in 'Graphics for ECS' started by Anibers2, Apr 1, 2019.

  1. Anibers2


    Mar 30, 2019
    Hi, I'm updating from an older version of RenderMeshSystem to the latest and find that RenderMeshSystemV2 spends ~80ms to cull all the objects because I have a lot of them. I understand that in MegaCity culling is a must because the meshes are complicated. But in my situation meshes are quite simple so hw culling is enough and not worth it to cull on CPU. Are there ways to turn off the software culling or tags/comps to directly draw entities?
    Last edited: Apr 1, 2019
  2. digitaliliad


    Jul 1, 2018
    I, too, am curious about this. I've been trying to use Disabled tags to get RenderMeshSystemV2 to ignore some of my entities, but this doesn't seem to have much of an impact on performance.
  3. Radu392


    Jan 6, 2016
    Anyone figured this out? I know there's the FrozenRenderSceneTag component, but that only works for entities that don't move. I'd like to never cull even moving entities, because just like OP, my meshes are simple.
    Last edited: Oct 31, 2019
  4. charleshendry


    Jan 7, 2018
    My understanding was that any entity with a WorldRenderBounds component would be checked for culling, so removing this would solve it.
  5. Radu392


    Jan 6, 2016
    Removing either WorldRenderBounds or ChunkWorldRenderBounds has no effect, it's automatically added back in by some backend system. Only by removing LocalToWorld do those two components go away, but if I remove that, then the entity will no longer be rendered.

    I took a look at CacheMeshBatchRendererGroup() method inside the RenderMeshSystemV2 class. That's the method that's causing this huge culling lag. I tried modifying it ever which way for about an hour before I gave up with no good results.

    All I want is to have no culling... It's the sole reason why RenderMesh is performing so badly at a medium scale. Surely rendering 20k entities with a simple quad is faster than doing this time consuming culling operation on the main thread.
  6. tertle


    Jan 25, 2011
    Why we trying to hack this. Can't you just disable the system if you don't want it to run.

    this.World.GetOrCreateSystem<RenderBoundsUpdateSystem>().Enable = false;
  7. Radu392


    Jan 6, 2016
    That system is not the problem, but I tried your suggestion nonetheless. Nothing changed.

    Like I said, the killing CPU method is CacheMeshBatchRendererGroup() inside the RenderMeshSystemV2 class. More specifically, this code inside that method:

    Code (CSharp):
    1.  Profiler.BeginSample("Add New Batches");
    2.             {
    3.                 var sortedChunkIndex = 0;
    4.                 for (int i = 0; i < sharedRenderCount; i++)
    5.                 {
    6.                     var startSortedChunkIndex = sortedChunkIndex;
    7.                     var endSortedChunkIndex = startSortedChunkIndex + sharedRendererCounts[i];
    9.                     while (sortedChunkIndex < endSortedChunkIndex)
    10.                     {
    11.                         var chunkIndex = sortedChunkIndices[sortedChunkIndex];
    12.                         var chunk = chunks[chunkIndex];
    13.                         var rendererSharedComponentIndex = chunk.GetSharedComponentIndex(RenderMeshType);
    15.                         var editorRenderDataIndex = chunk.GetSharedComponentIndex(editorRenderDataType);
    16.                         var editorRenderData = m_DefaultEditorRenderData;
    17.                         if (editorRenderDataIndex != -1)
    18.                             editorRenderData = EntityManager.GetSharedComponentData<EditorRenderData>(editorRenderDataIndex);
    20.                         var remainingEntitySlots = 1023;
    21.                         var flippedWinding = chunk.Has(meshInstanceFlippedTagType);
    22.                         int instanceCount = chunk.Count;
    23.                         int startSortedIndex = sortedChunkIndex;
    24.                         int batchChunkCount = 1;
    26.                         remainingEntitySlots -= chunk.Count;
    27.                         sortedChunkIndex++;
    29.                         while (remainingEntitySlots > 0)
    30.                         {
    31.                             if (sortedChunkIndex >= endSortedChunkIndex)
    32.                                 break;
    34.                             var nextChunkIndex = sortedChunkIndices[sortedChunkIndex];
    35.                             var nextChunk = chunks[nextChunkIndex];
    36.                             if (nextChunk.Count > remainingEntitySlots)
    37.                                 break;
    39.                             var nextFlippedWinding = nextChunk.Has(meshInstanceFlippedTagType);
    40.                             if (nextFlippedWinding != flippedWinding)
    41.                                 break;
    43. #if UNITY_EDITOR
    44.                             if (editorRenderDataIndex != nextChunk.GetSharedComponentIndex(editorRenderDataType))
    45.                                 break;
    46. #endif
    48.                             remainingEntitySlots -= nextChunk.Count;
    49.                             instanceCount += nextChunk.Count;
    50.                             batchChunkCount++;
    51.                             sortedChunkIndex++;
    52.                         }
    54.                         m_InstancedRenderMeshBatchGroup.AddBatch(tag, rendererSharedComponentIndex, instanceCount, chunks, sortedChunkIndices, startSortedIndex, batchChunkCount, flippedWinding, editorRenderData);
    55.                     }
    56.                 }
    57.             }
    58.             Profiler.EndSample();
    Edit: I'm not personally using RenderMesh since I have my own rendering system now, but man would it be cool if I this wasn't a bottleneck because my solution only works for a very specific use case that is a 2D game where similar entities cannot change material properties.
    Last edited: Nov 1, 2019
  8. tertle


    Jan 25, 2011
    That backend system is RenderBoundsUpdateSystem so I was just responding to that. Without RenderBoundsUpdateSystem WorldRenderBounds will not be added to entities.

    Anyway have you benchmarked this at runtime out of interest?
  9. Radu392


    Jan 6, 2016
    Ah okay gotcha.

    Anyway yes, but not too in depth. I noticed a few things:

    1. Batching only occurs every frame if you have a job with a translation component that is NOT ReadOnly and that runs every frame as well. Even if you don't actually change that translation component on any entity, simply not having it on ReadOnly will make the batching run. I'm not 100% certain about this, but I strongly think that is the case. I think that's why no matter how many buildings entities i create, 10k, 20k, 50k, the render system doesn't care about them because they can only be moved on user input or destroyed via event, so the jobs that can change their translation doesn't run every frame. The rendermesh system on them takes almost 0 ms so I'm fine with using it for those entities. However, that can't apply to unit entities because their translation component can change every frame.

    2. I noticed that turning off the RenderBoundsUpdateSystem before creating entities does not render them and also stops the batching like you said it would but if you turn it off after they were created it does nothing. Batching still occurs. Toggling it off/on appears to have no effect. I have yet to try another thing: Manually update the RenderBoundsUpdateSystem myself instead of every frame through a combination of calling it myself and add/remove those troublesome components when needed. I doubt very much this would work, so I won't attempt it as I'm satisfied with my own solution. My own solution takes about 1/4th of the time with batching.

    3. Nothing changes between Editor and Build. Batching takes almost just as long as in the Editor when in a build.

    Earlier when I was attempting to hack my way through the system, I was trying to just say 'ok don't batch, just please render all chunks no matter where they are' because that's what my own implementation does anyway.
    Last edited: Nov 1, 2019
    ju_my likes this.
  10. DreamingImLatios


    Jun 3, 2017
    Coming from

    I decided to take a deep dive into this code to figure out how it could be optimized. This isn't an issue that affects me personally since my moving entities use GPU animation instancing, but since I figured out a means to optimize it, I figured I may as well describe it (don't care enough personally to test it). As it turns out, most of this culprit code is actually totally Burst-able, which I suspect would drastically reduce this performance issue.

    Inside the Add New Batches sample, there are three loops, forA, whileB, and whileC, where whileC is nested inside whileB. Inside whileB, there are two managed calls that we want to get out of these loops.

    The first is EntityManager.GetSharedComponentData<EditorRenderData>(editorRenderDataIndex). The result is only used in the second managed call, so we can just store that index somewhere else and make that call later.

    The second is m_InstancedRenderMeshBatchGroup.AddBatch. This function requires two arrays that are defined and unmodified outside our loops. It requires the shared component, which if we know how to get by index (an index of -1 is default). tag is constant in our loops as well. And that just leaves us with 4 ints and a bool for arguments. Those are all blittable, which means we can pack them inside a NativeList of a struct.

    Now all our loop code can be dropped into an IJob with a writable NativeList and a few ReadOnly NativeArrays. The EntityManager call can be removed. And the AddBatch call can be replaced with adding to the NativeList. Tack on [BurstCompile] to this job, create an instance where these loops used to be, and call Run() on it.

    After the job, you can loop through the list and call GetSharedComponentData (account for -1 should get default instead) and then call AddBatch. And lastly don't forget to dispose the NativeList.

    If anyone with this performance issue is brave enough to try this out, I would love to see what kind of performance gains you get, if anything.
    Radu392 likes this.
  11. Cell-i-Zenit


    Mar 11, 2016
    I would try it out if you could tell me how i can change such a system (or even just telling me which system it is :p)
  12. DreamingImLatios


    Jun 3, 2017
    I forget the details about embedding a package as it has been a while since I have had to do it, but it essentially involves copying the package out of the Library/PackageCache and pasting it into your packages folder in your project and then updating your manifest.json. I remember someone wrote a script to automatically embed a package for you in the PackageManager forums.

    As for editing the code, the system is RenderMeshSystemV2.cs. Just search for Profiler.BeginSample("Add New Batches"); and that will take you directly to the spot.
  13. Cell-i-Zenit


    Mar 11, 2016
    Ok i just got it "installed" by adding a simple debug.log statement to see if it works:

    1. remove the render package from manifest.json
    2. copy the package from the lib cache folder somewhere else
    3. add debug statement
    4. add package by hand via packagemanager -> + -> local package
    5. done

    Now iam trying to get it into a job. I will post some updates here and see if it works and if you all can help me with that
  14. Cell-i-Zenit


    Mar 11, 2016
    Ok here it is.

    So either iam doing something wrong and the code is S*** (which it is) or the whole thing is not doing anything at all. I would say it is ~ 10fps slower then the old solution. I expected it to be more already :( .

    One big thing i have problems with is setting the Array size. I just "randomly" went with 500, but i dont know how i could calculate this better.

    So can you help me here and improve the code a bit? Just point me in the direction and i will improve on it.

    Here is the job:

    Code (CSharp):
    2. using System;
    3. using Unity.Burst;
    4. using Unity.Collections;
    5. using Unity.Entities;
    6. using Unity.Jobs;
    7. using Unity.Mathematics;
    8. using Unity.Rendering;
    9. using UnityEngine;
    11. namespace
    12. {
    13.     [BurstCompile]
    14.     public struct BatchingJob : IJob, IDisposable
    15.     {
    16.         [WriteOnly] public NativeArray<bool> NativeFlipped;
    17.         [WriteOnly] public NativeArray<int> NativeEditorRenderDataIndex;
    18.         [WriteOnly] public NativeArray<int4> NativeDataArray1;
    20.         public NativeArray<int> NativeCount;
    22.         [ReadOnly] public NativeArray<int> SortedChunkIndices;
    23.         [ReadOnly] public int SharedRenderCount;
    24.         [ReadOnly] public ArchetypeChunkComponentType<RenderMeshFlippedWindingTag> MeshInstanceFlippedTagType;
    25.         [ReadOnly] public ArchetypeChunkSharedComponentType<EditorRenderData> EditorRenderDataType;
    26.         [ReadOnly] public NativeArray<int> SharedRendererCounts;
    27.         [ReadOnly] public NativeArray<ArchetypeChunk> Chunks;
    28.         [ReadOnly] public ArchetypeChunkSharedComponentType<RenderMesh> RenderMeshType;
    30.         public void Execute()
    31.         {
    32.             var sortedChunkIndex = 0;
    33.             var index = 0;
    35.             for (int i = 0; i < SharedRenderCount; i++)
    36.             {
    37.                 var startSortedChunkIndex = sortedChunkIndex;
    38.                 var endSortedChunkIndex = startSortedChunkIndex + SharedRendererCounts[i];
    40.                 while (sortedChunkIndex < endSortedChunkIndex)
    41.                 {
    42.                     var chunkIndex = SortedChunkIndices[sortedChunkIndex];
    43.                     var chunk = Chunks[chunkIndex];
    44.                     var rendererSharedComponentIndex = chunk.GetSharedComponentIndex(RenderMeshType);
    46.                     var editorRenderDataIndex = chunk.GetSharedComponentIndex(EditorRenderDataType);
    48.                     var remainingEntitySlots = 1023;
    49.                     var flippedWinding = chunk.Has(MeshInstanceFlippedTagType);
    51.                     int instanceCount = chunk.Count;
    52.                     int startSortedIndex = sortedChunkIndex;
    53.                     int batchChunkCount = 1;
    55.                     remainingEntitySlots -= chunk.Count;
    56.                     sortedChunkIndex++;
    58.                     while (remainingEntitySlots > 0)
    59.                     {
    60.                         if (sortedChunkIndex >= endSortedChunkIndex) break;
    62.                         var nextChunkIndex = SortedChunkIndices[sortedChunkIndex];
    63.                         var nextChunk = Chunks[nextChunkIndex];
    64.                         if (nextChunk.Count > remainingEntitySlots) break;
    66.                         var nextFlippedWinding = nextChunk.Has(MeshInstanceFlippedTagType);
    67.                         if (nextFlippedWinding != flippedWinding) break;
    69.                         #if UNITY_EDITOR
    70.                         if (editorRenderDataIndex !=
    71.                             nextChunk.GetSharedComponentIndex(EditorRenderDataType))
    72.                             break;
    73.                         #endif
    75.                         remainingEntitySlots -= nextChunk.Count;
    76.                         instanceCount += nextChunk.Count;
    77.                         batchChunkCount++;
    78.                         sortedChunkIndex++;
    79.                     }
    81.                     NativeFlipped[index] = flippedWinding;
    83.                     NativeDataArray1[index] = new int4(rendererSharedComponentIndex,
    84.                         startSortedIndex, instanceCount, batchChunkCount);
    86.                     NativeEditorRenderDataIndex[index] = editorRenderDataIndex;
    88.                     index++;
    89.                 }
    90.             }
    92.             NativeCount[0] = index;
    94.         }
    96.         public void Dispose()
    97.         {
    98.             NativeFlipped.Dispose();
    99.             NativeEditorRenderDataIndex.Dispose();
    100.             NativeDataArray1.Dispose();
    101.             NativeCount.Dispose();
    102.         }
    103.     }
    104. }
    And here is how the code is called:

    Code (CSharp):
    2. Profiler.BeginSample("Add New Batches");
    3. {
    4.     var length = 500;
    5.     var job = new BatchingJob()
    6.     {
    7.         NativeCount = new NativeArray<int>(1, Allocator.TempJob),
    8.         NativeEditorRenderDataIndex = new NativeArray<int>(length, Allocator.TempJob),
    9.         NativeDataArray1 = new NativeArray<int4>(length, Allocator.TempJob),
    10.         NativeFlipped = new NativeArray<bool>(length, Allocator.TempJob),
    11.         SharedRenderCount = sharedRenderCount,
    12.         SharedRendererCounts = sharedRendererCounts,
    13.         SortedChunkIndices = sortedChunkIndices,
    14.         Chunks = chunks,
    15.         MeshInstanceFlippedTagType = meshInstanceFlippedTagType,
    16.         EditorRenderDataType = editorRenderDataType,
    17.         RenderMeshType = RenderMeshType
    18.     };
    20.     job.Run();
    22.     for (int i = 0; i < job.NativeCount[0]; i++)
    23.     {
    24.         var editorDataIndex = job.NativeEditorRenderDataIndex[i];
    25.         EditorRenderData editorRenderData = m_DefaultEditorRenderData;
    27.         if (editorDataIndex != -1)
    28.         {
    29.             editorRenderData = EntityManager.GetSharedComponentData<EditorRenderData>(editorDataIndex);
    30.         }
    32.         var data1 = job.NativeDataArray1[i];
    34.         m_InstancedRenderMeshBatchGroup.AddBatch(tag, data1.x,
    35.             data1.z, chunks, sortedChunkIndices, data1.y,
    36.             data1.w, job.NativeFlipped[i],
    37.             editorRenderData);
    38.     }
    40.     job.Dispose();
    41. }
  15. DreamingImLatios


    Jun 3, 2017
    A few things:
    1) Please share timeline profile screenshots with and without the modifications. Without this, it is impossible for me to know what is going on.
    2) Please share your jobs settings. Specifically whether or not leak detection and safety checks are enabled or disabled.
    3) Is there a reason you are using NativeArray instead of NativeList? I don't think this will make a huge difference other than removing that 500 hardcoded limit.

    Lastly, if you are really struggling, if you have a test project you would be willing to share with me, I would be happy to try my hand at it this weekend.
  16. Cell-i-Zenit


    Mar 11, 2016
    Last edited: Dec 9, 2019
  17. DreamingImLatios


    Jun 3, 2017
    I will try to take a look sometime this week!

    In the meantime, it would be really helpful if you could share screenshots of your timeline recordings using each method, like what you posted in the third image here: That way I can make sure I am seeing the same issues you are seeing.

    In that timeline, you'll notice that in the latter half of "Add New Batches" you got this sort of comb-looking thing in the profiler. It's the gaps between the teeth of this comb that we are trying to get rid of.
  18. Cell-i-Zenit


    Mar 11, 2016

    After some tests it is actually a little bit faster, but definitly not significant. Maybe 10 fps and it feels much more "stable"
  19. DreamingImLatios


    Jun 3, 2017
    So I did some digging, and wow did I dig up some stuff I was not expecting.

    So the first thing to note is that those "AddBatch" samples that were acting like the teeth of the comb? Yeah...
    Those were only capturing the very first part of the AddBatch method. It turns out it was capturing the native API AddBatch call only. Why? I have no idea. It's relatively cheap.

    Do you see that little gray text block before the first AddBatch? That's the job we optimized, except in this profile capture I had Burst disabled on that job. Burst does help it out a little, but obviously this part is not the problem.

    So what is going on in these AddBatch routines? Well first is the native AddBatch call and some setup, which represents the gaps in my profile capture.

    Second is a loop through all the properties of the shader to find which ones are properties declared for instancing in ECS components. In our case, we only have one, which I suspect is the LocalToWorld matrix (or maybe color?). But I didn't bother to investigate that far. Once it finds the matching property, it gets the property array pointer and initializes that property's array to the default property value. Anyways, this array initialization work is probably the bulk of this loop's processing, right?

    1 millisecond of our precious frametime is spent by Unity trying to find the damn properties!

    For anyone who is brave enough to try and fix this, the optimization seems approachable. You'll need a Dictionary<Shader, int2> and a List<int>. The list contains the ECS TypeIndices of the properties the shaders reference, and the int2 is a start offset and length of the typeIndices for the given shader. If the shader is not in the Dictionary, you have to do that extra work of adding it and finding the typeIndices. But if it is in the dictionary, what used to be a bunch of string comparisons now becomes some integer lookups directly to what matters.

    Alright, so that's half the mystery. Now to the other half, the chunkCount loop.

    Here is where it does some gymnastics with the chunks and eventually copies the instanced properties into the shaderProperty arrays. And those copies are what are taking so much time, right?
    Again, no.

    The matrix copy is quite tiny, and the little guy to the right is the shader property (which I now suspect is color) copy. Which means it must be the code before that matrix copy that is slow.

    But why?

    I haven't figured that out yet. There's no interaction with Shaders or the graphics API at all in this part of the code. It's just accessing some array pointers from the chunks and then there's an Add to a NativeMultiHashMap. If anyone is willing to dig deeper, I would love to hear what you find!

    But all in all, it seems like there's room for an order of magnitude of improvement in this code path. I need to get some sleep now. Thanks for reading!
  20. Cell-i-Zenit


    Mar 11, 2016
    Thanks for your digging!

    Just a first idea after reading your text:

    Maybe the MultiHashMap is getting rebalanced multiple times and this making is slower.
  21. laurentlavigne


    Aug 16, 2012
    Thanks for digging into that, @GilCat sent me over here.