Search Unity

Official New BatchRendererGroup API for 2022.1

Discussion in 'Graphics for ECS' started by joelv, Jan 26, 2022.

  1. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Hi Everyone!

    We in the Hybrid Renderer team are excited to share some news on what we have been working on for the last year.

    Background
    The Hybrid Renderer is a way for Unity to use Entities and associated data for rendering without requiring you to round trip to GameObjects. Since its initial implementation, the Unity Engine side of this, the BatchRendererGroup (BRG), has not been very easy to use unless you know a lot of the implementation details.

    As the BRG provides a way to potentially issue more draw calls with much less CPU cost there have naturally been requests from you on how to use this interface. So as of Unity 2022.1, the BRG has been fully rewritten which we hope will bring better flexibility as well as be easier to use.

    Where the BRG previously consisted of a set of pre-built batches with the same mesh and material, it now allows you to build your own draw commands in the OnPerformCulling callback or in (Burst compiled) jobs. Each draw command is like a DrawMeshInstanced call and can draw multiple instances with the same material and mesh, with the difference that they are much more efficient and flexible.

    Performance example
    Draw commands using the BRG are considerably cheaper than traditional GameObject rendering, with regards to actual draw submission cost on the main and render thread. Direct comparison is tricky due to the fact that the responsibility of some work (culling, draw setup and so on) is moved to you as a user.



    As an example this URP test scene of roughly 24k GameObjects with varying meshes and materials has a baseline RenderCameraStack profiler marker of about 17ms on AMD 3970X. This is all serial main thread time, and the render thread has about the same time because the main thread can't feed it quickly enough.



    Compared to a scene where the GameObjects have been converted to draws in a BRG, the same profiler marker takes about 0.8ms on the main thread and about 0.2ms render thread.



    Previously you could do this with Graphics.DrawMeshInstanced of course, but it would require you to upload a lot of matrices every frame as well as having to write custom shaders for any per instance data you would like to override. With the new BRG it is possible to write shaders that support both this and regular game objects and we support URP/Lit and HDRP/Lit (among others) out of the box. Also take these measurements with a grain of salt, but they do show that it’s possible to render much more efficiently with the new API.

    Getting Started
    First, your project needs to use a Scriptable Render Pipeline (URP/HDRP or custom) as the new BRG interface is fully built upon the SRP Batcher. You then need to disable stripping of DOTS Instancing variants by setting “Build-time stripping of BatchRendererGroup variants” (under “Project Settings”, “Graphics”, “Shader Stripping”) to “Keep all”. The project also needs to have “Allow unsafe code” enabled (under “Project Settings”, “Script Compilation”).

    Currently OpenGL, GLES and WebGL are not supported.

    The following code is adapted from the SimpleExample test scene and script available in the URP BRG test project in the Graphics repository.

    BatchRenderGroup Object
    Next up you need to create the BatchRendererGroup and provide an OnPerformCulling callback method.

    Code (CSharp):
    1. void Start()
    2. {
    3.     m_BRG = new BatchRendererGroup(OnPerformCulling, IntPtr.Zero);
    4.     // Register resources
    5.     // Create batch
    6. }
    This callback is the main entry for the BatchRendererGroup and will be called by Unity internals whenever visible objects are to be culled.

    Code (CSharp):
    1. public unsafe JobHandle OnPerformCulling(
    2.   BatchRendererGroup rendererGroup,       // The BRG associated with this callback
    3.   BatchCullingContext cullingContext,     // Immutable input data for the operation
    4.   BatchCullingOutput cullingOutput,       // Mutable output data (draw commands)
    5.   IntPtr userContext)                     // Context user can pass (IntPtr.Zero here)
    6. {
    7.     // Do draw command generation work here
    8.     return new JobHandle(); // A job handle saying when the output actually is ready
    9. }
    10.  

    Register Resources
    The data output from the culling callback are all unmanaged types. This means that you can't directly reference objects such as instances of Mesh and Material, and instead, you need to register them with the BRG before you use them. In a real world project you should do this dynamically, but this example uses Start:

    Code (CSharp):
    1.     m_MeshID = m_BRG.RegisterMesh(mesh);
    2.     m_MaterialID = m_BRG.RegisterMaterial(material);
    The BRG object holds meshes and materials and updates its internal representation once per frame (after it processes the culling callback).

    Create Batch
    Except meshes and materials, the only way to add data to a specific draw call is through a batch and its associated metadata. The SRP shader code looks at the metadata for specific properties, determines if the property is in batch data or in the material constant buffer (high bit set or not), and loads the data for the instance from the correct location. We provide a utility (UNITY_ACCESS_DOTS_INSTANCED_PROP_WITH_DEFAULT) in the render pipelines core package for this. For more information on how this works, please see the linked manual draft.

    Note: the layout and implementation here is just one way to do it, and the way we have implemented for the Hybrid Renderer. You can use the metadata values in any way you find suitable with custom shaders or a modified URP/HDRP.

    First you need to create a buffer big enough to hold on to all the data you intend to store. If you outgrow this size later there is an API to update the buffer for a specific batch.

    Code (CSharp):
    1.     m_InstanceData = new GraphicsBuffer(GraphicsBuffer.Target.Raw,
    2.                                         kBufferSizeInInts,
    3.                                         sizeof(int));
    4.     // Fill data to buffer (see attached example code)
    Next up you need to set up the metadata for the batch you want to create. In this example, there are three shader properties stored in the buffer: ObjectToWorld, WorldToObject and BaseColor. The high bit (0x80000000) is set for these so they will read per-instance data.

    Code (CSharp):
    1.     var metadata = new NativeArray<MetadataValue>(3, Allocator.Temp);
    2.     metadata[0] = new MetadataValue { NameID = Shader.PropertyToID("unity_ObjectToWorld"), Value = 0x80000000 | byteAddressObjectToWorld, };
    3.     metadata[1] = new MetadataValue { NameID = Shader.PropertyToID("unity_WorldToObject"), Value = 0x80000000 | byteAddressWorldToObject, };
    4.     metadata[2] = new MetadataValue { NameID = Shader.PropertyToID("_BaseColor"), Value = 0x80000000 | byteAddressColor, };
    And finally you can create this batch.

    Code (CSharp):
    1.     m_BatchID = m_BRG.AddBatch(metadata, m_InstanceData.bufferHandle);
    As with mesh and material registration, you should create these batches when needed, but this example uses Start.

    Culling Callback
    Now that everything is set up it is finally time to fill in the draw command generation part in the culling callback.

    First up you need to allocate memory for the output. This is necessary as you as a user must know how many commands and ranges you will output. In this example, it means the draw commands, the draw ranges, and the visible instances. Always allocate memory using Allocator.TempJob as Unity frees this memory on the backend, potentially on another thread, later on.

    Code (CSharp):
    1.     var drawCommands = (BatchCullingOutputDrawCommands*)cullingOutput.drawCommands.GetUnsafePtr();
    2.  
    3.     drawCommands->drawCommands = (BatchDrawCommand*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawCommand>(), alignment, Allocator.TempJob);
    4.     drawCommands->drawRanges = (BatchDrawRange*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawRange>(), alignment, Allocator.TempJob);
    5.     drawCommands->visibleInstances = (int*)UnsafeUtility.Malloc(kNumInstances * sizeof(int), alignment, Allocator.TempJob);
    6.     drawCommands->drawCommandPickingInstanceIDs = null; // Picking is not handled
    This example creates a single draw command, in a single range, that renders three instances. It doesn't set up any sorting positions, but you can use these in your own implementation to sort draw commands.

    Code (CSharp):
    1.     drawCommands->drawCommandCount = 1;
    2.     drawCommands->drawRangeCount = 1;
    3.     drawCommands->visibleInstanceCount = kNumInstances;
    4.  
    5.     drawCommands->instanceSortingPositions = null; // No sorting
    6.     drawCommands->instanceSortingPositionFloatCount = 0;
    You need to set up each of the draw calls with a contiguous range of instance indices in the visible instances array. This example just does one draw call and writes out the values 0, 1, and 2. This is the index to be used for a specific instance to look up the data in a specific batch, even though the array is shared between many draw calls referencing multiple batches.

    Code (CSharp):
    1.     for (int i = 0; i < kNumInstances; ++i)
    2.         drawCommands->visibleInstances[i] = i;
    Draw commands contain the information Unity needs to render with the appropriate mesh, material, and batch. The code below sets up the offset and count of the visible instances, as well as some other data Unity needs for rendering.

    Code (CSharp):
    1.     drawCommands->drawCommands[0].visibleOffset = 0;
    2.     drawCommands->drawCommands[0].visibleCount = kNumInstances;
    3.     drawCommands->drawCommands[0].batchID = m_BatchID;
    4.     drawCommands->drawCommands[0].materialID = m_MaterialID;
    5.     drawCommands->drawCommands[0].meshID = m_MeshID;
    6.     drawCommands->drawCommands[0].submeshIndex = 0;
    7.     drawCommands->drawCommands[0].splitVisibilityMask = 0xff;
    8.     drawCommands->drawCommands[0].flags = 0;
    9.     drawCommands->drawCommands[0].sortingPosition = 0;
    Finally, you can set up the draw range. It’s just one in this case. A real project scene would use commands with different filtering settings to split the instances into different ranges.

    Code (CSharp):
    1.     drawCommands->drawRanges[0].drawCommandsBegin = 0;
    2.     drawCommands->drawRanges[0].drawCommandsCount = 1;
    3.     drawCommands->drawRanges[0].filterSettings = new BatchFilterSettings { renderingLayerMask = 0xffffffff, };
    This simple example returns a default JobHandle, with all draw command generation happening on the main thread. To optimize performance, most of the things in this function should be a Burst job. If you use Burst jobs, Unity builds a job chain with the returned job handle here to filter, prepare, and execute draws matching filter settings from draw renderers and draw shadows commands in the SRP.

    Additional user responsibilities
    The downside of a more raw and flexible draw submission system is that you have to set up a lot more data. The example above only sets up matrices and colors, but for more complex scenes it will be your responsibility to set up any global illumination data such as light probes and light maps.
    Unity can't do this since it has no information about what instance is where any longer, it’s all provided using an opaque GPU buffer.

    Known issues
    Some mobile devices might have a GPU performance regression compared to game objects. This can be due to the SSBO data loading path used in these examples (and by the current Hybrid Renderer). We are investigating different approaches to solve this without an explosion of shader variants.

    Resources
    Our test projects for URP and HDRP live in the Graphics repo.

    The Unity Manual page for BatchRendererGroup

    Future work
    In the coming year, the Hybrid Rendering team will land improvements to the Hybrid Renderer package to use this interface, as well as work on interface improvements and bug fixes on the core Unity side of things.

    Support for GLES3.1 and higher is on our roadmap, and we are also looking at supporting equivalents to Graphics.DrawProcedural and Graphics.DrawProceduralIndirect.
     

    Attached Files:

    Last edited: Jun 9, 2022
  2. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Please give me a real world example of why I need to re-register these? Probably just need more peripheral information why I do these things, not I should do these things...

    I'm guessing just to alter the rendered mesh or material whenever it might be needed? For example with grass: never, in my case?
     
    bb8_1 likes this.
  3. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    I will try to clarify this in the post later, but the main reason is that Mesh and Material is managed types and this whole interface is written to be burst compatible which rules out managed types.
    Registering also allows us to sidestep some very slow setup and teardown costs per frame and generate a better usable view of Mesh and Material up front.

    I agree that ideally Mesh and Material would be directly usable but we are not there yet.
     
    joshcamas, NotaNaN, bb8_1 and 4 others like this.
  4. JesOb

    JesOb

    Joined:
    Sep 3, 2012
    Posts:
    1,109
    Hi @joelv
    Good step forward thanks :)

    This new api looks like very similar to MultiDrawIndirect (MDI)
    https://docs.nvidia.com/gameworks/c...es/opengl_samples/multidrawindirectsample.htm

    Can you clarify similarities and differences of MDI to MeshShaders to Unity BRG Api?
    - can we issue one draw call to render many instances of different submeshes of one mesh in different locations?
    - can we use modern techniques like culling with compute buffer → generate draw commands buffer and than issue one BRG to render from that buffer?
    - do this api is step forward to MeshShaders or you plan to rewrite BRG once again to support MeshShaders?
    - can we use meshlets approach through BRG

    - when we can expect shader graph support for this?

    Thanks :)
     
    Walter_Hulsebos likes this.
  5. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Glad you like it.

    So to answer your questions: Yes it is a bit like MDI but it is still a CPU draw loop. It will allow you to switch Mesh (including submesh) and Material between the draws in this loop (each draw command).

    But currently this does not interact nicely with GPU generated draws, the draw commands exposed by the BRG is generated and consumed on the CPU. We are looking into supporting the equivalent to DrawProcedural which could mean at least almost full MDI support depending on what the underlying graphics API supports. In our prototype this becomes a new type of DrawCommand that allows you to provide the compute buffers needed.

    Mesh shaders do not really fit into our roadmap as it is now, but if/once the shader pipeline and device backends for them are planned out we will be sure to support them.

    And finally, yes this works with shader graph. At least if used in an HDRP/URP context where we have written the required code generation.

    Hope this answers your questions. We can't commit to how the API will evolve but we will continue to improve it.
     
  6. VincentBreysse

    VincentBreysse

    Unity Technologies

    Joined:
    May 31, 2021
    Posts:
    27
    Also, in a real world project you would likely have new objects created dynamically during the game.
    For example, let's say you want to instantiate a projectile when shooting at something. If the projectile uses some materials or meshes which were unknown on startup, then you would need to register them dynamically to the BatchRendererGroup to be able to perform the rendering.
    Currently you can't render anything with a BRG using materials or meshes which haven't been registered beforehand.
    In the same way, deregistration should also happen dynamically so that the data associated with unused registered materials/meshes is freed up. The details regarding when and how you register/deregister the resources are up to you.

    In this post, we talk about a simple example scene in which all the materials and meshes are known on startup. So technically you can just gather all of them in the Start() function, then register everything and be done with it. But in real world projects it usually won't be that simple. The example in this post is mainly for illustration purpose.
     
    Last edited: Apr 20, 2022
    bb8_1, NotaNaN, Krajca and 1 other person like this.
  7. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Thanks, as you can imagine the performance gains here don't really make this optional for any serious published game, or for use in VR (Vulkan at the moment if Quest2).

    I use URP currently.

    My projects are currently mobobehaviour and gameobject based but my need for BRG API is for rendering static level geometry, details, and things that I need a lot of but can't afford the overhead (but have predictable characteristics so don't need to be gameobjects). So if you could keep it in mind for future examples (accelerating common static level rendering tasks), I would really appreciate it and probably learn a lot.

    Ideally I'd like to replace all my rendering with this if I can, and am willing to build some functions to make life easier. Anything along those lines like a strategy or decent setup to build on top for dev QOL that you could advise would be great.

    I aim to try and get the most out of this valuable performance gain!

    And:
    An example for that too if possible. I sound awfully selfish but this feature is a big deal, especially if like me, you're not doing an ECS project but a regular one with a heck of a lot of draw calls (large open world rendering on rubbish hardware).

    Thanks regardless as this was a much needed feature.
     
  8. YuriyPopov

    YuriyPopov

    Joined:
    Sep 5, 2017
    Posts:
    237
    I'm wondering if we can get a decent HDRP example with helpful comments like the URP example. I too like hippocoder would like to replace all my rendering with this asap.
     
    hippocoder likes this.
  9. JesOb

    JesOb

    Joined:
    Sep 3, 2012
    Posts:
    1,109
    Thanks

    Another questions:

    - Can we expect to have simple for loop of draw calls (without state change) when we have same material same mesh just different submesh and per instance data offset in buffer?
    All DrawIndexed commands must be equal, just different parameters so it must be super fast.

    - Unity have DrawProcedural that is equivalent of glDrawElementsIndirect
    when/do unity will expose MultiDrawProcedural that will be equivalent of glMultiDrawElementsIndirect?
     
    laurentlavigne, dotmos and hippocoder like this.
  10. Lymdun

    Lymdun

    Joined:
    Jan 1, 2017
    Posts:
    46
    Hi!
    Do you plan to work on a new renderer using this interface for GameObjects/MeshRenderer without any dependency to DOTS, as seen in https://twitter.com/SebAaltonen/status/1407661348197175299 ?
     
    Peter77, Shaderic, PutridEx and 2 others like this.
  11. kite3h

    kite3h

    Joined:
    Aug 27, 2012
    Posts:
    197
    It is very difficult to raise Draw, which is originally in the low-level area, to the script level, but the reason for entrusting this to the front-end stage is not to leave Occlusion Culling to the developer's discretion.
     
  12. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Currently I do think mesh switching is a bit costly, even submesh. It means some additional data encoded per draw in the backend and some buffers to be bound. We are looking into optimizing this.

    I am not aware of any plans for exposing that API. However it might happen in context of the BatchRendererGroup at some point.
     
    JesOb likes this.
  13. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    The test scripts in the repository linked in the original post actually originates from Sebastians hackweek project. We are maintaining these but currently they are not feature complete (missing lightprobe and lightmap support, missing static batching support).

    Our teams focus is the Hybrid Renderer and we can't spread ourselves too thin at the moment. I would love to be able to reimplement full Game Object rendering on top of BRG though, but it's nothing I can promise =)
     
  14. Tim-C

    Tim-C

    Unity Technologies

    Joined:
    Feb 6, 2010
    Posts:
    2,225
    We have a branch here where we have been experimenting with allowing static objects to render via BRG. This is super super experimental but can serve as a bigger example of how to put a lot more things into the BRG. This is also using some engine API's that have not made it into a release yet so it will likely not just work or just compile for you but the entry point is the monobehaviour here.

    It's likely a pretty okay starting point if you want to make your own 'fast brg for my static objects'. Note we also do some stuff for objects that are moving, but rigid so they can also work through BRG. This just leaves some things like skinned meshes and effects going through the gameobject path. Note: Don't look into the deferred material stuff, that's even more experimental and builds on top of the BRG gameobject rendering and is even less stable than the super experimental static objects in BRG.
     
    NotaNaN, TerraUnity, fherbst and 7 others like this.
  15. Tim-C

    Tim-C

    Unity Technologies

    Joined:
    Feb 6, 2010
    Posts:
    2,225
    hippocoder likes this.
  16. Oh boy, this will be epic if it ever makes into prod. +1 from me, I pretty much would like to see this developed properly.
     
    Gametyme, PutridEx and JesOb like this.
  17. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    Is this similar to vulkan multithreaded rendering?
     
  18. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    In one way you can see of it like that: multiple threads can build draw commands. However the commands you provide here are CPU draw commands. They need to be issued to the GPU using some low level API calls later on. These can happen from multiple threads as well if you have graphics jobs enabled.
     
    hippocoder and JesOb like this.
  19. YuriyPopov

    YuriyPopov

    Joined:
    Sep 5, 2017
    Posts:
    237
    You should consider giving this a much higher priority if you ask me. The tech seems amazing and would allow us to do more. Getting something into the hands of people sooner will be better.
     
    Gametyme, saskenergy, NotaNaN and 6 others like this.
  20. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
  21. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,411
    URP, 10 million default cubes, is this expected results? (using that sample script)
    upload_2022-1-27_17-40-42.png
     
    Kjelle69 and hippocoder like this.
  22. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    The sample script does not use Burst to keep the sample simple. If you improve it and use Burst jobs to fill the large arrays (I think in the sample it's just the visible instance indices), you should see much better performance.

    Alternatively, if you just want to test best case performance, you can make a cached copy of the array and use UnsafeUtility.MemCpy to copy it into place. This is not what a real game would do, since it would assume static visibility, but for a simple test like this it could be OK.
     
  23. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Oh now you have done it! This is just the ticket and I think will help a major ton of devs! Thank you, and I hope URP would get some kind of new static path like this for classic projects. So many would get a free boost from it. Until then I am going to poke around with gratitude!

    Thank you!

    :)
     
  24. tatoforever

    tatoforever

    Joined:
    Apr 16, 2009
    Posts:
    4,368
    Just to be 100% sure I got it right. When we upload data (or register it) using the API, does it remains on the GPU untill we unload it?
     
  25. CodeRoadOne

    CodeRoadOne

    Joined:
    Mar 9, 2014
    Posts:
    52
    The new code seems to solve some issues and I think it will work correctly with multiple lights.

    @mgear I also tried to do your test using GPU culling, but my GPU (GeForce 1050GTX TI) cannot handle so much data unfortunately.
    I did a test with around 2 mil cubes (127x127x127) here are my results:
    upload_2022-1-27_17-45-32.png
    CPU time to emit the visibility is 0.034ms and to emit the draw calls is 0.017ms
    Total time on CPU with updating of the GPU data is 0.11ms
    I'm sure as soon as I fix the GPU occlusion culling it will run even faster.

    Of course my version doesn't handle correctly all lights and uv lightmaps per instance. So getting speed improvements inside Unity it will always be better. I'm waiting for the release of Dots so I can fully use this new features.

    Thank you for sharing!
     
    Last edited: Jan 27, 2022
    ThatDan123, JesOb and Lars-Steenhoff like this.
  26. Grimreaper358

    Grimreaper358

    Joined:
    Apr 8, 2013
    Posts:
    789
    Will there be default BRG optimizations enabled? I don't know much about graphics programming so just checking if we will only have an API to use or if this will also have a default implementation in the engine.

    I could just add my assets to the scene as usual and maybe follow certain rules and have it work.
     
  27. YuriyPopov

    YuriyPopov

    Joined:
    Sep 5, 2017
    Posts:
    237
    I just have to ask. What method did you use to implement gpu occlusion culling and how?
     
  28. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    For GraphicsBuffers you create and upload data to (matrices, overridden properties and so on) it is persistent and up to you to update and manage.

    Meshes and Materials are a bit more complicated. Once you register them they will be recognized by the system but if some code deletes them the BRG will tag them as deleted and will just stop drawing any draw commands referencing the deleted mesh/material.

    The data you provide in the culling callback is transient. It's only used for one frame and then freed.
     
  29. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    We are not replacing any unity part with this as it is now. It's the foundation for the Hybrid Renderer to be able to render entities, and it is usable if you want to write a custom renderer. The scripts linked in this threads are just examples of what is possible, and we may use something like this in the future so speed up general rendering.
     
  30. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Awesome, hope this happens!
     
    Gametyme and Grimreaper358 like this.
  31. CodeRoadOne

    CodeRoadOne

    Joined:
    Mar 9, 2014
    Posts:
    52
    Hi @YuriyPopov,
    The occlusion culling is following the standard approach from Nanite. Use the previous frame to do the culling for the current frame.
    Render all visible instances, next using the new depth do another visibility pass and render the rest of objects.
    Right now I have an issue with the algorithm and it is not working as expected I need to debug.
    But the current implementation is using frustum culling, size culling and lod switching, everything from the GPU. Still one draw call per material, but doesn't care if you have different meshes.
    If I can have more than one light it will be usable, so that's why I'm waiting for improvements in Dots :)
    I hope I answered your question.
     
  32. YuriyPopov

    YuriyPopov

    Joined:
    Sep 5, 2017
    Posts:
    237
    I dont get how you do GPU culling with the this api at all. Do you first gather data, feed a compute shader, then read the buffer back on the cpu and omit the draw commands ?
     
  33. CodeRoadOne

    CodeRoadOne

    Joined:
    Mar 9, 2014
    Posts:
    52
    Just for clarifications, the code that I did it is not using this API (only URP), but I think it can be adjusted. The basic idea is to store all the data in a persistent buffer on the GPU. Execute a compute shader that will generate a list of instances that needs to be rendered. Using a draw indirect emitted from the CPU for all different materials and the let the compute to generate the actual data for the draw call.
    I've sent you a private message with more info about this if you would like to know more (I don't want to pollute this thread more that I already did).
     
    Last edited: Jan 28, 2022
    unisip likes this.
  34. JesOb

    JesOb

    Joined:
    Sep 3, 2012
    Posts:
    1,109
    I have found that Unity already Expose new Api for this Graphics.RenderMeshIndirect and it will support multi-draw paradigm in future.

    Please
    make BRG interface like this api so it can easily enable usage of low level multi-draw api in future.
    May be expose few different ways to store draw commands, like in new Graphics.RenderX method family, so we can provide draw commands from GraphicBuffer and others

    Goals is to efficiently draw many different mesh instances with same material, like:
    - draw all different props on level in one go
    - draw all chunks of VoxelWorld in one go
    - draw one district of level (one constructor) in one go
    - perform culling and LOD selection on GPU side write commands and render fast batches from GraphicBuffer
    - ...

    upload_2022-2-2_3-5-6.png
    upload_2022-2-2_3-10-49.png
    upload_2022-2-2_3-11-21.png
     
    dotmos and bb8_1 like this.
  35. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    We have an experimental implementation for almost all of that (regular mesh draw, direct procedural, indirect procedural) in a branch. On DX11 multiple indirect draws is emulated as a loop on CPU side and of course there is no way to provide a late command count as a buffer so all commands in the range will always be executed.

    How and when this actually will land is still not decided. We want to get the interface right and ensure it actually covers everything. We'll get back to you once it hits some future unity version beta, but it won't happen during the unity 22.X stream.
     
    Zarbuz, NotaNaN, bb8_1 and 5 others like this.
  36. linfuqing

    linfuqing

    Joined:
    May 11, 2015
    Posts:
    166
    Will this api be used for Hybrid Renderer in ecs 0.5?
    And will support the Point Light next version?
     
  37. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    This API is not used by the version in 0.5, and is used by the next version.

    Point light support depends on URP, the Hybrid Renderer requires a screen space technique for local lights, such as deferred or forward+.
     
    linfuqing likes this.
  38. DDKH

    DDKH

    Joined:
    Jun 13, 2013
    Posts:
    25
    Will this API support GLES 3.1 in ECS 1.0 ?
     
  39. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    We are aiming for this, and are currently working on it.

    However, it is possible that due to technical reasons, the GLES3.1 version might work slightly differently and have different performance characteristics.
     
  40. Thermos

    Thermos

    Joined:
    Feb 23, 2015
    Posts:
    148
    Hybrid Renderer in ECS 0.17 is already pretty fast..Can we expect a better performance by adopting this API when ECS reachs 1.0?
     
  41. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    One of the goals of the new Hybrid Renderer is to provide good performance also in situations where the current Hybrid Renderer has trouble due to technical limitations. Examples include transparencies (which require depth sorting), and scenes with a lot of Mesh/Material variety.
     
  42. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    Is that only phone that supports GLES3.1 will work properly? How about GLES3.0 only phone?
     
  43. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    We are trying to make it possible for the BatchRendererGroup API itself to support GLES3.0, but since it doesn't have support for compute shaders or SSBOs, users will be more limited in the ways they can upload instance data on GLES3.0 only devices.

    Hybrid Renderer uses compute shaders, and will require GLES3.1.
     
    optimise likes this.
  44. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    I see but for Hybrid Renderer how u handle GLES3.1 mobile phones that cannot support compute shader?
     
  45. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Isn't compute shader support required for OpenGL ES 3.1?
     
  46. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Yes. 3.1 requires compute and that is the target for the Hybrid Renderer for now.
    Supporting non compute capable devices would require a rewrite of the data model, doable but it would require quite some work and would lead to worse CPU performance for those devices.
     
  47. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    727
    Just want to say I am very happy to see Unity embracing the latest technologies (ES 3) as ES 2 is basically non-existent these days, especially for mobile gamers who care at all about gaming.

    Looking forward to seeing this trend continue. Maybe in the future it will be worth it for us to switch to URP. Every mobile dev knows that every bit of performance matters on this platform.
     
    hippocoder and apkdev like this.
  48. YuriyPopov

    YuriyPopov

    Joined:
    Sep 5, 2017
    Posts:
    237
    @joelv So I finally got around to doing to some test on this and I'm getting very mixed results.
    I'm using a exact copy of the RenderBRG script from the URP test project. I'm using Unity 2022.1.0b8.2747 and URP: 13.1.5 with forward rendering and XR single pass instanced enabled.
    In my test scene I have about 7k renderers, 1 dir light with shadows and 3 points lights with shadows. Naturally the point lights don't work with the BRG path. I'm getting my frame timings from SteamVR. With the BRG enabled I get about 5.8ms on average, however when I toggle it off and restore the normal renderers I get about 5.4ms on average.
    Any insights on why this is happening and what I could be doing wrong ?
     
  49. JussiKnuuttila

    JussiKnuuttila

    Unity Technologies

    Joined:
    Jun 7, 2019
    Posts:
    351
    The RenderBRG script in the URP test project is a relatively simple test script, and does not have very sophisticated culling, especially for shadow maps. My first guess would be that the RenderBRG version is rendering more things in the shadow maps than the GameObject version is. You can check whether this is the case e.g. by using RenderDoc.
     
  50. VincentBreysse

    VincentBreysse

    Unity Technologies

    Joined:
    May 31, 2021
    Posts:
    27
    @YuriyPopov There is a known issue in the 22.1 RenderBRG.cs script related to the picking that causes huge performance issues. You can find a throw away branch with a fix here. It has already landed to master, but that's for 2022.2. So alternatively you can probably just grab the latest RenderBRG.cs script on Graphics master.
     
    Last edited: Feb 21, 2022