Official New BatchRendererGroup API for 2022.1

joelv · Jun 9, 2022

Hi Everyone!

We in the Hybrid Renderer team are excited to share some news on what we have been working on for the last year.

Background
The Hybrid Renderer is a way for Unity to use Entities and associated data for rendering without requiring you to round trip to GameObjects. Since its initial implementation, the Unity Engine side of this, the BatchRendererGroup (BRG), has not been very easy to use unless you know a lot of the implementation details.

As the BRG provides a way to potentially issue more draw calls with much less CPU cost there have naturally been requests from you on how to use this interface. So as of Unity 2022.1, the BRG has been fully rewritten which we hope will bring better flexibility as well as be easier to use.

Where the BRG previously consisted of a set of pre-built batches with the same mesh and material, it now allows you to build your own draw commands in the OnPerformCulling callback or in (Burst compiled) jobs. Each draw command is like a DrawMeshInstanced call and can draw multiple instances with the same material and mesh, with the difference that they are much more efficient and flexible.

Performance example
Draw commands using the BRG are considerably cheaper than traditional GameObject rendering, with regards to actual draw submission cost on the main and render thread. Direct comparison is tricky due to the fact that the responsibility of some work (culling, draw setup and so on) is moved to you as a user.

As an example this URP test scene of roughly 24k GameObjects with varying meshes and materials has a baseline RenderCameraStack profiler marker of about 17ms on AMD 3970X. This is all serial main thread time, and the render thread has about the same time because the main thread can't feed it quickly enough.

Compared to a scene where the GameObjects have been converted to draws in a BRG, the same profiler marker takes about 0.8ms on the main thread and about 0.2ms render thread.

Previously you could do this with Graphics.DrawMeshInstanced of course, but it would require you to upload a lot of matrices every frame as well as having to write custom shaders for any per instance data you would like to override. With the new BRG it is possible to write shaders that support both this and regular game objects and we support URP/Lit and HDRP/Lit (among others) out of the box. Also take these measurements with a grain of salt, but they do show that it’s possible to render much more efficiently with the new API.

Getting Started
First, your project needs to use a Scriptable Render Pipeline (URP/HDRP or custom) as the new BRG interface is fully built upon the SRP Batcher. You then need to disable stripping of DOTS Instancing variants by setting “Build-time stripping of BatchRendererGroup variants” (under “Project Settings”, “Graphics”, “Shader Stripping”) to “Keep all”. The project also needs to have “Allow unsafe code” enabled (under “Project Settings”, “Script Compilation”).

Currently OpenGL, GLES and WebGL are not supported.

The following code is adapted from the SimpleExample test scene and script available in the URP BRG test project in the Graphics repository.

BatchRenderGroup Object
Next up you need to create the BatchRendererGroup and provide an OnPerformCulling callback method.

Code (CSharp):

void Start()

{

m_BRG = new BatchRendererGroup(OnPerformCulling, IntPtr.Zero);

// Register resources

// Create batch

}

This callback is the main entry for the BatchRendererGroup and will be called by Unity internals whenever visible objects are to be culled.

Code (CSharp):

public unsafe JobHandle OnPerformCulling(

BatchRendererGroup rendererGroup, // The BRG associated with this callback

BatchCullingContext cullingContext, // Immutable input data for the operation

BatchCullingOutput cullingOutput, // Mutable output data (draw commands)

IntPtr userContext) // Context user can pass (IntPtr.Zero here)

{

// Do draw command generation work here

return new JobHandle(); // A job handle saying when the output actually is ready

}

Register Resources
The data output from the culling callback are all unmanaged types. This means that you can't directly reference objects such as instances of Mesh and Material, and instead, you need to register them with the BRG before you use them. In a real world project you should do this dynamically, but this example uses Start:

Code (CSharp):

m_MeshID = m_BRG.RegisterMesh(mesh);

m_MaterialID = m_BRG.RegisterMaterial(material);

The BRG object holds meshes and materials and updates its internal representation once per frame (after it processes the culling callback).

Create Batch
Except meshes and materials, the only way to add data to a specific draw call is through a batch and its associated metadata. The SRP shader code looks at the metadata for specific properties, determines if the property is in batch data or in the material constant buffer (high bit set or not), and loads the data for the instance from the correct location. We provide a utility (UNITY_ACCESS_DOTS_INSTANCED_PROP_WITH_DEFAULT) in the render pipelines core package for this. For more information on how this works, please see the linked manual draft.

Note: the layout and implementation here is just one way to do it, and the way we have implemented for the Hybrid Renderer. You can use the metadata values in any way you find suitable with custom shaders or a modified URP/HDRP.

First you need to create a buffer big enough to hold on to all the data you intend to store. If you outgrow this size later there is an API to update the buffer for a specific batch.

Code (CSharp):

m_InstanceData = new GraphicsBuffer(GraphicsBuffer.Target.Raw,

kBufferSizeInInts,

sizeof(int));

// Fill data to buffer (see attached example code)

Next up you need to set up the metadata for the batch you want to create. In this example, there are three shader properties stored in the buffer: ObjectToWorld, WorldToObject and BaseColor. The high bit (0x80000000) is set for these so they will read per-instance data.

Code (CSharp):

var metadata = new NativeArray<MetadataValue>(3, Allocator.Temp);

metadata[0] = new MetadataValue { NameID = Shader.PropertyToID("unity_ObjectToWorld"), Value = 0x80000000 | byteAddressObjectToWorld, };

metadata[1] = new MetadataValue { NameID = Shader.PropertyToID("unity_WorldToObject"), Value = 0x80000000 | byteAddressWorldToObject, };

metadata[2] = new MetadataValue { NameID = Shader.PropertyToID("_BaseColor"), Value = 0x80000000 | byteAddressColor, };

And finally you can create this batch.

Code (CSharp):

m_BatchID = m_BRG.AddBatch(metadata, m_InstanceData.bufferHandle);

As with mesh and material registration, you should create these batches when needed, but this example uses Start.

Culling Callback
Now that everything is set up it is finally time to fill in the draw command generation part in the culling callback.

First up you need to allocate memory for the output. This is necessary as you as a user must know how many commands and ranges you will output. In this example, it means the draw commands, the draw ranges, and the visible instances. Always allocate memory using Allocator.TempJob as Unity frees this memory on the backend, potentially on another thread, later on.

Code (CSharp):

var drawCommands = (BatchCullingOutputDrawCommands*)cullingOutput.drawCommands.GetUnsafePtr();

drawCommands->drawCommands = (BatchDrawCommand*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawCommand>(), alignment, Allocator.TempJob);

drawCommands->drawRanges = (BatchDrawRange*)UnsafeUtility.Malloc(UnsafeUtility.SizeOf<BatchDrawRange>(), alignment, Allocator.TempJob);

drawCommands->visibleInstances = (int*)UnsafeUtility.Malloc(kNumInstances * sizeof(int), alignment, Allocator.TempJob);

drawCommands->drawCommandPickingInstanceIDs = null; // Picking is not handled

This example creates a single draw command, in a single range, that renders three instances. It doesn't set up any sorting positions, but you can use these in your own implementation to sort draw commands.

Code (CSharp):

drawCommands->drawCommandCount = 1;

drawCommands->drawRangeCount = 1;

drawCommands->visibleInstanceCount = kNumInstances;

drawCommands->instanceSortingPositions = null; // No sorting

drawCommands->instanceSortingPositionFloatCount = 0;

You need to set up each of the draw calls with a contiguous range of instance indices in the visible instances array. This example just does one draw call and writes out the values 0, 1, and 2. This is the index to be used for a specific instance to look up the data in a specific batch, even though the array is shared between many draw calls referencing multiple batches.

Code (CSharp):

for (int i = 0; i < kNumInstances; ++i)

drawCommands->visibleInstances[i] = i;

Draw commands contain the information Unity needs to render with the appropriate mesh, material, and batch. The code below sets up the offset and count of the visible instances, as well as some other data Unity needs for rendering.

Code (CSharp):

drawCommands->drawCommands[0].visibleOffset = 0;

drawCommands->drawCommands[0].visibleCount = kNumInstances;

drawCommands->drawCommands[0].batchID = m_BatchID;

drawCommands->drawCommands[0].materialID = m_MaterialID;

drawCommands->drawCommands[0].meshID = m_MeshID;

drawCommands->drawCommands[0].submeshIndex = 0;

drawCommands->drawCommands[0].splitVisibilityMask = 0xff;

drawCommands->drawCommands[0].flags = 0;

drawCommands->drawCommands[0].sortingPosition = 0;

Finally, you can set up the draw range. It’s just one in this case. A real project scene would use commands with different filtering settings to split the instances into different ranges.

Code (CSharp):

drawCommands->drawRanges[0].drawCommandsBegin = 0;

drawCommands->drawRanges[0].drawCommandsCount = 1;

drawCommands->drawRanges[0].filterSettings = new BatchFilterSettings { renderingLayerMask = 0xffffffff, };

This simple example returns a default JobHandle, with all draw command generation happening on the main thread. To optimize performance, most of the things in this function should be a Burst job. If you use Burst jobs, Unity builds a job chain with the returned job handle here to filter, prepare, and execute draws matching filter settings from draw renderers and draw shadows commands in the SRP.

Additional user responsibilities
The downside of a more raw and flexible draw submission system is that you have to set up a lot more data. The example above only sets up matrices and colors, but for more complex scenes it will be your responsibility to set up any global illumination data such as light probes and light maps.
Unity can't do this since it has no information about what instance is where any longer, it’s all provided using an opaque GPU buffer.

Known issues
Some mobile devices might have a GPU performance regression compared to game objects. This can be due to the SSBO data loading path used in these examples (and by the current Hybrid Renderer). We are investigating different approaches to solve this without an explosion of shader variants.

Resources
Our test projects for URP and HDRP live in the Graphics repo.

The Unity Manual page for BatchRendererGroup

Future work
In the coming year, the Hybrid Rendering team will land improvements to the Hybrid Renderer package to use this interface, as well as work on interface improvements and bug fixes on the core Unity side of things.

Support for GLES3.1 and higher is on our roadmap, and we are also looking at supporting equivalents to Graphics.DrawProcedural and Graphics.DrawProceduralIndirect.

hippocoder · Jan 26, 2022

joelv said: ↑

Register Resources
The data output from the culling callback are all unmanaged types. This means that you can't directly reference objects such as instances of Mesh and Material, and instead, you need to register them with the BRG before you use them. In a real world project you should do this dynamically, but this example uses Start:
Click to expand...

Please give me a real world example of why I need to re-register these? Probably just need more peripheral information why I do these things, not I should do these things...

I'm guessing just to alter the rendered mesh or material whenever it might be needed? For example with grass: never, in my case?

joelv · Jan 26, 2022

hippocoder said: ↑

Please give me a real world example of why I need to re-register these? Probably just need more peripheral information why I do these things, not I should do these things...

I'm guessing just for different meshes and materials within the same batch, but using the same shader?
Click to expand...

I will try to clarify this in the post later, but the main reason is that Mesh and Material is managed types and this whole interface is written to be burst compatible which rules out managed types.
Registering also allows us to sidestep some very slow setup and teardown costs per frame and generate a better usable view of Mesh and Material up front.

I agree that ideally Mesh and Material would be directly usable but we are not there yet.

JesOb · Jan 26, 2022

Hi @joelv
Good step forward thanks

This new api looks like very similar to MultiDrawIndirect (MDI)
https://docs.nvidia.com/gameworks/c...es/opengl_samples/multidrawindirectsample.htm

Can you clarify similarities and differences of MDI to MeshShaders to Unity BRG Api?
- can we issue one draw call to render many instances of different submeshes of one mesh in different locations?
- can we use modern techniques like culling with compute buffer → generate draw commands buffer and than issue one BRG to render from that buffer?
- do this api is step forward to MeshShaders or you plan to rewrite BRG once again to support MeshShaders?
- can we use meshlets approach through BRG

- when we can expect shader graph support for this?

Thanks

joelv · Jan 26, 2022

Jes28 said: ↑

Hi @joelv
Good step forward thanks

This new api looks like very similar to MultiDrawIndirect (MDI)
https://docs.nvidia.com/gameworks/c...es/opengl_samples/multidrawindirectsample.htm

Can you clarify similarities and differences of MDI to MeshShaders to Unity BRG Api?
- can we issue one draw call to render many instances of different submeshes of one mesh in different locations?
- can we use modern techniques like culling with compute buffer → generate draw commands buffer and than issue one BRG to render from that buffer?
- do this api is step forward to MeshShaders or you plan to rewrite BRG once again to support MeshShaders?
- can we use meshlets approach through BRG

- when we can expect shader graph support for this?

Thanks
Click to expand...

Glad you like it.

So to answer your questions: Yes it is a bit like MDI but it is still a CPU draw loop. It will allow you to switch Mesh (including submesh) and Material between the draws in this loop (each draw command).

But currently this does not interact nicely with GPU generated draws, the draw commands exposed by the BRG is generated and consumed on the CPU. We are looking into supporting the equivalent to DrawProcedural which could mean at least almost full MDI support depending on what the underlying graphics API supports. In our prototype this becomes a new type of DrawCommand that allows you to provide the compute buffers needed.

Mesh shaders do not really fit into our roadmap as it is now, but if/once the shader pipeline and device backends for them are planned out we will be sure to support them.

And finally, yes this works with shader graph. At least if used in an HDRP/URP context where we have written the required code generation.

Hope this answers your questions. We can't commit to how the API will evolve but we will continue to improve it.

VincentBreysse · Apr 20, 2022

hippocoder said: ↑

Please give me a real world example of why I need to re-register these? Probably just need more peripheral information why I do these things, not I should do these things...

I'm guessing just to alter the rendered mesh or material whenever it might be needed? For example with grass: never, in my case?
Click to expand...

Also, in a real world project you would likely have new objects created dynamically during the game.
For example, let's say you want to instantiate a projectile when shooting at something. If the projectile uses some materials or meshes which were unknown on startup, then you would need to register them dynamically to the BatchRendererGroup to be able to perform the rendering.
Currently you can't render anything with a BRG using materials or meshes which haven't been registered beforehand.
In the same way, deregistration should also happen dynamically so that the data associated with unused registered materials/meshes is freed up. The details regarding when and how you register/deregister the resources are up to you.

In this post, we talk about a simple example scene in which all the materials and meshes are known on startup. So technically you can just gather all of them in the Start() function, then register everything and be done with it. But in real world projects it usually won't be that simple. The example in this post is mainly for illustration purpose.

hippocoder · Jan 26, 2022

Thanks, as you can imagine the performance gains here don't really make this optional for any serious published game, or for use in VR (Vulkan at the moment if Quest2).

I use URP currently.

My projects are currently mobobehaviour and gameobject based but my need for BRG API is for rendering static level geometry, details, and things that I need a lot of but can't afford the overhead (but have predictable characteristics so don't need to be gameobjects). So if you could keep it in mind for future examples (accelerating common static level rendering tasks), I would really appreciate it and probably learn a lot.

Ideally I'd like to replace all my rendering with this if I can, and am willing to build some functions to make life easier. Anything along those lines like a strategy or decent setup to build on top for dev QOL that you could advise would be great.

I aim to try and get the most out of this valuable performance gain!

And:

joelv said: ↑

To optimize performance, most of the things in this function should be a Burst job. If you use Burst jobs, Unity builds a job chain with the returned job handle here to filter, prepare, and execute draws matching filter settings from draw renderers and draw shadows commands in the SRP.
Click to expand...

An example for that too if possible. I sound awfully selfish but this feature is a big deal, especially if like me, you're not doing an ECS project but a regular one with a heck of a lot of draw calls (large open world rendering on rubbish hardware).

Thanks regardless as this was a much needed feature.

YuriyPopov · Jan 26, 2022

I'm wondering if we can get a decent HDRP example with helpful comments like the URP example. I too like hippocoder would like to replace all my rendering with this asap.

JesOb · Jan 26, 2022

joelv said: ↑

Hope this answers your questions.
Click to expand...

Thanks

Another questions:

- Can we expect to have simple for loop of draw calls (without state change) when we have same material same mesh just different submesh and per instance data offset in buffer?
All DrawIndexed commands must be equal, just different parameters so it must be super fast.

- Unity have DrawProcedural that is equivalent of glDrawElementsIndirect
when/do unity will expose MultiDrawProcedural that will be equivalent of glMultiDrawElementsIndirect?

Lymdun · Jan 26, 2022

joelv said: ↑

Future work
In the coming year, the Hybrid Rendering team will land improvements to the Hybrid Renderer package to use this interface, as well as work on interface improvements and bug fixes on the core Unity side of things.
Click to expand...

Hi!
Do you plan to work on a new renderer using this interface for GameObjects/MeshRenderer without any dependency to DOTS, as seen in https://twitter.com/SebAaltonen/status/1407661348197175299 ?

kite3h · Jan 27, 2022

It is very difficult to raise Draw, which is originally in the low-level area, to the script level, but the reason for entrusting this to the front-end stage is not to leave Occlusion Culling to the developer's discretion.

joelv · Jan 27, 2022

Jes28 said: ↑

Thanks
- Can we expect to have simple for loop of draw calls (without state change) when we have same material same mesh just different submesh and per instance data offset in buffer?
All DrawIndexed commands must be equal, just different parameters so it must be super fast.
Click to expand...

Currently I do think mesh switching is a bit costly, even submesh. It means some additional data encoded per draw in the backend and some buffers to be bound. We are looking into optimizing this.

Jes28 said: ↑

- Unity have DrawProcedural that is equivalent of glDrawElementsIndirect
when/do unity will expose MultiDrawProcedural that will be equivalent of glMultiDrawElementsIndirect?
Click to expand...

I am not aware of any plans for exposing that API. However it might happen in context of the BatchRendererGroup at some point.

joelv · Jan 27, 2022

Lymdun said: ↑

Hi!
Do you plan to work on a new renderer using this interface for GameObjects/MeshRenderer without any dependency to DOTS, as seen in https://twitter.com/SebAaltonen/status/1407661348197175299 ?
Click to expand...

The test scripts in the repository linked in the original post actually originates from Sebastians hackweek project. We are maintaining these but currently they are not feature complete (missing lightprobe and lightmap support, missing static batching support).

Our teams focus is the Hybrid Renderer and we can't spread ourselves too thin at the moment. I would love to be able to reimplement full Game Object rendering on top of BRG though, but it's nothing I can promise =)

Tim-C · Jan 27, 2022

hippocoder said: ↑

My projects are currently mobobehaviour and gameobject based but my need for BRG API is for rendering static level geometry, details, and things that I need a lot of but can't afford the overhead (but have predictable characteristics so don't need to be gameobjects). So if you could keep it in mind for future examples (accelerating common static level rendering tasks), I would really appreciate it and probably learn a lot.

Ideally I'd like to replace all my rendering with this if I can, and am willing to build some functions to make life easier. Anything along those lines like a strategy or decent setup to build on top for dev QOL that you could advise would be great.
Click to expand...

We have a branch here where we have been experimenting with allowing static objects to render via BRG. This is super super experimental but can serve as a bigger example of how to put a lot more things into the BRG. This is also using some engine API's that have not made it into a release yet so it will likely not just work or just compile for you but the entry point is the monobehaviour here.

It's likely a pretty okay starting point if you want to make your own 'fast brg for my static objects'. Note we also do some stuff for objects that are moving, but rigid so they can also work through BRG. This just leaves some things like skinned meshes and effects going through the gameobject path. Note: Don't look into the deferred material stuff, that's even more experimental and builds on top of the BRG gameobject rendering and is even less stable than the super experimental static objects in BRG.

Tim-C · Jan 27, 2022

Specifically if you take a look at that branch ignore anything that is turned on setting this to true:
https://github.com/Unity-Technologi...core/Runtime/BatchRenderer/RenderBRG.cs#L1357
public bool EnableDeferredMaterials = false;

Lurking-Ninja · Jan 27, 2022

Tim-C said: ↑

We have a branch here where we have been experimenting with allowing static objects to render via BRG.
Click to expand...

Oh boy, this will be epic if it ever makes into prod. +1 from me, I pretty much would like to see this developed properly.

Krajca · Jan 27, 2022

Is this similar to vulkan multithreaded rendering?

joelv · Jan 27, 2022

Krajca said: ↑

Is this similar to vulkan multithreaded rendering?
Click to expand...

In one way you can see of it like that: multiple threads can build draw commands. However the commands you provide here are CPU draw commands. They need to be issued to the GPU using some low level API calls later on. These can happen from multiple threads as well if you have graphics jobs enabled.

YuriyPopov · Jan 27, 2022

Tim-C said: ↑

We have a branch here where we have been experimenting with allowing static objects to render via BRG. This is super super experimental but can serve as a bigger example of how to put a lot more things into the BRG. This is also using some engine API's that have not made it into a release yet so it will likely not just work or just compile for you but the entry point is the monobehaviour here.

It's likely a pretty okay starting point if you want to make your own 'fast brg for my static objects'. Note we also do some stuff for objects that are moving, but rigid so they can also work through BRG. This just leaves some things like skinned meshes and effects going through the gameobject path. Note: Don't look into the deferred material stuff, that's even more experimental and builds on top of the BRG gameobject rendering and is even less stable than the super experimental static objects in BRG.
Click to expand...

You should consider giving this a much higher priority if you ask me. The tech seems amazing and would allow us to do more. Getting something into the hands of people sooner will be better.

joelv · Jan 27, 2022

The manual page was just published.
Please have a look for more details how this all works

https://docs.unity3d.com/2022.1/Documentation/Manual/batch-renderer-group.html

mgear · Jan 27, 2022

URP, 10 million default cubes, is this expected results? (using that sample script)

JussiKnuuttila · Jan 27, 2022

mgear said: ↑

URP, 10 million default cubes, is this expected results? (using that sample script)
View attachment 994974
Click to expand...

The sample script does not use Burst to keep the sample simple. If you improve it and use Burst jobs to fill the large arrays (I think in the sample it's just the visible instance indices), you should see much better performance.

Alternatively, if you just want to test best case performance, you can make a cached copy of the array and use UnsafeUtility.MemCpy to copy it into place. This is not what a real game would do, since it would assume static visibility, but for a simple test like this it could be OK.

hippocoder · Jan 27, 2022

Tim-C said: ↑

We have a branch here where we have been experimenting with allowing static objects to render via BRG. This is super super experimental but can serve as a bigger example of how to put a lot more things into the BRG. This is also using some engine API's that have not made it into a release yet so it will likely not just work or just compile for you but the entry point is the monobehaviour here.

It's likely a pretty okay starting point if you want to make your own 'fast brg for my static objects'. Note we also do some stuff for objects that are moving, but rigid so they can also work through BRG. This just leaves some things like skinned meshes and effects going through the gameobject path. Note: Don't look into the deferred material stuff, that's even more experimental and builds on top of the BRG gameobject rendering and is even less stable than the super experimental static objects in BRG.
Click to expand...

Oh now you have done it! This is just the ticket and I think will help a major ton of devs! Thank you, and I hope URP would get some kind of new static path like this for classic projects. So many would get a free boost from it. Until then I am going to poke around with gratitude!

Thank you!

tatoforever · Jan 27, 2022

joelv said: ↑

[...]
Previously you could do this with Graphics.DrawMeshInstanced of course, but it would require you to upload a lot of matrices every frame as well as having to write custom shaders for any per instance data you would like to override. With the new BRG it is possible to write shaders that support both this and regular game objects and we support URP/Lit and HDRP/Lit (among others) out of the box. Also take these measurements with a grain of salt, but they do show that it’s possible to render much more efficiently with the new API.
Click to expand...

Just to be 100% sure I got it right. When we upload data (or register it) using the API, does it remains on the GPU untill we unload it?

CodeRoadOne · Jan 27, 2022

The new code seems to solve some issues and I think it will work correctly with multiple lights.

@mgear I also tried to do your test using GPU culling, but my GPU (GeForce 1050GTX TI) cannot handle so much data unfortunately.
I did a test with around 2 mil cubes (127x127x127) here are my results:

CPU time to emit the visibility is 0.034ms and to emit the draw calls is 0.017ms
Total time on CPU with updating of the GPU data is 0.11ms
I'm sure as soon as I fix the GPU occlusion culling it will run even faster.

Of course my version doesn't handle correctly all lights and uv lightmaps per instance. So getting speed improvements inside Unity it will always be better. I'm waiting for the release of Dots so I can fully use this new features.

Thank you for sharing!

Grimreaper358 · Jan 27, 2022

Will there be default BRG optimizations enabled? I don't know much about graphics programming so just checking if we will only have an API to use or if this will also have a default implementation in the engine.

I could just add my assets to the scene as usual and maybe follow certain rules and have it work.

YuriyPopov · Jan 28, 2022

CodeRoadOne said: ↑

The new code seems to solve some issues and I think it will work correctly with multiple lights.

@mgear I also tried to do your test using GPU culling, but my GPU (GeForce 1050GTX TI) cannot handle so much data unfortunately.
I did a test with around 2 mil cubes (127x127x127) here are my results:
View attachment 995148
CPU time to emit the visibility is 0.034ms and to emit the draw calls is 0.017ms
Total time on CPU with updating of the GPU data is 0.11ms
I'm sure as soon as I fix the GPU occlusion culling it will run even faster.

Of course my version doesn't handle correctly all lights and uv lightmaps per instance. So getting speed improvements inside Unity it will always be better. I'm waiting for the release of Dots so I can fully use this new features.

Thank you for sharing!
Click to expand...

I just have to ask. What method did you use to implement gpu occlusion culling and how?

joelv · Jan 28, 2022

tatoforever said: ↑

Just to be 100% sure I got it right. When we upload data (or register it) using the API, does it remains on the GPU untill we unload it?
Click to expand...

For GraphicsBuffers you create and upload data to (matrices, overridden properties and so on) it is persistent and up to you to update and manage.

Meshes and Materials are a bit more complicated. Once you register them they will be recognized by the system but if some code deletes them the BRG will tag them as deleted and will just stop drawing any draw commands referencing the deleted mesh/material.

The data you provide in the culling callback is transient. It's only used for one frame and then freed.

joelv · Jan 28, 2022

Grimreaper358 said: ↑

Will there be default BRG optimizations enabled? I don't know much about graphics programming so just checking if we will only have an API to use or if this will also have a default implementation in the engine.

I could just add my assets to the scene as usual and maybe follow certain rules and have it work.
Click to expand...

We are not replacing any unity part with this as it is now. It's the foundation for the Hybrid Renderer to be able to render entities, and it is usable if you want to write a custom renderer. The scripts linked in this threads are just examples of what is possible, and we may use something like this in the future so speed up general rendering.

hippocoder · Jan 28, 2022

joelv said: ↑

we may use something like this in the future so speed up general rendering
Click to expand...

Awesome, hope this happens!

CodeRoadOne · Jan 28, 2022

YuriyPopov said: ↑

I just have to ask. What method did you use to implement gpu occlusion culling and how?
Click to expand...

Hi @YuriyPopov,
The occlusion culling is following the standard approach from Nanite. Use the previous frame to do the culling for the current frame.
Render all visible instances, next using the new depth do another visibility pass and render the rest of objects.
Right now I have an issue with the algorithm and it is not working as expected I need to debug.
But the current implementation is using frustum culling, size culling and lod switching, everything from the GPU. Still one draw call per material, but doesn't care if you have different meshes.
If I can have more than one light it will be usable, so that's why I'm waiting for improvements in Dots
I hope I answered your question.

YuriyPopov · Jan 28, 2022

CodeRoadOne said: ↑

Hi @YuriyPopov,
The occlusion culling is following the standard approach from Nanite. Use the previous frame to do the culling for the current frame.
Render all visible instances, next using the new depth do another visibility pass and render the rest of objects.
Right now I have an issue with the algorithm and it is not working as expected I need to debug.
But the current implementation is using frustum culling, size culling and lod switching, everything from the GPU. Still one draw call per material, but doesn't care if you have different meshes.
If I can have more than one light it will be usable, so that's why I'm waiting for improvements in Dots
I hope I answered your question.
Click to expand...

I dont get how you do GPU culling with the this api at all. Do you first gather data, feed a compute shader, then read the buffer back on the cpu and omit the draw commands ?

CodeRoadOne · Jan 28, 2022

YuriyPopov said: ↑

I dont get how you do GPU culling with the this api at all. Do you first gather data, feed a compute shader, then read the buffer back on the cpu and omit the draw commands ?
Click to expand...

Just for clarifications, the code that I did it is not using this API (only URP), but I think it can be adjusted. The basic idea is to store all the data in a persistent buffer on the GPU. Execute a compute shader that will generate a list of instances that needs to be rendered. Using a draw indirect emitted from the CPU for all different materials and the let the compute to generate the actual data for the draw call.
I've sent you a private message with more info about this if you would like to know more (I don't want to pollute this thread more that I already did).

JesOb · Feb 2, 2022

Jes28 said: ↑

- Unity have DrawProcedural that is equivalent of glDrawElementsIndirect
when/do unity will expose MultiDrawProcedural that will be equivalent of glMultiDrawElementsIndirect?
Click to expand...

joelv said: ↑

I am not aware of any plans for exposing that API. However it might happen in context of the BatchRendererGroup at some point.
Click to expand...

I have found that Unity already Expose new Api for this Graphics.RenderMeshIndirect and it will support multi-draw paradigm in future.

Please
make BRG interface like this api so it can easily enable usage of low level multi-draw api in future.
May be expose few different ways to store draw commands, like in new Graphics.RenderX method family, so we can provide draw commands from GraphicBuffer and others

Goals is to efficiently draw many different mesh instances with same material, like:
- draw all different props on level in one go
- draw all chunks of VoxelWorld in one go
- draw one district of level (one constructor) in one go
- perform culling and LOD selection on GPU side write commands and render fast batches from GraphicBuffer
- ...

joelv · Feb 2, 2022

Jes28 said: ↑

I have found that Unity already Expose new Api for this Graphics.RenderMeshIndirect and it will support multi-draw paradigm in future.

Please
make BRG interface like this api so it can easily enable usage of low level multi-draw api in future.
May be expose few different ways to store draw commands, like in new Graphics.RenderX method family, so we can provide draw commands from GraphicBuffer and others

Goals is to efficiently draw many different mesh instances with same material, like:
- draw all different props on level in one go
- draw all chunks of VoxelWorld in one go
- draw one district of level (one constructor) in one go
- perform culling and LOD selection on GPU side write commands and render fast batches from GraphicBuffer
- ...
Click to expand...

We have an experimental implementation for almost all of that (regular mesh draw, direct procedural, indirect procedural) in a branch. On DX11 multiple indirect draws is emulated as a loop on CPU side and of course there is no way to provide a late command count as a buffer so all commands in the range will always be executed.

How and when this actually will land is still not decided. We want to get the interface right and ensure it actually covers everything. We'll get back to you once it hits some future unity version beta, but it won't happen during the unity 22.X stream.

linfuqing · Feb 10, 2022

Will this api be used for Hybrid Renderer in ecs 0.5?
And will support the Point Light next version?

JussiKnuuttila · Feb 10, 2022

This API is not used by the version in 0.5, and is used by the next version.

Point light support depends on URP, the Hybrid Renderer requires a screen space technique for local lights, such as deferred or forward+.

DDKH · Feb 11, 2022

Will this API support GLES 3.1 in ECS 1.0 ?

JussiKnuuttila · Feb 11, 2022

TzarGames said: ↑

Will this API support GLES 3.1 in ECS 1.0 ?
Click to expand...

We are aiming for this, and are currently working on it.

However, it is possible that due to technical reasons, the GLES3.1 version might work slightly differently and have different performance characteristics.

Thermos · Feb 11, 2022

Hybrid Renderer in ECS 0.17 is already pretty fast..Can we expect a better performance by adopting this API when ECS reachs 1.0？

JussiKnuuttila · Feb 14, 2022

Thermos said: ↑

Hybrid Renderer in ECS 0.17 is already pretty fast..Can we expect a better performance by adopting this API when ECS reachs 1.0？
Click to expand...

One of the goals of the new Hybrid Renderer is to provide good performance also in situations where the current Hybrid Renderer has trouble due to technical limitations. Examples include transparencies (which require depth sorting), and scenes with a lot of Mesh/Material variety.

optimise · Feb 14, 2022

JussiKnuuttila said: ↑

However, it is possible that due to technical reasons, the GLES3.1 version might work slightly differently and have different performance characteristics.
Click to expand...

Is that only phone that supports GLES3.1 will work properly? How about GLES3.0 only phone?

JussiKnuuttila · Feb 14, 2022

optimise said: ↑

Is that only phone that supports GLES3.1 will work properly? How about GLES3.0 only phone?
Click to expand...

We are trying to make it possible for the BatchRendererGroup API itself to support GLES3.0, but since it doesn't have support for compute shaders or SSBOs, users will be more limited in the ways they can upload instance data on GLES3.0 only devices.

Hybrid Renderer uses compute shaders, and will require GLES3.1.

optimise · Feb 14, 2022

JussiKnuuttila said: ↑

We are trying to make it possible for the BatchRendererGroup API itself to support GLES3.0, but since it doesn't have support for compute shaders or SSBOs, users will be more limited in the ways they can upload instance data on GLES3.0 only devices.

Hybrid Renderer uses compute shaders, and will require GLES3.1.
Click to expand...

I see but for Hybrid Renderer how u handle GLES3.1 mobile phones that cannot support compute shader?

DreamingImLatios · Feb 14, 2022

optimise said: ↑

I see but for Hybrid Renderer how u handle GLES3.1 mobile phones that cannot support compute shader?
Click to expand...

Isn't compute shader support required for OpenGL ES 3.1?

joelv · Feb 14, 2022

optimise said: ↑

I see but for Hybrid Renderer how u handle GLES3.1 mobile phones that cannot support compute shader?
Click to expand...

DreamingImLatios said: ↑

Isn't compute shader support required for OpenGL ES 3.1?
Click to expand...

Yes. 3.1 requires compute and that is the target for the Hybrid Renderer for now.
Supporting non compute capable devices would require a rewrite of the data model, doable but it would require quite some work and would lead to worse CPU performance for those devices.

joshuacwilde · Feb 17, 2022

joelv said: ↑

Yes. 3.1 requires compute and that is the target for the Hybrid Renderer for now.
Supporting non compute capable devices would require a rewrite of the data model, doable but it would require quite some work and would lead to worse CPU performance for those devices.
Click to expand...

Just want to say I am very happy to see Unity embracing the latest technologies (ES 3) as ES 2 is basically non-existent these days, especially for mobile gamers who care at all about gaming.

Looking forward to seeing this trend continue. Maybe in the future it will be worth it for us to switch to URP. Every mobile dev knows that every bit of performance matters on this platform.

YuriyPopov · Feb 18, 2022

@joelv So I finally got around to doing to some test on this and I'm getting very mixed results.
I'm using a exact copy of the RenderBRG script from the URP test project. I'm using Unity 2022.1.0b8.2747 and URP: 13.1.5 with forward rendering and XR single pass instanced enabled.
In my test scene I have about 7k renderers, 1 dir light with shadows and 3 points lights with shadows. Naturally the point lights don't work with the BRG path. I'm getting my frame timings from SteamVR. With the BRG enabled I get about 5.8ms on average, however when I toggle it off and restore the normal renderers I get about 5.4ms on average.
Any insights on why this is happening and what I could be doing wrong ?

JussiKnuuttila · Feb 18, 2022

YuriyPopov said: ↑

@joelv So I finally got around to doing to some test on this and I'm getting very mixed results.
I'm using a exact copy of the RenderBRG script from the URP test project. I'm using Unity 2022.1.0b8.2747 and URP: 13.1.5 with forward rendering and XR single pass instanced enabled.
In my test scene I have about 7k renderers, 1 dir light with shadows and 3 points lights with shadows. Naturally the point lights don't work with the BRG path. I'm getting my frame timings from SteamVR. With the BRG enabled I get about 5.8ms on average, however when I toggle it off and restore the normal renderers I get about 5.4ms on average.
Any insights on why this is happening and what I could be doing wrong ?
Click to expand...

The RenderBRG script in the URP test project is a relatively simple test script, and does not have very sophisticated culling, especially for shadow maps. My first guess would be that the RenderBRG version is rendering more things in the shadow maps than the GameObject version is. You can check whether this is the case e.g. by using RenderDoc.

VincentBreysse · Feb 21, 2022

@YuriyPopov There is a known issue in the 22.1 RenderBRG.cs script related to the picking that causes huge performance issues. You can find a throw away branch with a fix here. It has already landed to master, but that's for 2022.2. So alternatively you can probably just grab the latest RenderBRG.cs script on Graphics master.

Search Unity

Unity ID

Useful Searches

Official New BatchRendererGroup API for 2022.1

Unity Technologies

Attached Files:

Digital Ape

Unity Technologies

Unity Technologies

Unity Technologies

Digital Ape

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Guest

Unity Technologies

Unity Technologies

Unity Technologies

Digital Ape

Unity Technologies

Unity Technologies

Digital Ape

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies