Search Unity

Question How Hybrid Rendering v2 is passing material override properties to the shader

Discussion in 'Graphics for ECS' started by Opeth001, Jan 7, 2021.

  1. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    Hello Everyone,

    I'm making my own hybrid Renderer and i would like to get an idea on how the HRV2 is passing the material property override to the shader.
    Any explanation or link to an example/doc will be highly appreciated.

    Thanks!
     
    deus0 likes this.
  2. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,770
    May I ask why?
     
  3. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    The hybrid Renderer v2 is very unstable:
    1) invisible Entities bug.
    2) not compatible with SRP batcher.
    3) takes too much time to load on mobile for big Subscenes. (+1min)
    4) not working with URP SimpleLit Shader.(Free Performance Drop)
    5) has a very weird bug where the full screen get disco Effect on some devices.

    Plus is not supporting some features our game require:
    1) Support for Terrains.
    2) Lightmaps.
    3) Material Properties Override not working correctly for mid-end devices.( OpenGL ES 3.1+)
     
    apkdev likes this.
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    I suspect the issues you are encountering are happening at the BatchRendererGroup layer, so I won't go into how that works (you can find a description at the bottom of HR's docs page). The other piece is that they use a struct containing 128 instances of DynamicComponentTypeHandle which are used for fetching the instanced properties which then get uploaded to compute buffers dictated by the BatchRendererGroup API.
     
    Opeth001 likes this.
  5. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Yeah the Hybrid Renderer V2 has a "GPU ECS" storage buffer which we update each frame with changed override data.
    When drawing we have an index and meta data buffer so that we can fetch out the separate properties (indices are stored in the batch renderer group and passed on from there when drawing).

    We are working towards addressing all of these points. Hybrid V2 is fully built on top of the SRP batcher, lightmaps should now be supported as well in the latest release. Terrain is still to be road mapped and we cannot guarantee GLES 3.1 support as it is now, but it's still under discussion.

    Anything else you have listed above sounds like bugs we would really like repro cases for, irregardless of if you decide to go for your own implementation or not. I'm especially interested in the perf drop you seem to have encountered in the URP shaders.
     
    apkdev, Opeth001 and DreamingImLatios like this.
  6. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    I'm very interested in how we can add our own custom properties to our custom rendering solutions.
    My chunk renderer looks like this:
    protected override void OnUpdate()
    {
    var entities = chunkMeshes.ToEntityArray(Allocator.TempJob);
    var positionData = chunkMeshes.ToComponentDataArray<ChunkPosition>(Allocator.TempJob);
    for (var i = 0; i < entities.Length; i++)
    {
    float3 position = positionData[i].GetVoxelPosition(voxelDimensions).ToFloat3();
    var positionFloat4x4 = float4x4.TRS(position, identityRotation, scale);
    Matrix4x4 positionMatrix = positionFloat4x4;
    var entity = entities[i];
    var renderMesh = EntityManager.GetSharedComponentData<ChunkMeshLink>(entity);
    Graphics.DrawMesh(renderMesh.mesh, positionMatrix, renderMesh.material, 0);
    }
    positionData.Dispose();
    entities.Dispose();
    }

    By having a simpler system to render our entities, we have more control and can optimize data more anyway. There are alot more benefits then using the solution that fits all scenarios. I am just wondering how we can pass in data into the shaders, based on each mesh (per mesh) colors. I could easily grab the data with an EntityQuery, i'm just wondering how we can pass it into the Graphics utility.
    I believe originally, the timing was around 8-12 ms per frame for the normal rendering systems, whereas the custom one for chunks went down to less then 1ms. (not exact numbers, just from memory, there was a significant speedup though)
     
    vanyfilatov likes this.
  7. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Steal one of the 0's in the positionMatrix to instead represent an index into a compute buffer that holds your unique properties. Have the vertex shader parse the index from the matrix and replace it with the zero. If using a custom shader, sample the property in the vertex shader and pass it through an interpolator. If using shader graph, parse the index again in the fragment shader nodepath too and feed it to a custom function node.
     
  8. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    Hi DreamingImLatios, I didn't fully understand what you wrote, but it did something good. Somehow your comment triggered my brain neurons to finally solve the problem. I also double checked the unity docs on Graphics DrawMesh functions.
    If anyone has any ideas to optimize it, please let me know haha. Could really use something like 'GetSharedComponentDataFromEntity' from unitys side. Early ECS had a query to get a list of shared components, but wasn't added with newer ECS packages.
    This was the solution for inserting custom properties into a custom render system.
    Code (CSharp):
    1.  
    2.     public class MinivoxRenderSystem : SystemBase
    3.     {
    4.         private EntityQuery chunkMeshes;
    5.         private MaterialPropertyBlock materialPropertyBlock;
    6.         private UnityEngine.Color setColor;
    7.  
    8.         protected override void OnCreate()
    9.         {
    10.             chunkMeshes = GetEntityQuery(ComponentType.ReadOnly<RenderChunkMesh>(), ComponentType.ReadOnly<MinivoxChunkRender>(),
    11.                 ComponentType.ReadOnly<MaterialBaseColor>());
    12.             materialPropertyBlock = new MaterialPropertyBlock();
    13.         }
    14.  
    15.         protected override void OnUpdate()
    16.         {
    17.             //var entities = chunkMeshes.ToEntityArrayAsync(Allocator.TempJob, out var jobHandle);
    18.             //Dependency = JobHandle.CombineDependencies(Dependency, jobHandle);
    19.             var entities = chunkMeshes.ToEntityArray(Allocator.TempJob);
    20.             var positions = GetComponentDataFromEntity<MinivoxChunkRender>(true);
    21.             var colors = GetComponentDataFromEntity<MaterialBaseColor>(true);
    22.             var scale = new float3(1,1,1);
    23.             var identityRotation = quaternion.identity;
    24.             for (var i = 0; i < entities.Length; i++)
    25.             {
    26.                 var e = entities[i];
    27.                 var position = positions[e].position;
    28.                 var positionFloat4x4 = float4x4.TRS(position, identityRotation, scale);
    29.                 Matrix4x4 positionMatrix = positionFloat4x4;
    30.                 var entity = entities[i];
    31.                 var renderChunkMesh = EntityManager.GetSharedComponentData<RenderChunkMesh>(entity);
    32.                 var color = colors[e].Value;
    33.                 setColor.r = color.x;
    34.                 setColor.g = color.y;
    35.                 setColor.b = color.z;
    36.                 setColor.a = color.w;
    37.                 materialPropertyBlock.SetColor("_BaseColor", setColor);
    38.                 Graphics.DrawMesh(renderChunkMesh.mesh, positionMatrix, renderChunkMesh.material, 0, null, 0, materialPropertyBlock);
    39.                 // Graphics.DrawMesh(renderChunkMesh.mesh, positionMatrix, renderChunkMesh.material, 0);
    40.             }
    41.             entities.Dispose();
    42.         }
    43.     }
     
  9. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    deus0 likes this.
  10. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    I did check the thread, but I wasn't able to see any useful information. They mentioned Graphics.DrawMeshInstancedProcedural but I am using unique meshes, so that does not apply to me I believe.
    Oh and something about copying directly to the compute buffer.. sounded useful? Would that replace MaterialBlocks I wonder.. Just not sure if that was what you meant?
     
  11. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    Sorry, I did not read very carefully, if you strictly have unique meshes, much of the thread I copied does not apply, apologies.
    You can use drawmeshinstancedindirect and use compute buffers instead of mpb. As usual dreaming’s advice is good...
     
    deus0 likes this.
  12. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    I'm not familiar enough with the performance characteristics between DrawMesh and MaterialPropertyBlock, nor do I know enough about your target platforms, so I would have to see actual profiling characteristics before I can fully recommend trying swapping MaterialPropertyBlock with a compute buffer if you aren't familiar enough with the technique.

    But a different question, is there a reason you are not using Entities.ForEach in your most recent code snippet?
     
    deus0 likes this.
  13. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    Ahh there was no good reason... I do remember grabbing this code from the earliest ECS days and haven't looked at it enough since. I knew something was wrong. I apologize in advance for a long post.

    As for the compute buffers, it should be quicker then using the material property blocks. I'm just a bit unfamiliar with them. I just checked the documentation:
    https://docs.unity3d.com/ScriptReference/ComputeBuffer.SetData.html
    I think this function is the best to use with ECS:
    public void SetData(NativeArray<T> data);
    Not sure how to implement it when it comes to using it, but I found a tutorial that will surely help:
    https://catlikecoding.com/unity/tutorials/basics/compute-shaders/

    I was able to get the timing down for 250 Unique Grass Models to 0.8ms with setting the material property block, and 0.5ms without the material base being set (commented out). I am sure if I implement the ComputeBuffer using the NativeArray, while updating the NativeArray in a parallel job, and only recreating the NativeArray when the size changes. From experience, it should become close to 0.01ms for that amount of data being moved to parallel systems.

    The catlike tutorial also went over how to set up compute shaders to work with shadergraph by using a custom function node. Seems a little complicated, but essentially we are just declaring data into the GPU using the ComputeBuffer class, and then using the Shader to read from it. So this lets us do a number of awesome things. I could essentially move any of my game logic to the GPU by setting up the data properly! *a whole new world* This will help me with my other roadblock of not being able to implement vertex skinning deformations. https://forum.unity.com/threads/dots-animation-skinning.1039747/

    Using DrawMeshProcedural or DrawMeshIndirect should do wonders for my custom particle systems though. I am already eyeballing them results. The Sprite Renderer project linked was very impressive at showcasing the technology.

    Code (CSharp):
    1.  
    2.     // todo: use attribute to make this work in editor
    3.     [UpdateInGroup(typeof(PresentationSystemGroup))]
    4.     public class MinivoxRenderSystem : SystemBase
    5.     {
    6.         private UnityEngine.MaterialPropertyBlock materialPropertyBlock;
    7.         private UnityEngine.Color setColor;
    8.  
    9.         protected override void OnCreate()
    10.         {
    11.             materialPropertyBlock = new UnityEngine.MaterialPropertyBlock();
    12.         }
    13.  
    14.         protected override void OnUpdate()
    15.         {
    16.             Entities.ForEach((Entity e, in RenderChunkMesh renderChunkMesh, in MinivoxChunkRender minivoxChunkRender, in MaterialBaseColor materialBaseColor) =>
    17.             {
    18.                 var color = materialBaseColor.Value;
    19.                 setColor.r = color.x;
    20.                 setColor.g = color.y;
    21.                 setColor.b = color.z;
    22.                 setColor.a = color.w;
    23.                 materialPropertyBlock.SetColor("_BaseColor", setColor);
    24.                 UnityEngine.Graphics.DrawMesh(renderChunkMesh.mesh, minivoxChunkRender.positionMatrix, renderChunkMesh.material, 0, null, 0, materialPropertyBlock);
    25.             }).WithoutBurst().Run();
    26.         }
    27.     }
    28.  
    (I'm also not sure how we can get this to draw in editor when pausing the game, after selecting a GameObject it disappears, I was hoping there was an attribute we can add to the system, tried many so far)
     
    Last edited: Jan 17, 2021
  14. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,683
    Nope (it's useful but not the best):) Begin\EndWrite (if used properly according to rules) it's a fastest and best possible approach in Unity for writing into Compute Buffers. Depends on the hardware it always will be faster or equal (in very rare cases) to SetData but never slower.
     
    deus0 and Opeth001 like this.
  15. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    What is the correct way to write data to Compute buffers using Begin \ EndWrite?
    the ComputeBuffer.BeginWrite api returns a NativeArray, does that mean we can write to it in Bursted jobs?
    if so, what is the best approach to access multiple ComputeBuffers NativeArrays in parallel jobs?

    Thanks!
     
    deus0 likes this.
  16. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    3,356
    Snippet from our instanced indirect buffer handling. You could I guess complicate this by doing the copy in a job but really it's just a memcpy under the hood so quite cheap. And if you needed to optimize buffer handling you would likely aggregate or use some other approach based on context. And for reasons other then trying to optimize out a memcpy.

    Code (csharp):
    1.  
    2. NativeArray<IndirectInstanceData> bufferData = SourceBuffer.BeginWrite<IndirectInstanceData>(0, instanceCount);
    3. NativeArray<IndirectInstanceData>.Copy(Matrices, 0, bufferData, 0, instanceCount);
    4. SourceBuffer.EndWrite<IndirectInstanceData>(instanceCount);
    5.  
     
    deus0 and Opeth001 like this.
  17. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    I wanted to do it in parallel because I thought setting data from CPU to GPU was very slow. also in my case this process doesn't happen every frame only when the map regions are Enabled / Disabled, but I am copying it to a high number of compute buffers (~ 100).
     
    Last edited: Jan 18, 2021
    deus0 likes this.
  18. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    Is there a limit on the number of compute buffers used?
     
    deus0 likes this.
  19. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    I'm in the same situation, dealing with thousands of possible updates. Having it all in a parallel job would be the best! I will probably try to work on this on the weekend, hopefully can get it to work such a way! Even in my above code, I wonder if anyone knows a way (a unity/systematic way) to check if the component was updated? That would optimize it a bit anyway.
    I wonder if begin/endwrite is faster then unitys mesh functions for setting mesh vertex/uv/tri/normal data :O
     
    Opeth001 likes this.
  20. Opeth001

    Opeth001

    Joined:
    Jan 28, 2017
    Posts:
    1,117
    Yes, you can execute your system when one of the concerned CD version has changed.

    I'm sorry I can't help you with this I'm not using procedural mesh rendering. GL :)
     
    deus0 likes this.
  21. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,683
    https://forum.unity.com/threads/nativearrays-and-drawmeshinstanced.522888/#post-6727681
    And yes you can do it inside bursted job.
     
    Jakky27, Opeth001 and deus0 like this.
  22. joelv

    joelv

    Unity Technologies

    Joined:
    Mar 20, 2015
    Posts:
    203
    Hybrid Renderer V2 SparseUploader code for the 0.10 or the upcoming 0.11 package is probably a good place to look at for Begin/EndWrite usage. It can be improved somewhat depending on what API you run on.

    We keep a pool of compute buffers around, and each frame we get as many buffers as we need from the pool and call BeginWrite on them. We can then use the returned native arrays on burst jobs and will later call EndWrite on the buffers.
    These are main thread API sadly so some room for improvements is possible (EndWrite on job fence completion or similar).

    In 0.11 we also use a small pool of compute buffers where we do minimal async readbacks which we use to test for GPU liveness. We plan to implement a real interface so that we can track where the GPU is so reusing buffers is easier.

    EDIT: A side note here is that currently there is no guarantees about the memory where SubUpdates compute buffers are placed since they need to be CPU writable and GPU readable. Different GPUs and APIs have different setups.
    It might be that they are placed in a way that makes them slightly slower to read from the GPU depending on access patterns, so please be aware of that. This is also something that is tricly to unify between all APIs and GPUs we support.
     
  23. deus0

    deus0

    Joined:
    May 12, 2015
    Posts:
    256
    I ended up created a ComputeBuffer for every unique grass mesh. Originally I tested it by batching them, but the function couldn't draw unique meshes unless it's one call (UnityEngine.Graphics.DrawMeshInstancedIndirect) per mesh. So one instance per buffer.. It seems very inefficient. I read a forum post on polygon soup but I don't think that's a good solution? It seems better if we could store a 2 dimensional buffer on the GPU, and have a set of mesh data per instance. And then call that with one call.
    I used the same approach, but increasing the amount of buffers, for GPU skinning. I just had to update bone matrices every frame. But using GPUs to calculate parallel stuff.. who would of known how useful...
    I feel like there should be a better way to store data per unique mesh, for meshes that can change a lot during game-play, but perhaps not? perhaps that isn't the norm haha.
    I am looking forward to new GPU buffer tools and more job use-age. Thanks for the hard work.