Search Unity

  1. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

DrawMeshInstancedIndirect Example Comments and Questions

Discussion in '5.6 Beta' started by Noisecrime, Dec 14, 2016.

  1. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Hi,

    Just tested out the example provided for the new DrawMeshInstancedIndirect method from here and it works really well. However it took a little while to sort out as its just the scripts and I made a few changes regarding the Surface Shader that might be useful to incorporate back into the example.

    Firstly some quick notes
    Setting up a new project/scene in Unity will likely default to having soft shadows and depending on the quality setting this might include 2 to 4 cascades. This can heavily impact the performance since each cascade requires rendering all the instances again. Unfortunately I was testing in an existing project set to 4 cascades so the performance was roughly 4 times less than having no shadows.

    I recommend initially you set up the quality to no cascades and maybe even test with shadows disabled on the directional light.

    Its also best to run in deferred rendering mode as forward mode appears to take an extra little hit with additional passes such as the depth pass that is needed when doing shadows or if you use more than one light source. Mind you I've not tested with multiple lights, I did try with DrawMeshInstanced and found they weren't rendered at all in forward.

    I also noticed unlike DrawMeshInstanced Unity is unable to report the total tris/verts in the stats overlay. Unsure if that is a bug or simply not possible?


    Example Code
    Made a couple of changes here.

    Firstly I moved the SetBuffer() call from Update() to end of UpdateBuffers() since the data doesn't change unless you change the instance count. Having it in Update() didn't seem to affect performance, but seems odd having it there.

    Secondly I added a conditional check in UpdateBuffers for instance count being 0 as that will cause errors.
    e.g.
    Code (CSharp):
    1. if ( instanceCount < 1 ) instanceCount = 1;

    Surface Shader
    In order to correctly render shadows you need to add additional pragma defines, specifically 'addshadow'.
    I imagine for forwardShadows you would need to add 'fullforwardshadows' too, but didn't test that.
    e.g.
    Code (CSharp):
    1. #pragma surface surf Standard addshadow
    Finally in setup() I changed _Time.x to _Time.y to speed up the rotation of the assigned mesh that is instanced, otherwise it can be quite hard to see the movement.

    On my GTX970 it was able to render 2 million individually rotating cubes at approx 35 fps, that dropped to 18 fps with no cascade shadows in deferred rendering, which is to be expected.


    So now the Questions
    I assume that for surface shaders the setup() function is required to derive the ObjectToWorld and WorldToObject matrices as those are no longer being generated or passed in by Unity?

    I was able to add a custom vertex method to the surface shader that used 'unity_InstanceID' to grab specific instance data in order to modify the vertex positions, but have been unable to do the same inside the Surf function.

    For example this code
    Code (CSharp):
    1.     void surf (Input IN, inout SurfaceOutputStandard o)
    2.         {
    3. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    4.             float4 col = colorBuffer[unity_InstanceID];
    5. #else
    6.             float4 col = float4(0,0,0,1);
    7. #endif
    8.  
    9.             // Albedo comes from a texture tinted by color
    10.             fixed4 c = tex2D (_MainTex, IN.uv_MainTex) * col;
    11.             o.Albedo = c.rgb;
    12.             // Metallic and smoothness come from slider variables
    13.             o.Metallic = _Metallic;
    14.             o.Smoothness = _Glossiness;
    15.             o.Alpha = c.a;
    16.         }
    Results in the instances being rendered as 'black' ( not completely due to ambient lighting etc ). This would suggest that UNITY_PROCEDURAL_INSTANCING_ENABLED is no longer defined for the surf method. if I remove the define conditional check then I get the error
    undefined variable "colorBuffer" undefined variable "unity_InstanceID"

    Not sure what I need to do to resolve this?


    Finally at some point I want I want to investigate per instance culling by using a computeShader to calculate which instances are within the frustrum and thus fill the instances per TRS matrices into a computeBuffer. I think from memory that is possible using one of the ComputeBuffer types and reading back the count value or something. Would this approach work?
     
  2. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    Hey,

    Great to see you getting such good results with this new API! And glad our example code got you up and running quickly!

    Correct. It's possible in theory, but our current tech relies on knowing the instance count on the CPU, which isn't necessarily true here. While the demo script does know it, this API lets you populate the count on the GPU. Reading back this number to the CPU would be very slow. We may look at solutions for this in the future though.

    Thanks for the fix. I'll update our docs!

    Good idea! Although I think I'll just change the GUI slider range in the docs, to fix this in the example :)

    Thanks again - added to the docs example!

    Good idea - updated the docs

    Correct. You can set them up however you like. Maybe your instance data buffer will contain full matrices. Maybe you have no need for rotation. Maybe your instance data only contains a theta, to spin the meshes around 1 axis. Notice this is where we configure the custom rotation, in our example. If you didn't need rotation, you could set them up far more efficiently, and probably get even more cubes rendering at a good fps :)

    I was able to add this code to the example, in the surf function, and it works fine for me:

    Code (CSharp):
    1. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    2.             c.gb = (float)(unity_InstanceID % 256) / 255.0f;
    3. #else
    4.             c.gb = 0.0f;
    5. #endif
    This makes the cubes appear more/less red based on their instance id. Does that code work for you?

    Yes, this is possible, and is a great use for this tech! If you have your raw instance data in a ComputeBuffer, you can dispatch a ComputeShader to inspect every item, and add each visible instance to an Append Buffer. This Append Buffer would then become the input to DrawMeshInstancedIndirect. The ComputeShader would simply need to load the instance position, and test that against the camera frustum (factoring in the size of the instance too, i.e. it's Bounding Sphere or AABB. Then, you can use ComputeBuffer.CopyCount to copy the culled instance count into the Indirect Args Buffer used for DrawMeshInstancedIndirect.

    You could even implement hierarchical culling etc, if you wanted. It's totally up to you how you filter your instances in your "CullingComputeShader".



    Hope this all helps, good luck, and thanks again for the great feedback!
     

    Attached Files:

    • red.png
      red.png
      File size:
      960.1 KB
      Views:
      2,712
  3. XaneFeather

    XaneFeather

    Joined:
    Sep 4, 2013
    Posts:
    90
    I apologize in advance if I hijack your thread like this, but I've had some time with DrawMeshInstancedIndirect myself yesterday and found various issues and I felt it would be best to merge all related issues into one thread rather than opening a new thread.

    First off, I have made very similar changes to the script and moved SetBuffer() down to the UpdateBuffers() method among other various miscellaneous changes. That said, my primary concern was using the new DrawMeshInstancedIndirect API to render rich patches of vegetation, both ground cover and trees. Vegetational assets tend to have multiple submeshes so I made various changes to support rendering a dynamic amount of submeshes, with an option to turn off specific submeshes for debugging purposes. My steps were as follows:
    • Created new CommandBuffers for each submesh synonymous to the already existing argsBuffer and fed it with the correct amount of indices of the respective submesh. I figured, that alternatively I could reuse the same CommandBuffer to populate the indices and use the argsOffset parameter to point it accordingly. But I wanted to make sure there weren't any issues tied to that approach.
    • Created and assigned new materials used by each submesh and fed the positionBuffer respectively.
    • Rendered each submesh using DrawMeshInstancedIndirect using the correct material and CommandBuffers, e.g.;
    In my tests I noticed that the performance differs greately depending on the mesh used. Singular meshes with no submeshes seemed to generally perform better and some meshes with submeshes even managed to crash Unity occasionally. I couldn't find a reliable way to reproduce the crash but I'll keep experimenting.

    I also couldn't find an explanation as to why submeshes seem to perform worse, even horribly at times. There seems to be a huge overhead associated to rendering meshes with submeshes and the frametime occasionally spiking up to a multitude of its rendering time. Here's a screencap of the profiler showcasing one of the mentioned occasions:

    I couldn't manage to break up Gfx.WaitForPresent into more atomic ops even with the Deep Profiler turned on. And I made sure VSync was turned off as well. I can definitely tie this overhead to the DrawMeshInstancedIndirect method, as turning it off during runtime got rid of any time spent in WaitForPresent.

    Another issue I noticed was that DrawMeshInstancedIndirect seems to have issues rendering submeshes properly. In my tests, I rendered the same mesh + 2 submeshes via the default MeshRenderer and DrawMeshInstancedIndirect using the same material. Here is the screencap of the model rendered by the default MeshRenderer:

    And here is the same meshes rendered in DrawMeshInstancedIndirect, while correctly iterating through all its submeshes:

    The tree's first two submeshes rendered just fine while the last submesh seemed to be a subset of the first submesh. It's rather odd. I managed to reproduce the issue with every mesh I had that featured submeshes and it was always the last submesh that did not render properly.

    I will spend more time experimenting with the new instancing API and report back whenever I find new oddities. The issues reported in this post will be filed in a bug report later today with the project I prepared.

    I'm wondering if anyone else can confirm my sightings.
     
    Noisecrime likes this.
  4. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Hey, thanks for the help.

    So this is weird. I tried your example and it does work, so I went back over my version and was still unable to get it to work until I explicitly set the else condition to be non-black!

    e.g.
    This results in all black cubes
    Code (CSharp):
    1. float4 col = 1.0f;
    2.  
    3. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    4.         col = colorBuffer[unity_InstanceID];
    5. #else
    6.         col = float4(0, 0, 0, 1);
    7. #endif
    8.  
    9. fixed4 c = tex2D(_MainTex, IN.uv_MainTex) * col;
    Yet this, where i simply make the else color blue results in all different colored cubes
    Code (CSharp):
    1. float4 col = 1.0f;
    2.  
    3. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    4.         col = colorBuffer[unity_InstanceID];
    5. #else
    6.         col = float4(0, 0, 1, 1);
    7. #endif
    8.  
    9. fixed4 c = tex2D(_MainTex, IN.uv_MainTex) * col;
    This is based on passing in a structuredBuffer<float4> to the shader using simple c# code to make a random color
    Code (CSharp):
    1. colors[i]        = new Vector4( Random.value, Random.value, Random.value, 1f );
    2.  
    3. <snip>
    4.  
    5. colorBuffer.SetData(colors);
    6. instanceMaterial.SetBuffer("colorBuffer", colorBuffer);
    I can only assume some weird edge case in the shader compiler happening here?

    Don't forget to update the pad input code too then since its clamped to 0 as well.
     
    Last edited: Dec 14, 2016
  5. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Can't help with the submesh issue as i've avoided using them, in fact I had assumed up till recently they wouldn't be supported with instancing, though not sure why I thought so.

    As for Gfx.WaitForPresent I wouldn't worry about it. Unless I'm wrong ( please someone correct me if I am) that normally suggests you are GPU bound which would be expected when using instancing. As long as you're overall framerate is above your desired value ( eg. 60 fps ) then it should just mean you have cpu time to burn for whatever purposes you'd like. If you are below target framerate then you are simply pushing the gpu too hard and will have to cut back.

    For example most of my R&D with this is towards extreme numbers of fully skinned meshes, each with a unique animation ( yeah been playing too much Total War: Warhammer ). Whilst developing with DrawMeshInstanced I noticed my gpu fans start up immediately and checking the gpu I can see it jump instantly to 100% usage, something I have rarely if ever seen in any of my other Unity projects and in even in many games. Of course this is mostly happening due to disabling v-sync, so the GPU simply pushes as hard as it can, and as instancing is so performant, pushing millions of verts/tris per frame ( I think I was pushing 35 million tris/35 millon verts all palette matrix skinned a frame at 80 fps on my GTX970) the gpu is for once is fully utilised, thus takes longer than normal ( or more specifically, longer than the cpu ) to render a frame.
     
  6. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    Yeah that is weird!
     
  7. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    168
    It seems that we have a bug in DrawMeshInstancedIndirect code that results in incorrect submesh to render. Will fix asap.
     
    richardkettlewell likes this.
  8. XaneFeather

    XaneFeather

    Joined:
    Sep 4, 2013
    Posts:
    90
    I don't mean to push, but any ETA on this? Couldn't find anything in the changelogs so far that would address this as fixed.
     
  9. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Came across a few more bugs

    862862 ComputeBuffer CopyCount Broken

    As far as I can tell copycount into an argumentBuffer for an AppendBuffer appears broken or I'm using it very wrongly. This is awkward for DrawMeshInstanceIndirect since the whole point is to avoid any gpu readbacks.

    Current workaround is to use a readback i.e. copycount into new computeBuffer then getData() on it, finally put the value ( instances) into the real argument array and copy that to the argumentBuffer.


    864476 DrawMeshInstanceIndirect: Shadow Passes Broken
    This is a bit of weird edge case, where is would appear from investigation that there are situations where Unity's created Shadow Pass drawmesh calls are passing the wrong computeBuffer data to it.

    Specifically I have one computeShader that determines the indices of instances that are within frustum and places the index into one of four appendBuffers. This way I can make four calls to DrawMeshInstanceIndirect(), each providing a specific mesh and specific 'valid instance indices look up array' to draw all four Mesh Lods. This works fine for rendering the instances, but enable shadows and you get the results in the attachment below.

    What appears to be happening is that each Unity Shadow pass drawMesh call is using the same indices appendBuffer, instead of the correct one. Its almost like the last value assigned in the normal drawMesh calls is used instead.

    Again was able to work around this by adding a fifth appendBuffer that stores all indices within frustum and then draws them all with a single Lod Mesh.


    863817 ComputeShaders: No SetVectors() or SetBools()
    I noticed that for ComputeShaders there is no SetVectors(), or SetBools() methods. Also there isn't a uint type, but unsure if that is required or if simply passing an int will automatically get cast?


    Finally I ran into a number of issues when switching from making one DrawMeshInstanceDirect to four, one for each LOD. Initially I just updated the code to run in a loop, but it didn't work. I think something like Unity didn't store the changes I made to the buffers in the material between each drawcall being cached. This sounds logical, so I figured MaterialPropertyBlocks would be the way to go. However using one block had the same problems. I guess maybe you have to use multiple MPB? In the end I went simple and just created duplicates of the material and assigned the buffers to the create one. Need to look at this again as I can't help feeling i'm missing something.
     

    Attached Files:

    richardkettlewell likes this.
  10. Kaneleka

    Kaneleka

    Joined:
    Sep 23, 2013
    Posts:
    18
    Did you use the example code under Start for creating your argumentBuffer? The example just looks odd to me, in particular the second ComputeBuffer argument for stride.
     
  11. Kaneleka

    Kaneleka

    Joined:
    Sep 23, 2013
    Posts:
    18
    Finally got around to installing this beta and implementing DrawMeshInstancedIndirect into my own scene to test out the performance gains. It did lower my CPU load by 1.5ms or so over DrawMeshInstanced. I'm pleased with that!

    I noticed everyone in this thread relocated the SetBuffer() call from Update() to the end of UpdateBuffers(), and even the documentation was updated to reflect that. However, I noticed that without the SetBuffer() call in Update() my objects would get cleared from the screen randomly or anytime I took the Window focus away from Unity and back again.
    I'm sure someone from Unity is thinking -- "Now I remember, that's why it was in the Update()"...
    I'd love to know the real technical reason why this happens however, or if there's another workaround.

    I'm filling the buffer with a positional Matrix4x4 for each instance (w/ nonuniform scaling) and was wondering if there was a "proper" way of getting the inverse on this for unity_WorldToObject within the setup? I did find a function that was graciously posted on the forums that works fine, however, am curious to know if anyone knows what the correct or most optimal way of obtaining this is.
     
  12. XaneFeather

    XaneFeather

    Joined:
    Sep 4, 2013
    Posts:
    90
    I think what we also need is a function to create a 4x4 TRS matrix inside a shader, allowing us to only populate the buffers with raw transform data to reduce its byte-size. I've written my own shader function to achieve this for now, but it only allows for uniform scaling. Also, I honestly think there's no need to re-invent the wheel here either - exposing such a function in the Unity.cginc would benefit everyone.
     
  13. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Quick update on the assumed copycount bug. It wasn't a bug, but silly user error, I had got confused between the argument array and the argument buffer, so I had used a dstOffset as an index, instead of index of bytes. I.e. I had used a value of 1 instead of 1 * sizeof(uint)!

    Damn silly mistake, that I just never caught despite looking at the line over and over. On the plus side due to the bug report Unity will amend the method declaration to state dstOffsetBytes, improve the documentation and possibly add a check to ensure the offset is a multiple of 4.

    I've applied the fix and it did provide a performance increase, though not quite as much as I had hoped for. I need to profile further as I was sure that using the readback method to set the copycount was stalling the gpu significantly.


    Oh and for reference to anyone else wondering about this, here is the reply to my questioning the lack of SetVectors and SetBools. Seems to make sense.

     
    Last edited: Feb 2, 2017
    DrBlort and richardkettlewell like this.
  14. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Not sure about the randomly getting cleared from screen bit, but on my machine I've found when using buffers if I alt-tab out of Unity editor or if running windowed and switch to another application that I often lose the effect of the shader as if the buffer has been lost. I wonder if that is the same situation here. Weird thing is on a colleagues machine with a different gpu but the same brand ( nvidia ) this doesn't happen.

    I'm really unclear as to the responsibility of the developer in this case. Should we be constantly resetting the buffer every frame or not? It seams silly to think that we should, but I believe there are cases where we currently might have to. perhaps this is an area Unity can look into more.
     
    Last edited: Feb 2, 2017
  15. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    This might be a good idea, i'm pretty sure I did some work on this too as well as playing around with whether or not the inverse was needed specific shaders.

    However I don't think its a good idea in general to generate it ( a proper inverse ) in the shader since it ( I assume ) must create quite a large overhead as it will need to be re-generated for every vertex and geneerating a true inverse is a costly function to begin with.

    Obviously for some specific cases there is no choice, it has to be generated in the shader, but I think where ever possible it would be more advantageous to generate it on the c# side. Perhaps for best overall efficiency maybe do it in a ComputeShader, so we gain the speed of GPU calculations, as well as avoiding having to pass the data from cpu to gpu and of course its only done per matrix instead of per vertex as it would in a shader.
     
    Last edited: Feb 2, 2017
  16. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    It's a trade-off, so it's probably not possible to say what's best in different use-cases. eg:
    - a low-poly mesh vs a high poly mesh
    - the cost of cpu matrix inversions vs. gpu (including bandwidth to upload the constant data)

    I'd be inclined to try generating the inverses in a compute shader, so you can keep the logic/data on the GPU, while still doing the inversion once per instance, instead of per vert. But this still increases memory bandwidth usage, compared to the pure ALU solution of doing it in the vertex shader.... so many variables to consider :)

    If it helps, here is a vertex-shader solution I've been using during prototyping for something else.. it may contain bugs, and may not be as fast as it could be. It does show that there is quite a lot of ALU required to do this task though :(

    Code (CSharp):
    1.             // transform matrix
    2.             unity_ObjectToWorld._11_21_31_41 = float4(data.transform._11_21_31, 0.0f);
    3.             unity_ObjectToWorld._12_22_32_42 = float4(data.transform._12_22_32, 0.0f);
    4.             unity_ObjectToWorld._13_23_33_43 = float4(data.transform._13_23_33, 0.0f);
    5.             unity_ObjectToWorld._14_24_34_44 = float4(data.transform._14_24_34, 1.0f);
    6.  
    7.             // inverse transform matrix
    8.             float3x3 w2oRotation;
    9.             w2oRotation[0] = unity_ObjectToWorld[1].yzx * unity_ObjectToWorld[2].zxy - unity_ObjectToWorld[1].zxy * unity_ObjectToWorld[2].yzx;
    10.             w2oRotation[1] = unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[2].yzx - unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[2].zxy;
    11.             w2oRotation[2] = unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[1].zxy - unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[1].yzx;
    12.  
    13.             float det = dot(unity_ObjectToWorld[0], w2oRotation[0]);
    14.    
    15.             w2oRotation = transpose(w2oRotation);
    16.  
    17.             w2oRotation *= rcp(det);
    18.  
    19.             float3 w2oPosition = mul(w2oRotation, -unity_ObjectToWorld._14_24_34);
    20.  
    21.             unity_WorldToObject._11_21_31_41 = float4(w2oRotation._11_21_31, 0.0f);
    22.             unity_WorldToObject._12_22_32_42 = float4(w2oRotation._12_22_32, 0.0f);
    23.             unity_WorldToObject._13_23_33_43 = float4(w2oRotation._13_23_33, 0.0f);
    24.             unity_WorldToObject._14_24_34_44 = float4(w2oPosition, 1.0f);
    The world matrix is uploaded as a float3x4 transform;
     
    Noisecrime likes this.
  17. Kaneleka

    Kaneleka

    Joined:
    Sep 23, 2013
    Posts:
    18
    I believe this is the same issue I'm encountering and also suspected it may be dependent upon the GPU used. In my case it happens consistently on an AMD R9 series GPU.
     
    Noisecrime likes this.
  18. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Do you happen to have a simple example project you could log as a bug with Unity? The only project I have where I noticed this is rather large and complex so not a good case study. Even if its not an explicit bug, reporting it might get some official stance on dealing with it.
     
    LeonhardP likes this.
  19. Kaneleka

    Kaneleka

    Joined:
    Sep 23, 2013
    Posts:
    18
    I unfortunately don't have a simple example project with this bug either. Like you my main project which I'm seeing this occur in is way too large and complex to submit. Which is likely a determining factor...
    What's interesting is that within this main project a new simple scene with just a camera and Unity's example script/shader in it I'll get the same disappearing act. However, in a completely new project I'm unable to duplicate the issue even with the same scene and project settings configured.
    I'm afraid I may have to resort to removing assets/plugins one by one in my main project in hopes of tracking this down, or just leave my workaround in place for now and hope it magically fixes itself later.
     
    Noisecrime likes this.
  20. jason-fisher

    jason-fisher

    Joined:
    Mar 19, 2014
    Posts:
    133
    I just hit on an interesting issue using 5.06b4 with the example project: https://github.com/noisecrime/Unity-InstancedIndirectExamples --

    In the editor with 1.1m instances, vsync off and the profiler recording (deep mode disabled) I get 2.7ms CPU times for about 340fps. As soon as I disable recording in the profiler, CPU time doubles to 5ms and framerate drops to ~180fps. Maybe that is reproducible/related to some of the other issues we are seeing?
     
  21. DrBlort

    DrBlort

    Joined:
    Nov 14, 2012
    Posts:
    69
    I just tested that in 5.6.0b6 and the same happens here. Not the same times (10ms to 8ms, aprox) but I have an older GPU (GTX 770).
     
    jason-fisher likes this.
  22. Skolstvo

    Skolstvo

    Joined:
    Dec 21, 2015
    Posts:
    107
    Is there a reason the manual example surface shader has object to world matrix conversions, whereas the custom shader has no object to world conversions? These shaders perform the same transform manipulation, Sorry if this has been answered in this thread.

    I'm not too familiar with matrix transformation and could find much in the Unity manual about those values.

    Surface shader example
    Code (csharp):
    1.  
    2. unity_ObjectToWorld._11_21_31_41 = float4(data.w, 0, 0, 0);
    3. unity_ObjectToWorld._12_22_32_42 = float4(0, data.w, 0, 0);
    4. unity_ObjectToWorld._13_23_33_43 = float4(0, 0, data.w, 0);
    5. unity_ObjectToWorld._14_24_34_44 = float4(data.xyz, 1);
    6. unity_WorldToObject = unity_ObjectToWorld;
    7. unity_WorldToObject._14_24_34 *= -1;
    8. unity_WorldToObject._11_22_33 = 1.0f / unity_WorldToObject._11_22_33;
    9.  
    Custom shader example
    Code (csharp):
    1.  
    2. o.pos = mul(UNITY_MATRIX_VP, float4(worldPosition, 1.0f));
    3.  
     
  23. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    The ObjectToWorld and worldToObject matrices are often used beyond simple transforms of the vertex data, for example converting view direction from object to world or vice versa.


    The first surface shader is a 'standard' surface shader, meaning it is part of the physically based shading (PBS) system.

    I always thought the naming of the Unity 5 physically based shaders as 'Standard' was a poor choice as it doesn't really convey what the shader is doing. It is a complex uber-shader that means it can be tailored to generate code for different features/requirements ( e.g. it can have or not have normal maps ) based on defines. This means the majority of the code is not found in the shader itself but in the various Unity Shader cginc files ( download the built-in Unity shaders from the download/beta page to see them ).

    Anyway the PBS method requires the WorldToObject matrix for some of its code paths, for example converting light direction or camera/view direction into object space. If you do a search through the unity cginc files you'll find methods that reference WorldToObject, or ObjectToWorld and then you can search for those methods to see which features are calling them and why they are required. So if the StandardShader required the parallax mapping feature then it uses ObjSpaceViewDir method which itself requires WorldToObject matrix.

    The second shader is much simpler it performs all its lighting calculations within that code sample and as you can see its features do not explicitly require worldToObject conversions.
     
    Last edited: Feb 5, 2017
    JoeStrout, LeonhardP and DrBlort like this.
  24. Skolstvo

    Skolstvo

    Joined:
    Dec 21, 2015
    Posts:
    107
    Thanks for clearing that up. So I have to look inside the .cginc files and ignore the documentation. They are confusing to wade through. I'm just figuring out what I need to do if i want different translation while still using the standard shader with indirect instantiation.

    Thank you very much for your help Noisecrime.
     
  25. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    I too have just started experimenting with this feature in 5.6.

    I'm trying to see if a DX11 workflow using DrawInstancedIndexedIndirect and heavy use of instancing can be successfully implemented in Unity. ( I should say that I'm not very familiar with Unity and that the workflow is AEC related and probably not a common gaming scenario)

    After a few expriments (and the example was very helpful in getting me started) I discovered I was trying to reverse engineer a mapping between DrawMeshInstancedIndirect and DrawInstancedIndexedIndirect.

    Being lazy and not being able to find a definitive answer in the beta docs maybe somebody has insights on the following questions:

    1) What, in Unity terms, is the meaning of start index location, base vertex location, start instance location in the args buffer. I guess they map to the DX args but just in case what are the units for the locations? (byte, word, genuine index)
    2) Is it OK in Unity to define two independent objects in a Mesh? I guess it must be otherwise start index location, base vertex location have no meaning. In defining the mesh does index get added to the base vertex location or is it an absolute in the vertex array?
    3) How does start instance location relate to unity_InstanceID in a surface shader? I might expect that it should be added to unity_InstanceID but it doesn't appear to be the case? If not how do you get the value in the shader. This is needed so that you can define your own instance data buffers.
    4) On point 3 - is it actually intended that you must use a MaterialPropertyBlock for this to work correctly? If so, then is this still a constant buffer which would severely restrict the number of instances that could be supported.
    5) The 64k(ish) vertex limit/16 bit index array will be limiting on the number of instances that can be processed with a given material without binding new mesh vertex/instance arrays - I know this one has been going for years but any plans to relax this and allow 32 bit indices?

    Maybe I'm trying to push the feature too far but it seems tantalisingly close to what I need.
     
  26. DrBlort

    DrBlort

    Joined:
    Nov 14, 2012
    Posts:
    69
    I can answer to #4, I used thousands of instances (i.e., more than 1023) with just null instead of an MPB, and used my own (computed) buffers with the instance information. No problems so far.
     
    richardkettlewell likes this.
  27. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    1) same as DX
    2) do you mean submeshes? The api only draws one submesh per call (submesh index is a parameter)
    3) added to unity_instanceID
    4) answered in previous reply. Manual page also has an example with no MPB
    5) 64k vert limit has nothing to do with how many instances you can submit. Each instance may not have more than 64k verts. I used the example on the manual page to render 3m cubes, each with 24 verts/36 indices, giving a total of 72m vertices/108m indices

    https://docs.unity3d.com/560/Documentation/ScriptReference/Graphics.DrawMeshInstancedIndirect.html and https://msdn.microsoft.com/en-us/library/windows/desktop/ff476410(v=vs.85).aspx also have lots more info.
     
  28. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    Thanks for the replies all. A few comments below.

     
  29. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    More replies :)

    2) Sounds like you just want to store a big polygon soup of vertices and indices, and then draw different regions of them. Should be fine as long as you set IndexCountPerInstance, StartIndexLocation and BaseVertexLocation accordingly.
    3) Might be a bug.. if you can prove it with a repro project, report a bug and we will fix it
    5) My memory is a bit hazy on this one, but I'm inclined to think that, as long as no individual region has more than 64K vertices, you can use BaseVertexLocation (etc) to read outside the usual 64K limits. E.g:

    If you have 100,003 vertices in a mesh, and you want to draw the last triangle, I think you want the data to look like this:
    - Index Buffer Contents: [0,1,2]
    - BaseVertexLocation [100000]
    - IndexCountPerInstance [3]

    If you're also packing lots of objects into the same index buffer too, I believe you can set StartIndexLocation to point to the start of each region. The indices themselves will all still be zero-based for each region.

    Hope that makes sense.
     
  30. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    Makes perfect sense, thanks.

    On the last point unfortunately the Mesh object will throw an error if you try to add more than 65000 vertices. I wonder if this test is on the assumption of only one object per mesh with 16 bit indices or there is something more fundamental.

    If it's the former, now with this instancing capability it would be useful if the test could be relaxed.
     
    Noisecrime likes this.
  31. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    On point 2, I think my tests still indicate that unity_InstanceID runs from 0 for each DrawMeshInstancedIndirect call regardless of the value of start instance location. Before I work this up to a simple repro. has anyone else tried this and can confirm or otherwise that it works as intended?
     
  32. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    I've not used this feature of instancing yet, but it is something I wanted to explore at some point.

    Having reported a number of bugs for this feature and being rather embarrassed with several of them due to silly typos/misunderstandings ( e.g. supplying an offset value instead of a offset in bytes value), it may be worth posting your code here first to see if anyone can spot anything obvious wrong with it, before taking the time to make a bug report up.

    If you do report it as a bug, please keep the thread informed as to the results.
     
  33. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Yeah I think that is a limitation we're sadly going to be stuck with for some time. I know that Unity have mentioned that they are wanting to improving the whole Mesh concept in Unity, making it easier to use and I would hope that would include lifting the 16 bit indices limitation, though i'm not sure if its ever been mentioned.
     
    Last edited: Feb 28, 2017
  34. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    I think this demonstrates the point. Can anyone spot any obvious mistakes and/or is it reproducible on other systems?

    It follows the pattern of the example if you want to try it for yourselves.

    With iArgOffset set to 0, I expect to see a green and red mesh, set to 1 a green and cyan mesh if unityInstanceID is behaving as expected but I actually get green and red in both cases.

    TestBehaviourScript.cs
    Code (CSharp):
    1.  
    2. using System.Collections;
    3. using System.Collections.Generic;
    4. using UnityEngine;
    5.  
    6. public class TestBehaviourScript : MonoBehaviour {
    7.     public Mesh instanceMesh;
    8.     public Material instanceMaterial;
    9.     public int iArgOffset = 0;
    10.  
    11.     private ComputeBuffer argsBuffer;
    12.     private uint[] args = new uint[5] { 0, 0, 0, 0, 0 };
    13.  
    14.     // Use this for initialization
    15.     void Start () {
    16.  
    17.         argsBuffer = new ComputeBuffer(2, args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
    18.         UpdateBuffers();
    19.     }
    20.  
    21.     // Update is called once per frame
    22.     void Update () {
    23.  
    24.         int argsOffset = iArgOffset * 4 * 5;
    25.         Graphics.DrawMeshInstancedIndirect(instanceMesh, 0, instanceMaterial, new Bounds(Vector3.zero, new Vector3(10000.0f, 10000.0f, 10000.0f)), argsBuffer, argsOffset);
    26.  
    27.     }
    28.  
    29.     void UpdateBuffers()
    30.     {
    31.  
    32.         uint[] argssum = new uint[2 * 5];
    33.  
    34.         // indirect args
    35.         uint numIndices = (instanceMesh != null) ? (uint)instanceMesh.GetIndexCount(0) : 0;
    36.         argssum[0] = numIndices;
    37.         argssum[1] = 2;
    38.         argssum[2] = 0;
    39.         argssum[3] = 0;
    40.         argssum[4] = 0;
    41.         argssum[5] = numIndices;
    42.         argssum[6] = 2;
    43.         argssum[7] = 0;
    44.         argssum[8] = 0;
    45.         argssum[9] = 1;
    46.  
    47.         argsBuffer.SetData(argssum);
    48.  
    49.     }
    50.  
    51.     void OnDisable()
    52.     {
    53.  
    54.         if (argsBuffer != null)
    55.             argsBuffer.Release();
    56.         argsBuffer = null;
    57.     }
    58.  
    59. }
    60.  
    TestInstancedSurfaceShader.shader
    Code (CSharp):
    1.  
    2. Shader "Instanced/TestInstancedSurfaceShader" {
    3.     Properties{
    4.         _MainTex("Albedo (RGB)", 2D) = "white" {}
    5.     _Glossiness("Smoothness", Range(0,1)) = 0.5
    6.         _Metallic("Metallic", Range(0,1)) = 0.0
    7.     }
    8.         SubShader{
    9.         Tags{ "RenderType" = "Opaque" }
    10.         LOD 200
    11.  
    12.         CGPROGRAM
    13.         // Physically based Standard lighting model
    14. #pragma surface surf Standard addshadow
    15. #pragma multi_compile_instancing
    16. #pragma instancing_options procedural:setup
    17.  
    18.         sampler2D _MainTex;
    19.     float4 col;
    20.     struct Input {
    21.         float2 uv_MainTex;
    22.     };
    23.  
    24. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    25.  
    26. #endif
    27.  
    28.  
    29.  
    30.  
    31.     void setup()
    32.     {
    33. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    34.  
    35.         col = float4(0, 0, 1, 1.0);
    36.  
    37.         if (unity_InstanceID == 0)
    38.         {
    39.             col = float4(1, 0, 0, 1.0);
    40.             unity_ObjectToWorld._14_24_34_44 = float4(0, 1, 4, 1.0f);
    41.         }
    42.         else if (unity_InstanceID == 1)
    43.         {
    44.             col = float4(0, 1, 0, 1.0);
    45.             unity_ObjectToWorld._14_24_34_44 = float4(1.5, 1, 4, 1.0f);
    46.         }
    47.         else
    48.         {
    49.             col = float4(0, 1, 1, 1.0);
    50.             unity_ObjectToWorld._14_24_34_44 = float4(-1.5, 1, 4, 1.0f);
    51.         }
    52.  
    53.  
    54.         unity_ObjectToWorld._11_21_31_41 = float4(1, 0, 0, 0.0f);
    55.         unity_ObjectToWorld._12_22_32_42 = float4(0, 1, 0, 0.0f);
    56.         unity_ObjectToWorld._13_23_33_43 = float4(0, 0, 1, 0.0f);
    57.  
    58.         float3x3 w2oRotation;
    59.         w2oRotation[0] = unity_ObjectToWorld[1].yzx * unity_ObjectToWorld[2].zxy - unity_ObjectToWorld[1].zxy * unity_ObjectToWorld[2].yzx;
    60.         w2oRotation[1] = unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[2].yzx - unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[2].zxy;
    61.         w2oRotation[2] = unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[1].zxy - unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[1].yzx;
    62.         float det = dot(unity_ObjectToWorld[0], w2oRotation[0]);
    63.         w2oRotation = transpose(w2oRotation);
    64.         w2oRotation *= rcp(det);
    65.  
    66.         float3 w2oPosition = mul(w2oRotation, -unity_ObjectToWorld._14_24_34);
    67.  
    68.         unity_WorldToObject._11_21_31_41 = float4(w2oRotation._11_21_31, 0.0f);
    69.         unity_WorldToObject._12_22_32_42 = float4(w2oRotation._12_22_32, 0.0f);
    70.         unity_WorldToObject._13_23_33_43 = float4(w2oRotation._13_23_33, 0.0f);  
    71.         unity_WorldToObject._14_24_34_44 = float4(w2oPosition, 1.0f);
    72.  
    73. #endif
    74.     }
    75.  
    76.     half _Glossiness;
    77.     half _Metallic;
    78.  
    79.  
    80.     void surf(Input IN, inout SurfaceOutputStandard o) {
    81.         fixed4 c = tex2D(_MainTex, IN.uv_MainTex);
    82.         o.Albedo = col.rgb;
    83.         o.Metallic = _Metallic;
    84.         o.Smoothness = _Glossiness;
    85.         o.Alpha = 1.0f;
    86.     }
    87.     ENDCG
    88.     }
    89.         FallBack "Diffuse"
    90. }
     
    Noisecrime and richardkettlewell like this.
  35. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    Looks sensible to me.. sounds like a possible bug!
     
  36. DrBlort

    DrBlort

    Joined:
    Nov 14, 2012
    Posts:
    69
    Tried it in my system (5.6.b7) and yes, seems like the start instance location is being ignored.

    I actually didn't understand what to use that parameter for, and made my shaders assuming that unity_InstanceID started at zero every time. Go figure.
     
  37. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    OK - I've submitted a bug report.

    On the 16 bit index issue I could live with 16 bit indices (although with a bit of complaining) if Meshes actually worked as Richard suggested. What's the right way to discover if the 65000 vertex check is a fundamental limitation or an assumption that indices are 16 bit?
     
  38. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,625
    From a quick glance at our mesh code, it looks like there is no way right now for you to avoid this error when using our Mesh API.

    It looks like as assumption that indices are 16-bit, rather than a limitation, but simply removing it isn't an option, as it helps inform the other 99% of users about meshes that are too big ;)

    We are working towards a more flexible mesh API, but don't have any information yet about when it might be ready.

    Sorry I can't offer you any workarounds for this limitation :(
     
  39. tengel

    tengel

    Joined:
    Aug 27, 2014
    Posts:
    4
    Hello.

    I've been playing around with DrawMeshInstancedIndirect for a while, and added a few more examples to the ones wrote by noisecrime. It might be useful to somebody https://github.com/tiiago11/Unity-InstancedIndirectExamples.

    I came across a case where I'd like to use a vertex shader to apply other transformations on the vertices. These transforms values would be per-instance. Is there any way to pass per-instance data to the vertex shader?
    Thanks.
     
    Bamfax, bzor and Noisecrime like this.
  40. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    So, yet another question. I want to render many instances of a number of shapes (these are genuine single meshes not packed) using a single material so I might expect something like this in Update to work:

    Code (CSharp):
    1.        
    2.        for (int i = startMeshes; i < startMeshes + meshCount; i++)
    3.         {
    4.             instanceMaterial.SetBuffer("transformBuffer", meshes[i].transforms);
    5.             instanceMaterial.SetBuffer("colorBuffer", meshes[i].colors);
    6.             Graphics.DrawMeshInstancedIndirect(meshes[i].mesh,
    7.                                                0,
    8.                                                instanceMaterial,
    9.                                                meshes[i].bounds,
    10.                                                meshes[i].drawArgs,
    11.                                                0);
    12.         }
    13.  
    The actual result is some apparently random combination of one mesh, transform and colour buffer

    I think this is the same problem that Noisecrime described on Dec 29 -

    Do we think this is expected behaviour? Or are we both missing something?

    I'm inclined to think it's a bug, or at least some mechanism is required to force the transformBuffer/colorBuffer to rebind inbetween calls to DrawMeshInstancedIndirect (which may already exist). Otherwise each instance mesh would require its own material.

    Any other opinions/workarounds/pointers to how it should work?
     
  41. eagle555

    eagle555

    Joined:
    Aug 28, 2011
    Posts:
    2,671
    Yes you can pass basically any info with a ComputeBuffer, can use a struct.
    E.g:
    struct TransformBuffer
    {
    float3 position;
    float4 rotation;
    float3 scale;
    float3 color;
    }

    For rotation you can use this formula in the vertex shader, where q is a quaternion:
    v.vertex.xyz = v.vertex.xyz + 2.0 * cross(q.xyz, cross(q.xyz, v) + q.w * v.vertex.xyz);
    Need to rotate normal and tangent as well, and tangent.w shouldn't be changed.

    Submeshes still don't seem to render in Unity5.6 beta10 with DrawMeshInstancedIndirect...

    Nathaniel
     
    tengel likes this.
  42. tengel

    tengel

    Joined:
    Aug 27, 2014
    Posts:
    4
    Thank you for the quick reply.
    Yes, that is one option. But I was aiming for something that didn't require any buffer. Currently I'm drawing grass and in the setup step I generate position and scale of each blade (based on unity_InstanceID, without any buffers) . However, I wish to apply the rotation in the vertex shader as you mentioned and I can't figure out a way to pass this info (which is calculated based on the instance position, rather than vertex position) from the setup to the VS.

    Currently, I re-calculate the blade position in the VS (also using unity_InstanceID) and use it as a seed to generate a rotation quaternion. However, passing values from the setup would be very useful to save work, and can be used on other applications as well.

    Thanks.
     
  43. buzzardsledge

    buzzardsledge

    Joined:
    Feb 26, 2017
    Posts:
    8
    So the unity_Instance ID bug I reported (post #37) is now in the system as Issue 886294.

    Somehow the description says on OpenGL even though I reported it on DX. Presumably Unity QA did their repo. on OpenGL. Apparently it works as we expected in Metal though.

    As a follow up to post #40, I've been doing some experiments with shadowing and instancing too. It appears that shadows don't work correctly if you have more than one call to DrawMeshInstancedIndirect in a frame.

    Has anyone else tried working with shadows and can corroborate, or otherwise, this observation before I try to work up a repro?

    I'm beginning to suspect that either I'm missing something obvious or there is some interaction between instancing and Unity's internal batching that isn't quite connected up.
     
  44. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    1,566
    Yeah I've seen this too or something similar, reported the bug on 24.12.16 as

    It was confirmed as reproducible, but not heard anything more.

    In my case I was issuing a drawMeshInstanceIndirect many times a frame to support multiple LODS, however only the last of the computeBuffers containing instancing data was being used for rendering shadows, even though each call was meant to have its own unique buffer.
     
    buzzardsledge likes this.
  45. bzor

    bzor

    Joined:
    May 28, 2013
    Posts:
    31
    these examples are awesome, just what I needed.. thank you!!!
     
  46. tengel

    tengel

    Joined:
    Aug 27, 2014
    Posts:
    4
    @bzor thanks!!

    So I have tested shadows now, same issue. The problem appears when issuing more than one draw call. In the following example 4 calls are issued, and only one actually draws shadows (no apparent order, sometimes it's another batch).

    Couple of things tested:
    • Tested with different materials and same materials.
    • Different bounds settings.
    • Same args buffer (for different draw calls with same mesh).
    • Different args buffers (with same mesh, and with different meshes).

    upload_2017-3-16_11-26-42.png

    I posted the demo here ( https://github.com/tiiago11/Unity-InstancedIndirectExamples ). Hopefully somebody else can repoduce it.
    The issue is still open and can be voted here https://issuetracker.unity3d.com/is...r-being-passsed-to-consecutive-drawmesh-calls

    The shadows are being rendered, only on the wrong position. It seems Unity is mixing up the materials somehow. The following image shows the result of two draw calls (each with its own material), the shadow pass of the second batch seems to be drawn using the first one's material.
    upload_2017-3-16_12-47-9.png
     
    Last edited: Mar 16, 2017
  47. tengel

    tengel

    Joined:
    Aug 27, 2014
    Posts:
    4
    Damn Unity. Just found a workaround for the shadows issue.
    1. If you duplicate the shader file, and rename it. Add each duplicate to a different material, it works. Which indicates that Unity is having troubles differentiating among the materials, and is trying to batch them somehow.
    2. So I had to find a way to force Unity disable whatever it was that was not separating the materials. Creating the material from code (using the shader or the original material) was not working
    3. The uniforms I was passing with the materials were different from each other, but that didn't do it.
    4. The only thing left was the MaterialPropertyBlock (which I wasn't using). Setting an empty mpb per draw call also didn't work.
    5. What did work was to set an unique dummy variable per mpb, so that Unity will have to issue a different call. The variable doesn't even have to be used in the shader.
    for (int i = 0; i < meshes.Length; i++)
    {
    materials.SetFloat("_Dim", gridDim);
    .......
    /// this is the magic line. Uncomment this for shadows!!
    mpbs.SetFloat("_Bla", (float)i);

    Graphics.DrawMeshInstancedIndirect(meshes, 0, materials, meshes.bounds, argsBuffers, 0, mpbs, castShadows, receiveShadows);
    }

    It is not ideal, but it works!! Image with 4 draw calls (one per different group).
    upload_2017-3-16_13-40-10.png


    The workaround can be found here. https://github.com/tiiago11/Unity-InstancedIndirectExamples
    Hope it can help somebody.
     
    R0man, RomBinDaHouse, Shinao and 4 others like this.
  48. nquetriths

    nquetriths

    Joined:
    Oct 5, 2014
    Posts:
    1
    Tengel you're the BEST....! Ive been having the shadows issue and been trying to fix it for the past hour! God damn that was painful....
     
unityunity