Search Unity

Graphics.DrawMesh/DrawMeshInstanced fundamentally bottlenecked by the CPU

Discussion in 'General Graphics' started by jbooth, Sep 2, 2016.

  1. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    Graphics.DrawMesh and DrawMeshInstanced functions internally add an ImmediateRenderer node to a render queue used each frame, then clear out that node at the end of the frame. This means that each Graphics.DrawMesh call needs to happen every frame.

    In our test case (52251), which has been sitting in Unity enterprise support for several months, we submitted a change to the internal code which allows you to cache these calls between frames. In the example scene provided this removes 38ms of per-frame time each frame. We also tested a change similar to the one made in Graphics.DrawMeshInstanced, where you take an array of matrix's instead of a single matrix, allowing you to reduce c#->cpp time, but our results show that this is only a small fraction of the CPU time used. The majority of the savings exists by having some way to not resubmit the calls each frame. The test case submitted allows you to A/B these results and see them for yourself.

    Enterprise support recently closed our ticket as "you're change has been included into Unity 5.5". This is clearly not the case, and since we have been unable to get a response on this issue after months of going through the official channels, I'm hoping that someone responsible for the DrawMeshInstanced changes monitors these forums and can give us some feedback on this issue. Being able to use DrawMesh with instancing is nice and all, but it's not very useful if it's just going to make you bottlenecked on the CPU. Additionally, DrawMeshInstanced only works on high end devices, where as being able to cache the submit calls and clear them later allows any device to use Graphics.DrawMesh in an extremely efficient way.
     
  2. LeonhardP

    LeonhardP

    Unity Technologies

    Joined:
    Jul 4, 2016
    Posts:
    3,136
    Hi jbooth,
    The proper authorities have been informed. It might take a while, though, given the upcoming weekend.
     
    LennartJohansen likes this.
  3. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    Thanks LeonhardP!
     
  4. LennartJohansen

    LennartJohansen

    Joined:
    Dec 1, 2014
    Posts:
    2,394
    I have a procedural game world with plants, trees, rocks etc that is not using the Unity terrain to render.
    Would this changes allow me to skip using normal GameObjects with MeshRender and just pass inn an array of matrix' to render multiple instances. Skipping the extra CPU work of traversing the object hierarchy every frame?

    At what time does culling of objects happen? Will off-screen meshes in the list still be processed?
     
  5. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    That would be awesome, for the game we're working on related to the grass rendering system.

    By the way if you're interested:
    http://forum.unity3d.com/threads/speed-tree-optimisations.317585/

    It would be awesome if you wouldn't have to submit them each frame. I really hope 'DrawMeshInstanced' will solve some of the performance issues.

    But by the way, 'high end devices'? You mean that for mobile devices right? Since this instanced stuff tech exists for at least 12 years on desktop PC's... Since DX9, right?
     
  6. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    When you call Graphics.DrawMesh, it just inserts the relevant data structure used for rendering into the Queue and clears it out before the next frame; culling/batching/etc still happen as normal.

    You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasn't available on our platforms (mobile).
     
    deus0 likes this.
  7. LennartJohansen

    LennartJohansen

    Joined:
    Dec 1, 2014
    Posts:
    2,394
    You can do that now in 5.5 using Graphics.DrawMeshInstanced (but not Graphics.DrawMesh), the only difference is that you have to do this every frame. For our use case, this was prohibitive (38ms), and wasn't available on our platforms (mobile).[/QUOTE]

    I understand. It would help a lot to have a system to "keep" this in the render loop.
     
  8. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    Do you think that this prohibitive cost is caused by the submission itself or by the fact that every call of 'DrawMesh' the data passed is copied instead of being used? For example if you send a MaterialPropertyBlock every call you make to 'DrawMesh' all the data from it is copied instead of being used as-is.

    Do you think that if it won't copy the data it will work faster? Or you think that the submission itself is the bottleneck?
     
    deus0 likes this.
  9. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    169
    Hey @jbooth ,

    I'll talk to the team to see if it's okay to add DrawMeshPersistent and DrawMeshInstancedPersistent.
     
    AshwinMods, Pr0x1d, WonkeeKim and 7 others like this.
  10. iivo_k

    iivo_k

    Joined:
    Jan 28, 2013
    Posts:
    314
    So am I correct to assume that the persistent calls would help with objects that don't move between frames and with moving (animated) objects DrawMesh would still be slower than using GameObjects?
     
  11. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    @zeroyao - I think a command buffer like approach would likely be more Unity like of an API; using an int based ID system is fine for our uses, but a little, well, un-unity like, but it was far simpler for us to do (since we're not as familiar with the source) than changing the command buffer system to work for this use case. Either way, something that solves the use case at similar performance would be amazing.

    @livo_k: Pretty much, yes.
     
  12. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    169
    @jbooth Have you tried RenderingCommandBuffer.DrawMesh?
     
    deus0 likes this.
  13. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    @zeroyao:

    Yeah, not viable: Doesn't dynamically batch (1 draw call per mesh rendered), and you can't insert it into the normal rendering pathways (depth, shadow, drawing, etc), only after or before a given operation.
     
    Assembler-Maze likes this.
  14. any_user

    any_user

    Joined:
    Oct 19, 2008
    Posts:
    374
    I agree we should have a more performant way to draw meshes repeatedly with dynamic batching. And if anybody will look into the Graphics.DrawMesh code anyway, it might also be interesting to finally get a way to set the sorting order when submitting a mesh.
     
  15. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Replying to follow this! It would be great if the persistent drawmesh calls could 'remember' the result of the dynamic batching operation too, and reuse the result. There is CPU overhead for dynamic batching, but if our meshes are persistent and we have a thousand small meshes to draw, it would be awesome if it didn't have to recompute the dynamic batching every frame for these persistent meshes! Maybe it will already be doing this, I can't say since this has not been released yet, but just thought I'd voice my suggestion anyway. :)
     
    deus0 and Assembler-Maze like this.
  16. moure

    moure

    Joined:
    Aug 18, 2013
    Posts:
    184
    I have been doing some tests with billboards and instancing and i was wondering the same thing. As you can see on the images the saved draw calls number is huge but i am still wondering what the cpu overhead is. Btw the billboards are objects on the scene on unity 5.4.0f3 (not using graphics.drawmesh api)
    InstancingBillboards_NoImageEffects.PNG
    InstancingBillboards.PNG
     
  17. Roni92pl

    Roni92pl

    Joined:
    Jun 2, 2015
    Posts:
    396
    If you could show screenshot with profiler view with extended rendering that would be helpful.
     
    primaerfunktion likes this.
  18. moure

    moure

    Joined:
    Aug 18, 2013
    Posts:
    184
    Sure, here they are (cpu profiler and frame debugger):
    InstancingBillboards_Profiler.PNG
    InstancingBillboards_FrameDebugger.PNG
    I am just curious if there is a too many batched calls issue and what is the sweet spot between having more instances of fewer poly count objects or less instances of higher poly count objects.
    Either way i cant wait to test the new DrawMeshInstancedPersistent ;)
     
  19. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    I am curious too. Thinking about it further, its probably impossible to "remember the result", because objects could move in and out of the view frustum and be filled entirely. Wouldn't want those rendered, and so youd have to recompute dynamic batching each frame for each camera. Just speculating, we should discuss dynamic batching related issues elsewhere I guess, so we don't derail this thread.
     
  20. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    Nice looking stuff, but if you plan to draw grass, I highly recommend Unity 5.5 with it's 'DrawMeshInstanced' method. It improved my grass rendering by an order of x5 to x10. (From 30FPS to 300FPS).
     
    moure likes this.
  21. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,408
    Just noticed in 5.4.1p1:
    - Graphics: Slight optimisation of CommandBuffer.DrawMesh and CommandBuffer.DrawRenderer.
     
  22. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,609
    I believe the technique described here could be used for grass as well. It's not really using instancing, but still able to display quite some animated billboards, which could perhaps also be grass-blades.
     
  23. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Hey @zeroyao , any update on those persistent draw mesh calls? :)
     
    deus0 likes this.
  24. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    Hey!

    Yes that is possible, but billboard grass look so bad... Trust me, if you want good grass you can't really billboard it. And I've used a technique similar to yours for optimizing the SpeedTree trees drawing routines, in case you're curious:
    http://forum.unity3d.com/threads/speed-tree-optimisations.317585/
    at the bottom of the page.

    Of course, with transition and stuff, going back from the billboard to the SpeedTree billboard when you're closer to them, but it's the same technique you mentioned.

    Awesome for trees, bad for grass in my opinion.
     
  25. jesta

    jesta

    Joined:
    Jun 19, 2010
    Posts:
    294
    +1 for this feature. I can't even express how painful it is for us to have such a limited drawing API. Had to develop our own engine to ship the game because of this...
     
    jason-fisher likes this.
  26. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Just bumpin' so it doesn't fade away forever! Hoping for an update :)
     
    Assembler-Maze likes this.
  27. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    This is why other engines are open-source so that you don't have to pm months and months for something tiny like this.

    And like GeometryUtility.CalculateFrustumPlanes. It allocates 136 bytes each frame. Imagine at 60FPS, we have 8kb per second! That is not very cool on a mobile for example. And we need to wait months again just to have a non-alloc version...
     
    CyRaid and StaffanEk like this.
  28. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    IMO it would just be better to increase exposure to all lower level stuff. As "developers" we need freedom and control over core elements and not this half-baked API stuff. How many DrawMesh[insert idea/fix here] will we end up with?
     
  29. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    Oddly, there is a non-allocating version but it's private. o_O Reflection can be used to expose this method. It's ugly, but it works;
    Code (CSharp):
    1. MethodInfo info = typeof(GeometryUtility).GetMethod("Internal_ExtractPlanes", BindingFlags.Static | BindingFlags.NonPublic, null, new Type[] { typeof(Plane[]), typeof(Matrix4x4) }, null);
    2. Action<Plane[], Matrix4x4> ExtractPlanes = Delegate.CreateDelegate(typeof(Action<Plane[], Matrix4x4>), info) as Action<Plane[], Matrix4x4>;

    PS; You'll need to calculate and pass the world projection matrix when you invoke the method;
    Code (CSharp):
    1. Plane[] planes = new Plane[6];
    2. ExtractPlanes(planes, camera.projectionMatrix * camera.worldToCameraMatrix);
     
    Last edited: Oct 10, 2016
    Deleted User and jason-fisher like this.
  30. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Strange, no idea why that method is private. Looks like CalculateFrustumPlanes is just a wrapper around Internal_ExtractPlanes that allocates a new array of 6 planes every time. Feels like Internal_ExtractPlanes should be made public and renamed CalculateFrustumPlanesNonAlloc (similar to the new physics methods that support a NonAlloc ie RaycastNonAlloc)
     
    StaffanEk likes this.
  31. spraycanmansam

    spraycanmansam

    Joined:
    Nov 22, 2012
    Posts:
    254
    Yep, doing a similar thing here except we cache the reflected method in a helper utility to save a bit of performance :)

    @zeroyao Could we request a DrawMeshInstanced override that takes a List<Matrix4x4> instead of an array? At this point we can't use DrawMeshInstanced for our dynamic systems because the conversion from List to array is spewing out garbage...
     
    AshwinMods likes this.
  32. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    169
    Yes will do!
     
    spraycanmansam likes this.
  33. spraycanmansam

    spraycanmansam

    Joined:
    Nov 22, 2012
    Posts:
    254
    Much appreciated!
     
    Prodigga likes this.
  34. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    An extra matrix multiplication. Good but not perfect.
     
  35. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    Or we could have the engine open-source so that we don't have to wait months for a tiny fix like this?
     
  36. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    There's no extra multiplication. The matrix would of been computed within the CalculateFrustumPlanes method, but since we're bypassing this, we simply compute and pass it manually.
     
    spraycanmansam likes this.
  37. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    Yes good point.

    But isn't using reflection and looking through the code illegal, until the engine will be open-source? Just wondering...
     
    deus0 likes this.
  38. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    I don't think it is illegal, just unsupported. IE unity may change the name or signature of the internal method and that'll break your code.

    Anyway this is really off topic, maybe a new thread or a feature request? Let's not derail this entirely, the thread is about persistent DrawMesh methods!
     
    deus0 and grizzly like this.
  39. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    @zeroyao regarding the persistent methods..
    I have been thinking about how persistent DrawMesh methods could improve my performance and have a question.

    There are some situations where i want to draw a lot of meshes semi persistently. Say, for example, I am making a city building game, and there are a handful of different building meshes. I have a huge City with hundreds of these buildings in it. It costs me one draw call per building mesh. So if there was 20 variations of a building, I can render the entire city in 20 draw calls, instancing each type of building. So far so good. However, at some point I will need to either add a new building, or delete existing buildings.

    In this example, let's say the city currently has a hundred instances of each buildings type (2000 buildings in the city). If the player deletes a building, I am going to have to remove that buildings matrix from my matrix array/list, and resubmit my array of matrices (99 matricies).

    The opposite situation is if the player constructs a new building. I have to add that matrix to my list, resubmit 101 matricies.

    Similar issue with MaterialProperyGroup arrays. If each structure had some custom property, we have to resubmit all the matricies, and the giant MaterialPropertyGroup.

    I don't know how common this scenario is, but maybe worth considering? This is just one example, but I can think of a couple of others.

    It could be cool to have some way of submitting just the changes, instead of resubmitting the all the data. IE a Remove and Add method, similar to C# List class, but where the actual removal or addition is handled engine side. Continuing with the city example, if we wanted to remove some buildings, maybe we submit a list of indicies to remove from the existing DrawMeshPersistant operation, or if we wanted to add new buildings, we submit a new matrix list and a new MaterialPropertyGroup with only the data that needs to be added, and this is merged in to the existing data engine side. No idea how any of this works under the hood, so just spitballing ideas!
     
  40. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    972
    Now, ain't that code familiar... We do the same, only wrapping the delegate in an extension method for Camera.
     
  41. spraycanmansam

    spraycanmansam

    Joined:
    Nov 22, 2012
    Posts:
    254
    Hey @zeroyao, did this ever make it into the new betas? I've been scouring patch notes but haven't had a chance to dl the latest and try.
     
  42. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    169
    Hey it will land to 5.6 soon. Then I'll graft the changes to 5.5.
     
  43. zeroyao

    zeroyao

    Unity Technologies

    Joined:
    Mar 28, 2013
    Posts:
    169
    Hey,

    The new List<T> API for DrawMeshInstanced, along with non-alloc array property setters and getters for Material, MaterialPropertyBlock, Shader and CommandBuffer are in 5.5b11.
     
    AshwinMods, m4d, jason-fisher and 4 others like this.
  44. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    OMG OMG OMG!

    I see a huge performance boost for some of my systems :)
     
  45. spraycanmansam

    spraycanmansam

    Joined:
    Nov 22, 2012
    Posts:
    254
    Fantastic!
     
  46. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,461
    [q] The new List<T> API for DrawMeshInstanced, along with non-alloc array property setters and getters for Material, MaterialPropertyBlock, Shader and CommandBuffer are in 5.5b11. [/q]

    While we're on the subject, a way to have shader arrays be saved with the material would be incredibly useful. I use Texture Arrays in a shader and want to have uniform data associated with each entry in the texture array (a float4[], essentially). What I'm doing to work around this for now is baking these values into a texture and sampling them; but this is obviously not optimal and limits those values to a 0-1 range and 8 bits of storage. Ideally, we'd be able to set the values on the material and have them saved with the material (a full property editor is not required for this, only some way to have the values saved in the material).
     
    jason-fisher and Peter77 like this.
  47. DanMeyer009

    DanMeyer009

    Joined:
    Sep 22, 2013
    Posts:
    23
    I have to admit that my programming skill-fu is weak. Will there be a proper guide write-up on how to use these in the near future? I've searched through the upgrade docs and manuals but it really only mentions how to add it to existing shaders. As it is now I've only been using a custom surface shader that I've added instancing support to for the vegetation in my game but if I'm understanding this right then I will have to group my objects into a list and instantiate at runtime? As none of my vegetation transforms move it should work with the non-alloc list DrawMeshInstanced correct? I apologize for the noob questions in advance. Any advice on how to properly use these added APIs will be greatly appreciated.
     
  48. spraycanmansam

    spraycanmansam

    Joined:
    Nov 22, 2012
    Posts:
    254
    If you use normal gameobjects with a renderer component using your custom surface shader with instancing support then those objects will be instanced :) The DrawMeshInstanced methods are for drawing lots of meshes manually without needing gameobjects with components and the overhead that comes with that.
     
    PutridEx likes this.
  49. Assembler-Maze

    Assembler-Maze

    Joined:
    Jan 6, 2016
    Posts:
    630
    They are 'theoretically' instanced, it's just like the dynamic batching. It might work when you need it but it might aswell not work :). For some stuff (take trees or vegetation perhaps) i'd say that manual instancing for decent performance is a must.
     
    spraycanmansam likes this.
  50. DanMeyer009

    DanMeyer009

    Joined:
    Sep 22, 2013
    Posts:
    23
    Yes, I do have all my vegetation as separate gameobjects but I really don't need them to be. I just import them from my from my modeling app. I'll try to just import empty positions and manually instance with the new list function. I'll report back if I get a speed up. I'm interested to see how far they push this for the new terrain system as I've already seen a large speed up, especially for rendering my grass meshes.