Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Submesh performance impact?

Discussion in 'General Graphics' started by Vince_H, Dec 8, 2016.

  1. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    So, i am creating something that requires me to generate a metric ton of meshes. Each of my mesh filters had 6 submeshes to create the entire 3D object. However only 2 materials were used on said object.

    Now due to the large amount of objects in the scene (just being there, no code running on it yet) the frame drops were significant, and were tracked down to the point that simply having 2000 mesh filters/game objects around with each 6 submeshes, was the cause.

    So recommendation was: Oh, submeshes generate overhead! You can optimize the meshes by using 2 submeshes instead of 6, and then just working on the vertices directly if you need to turn off submeshes. It should be much quicker.

    So i got around to doing that, and it turns out. I have no measurable difference when i do this.

    So now my question. Do more submeshes actually impact performance?
    I also upgraded unity, could it be the statement above is no longer true between Unity 5.3 and 5.5?
    And if this doesn't gain me performance, what will? I heard some idea about making it all 1 big mesh with a huge number of vertices. And then just managing them ALL manually (like do all the separate object movement by hand, in code. Reimplement all of the transform code and such. Which did not sound appealing to me at all, nor does it sound like a speed upgrade for what unity should be doing?)

    Any advice on this matter would be greatly appreciated.
     
  2. jvo3dc

    jvo3dc

    Joined:
    Oct 11, 2013
    Posts:
    1,520
    With submeshes you mean different materials on one mesh. Yes, that impacts performance, because it requires more state changes during rendering. If some materials are the same, things might get batched by unity, so that could reduce the performance loss.

    A huge number of vertices isn't really possible, since Unity splits it at 65k vertices anyway.
     
  3. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,398
    Use the profiler to see where the performance issues are coming from.

    --Eric
     
  4. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    Obviously i tried the profiler. Here is what i got:

    Maybe i am expecting a bit too much from unity here. But i have no shadows, 4x anti-aliasing (turning that down has no effect) Each piece has 2 submeshes, and they have 2 shared materials. The back and the front of the piece. Both materials are mobile diffuse as well (default shader changes nothing here either, it doesn't even get worse, and unlit texture does not make it better either)

    So i am using the lightest settings i know of. Only 1 light source, all other things to minimum. No max framerate, no vsync obviously, and still it gets this much issues over just 1.7mil verts.

    Could it be something in unity is set wrong? Like some sort of batching not liking what i am trying to do?

    EDIT:
    Oh snap, i think i found at least part of it. It gets rid of those warnings and a ton of triangles.
    I somehow forgot that when i went from 6 submeshes to 2 submeshes. That i also didn't need to assign the materials array for 6 anymore. Simply having 4 useless materials on the mesh renderer even though they are not used, (and 2 to 6 were all the same material) It somehow created this huge performance impact. The screenshot showed 7.5ms, it is now down to 5.1ms. This might seem like a little. But it goes from 90ish FPS to 150ish FPS, so it makes a world of a difference on any device. Maybe i should just make it 1 material, with a huge texture. Maybe combine them at runtime. Because this is really idiotically much better just because of a few less materials.
     
    Last edited: Dec 10, 2016
  5. daxiongmao

    daxiongmao

    Joined:
    Feb 2, 2016
    Posts:
    389
    Looking at the profiler you are cpu bound. So combining textures would not help much.

    It looks like your meshes are small which is causing unity to batch them. Look at that number. Saved 14000 by batching. I don't remember when unity evaluates the dynamic batching but my guess is every frame.
    So ever frame its reading 14000 meshes and combining them into a set of larger meshes and then drawing them.
    This going to take some of cpu time. I don't know if it's threaded or not.
    Then it has to upload that new mesh to draw it.

    If your meshes don't move then you could try static batching. Then should just do the batching once.
    You could also turn off dynamic batching but that would probably be worse because now you would be submitting 14000 draw calls with very few triangles in them If those are cubes.

    Your really looking at a worse case rendering scenario here. Lots of small meshes dynamically changing every frame.

    I think you would need to some of your own batching making it smart about what has moved.
    Or try other techniques, you could do your own skinned type rendering using vertex shader.

    The best would be probably instanced rendering. I think unity 5.5 has support now but the platforms maybe limited.

    You might also be able to fake it with mesh particles and manually emiting them.
     
  6. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    I think it is CPU bound, my CPU isn't the best in comparison to the GPU. However i am looking at a worst case kind of thing. I can't static batch because these pieces are indeed going to be movable (not right now, but would be) So i am trying to get every little performance thing i can get.

    My aim is going to be 90FPS in VR on a 980GTX, with 2000 pieces. Then i would be happy with the results.

    As for Dynamic batching, i am not sure if any sort of custom dynamic batching is a thing. Someone told me once to make the whole thing 1 giant mesh, and then just "magically" recalculate whatever needs recalculation every time something needed to move. Which to me sounds like it would cost so much CPU doing things unity does on the C++ side, that i am not sure that would outperform the current dynamic batching, and i am not going to write all that, just to test if that is true.
    Might there be a middle way for this? To somehow give unity suggestions on what i know will move? Basically they will have rigidbodies later on, so i could make any sleeping RB a "static" batch until it wakes up again? (Also, the rigidbodies will not help my performance, hence the each and every ms i can get! But lets not discuss that now.)

    So i think that even though i am CPU limited, taking way 1 render pass seemed to clear a lot of the CPU side issues, so if i can make it 1 single render pass by combining the back and front and redoing my UV calculations, it should technically be a micro optimization right?

    A vertex shader is probably out of the question since it sounds like i can't make it do what i want, and i am bad with shaders in general. What exactly is Instanced rendering, and does it help me? I can't find much documentation on it.

    Also to make matters worse. Each and every piece you see, is also a unique piece. So none of them are ever duplicates or might even have different amounts of vertices. So that makes it even harder to use any kind of duplicate object tricks one might use.
     
  7. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    But combining textures is a cpu improvement due to less batches / set pass.
     
    Martin_H likes this.
  8. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,398
    That was me, and I'd still suggest that. A skinned mesh with bones would be simplest.

    I can't imagine you'd be using much CPU unless you're intending on all the pieces moving around at once.

    --Eric
     
  9. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    As a matter of fact it was you :)

    As for the skinned mesh thing, i really have no clue how to make that work.

    The worst case scenario i could be moving 1999 out of those 2000 pieces, I do not think i can make it outperform unity's transform by writing one in C# myself. Otherwise, what the hell has unity been doing to their Transform that i can outperform it by re-implementing it in C#?

    Don't get me wrong, but i just don't see it happening, and making it properly work with physics and all, i am just not seeing it here. Maybe i am looking at this from a wrong angle?
     
  10. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,398
    Use Mesh.bindPoses and .boneWeights. The docs have example code. For sure a skinned mesh is faster than many separate meshes, especially in this case where you just need 1 bone for the blend weights.

    --Eric
     
  11. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    I am reading that now, but it makes no sense to me in this context. These are the bones we all know and love from animations right? Like the fixed animations you give your playable character to walk and such.

    How exactly does this work then? I have 2000 rigidbodies without their visuals, and then make the "bones" sync up with their related rigidbodies every frame?

    It's not like i can pre-animate them, and i am not sure that setting a new animation frame every since frame will go over well with 2000 vectors3's and 2000 quaternions, right?

    Are there any examples in the field that you know use this way to do something similar? I would like to see how this works in action.
     
  12. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,398
    No, bones are the rigidbodies. Or rather transforms. You make a bunch of empty GOs with rigidbodies, which are the bones, and are bound to the appropriate vertices in the mesh. You don't do anything manually after the skinned mesh is set up. I did something along these lines years ago...I don't seem to have the code now; maybe it was for someone else's project.

    --Eric
     
  13. Vince_H

    Vince_H

    Joined:
    May 23, 2013
    Posts:
    22
    Aaaah so you set the bone weight to effectively 1, for the rigidbody/bone/transform that it belongs to, and then 0 for all the other bones. It is a 1 time probably a little heavy process, but from that point onward you have effectively 1 mesh and it is free.

    Well.. maybe i need to split it every 65K verts. But i get the idea.

    That is very genius, how did you find out about this specific thing? It only doesn't allow for any sort of culling, but then again i probably rarely can cull anything in my scene anyway. This gives me a lot of hope really.
    I hope it gives the improvement i am looking for, but it is looking to reduce the whole puzzle back to 10 draw calls or maybe even less. And can be made more efficient by making the pieces have less verts when running on lower end devices.

    Thanks for your patience in explaining this!
     
  14. daxiongmao

    daxiongmao

    Joined:
    Feb 2, 2016
    Posts:
    389
    Your right this is true, but in this case he was already batching 14000 objects. I said this because I am think his CPU problems were from having to manipulate 1.7M vertices every frame not submitting the draw calls.

    I don't know how efficient unity's dynamic batching is. And since its your code you should be able to be smarter about the batching of what has changed and what has not.
    But if everything changes every frame then, yes you probably can't do better than unity.

    Like I mentioned and Eric explained this is how i would do it. Since unity has lacked instancing support. And offloading work to the GPU especially this kind of vector matrix math is usually a win.

    Construct your meshes, adding an extra vertex attribute for the bone index. You shouldn't need a weight even because they should all be one for that bone and can do that in the shader. I can think of other possible optimizations but i would have to know more about your mesh structures.

    You might be able make use of unity's skinned mesh renderer but I think you should just write your own, vertex shaders are a powerful tool to have.

    You will have to be careful of the limit of 65k vertices. You will also be limited by the number of registers/transforms/matrix/bones you can support in the shader. You will probably need a full matrix probably float4x4 and will have to see how big of an array you can have.

    Then every frame you can capture the transform matrix of all your rigidbody transforms and pass this array to the GPU using the Material.SetMatrixArray(). Be smart about the memory allocations here.
    Then in the vertex shader you lookup the matrix in the array based on the vertex bone index and transform the vertex position by this.

    You will have to evaluate if its better to set the matrix to just the localToWorld transform or do the full local to projection on the CPU.
    My guess will be better to do the extra matrix multiply on the GPU. But you will have to see depending on how many bones vs vertex you have.

    Once you have the data setup the shader should be relatively easy.

    Good luck.