Search Unity

  1. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

How to optimize performance for large armies?

Discussion in 'General Graphics' started by RakNet, Aug 16, 2019.

  1. RakNet

    RakNet

    Joined:
    Oct 9, 2013
    Posts:
    214
    I'm writing a game where you fight large armies on the battlefield, up to 100 at a time. However, even with 35 units on the screen Camera.render becomes quite expensive. Old games such as Mount and Blade could handle 100 units at a time years ago on much worse hardware, so I know at least in principle this is possible. I don't have occlusion culling turned on in this case, as it wouldn't do any good considering all 35 enemies are in front of me.

    I've already exhaustively optimized combat and physics, so rendering is the last bottleneck.

    Processor AMD Ryzen Threadripper 2970WX Processor (24x 3GHZ/64MB L3 Cache)
    Video Card NVIDIA GeForce RTX 2080 - 8GB (GDDR6) (VR-Ready)
    Motherboard ASUS ROG STRIX X399-E Gaming RGB Motherboard
    32 GB RAM


    upload_2019-8-15_21-3-8.png

    upload_2019-8-15_21-3-36.png

    upload_2019-8-15_21-4-20.png

    upload_2019-8-15_21-5-40.png

    upload_2019-8-15_21-8-0.png
     
  2. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    887
    Hi there,
    You're right when you say that the technology should allow for this but it is going to take quite some effort to keep everything streamlined to that goal. The critical numbers in what you're showing so far are the batches and set path calls in the stats window of your first screenshot.

    You need to make aggressive use of dynamic batching (read this carefully), have optimized shaders, as few materials as possible and highly optimized AI/Pathfinding and game logic to achieve the type of game you're aiming for.

    I'd suggest you check out DOTS and it's demo projects. This solution as well as the projects are very closely aligned with what you want to achieve.
     
    chrisrcisme likes this.
  3. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    7,344
    Probably higher focus on LOD is needed.
     
    Tartiflette likes this.
  4. RakNet

    RakNet

    Joined:
    Oct 9, 2013
    Posts:
    214
    "The critical numbers in what you're showing so far are the batches and set path calls in the stats window of your first screenshot."

    There's only 3 kinds of monsters in the screenshot. If I have 30 of the same monster then by definition they have the same material. Does that mean they are going to be batched or do I need to set some setting in the material?

    "I'd suggest you check out DOTS and it's demo projects. This solution as well as the projects are very closely aligned with what you want to achieve."

    I haven't looked much into this, but doesn't it mean I'd have to rewrite my project?

    "Probably higher focus on LOD is needed."
    Can you explain how that is relevant to this situation? From the profiler it looks like the issue is Camera.render and not my video card.
     
  5. RakNet

    RakNet

    Joined:
    Oct 9, 2013
    Posts:
    214
  6. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    7,344
    You have high amount of vertices and tris + far distance. LOD could help gain on performance here by its own, not only for GPU but also CPU. LOD is not only mesh but textures and shaders as well, when applicable.

    Btw. do you use material GPU instancing?
     
  7. RakNet

    RakNet

    Joined:
    Oct 9, 2013
    Posts:
    214
    I don't think material GPU instancing is supported for SkinnedMeshRenderer and I already am using static for everything that does not move. I will however turn it on for MeshRenderer that does move.
     
  8. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    7,344
    Ah that may be good point, regarding skinned meshes.
    However, is worth to confirm, if that is the case for 100%.
     
  9. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    5,101
    Make sure to profile the game in a build as well, not only running it inside the editor:
     
  10. McDev02

    McDev02

    Joined:
    Nov 22, 2010
    Posts:
    643
    Many things can be improved.You can easily have 500 Characters on screen without fancy pants solutions. With that specs you have I only can assume that your game is super unoptimized. It should run at 200 FPS at least, based on the stats window there is nothing dramatic going on (for your specs).

    The very first thing you should investigate is the setup of your character models. Skinned Mesh renderers are expensive. But first, enable GPU skinning in the Player Settings if you haven't, that can boost FPS a lot. Then you should disable as many skinned mesh renderers as possible (although 40 isn't too much already). So you should add LODs to your characters and switch between a skinned mesh and a regular renderer. That might not be the perfect solution but honestly rather I have static models in the distance rather than unplayable FPS. This is a quick solution that could be replaced with other solutions if you really need 1000 animated objects one day.

    Try to reduce the bone per vertex count in the quality settings as low as possible. Also reducing shadow casters can help. I made a custom LOD solution for myself that would let me allow to disable shadow casting in the distance per object, with unity LOD you would have to switch to another mesh renderer. But shadows of small objects are less interesting than those of the trees for example, plus skinned mesh render shadows are more expensive.

    Also I would investigate the scripts, you got 18ms of script execution time, that is way too much.
     
    Last edited: Aug 19, 2019
  11. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    887
    good point, that and your physics time delay the point in the frame at which the Rendering commands can finally be send to the Rendering thread and then the GPU. Timeline view should be able to show you the impact of that.

    That could also be a good reason to look into DOTS. Depending on what Systems are taking too much time, you might be able to take only the heaviest of these out of the GameObject-Component world and "just" jobbify those, so you won't have to rewrite everything. Just being able to sprint through something like AI or Physics calculations on all Job System worker threads in parallel at an early point in your frame can leave you in a much better spot in terms of rendering.

    Sure that won't be the easiest solution to your Problem but depending on how much of development time you still expect to have ahead of you, it might be worth to make that investment and pivot now, rather than doing small adjustments now and realize that it won't be enough later, as you add more content & code. As you build your systems upon systems, pivoting will become much more expensive with every step you take.

    So, if you're looking at outstanding dev time of +1-2 years, my gut tells me you might be better of looking into DOTS Hybrid, at least for the critical systems.

    Of course, you can try to optimize what you have right now, profile it deeply, set frame time budgets for your AI, Physics, Rendering etc. and extrapolate any additional load you'd want to throw at it later. You should also be aware of your lowest end device and all target platforms and test & profile on those and not just in the editor. Since you mentioned that Instanced SkinnedMeshRenderer might not yet be available on mobile (didn't check if the blog post is outdated): The GPUs there are fundamentally different to PC and console hardware, thermal throttling is also pretty much a mobile exclusive pitfall. So if want to keep mobile as a possible release platform, you should try to get your hands on some representative devices and check the feasibility of that.

    Also, try setting up a load test with your targeted max number of enemies. You can fake missing uniqueness by just mocking up other textures, cloning animations and meshes to create placeholder prefabs etc...

    And since you've already put up a screenshot of the Frame Debugger, did you check where those 1385 Opaque draw calls came from and if everything you'd expect to get batched actually got batched?
     
  12. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    2,544
    I do a lot of optimisation on unity projects professionally as part of my job as Senior Engineer. I have used the following list of things, you could use some or all of these (I recommend all) and note that these are not in order of importance:

    1. Skinned Mesh Batching
    You can bake animations into textures like in the gpu gems original article, this is difficult but allows you to batch skinned meshes.

    1.b General Batching
    You also need to look into static batching, dynamic batching and possibly GPU instancing for everything in the scene, and try and keep batches to a bare minimum.

    1.c GPU Skinning
    Enable GPU skinning if you have not, and Graphics jobs also. Never turn these off unless you are sure they are unsupported on target platform or are definately not needed. In 90% of cases I come across these will provide a perf gain, and I have yet to see them cause a perf loss in any project I use them in.

    2. Imposters
    You can use imposters (non animated billboards) for the very very distant units where animation cant be worked out (you could even write a shader for the imposters that moves a few pixels around over time to make it seem like they are slightly moving which should be enough for far far distance units).

    3. Occlusion culling
    You can use occlusion culling to help with performance. You can bake this in the editor, or write your own custom culling using compute shaders and something like an octree.

    You said "I don't have occlusion culling turned on in this case, as it wouldn't do any good considering all 35 enemies are in front of me." Thats not how occlusion culling works, it will still give you a small perf improvement more than likely. Profiler will ofcourse be the true test, but turn it on rather than assuming you dont need it as it will stop anything other than whats in front of you from rendering.

    4. Baked Lighting
    Bake lighting whereever possible, and try to fake shadows for units. You dont need proper shadows on units except for the ones very very close to the camera, the rest can have blob shadows

    5. Optimized geometry and textures
    Ensure your geometry is as simplified as possible, same goes for textures. Textures should be set to lowest resolution that you can without sacrificing too much visual quality.

    6. Custom Collisions
    You will never get collision handling to happen with that many units using rigidbodies and colliders, and thats not really to do with unity but the nature of real time physics. You will need a custom solution using a compute shader and an octree to get high performance physics that also work with a great number of dynamic objects. Unity have already started writing their own which is available as a package. Not sure how thiers works but compute shader + octree would be how I would write one.

    7. Data Oriented Design
    You could use ECS + Job system to ensure that the code side of the application is as optimised as possible. ECS is built for the types of numbers your talking about, and the mega city demo actually implements a lot of what I have mentioned in this list already.

    7.b Job System
    If you cant use ECS + Jobs, at the very least jobify all your code that can be (pathfinding, movement, physics checks etc etc)

    8. Optimised Shaders
    You ideally want to not do full PBR but "approximated PBR" where you edit or write a PBR shader but effectively remove a lot of the heavy calculations and replace them with less accurate but faster calculations. Anything not being used or that doesnt make much visual difference should be stripped from the shader. Definately do not use the built in standard shader, or any standard shaders as it will never live up to the performance of a custom built solution.

    9. Clever Instancing / Shader-Fakery
    You can write instanced properties for your shader so that as much as possible you can get away with having different visual features on what underneath is actually same mesh same material. You can do all sorts of transformations, color changes, even special normal mapping as instanced things, but there is a limit.

    10. Optimized Audio
    Everyone always forgets to optimise audio, both in terms of memory + file size but also ensuring that you research how audio affects the scene, ensure you know what processing is being taken up by audio and adjusting accordingly. For instance, if an audio clip can be reused, reuse it. If an audio source can be reused instead of having a ton of them, reuse it.

    11. Direct Drawing
    Instead of using gameobjects and relying on the engine to determine when and how to draw, use DrawMeshInstancedIndirect to draw them yourself for a big boost to performance. https://docs.unity3d.com/ScriptReference/Graphics.DrawMeshInstancedIndirect.html

    12. Pooling
    Ensure all objects that are poolable are. This is a basic concept dealt with in a lot of tutorials available here on the unity website.

    13. LOD
    A lot of people already mentioned this, but you will need some aggresive LOD levels. I would personally have at least 5 LOD levels for what you mention, with furthest LOD being literally a billboard imposter or even a couple of pixels drawn as a post processing effect if it is very very very far away.

    14. Quality Settings if all else fails
    If all else fails, you will have to decrease your quality settings. I really would always do everything else including a complete redesign before this step if visual quality is important and you have art assets already created. This includes removing the costly post processing effects you seem to be using based on your screenshot.






    All of this requires you to profile using profiler, memory profiler, physics profiler, UI profiler, frame debugger etc and have a very very strong understanding of the performance of your app and exactly what is taking up how much of your rendering budget, and what is taking up your processing budget and by how much. Without that none of this is possible or even plausible.

    Note this list is in no way comprehensive, there are 1000s of things you can do to optimise but these are my starter list that I always check off first.

    @RakNet also something to consider, you say rendering is your bottleneck but looking at the profiler your scripts look unoptimised (based on the very little info we have other than your screenshots).

    Thats a lot of ms to be eaten up by scripts no matter how you rationalise it (again based on the very little info we have other than your screenshots).

    I recommend going back through and optimising as definately more can be done. I know you say you optimised physics and combat but without knowing what you have done and what it was like before vs now, its hard to tell if you really have done all that is possible.

    TLDR: It appears you can optimise both rendering and logic side a lot. Using the above list you should be able to get ideas for both. 1500~ batches is by no means optimised or a sensible amount to have, and many of the things mentioned in my list above will bring this number right down to a very very small amount. You will never be able to get this sort of thing down to less than 10-20 batches however, and thats without factoring in UI.
     
    Last edited: Aug 20, 2019
    koirat, Akshara, polemical and 6 others like this.
  13. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    2,544
    @RakNet how did you get on in the end? I recently checked on your project and it seems to have come a long way since you posted this, so I am assuming you managed to optimise enough in the end?
     
    BrandyStarbrite likes this.
  14. RakNet

    RakNet

    Joined:
    Oct 9, 2013
    Posts:
    214
    @GameDevCouple Thanks for asking. The answer is pretty complex, so I made a blog post about the problems I found, and the solutions, over the last two years of optimization.
     
    MadeFromPolygons and Peter77 like this.
  15. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    2,544
    Thats fantastic mate, will have a read through tonight but looks like a great post!
     
unityunity