Search Unity

Instantiate/LoadAsync Performance on SSDs Vs HDDs, RamDisks, large amounts of RAM,etc

Discussion in 'General Discussion' started by PhoenixAdvanced, Jun 7, 2020.

  1. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    Hi,

    I am working on a complex virtual world simulation, and I am also planning to build a new PC sometime before the end of the year (Once the 3080TI's and the next gen of Ryzen comes out).

    One of the main bottlenecks with my current project is instantiate and loadasync. I have a lot of paging in and out of complex scenes, etc, and I am looking for ways to alleviate this.

    Obviously, the main thing would be a good CPU, so the successor to the 3950x (4950x?) coming out this year would be a good start (Even though instantiate is single-threaded, the 3950x has similar single core performance to a comparable Intel CPU, but has far better multi-threading support).

    However, I have some other questions about how instantiate and loadsceneasync actually work with different types of hardware.

    For example, will these two functions be faster on a fast SSD compared to a slow SSD, or a HDD? The ryzen 3950x has 20 CPU lanes, so I coud use 16 for the graphics card, and 4 for a dedicated SSD, would this speed up asset loading and instantiation, our would the CPU be the bottleneck?

    Secondly, can Unity support 128GB's of RAM? Could I load a large number of complex assets into RAM at load time, to prevent having to instantiate them at all, or is there some hard limit in Unity that would prevent it from doing this?

    I know there is a "bug" or known issue with a single scene being over 4GB (due to 32-bit floating point accuracy) but splitting each scene into smaller ones (under 4GB) and loading them one at a time seems to solve this. However, is there some other issue with loading assets into RAM, or is the amount of RAM on the system the only limiting factor?

    Finally, what about RAM Disks? Would instantiate/loadscenasync be noticeable faster if the build was running from a RAM Disk, or again, would the CPU be the limiting factor?

    Just to clarify, I am not planning to release a game that requires 128GBs of RAM or a RAMdisk to play, I somehow doubt the majority of games have this kind of hardware, this is a concept project.

    Thanks for any advice!
     
  2. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    Once you step into the territory of building three thousand dollar plus machines a Threadripper is no longer an outrageous consideration. A 3960X is typically around twice the cost of the 3950X but it has a 50% increase in cores, twice the cache, twice the memory channels, can address 256GB RAM, and has 64 PCIe lanes.

    I don't know how you determine memory bandwidth limitations but it wouldn't surprise me if you were running into them.
     
    N1warhead likes this.
  3. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    That's a good point! Budget really isn't the limiting factor for this build, I'm mainly concerned about performance.

    However, what is the single threaded performance of the 3960X? Those CPU's are really designed for workstation use, I'm not sure if they would be optimised for game development, or playing games?

    Unity is still largely single threaded, so single-threaded performance is important.
     
  4. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    The 3950X has a base clock of 3.5GHz with a boost of 4.7GHz. The 3960X has a base clock of 3.8GHz with a boost of 4.5GHz. That's about 5% difference between boosts and 10% between bases. Gamers Nexus has a benchmark and the difference is a few frames.

     
    Last edited: Jun 7, 2020
  5. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    That's interesting, the single core performance is a lot better than I thought. Maybe that could a viable option, 256 GBS of RAM would be a huge boost too!
     
  6. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    Are these things also a bottleneck in your builds? If so, what kind of hardware are you targeting?
     
  7. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    I haven't looked heavily at build optimisation yet (I am aware, and hoping, that most of the issues are with the editor), but I am noticing heavy performance hits creating dynamic forests with many objects.

    I am looking into procedurally generating trees (L-systems, etc, etc) instead of using instantiate, but I will still likely need to use instantiate somewhere.

    The project I am working on probably won't be released for at least 2 years, so even cutting edge hardware today would be starting to become somewhat more common by then. Plus, when the project is done it will of course be heavily optimised, whereas in development it wouldn't be.
     
  8. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    Try to avoid using instantiate as much as possible. If you need to render thousands of the same thing, use instancing. Look at the DrawMeshInstanced method.
    https://docs.unity3d.com/ScriptReference/Graphics.DrawMeshInstanced.html

    If your project will take a couple years to complete, start with Unity 2020.1 beta right now. That version is the first version to support more than 4GB of content per scene. So if the 4GB per scene limit was what you were looking for a way around, just go with Unity 2020.1.
     
  9. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    Wait, so the 4GB limit is gone in 2020.1?? That's fantastic, I didn't know that! I will probably wait until it comes out to use it, but I will definitely use that feature, that will help a lot.

    I wasn't aware of DrawMeshInstanced, I will certainly look into that, thank you! The problem is that with my forests, I have maybe... 9-10 different types of trees?

    So each tree type might have, say, 50 instances, for a total of several hundred trees.

    This is why I can't (easily) use things like object pooling. I could create 9-10 different pools of 50 objects each, but at that point you might be losing a lot of the potential benefit.

    Would drawmeshinstanced work here? Could I call it 10 times for 50 instances each, and still get a benefit over instantiate?
     
  10. Ukounu

    Ukounu

    Joined:
    Nov 2, 2019
    Posts:
    208
    This thread has some relevant information:

    https://forum.unity.com/threads/upg...-32-core-decreased-fps-from-130-to-50.798966/

    Somebody else has assumed already that switching to Threadripper will help them with game dev, and reported the results.
     
  11. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    That is exactly the kind of thing that DrawMeshInstanced is excellent at. Basically, instead of telling the GPU "Draw this tree" for each of 1,000 trees, you'll be saying "Draw this tree in these 50 spots" for each of your 10 different trees.

    I can't predict what results for your trees will be, because it depends on whether your game is CPU or GPU bound. If your game is CPU bound then it'll help drastically. I've used it to draw thousands of objects with performance indistinguishable from not drawing those objects at all.

    This isn't the type of problem that pooling is useful for. Pooling is useful for when you've got lots of similar objects with short life cycles.
     
  12. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    Correction. They thought that switching would automatically provide better results but it definitely requires you design your application with it in mind. Unity's DOTS framework can scale up with high core counts but there are other routes such as separating the world state from the actual engine and handling rendering yourself with DrawMeshInstanced.

    Below is a statement by Joachim that they've tried as high as 32 cores with DOTS.

    https://forum.unity.com/threads/ent...t-on-threadripper-and-threadripper-v2.534939/

    As well as a tweet by a developer that DOTS scaled very well on their Threadripper.

    https://twitter.com/SebAaltonen/status/1030325568255012864
     
    Last edited: Jun 7, 2020
  13. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    That
    That sounds fantastic! I will definitely check that out. My project does seem to be CPU bound at the moment (Although I have an older computer, so I'm guessing the graphics card is probably holding it back too) so that should help a lot.
     
  14. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,570
    Pooling is used to reduce CPU cost of spawning/despawning objects. And not rendering times.

    It is also possible to create a single pool for all objects you're using. (Dictionary mapping prefabbed GameObject to a list pr set of existing instances would do it)
     
  15. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,620
    Consider what kind of computer your audience is likely to have when they're playing your game, too. You're right that buying nice hardware for development helps with things that aren't optimised yet, and with the overhead that comes with running an editor on top of your game. Still, you want to make sure that the game itself runs nicely on hardware that the majority of your audience has, and contrary to popular belief not all optimisation can be left until towards the end.

    For instance, switching to DrawMeshInstanced can make rendering of repeated objects much more efficient, but it only handles the rendering. If those GameObjects have any other functionality - collision, audio, etc - then you'll want to consider early on how to handle that, too. (And pooling could come in handy there.)
     
  16. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    According to Unity, the 4GB limit per scene was fixed in Unity 2020.1.
    https://forum.unity.com/threads/bug...n-standalone-build.441116/page-5#post-5300130

    DrawMeshInstanced is a great method to use if you want to render lots of things. If you have 10 different types of trees with 50 trees each, then you would call DrawMeshInstanced 10 times. Each call to DrawMeshInstanced will accept an array that contains the locations and rotations of all of the instances of a given type of tree. With DrawMeshInstanced, you would could draw your 500 trees using 10 draw calls instead of 500 draw calls. It will be a huge improvement for you.

    Here is a link to my instancing pool code, which uses DrawMeshInstanced to draw lots of laser projectiles.
    https://github.com/ShilohGames/InstancingPoolDemo

    That code won't direct work for trees, but you could use it as an example to get an idea how the method works.
     
  17. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    How is this done? I have never heard of "Dictionary Mapping" before!

    Do you mean creating a list of all instances of all gameobjects, and then instead of instantiating and deleting them, I would move them around the world, to reuse them?
     
  18. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    That's a good point, this is why I would tend to avoid CPU's like the Threadripper, since no gamer is likely to have a 32-core CPU for a long time, whereas a gaming or "pro-sumer" CPU like the 3950x might become more common in 2-3 years or so.

    I mean a 1080 for example is now a very common graphics card, a few years ago it would have been very high-end and available only to enthusiasts, etc.
     
  19. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    Thank you very much for that, that's very helpful!

    The only concern I have with drawmeshinstanced, is that it's a single call, right?

    WIth instantiate, I can put it in a coroutine, and draw, say 50 objects over the course of a few seconds. With drawmeshinstanced I would have to draw all of the objects at once, could that be slower?
     
  20. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    With Instantiate, you dynamically create more game objects in your scene based on a prefab you made. If you add lots of game objects to your scene, you bog your game down.

    With DrawMeshInstanced, you do not add game objects to your scene. What you do is directly add GPU based rendering commands to each frame. You call DrawMeshInstanced once per frame in an Update method, and you completely skip all of the game object steps.

    I have worked with DrawMeshInstanced and Instantiate. When you are dealing with lots of units, DrawMeshInstanced wins by a mile. With laser projectiles, DrawMeshInstanced is about 6-10 times as fast. DrawMeshInstanced is about 6 times as fast as a really well designed object pool, and DrawMeshInstanced is about 10 times as fast as simply using Instantiate.

    In my 3D space games, I can deliver 100-300 FPS using DrawMeshInstanced. Using Instantiate, the same hardware would get about 10-30 FPS. The performance difference is not even close.
     
  21. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,187
    There is a niche audience out there though that wants games that push the boundaries of what is currently being designed and created for games. Once upon a time Dwarf Fortress was legitimately a demanding game that could very easily take hours to generate your worlds on very expensive hardware.

    If your game pushes world simulation to a completely different level from modern open world games it's very possible that you will find an audience willing to live with having to purchase high-core, high-memory hardware. Eventually hardware will catch up.
     
  22. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    If you have never worked with DrawMeshInstanced, I recommend downloading my GitHub project for what I call Instancing Pools. This is a laser projectile solution that uses GPU Instancing with the DrawMeshInstanced method. Don't add my code to your current project. Just open it as a separate project so you can play around with it.

    When my code is running in the Unity editor, notice how it is not spawning any new game objects. Look at the frame rate and setpass calls. Notice how lightweight the solution is. You cannot match that level of performance using Instantiate, because Instantiate would create a separate game object for each laser in the scene. If you are still not convinced, try replicating my demo using Instantiate instead of DrawMeshInstanced, and then measure the performance difference.

    Anyway, you can apply the same concept to any situation with lots of units including trees. Using DrawMeshInstanced to draw trees will mean no wasted game objects and no wasted setpass calls.
     
  23. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    I will definitely do this, thank you!

    I completely misunderstood how drawmeshinstanced actually works, it does sound incredibly powerful.

    I think I will definitely need to do some experimenting with it though.

    So basically, instead of creating one object at a time using instantiate, I am updating an array with all of the objects I want to draw every frame?

    So, as the player moves, and I page out distant objects and page in new ones, I actually just update the data in the drawmeshinstanced array?

    I would have to rework some of my creation code, but it sounds like it would certainly be worth it.

    Does collision work with drawmeshinstanced?

    This is the kind of thing I am doing, build a complex game that appeals to a niche audience now, and hope it gets more mainstream as the hardware becomes more commonplace. I am a little less afraid to develop on a very high-end system now, it seems my logic might have been sound after all!
     
  24. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    DrawMeshInstanced needs to get called each Update frame to draw all of the instances. It is extremely lightweight and does not create any game objects in the scene.

    If you look at my source code, you will see that I use an array of a custom struct to hold the data about the laser projectiles. Then I have an array of Matrix4x4 that I actually feed into the DrawMeshInstanced method.
    https://github.com/ShilohGames/Inst...ngPool/Scripts/InstancePoolLasersBaseClass.cs

    DrawMeshInstanced is only for rendering. It does not do anything with physics. So if you need colliders, you have to implement a solution for managing colliders. If you are dealing with hundreds or thousands of trees dynamically, you probably only need to manage colliders for the trees closest to the player. And you may be able to design those colliders as just the lower portion of the tree that the player interacts with instead of the entire tree. Possible try setting up a simple capsule collider for the tree stumps of nearby trees.
     
  25. PhoenixAdvanced

    PhoenixAdvanced

    Joined:
    Sep 30, 2016
    Posts:
    316
    Ah, I think it get it now, I can certainly see how that would speed things up substantially!

    I could easily rig up a system to use instantiate/destroy only for objects very close to the player, and use drawmeshinstanced for everything else. I also have scripts, etc, on these objects that I need, but I only need that on objects that are within a short distance of the player, so this looks very promising!

    Thanks a lot for your advice, and sample project!
     
    ShilohGames likes this.
  26. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,023
    Exactly. That is how you can scale it.
     
    PhoenixAdvanced likes this.