Search Unity

Other Mega runtime Performance tips thread (unity & HDRP) - Guide to better runtime unity performance

Discussion in 'High Definition Render Pipeline' started by PutridEx, Sep 14, 2021.

  1. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    This is HDRP focused, with a focus on HDRP specific optimizations. Like shadow caching.
    Most tips on here apply to all render pipelines though.


    I'll start with some of the obvious.
    - CPU: Drawcalls/batches, SRP batcher makes it so materials/setpass calls less expensive, but they're still eating performance. Even in the old Render Pipeline, you get much better performance when you manually combine meshes than the automatic unity static or dynamic batching. It is recommended to use cell based mesh combining, so culling and LODS still work. (Although if you need a lot of materials, read down about SRP batcher, it's really good)
    (SRP) Batcher is a draw call optimization that significantly improves performance for applications that use an SRP. The SRP Batcher reduces the CPU time Unity requires to prepare and dispatch draw calls for materials that use the same shader
    variant. Note that it's not just the same shader, but it also needs to be the same shader variant

    - Both: real-time shadows! I believe, since 2021.1, HDRP supports shadow caching.
    Unfortuantly it is frustum view based*(edit: only directional light, read first comment), but you can still get *massive* gains if you render shadows for static objects as "On-Demand". To avoid shadows popping up, don't wait too long to update. Usually, every half a second is good. Although for lights other than the directional light you probably don't need to update it as often since it's not frustum based. Experiment and find the right time for your project. Not all lights need shadows, only enable them for ones you think need it.
    Shadows rendering twice per 60 frames vs every frame. big CPU win :)

    NOTE: for directional lights (the only light type where shadows are cached based on camera frustum view) you need to update quite a bit more often than usual, otherwise there will be popping issues.


    - CPU: il2cpp! Everyone should use il2cpp for production if your project works with it. And if it doesn't, try to make it work if it's not too much work. Usually it works out of the box unless you did one of the few things il2cpp might not support. It's a big performance gain, for little work from yourself. It even improves the C# rendering part, not just your scripts.

    - Both: Find your bottleneck! Is it the CPU? The GPU? Make a development build with autoconnect profiler, is the CPU waiting for the GPU? If it is, you're GPU bottlenecked.

    - Both: Profiler! Use the profiler. It also had a GPU module, so you can use it to see what's eating your CPU and GPU performance. It's not a direct performance tips, but it is so easy to use that it's a no-brainer. Seriously, use it.
    It is strongly recommended you read the following thread to learn the profiler: Other - Guide to unity profiler: HDRP version (And how to read profiler data) - Unity Forum

    - GPU: VOLUMETRICS. They're enabled by default for each light, disable them and only enable them on the lights you need volumetrics for. Otherwise, it's wasted GPU performance. Nothing is free, they're expensive.

    - GPU: Dynamic resolution! The good options are only available on 2021.2 and beyond (FSR/DLSS/TAAU). If someone is playing on anything higher than 1080p, especially 4k, it's a no-brainer. Maybe even enable it by default in that case, with a sensible screen % of course.

    - Both: Occlusion culling, although at some point, if your scene is big enough, it might be better to do manual occlusion culling.
    TIP: Use the asset "Perfect Culling" in the unity store. It's superior to unity's umbra occlusion culling in every way: It has little to no overhead at all, so no matter how big your project, it's worth it. It also does a much better job in the culling process.

    - Both: Camera far/near clipping plane. Set it to a sensible value based on your testing.

    - Both: Dynamic render pass culling. Only 2021.2 and up.
    This skips rendering passes based on what's visible for the camera. Find it at the bottom of HDRP global settings.

    - Both: Be careful with what features you use, don't just enable whatever with no consideration.
    For example, realtime reflection probes are far too expensive and are only really viable for testing, demos.
    And maybe your artistic blur at 0.001 intensity needs to take a backseat and get disabled :)

    - Both: Forward/Deferred. Does your project use a lot of realtime lights?should use deferred then.

    - Both: many features cost quite a bit more with little to no visual differences. Medium to high for many things is a large performance increase, but a very hard to notice visual upgrade. Medium is good :)

    - CPU: SRP batcher: If your scene is heavy, and you can't just reduce drawcalls as you need it for your scene. then SRP batcher (enabled by default) is really good! Good CPU gains. Big improvement to draw call performance. When creating your level, you can get away with using many materials, as you need, but with the same shader variant. And in general try to use the least number of shaders you can get away with. Check the SRP batcher documentation page, a lot of helpful info.


    Edit 2:

    - Both: This is less useful with HDRP since we have shadow caching, but still useful.
    You can use shadow proxies, which are just objects/cubes/planes with the mesh set as Shadows Only. And use that to setup your shadows. Benefit is, you use little to no realtime shadows. Disable them on your game objects. The idea is to disable "shadow casting" on most of your objects and replace it with a few planes/cubes to create shadows for your world. Obviously, it won't be as good or accurate.

    - VRAM: In your camera frame settings (+global settings and active HDRP asset settings), check for features you aren't using and disable them. Also helps with game size/VRAM.

    - VRAM:
    In 2021.2 and before, all your HDRP assets assigned in 'quality' are using VRAM as if they're all active. If you're only using one of them, remove the others.

    - Both/Terrain: Are you using a big terrain? If you are, check out the 'Pixel error' parameter in the terrain settings. When your terrain is textured, it's pretty hard to notice the difference. You can most likely reduce polygons by 10-35% or more while retaining the same visuals. Experiment and choose the right value for your project.

    - VRAM(experimental, possibly not a good idea): HDRP has texture streaming, in case you need it -- for those who didn't know :D

    - CPU: Keep an eye on how much garbage you generate each frame from your own scripts. Try to keep it as low as possible. Also worth checking incremental garbage collection, I believe it's in "player" settings -- enabled by default in recent versions, usually it's beneficial to keep it on.
    HDRP rendering code should generate no garbage, if it does, report it as a bug.

    - CPU: There are things you don't really need to do every single frame, even if they're in update -- anything that has to be in update, and is expensive, consider the viability of running it every now and then, depending on your project ofc.

    - Both: If you're creating your own shaders, keep an eye on their complexity and performance costs, usually this isn't an issue but it's good to remember.

    - Both: Check out global settings, it usually has a bunch of post processing -- you can keep what you want, but disable the ones you don't - in particular it has motion blur which is quite expensive relative to others.

    - Both:
    In 2021.2 HDRP has volumetric clouds -- keep the num primary steps and light steps as low as possible while retaining good visuals -- primary steps in particular. A very low amount could lead to a lot of noise in the clouds, so find a good balance.

    Edit (3) For foliage:

    - Both: If you're using a high density of grass/trees, make sure they're instanced. HDRP supports terrain grass (which uses indirect instancing which is good for perf) starting from 2021.2.

    - Both: One idea that might help depending on your needs is to combine a few grass meshes into one prefab, and use that prefab in the terrain. That way you can place less grass instances but still have a very high density.

    - Both: I mentioned this above but if you have a big terrain, remember to play with the "pixel error" parameter. It can massively reduce terrain polygons with little to no noticeable changes. Find a good value based on your project.

    - Both: Don't forget tree LODs and billboards!

    - Both: Split up your TERRAIN/SCENES! If you have a big world, you probably need to stream it in or out. Otherwise your terrain will have a big cost. The terrain object in unity does it's own culling and it can get really expensive. You can split up your terrain into 5, 10, 20, depending on how big your world is. There's assets on the store to do that, or you can do it manually. One basic way to do is it to split your world into scenes, rather than having one massive scene, you'll have 2, 5, 10, etc (scenes) based on your needs. And load them in using unity's (SceneManager.LoadSceneAsync) API. This will also help with your VRAM usage. Don't forget to unload scenes you're not using anymore. This doesn't just apply to your terrain, depends on how you do it but you can include props, objects, other meshes, etc in your scene.

    Edit 4:

    - Both: Consider level design changes if all else fails, if you're struggling with world streaming, or performance, you might have to change up your map in a way that allows Occlusion Culling (tip: Use Perfect Culling in the asset store, I'm not affiliated, it's just that good compared to umbra) to do a better job, and in other areas for streaming you can create a buffer zone between Point A and B, that allows you do stream in/out your levels while the player is transitioning through this buffer zone. You It's not ideal, but if you can't see another way, this can save your game.

    - Both: Remember to provide a graphic settings menu for PC, this will make it possible for your game to scale down, and In general it's a nice thing to find in games.

    - Both: Keep an eye on your VRAM usage. If a player has 8GB of VRAM, but your game requires 10GB, that extra 2 will be very expensive since it'll have to move textures between main and video memory. Which is much slower than VRAM. Performance will degrade massively, most of the time to unplayable levels.

    - VRAM: Build your game, close everything, including unity editor, browser, steam, videos, etc.
    Look at how much VRAM your computer is using, then run your game, and look at how much VRAM is being used up, you should get a rough idea of how much your game is using by taking the current amount and deducting it by the previous idle amount.
    IF your VRAM usage seems to be an issue, then get the memory profiler package and look at what exactly is consuming your VRAM. How much of it comes from your project, and how much from the render pipeline. From there, work on reducing it.

    - VRAM: To lower VRAM usage, you can use texture streaming (HDRP, texture streaming is still experimental, possibly not a good idea to use yet), or lower texture res, split your world into multiple scenes, and stream your world, unloading scenes when appropriate to reduce VRAM usage, HDRP can sometimes also be an issue so you have to work around that and make it consume less VRAM by changing settings and following other tips on here.

    - Both: Mesh LODs. They make a massive difference depending on your scene and detail.
    Imagine you're using 200 rock meshes, very detailed. After a certain distance most of the time they're not even visible, or if they are, the detail is completely lost on them. They're an extremely small part of the screen. This applies to all meshe, cliffs, buildings, props, etc.
    LODs also make it possible for you to make your objects high poly up close without completely sacrificing performance. Since most objects in your world aren't very close to the player.
    You can also completely cull objects after a certain distance.
    LODs are quite important, but at the same time don't overdo it! Your small rock doesn't need 5 LODs! If it's small, you might be able to get away with just 2 LODs. 1 Is the base, second is completely culled.
    Remember to judge your objects size and polycount when deciding on how many LODs to use. LOD transitions have a cost. Search around if you're not sure how many LODs your mesh should have.

    - Both: Polygons, you'd be surprised how sometimes it makes little difference in quality compared to the same mesh with lower polygons. Depending on your project, texture do a lot of the heavy lifting in making your objects look detailed.
    Again, this depends on your type of project, but unless undesired for a specific reason, keep your polycount in mind and make a budget for your game, based on your own testing. (Remember players with weaker hardware than yours if you're still targeting them).

    edit 5:

    - Both: Objects that cast shadows will be considered shadow casters, even if they receive 0 direct lighting and thus, they have no shadows, shadow calculations are still running for them!
    This especially has a massive GPU cost. You can find the "cast shadows" parameter in the inspector for your object, you can select "off, on, shadows only" - if your object never receives direct lighting and you know that, set it to off. Also, if it's a small object with LODs, consider setting "shadow casting" to off on the second or third LOD. You decide based on the object and its size.

    To give you an idea on how important this is, consider the following pics:

    With shadow casting "on":


    With shadow casting "off":


    Over 4ms on the GPU! On an RTX 2070 super. And they provide nothing at all, none of these barrels are creating any shadows.

    The scene for the above profiling:


    The barrels are clusters, each "mesh" is around 6-7 barrels.
    With the above tests, these barrels were switched from shadow casting on and off.
    There's only one light in the scene (directional light), with shadows enabled.

    The barrels in the picture are 525 objects (some are on top of each other).

    - GPU: Volumetric fog in HDRP can get _extremely_ expensive. If you only care about volumetric fog from the directional light, lower the quality. You can do that in the fog post process. Set it to "low", or even switch to manual control and lower it even more.
    It can get crazy real fast.
    Also, reprojection denoising is very expensive for volumetric fog. Consider using gaussian.

    If you care about volumetric fog for point/spotlights, then unfortunately you might have to struggle with some noise and flickering, unless you're willing to give up half or more of your GPU budget. Also consider lowing volumetric fog distance if you're using punctual light volumetric fog, to fade them off at a distance, otherwise volumetric fog will look very pixelated far away.

    In my opinion, HDRP needs to improve volumetric fog performance, to get better visuals at better performance because currently you'll have to struggle with massive punctual light volumetric fog flickering and noise, especially if Anisotropy is over 0, the higher it is, the more problematic noise/flickering will be.

    edit 6:

    Both: Texture Atlasing - although HDRP has SRP batching, it's still faster to have less materials to begin with. They're ideal for meshes that are scattered around at high densities. They also provide you with the ability to combine all meshes that use the same material.
    Ideally, you'd combine them intelligently based on a cell system or manually. Close meshes combine into one and so on. Otherwise, if all of them are a single mesh, you'd lose frustum view culling & occlusion culling. (You can use combine mesh studios for cell-based mesh combining with LOD support *not affiliated)
    Note: Texture atlasing might increase your VRAM usage, keep an eye on it

    GPU: Shadow filtering quality - HDRP has different shadow filtering quality options, that control the how soft the shadows are. "high" is the most expensive, the issue with high quality shadow filtering is with smaller lights (point/spot). These lights with high filtering will have a large performance increase, much larger than the directional light. Worst of all, that increase can't be cached, so the performance will be there no matter what as long as the shadows are on.
    The increased performance cost from high quality shadow filtering will be added to "deferred lighting", shown in unity's profiler with the GPU module enabled.

    If your project utilizes point/spotlight shadows, consider switching to medium.
    HDRP developers should separate the option into two, one affecting the directional light & another for point/spotlights. The cost with the directional light high shadow filtering isn't as crazy as point lights.

    Both: Baked lighting - You can set lights to baked, that way there will be zero performance cost of said lights, their shadows too if you want. You will lose specular lighting though; only real-time lights are capable of that without custom shaders.
    The cost you'll pay is baking times and fighting unity's lightmapper

    Both: Managing GPU usage for PC and console games | Unity - Good post from unity with a lot of helpful information to improve performance, some already mentioned here, but it's a good read.

    Both: Configuring your Unity project for stronger performance | Unity - Another good read from unity focused on performance.

    CPU: Make sure "graphics jobs" is enabled. You can find the option in project settings > player settings. This will multithread rendering code, leading to a big improvement as the CPU graphics overhead on the main thread will be massively reduced. This is enabled by default but check to be safe.

    Both: Shadow cascades - In HDRP, just like all other render pipelines, you have shadow cascades. To understand what they are, and their benefits check this page: Click Here
    Shadow cascades are a technique used in real-time 3D graphics to improve the quality and accuracy of shadows in a scene. The basic idea is to divide the viewable area of the scene (the "frustum") into multiple sections, or "cascades," and render the shadows separately for each cascade. Each cascade covers a larger area and is rendered with less detail and lower resolution than the previous cascade, with the closest cascade having the highest detail and resolution.

    This approach allows for a balance between performance and quality, as the GPU can spend more resources on the cascades that are closest to the camera, where the shadows are most visible, while still providing adequate shadow detail for the rest of the scene. This technique is commonly used in video game engines and other real-time graphics applications to improve the visual quality of shadows and reduce the performance impact of rendering them.

    Now, one thing you must know is that shadow cascades aren't free and will cost you precious GPU and CPU cycles. To understand how many cascades you need to use, you have to consider your directional light shadow resolution, scene/map size, and shadow render distance.
    In the end, test it for yourself -- set it to different amounts and see if you like the result.
    Shadow cascades should be a part of your game's graphical settings page, as it's a good way to scale down for weaker hardware.

    The less cascades you use, the faster shadow rendering will be on your GPU and CPU.

    Check this performance example of a middle-sized environment with many objects:
    (This is only showing GPU performance, but it also effects CPU performance)

    4 cascades:


    3 cascades:


    2 cascades:



    Both - SRP batcher & efficiency: There's an issue that few people know about when using the SRP batcher, which is that it will separate draw calls if they have different shader keywords, forcing a new draw call.

    With SRP batcher, as long as you use the same shader (even with different materials), it will improve your performance by reducing the cost of drawcalls, but different keywords per material can reduce its efficiency.

    To found out more, you need to use the frame debugger, from that you can tell why some calls are separated and weren't combined with the previous SRP batcher call, usually the cause is a completely different shader, or a difference in shader keywords (imagine two materials using the same shader, but one material has receive SSR enabled, the other has it disabled).

    Also, your materials will keep keywords from previous shaders, even though they're useless, so this will mess up with SRP batcher even more.

    To remove them, follow this video:


    If you have any "invalid keywords", click on the "-" to remove.

    To see the difference this can make, same scene, same camera view:

    25 steps to render Gbuffer, 171 overall for the frame.

    After some keywords changes:


    Down to 8(!) steps for the Gbuffer, and 135 total.


    Also, you can see in the pictures "Batch cause, SRP: node use different shader keywords."
    or "different shader"
    this gives you the reason why it had to do its own draw call and wasn't a part of the previous SRP batch, and so on.

    In this scene, SRP batcher can be improved even further with more keyword changes & shader changes.



    -----
    misc info:

    combined meshes: When you combine meshes, it can give improved CPU performance due to reduced draw calls if done to the right objects but might slightly increase your GPU performance. Remember to always use unity's profiler (CPU & GPU modules) to know exact numbers and if it's worth it for your particular scene/project.

    General quality: Your game quality settings should have hardcoded settings for general features, such as SSAO, SSR, Shadow distance, etc. (You can also expose them in an advanced graphical settings page)
    For example, you can customize how many samples SSAO does, full or quarter resolution, etc. - This goes for the majority of settings in HDRP.
    With HDRP, you get access to a lot more options. This will help scale down performance as required, or push visuals to the extreme depending on the user's hardware.

    Profiler & frame debugger: Improving performance has 1 extremely important requirement: Knowing your bottleneck, knowing the performance cost of your scene.
    It is absolutely essential that you learn to use the unity profiler, both the CPU and GPU modules, and to a lesser degree learning the frame debugger which helps with knowing exactly how many steps are taken to render a single frame of your project, frame debugger can help you debug the SRP batcher, and thus improve its efficiency, leading to improved performance. (see "SRP batcher & efficiency" above).

    How to read profiler data, Meaning of GPU metrics (HDRP):
    1. "ForwardOpaque": The cost of your opaque objects (the majority/all of your objects placed in the scene. This is decided by poly count, MSAA cost also goes here, and maybe drawcalls)
    2. "RenderShadowMaps": The cost of rendering all shadows in your scene, this is effected by the amount of objects that cast shadows, their polycount, shadow render distance, and the amount of shadow cascades you're using. (doesn't include Contact shadows if you have it enabled).
    3. "Volumetric Lighting": This is the cost of your volumetric fog, decided by the quality options chosen in the fog post process override, also affected by the number of lights with "volumetrics" enabled. The denoiser selected in the fog override has a cost as well.
    4. "Volumetric Clouds": Cost of using Volumetric clouds, effected by (num of primary steps) and (Num of light steps) selected in volumetric cloud post process override.
    5. "Post Processing": This is the cost for some of the post processing available in HDRP, like Bloom, Exposure, motion blur, etc.
    6. "ForwardDepthPrepass": This is the cost of doing a DepthPrepass in forward mode.
    A depth pre-pass eliminates or significantly reduces geometry rendering overdraw. In other words, any following color pass can reuse this depth buffer to have one fragment shader invocation per pixel. This is because a pre-populated depth buffer contains the depths of opaque geometries closest to the camera. The subsequent passes will shade only the fragments passing the z test with matching depths and avoid expensive overdraws.
    7. "Contact Shadows": Cost of doing contact shadows, decided by quality options in it's post process override.
    8. "Ambient Occlusion": Cost of doing SSAO, decided by it's post process override quality options.
    9. "ObjectsMotionVector": Cost of object motion vectors, decided by the amount of meshes with object motion vector (like animated grass).
    10. "ColorPyramid": Not 100% sure, but I believe this is decided by the "color buffer format" and/or "Buffer Format" in your HDRP asset.
    11. "BuildListList": cost of building a light list in your scene, decided by the amount of active realtime lights in your scene and possibly their range.
    12. "OpaqueAtmosphericScattering": This cost comes from your fog override. (HDRP).
    13. "CopyDepthBuffer": copies depth buffer :D


    Deferred GPU metrics are very similar with some changes:
    ForwardOpaque is split into multiple metrics in deferred mode:
    1. "Deferred Lighting": Which handles lighting costs, this is affected by the amount of realtime lights you have, and most importantly their range. Range makes a big difference in deferred, you can have many lights with very little performance cost as long as their range is small. The bigger it is, the more expensive.
    2. GBuffer: Cost of your rendered objects, affected by polygon count.

    To learn more on how to use the profiler, check this thread: Other - Guide to unity profiler: HDRP version (And how to read profiler data) - Unity Forum
     
    Last edited: Mar 10, 2023
  2. francescoc_unity

    francescoc_unity

    Unity Technologies

    Joined:
    Sep 19, 2018
    Posts:
    193
    Quick note about cached shadows, only the directional light ones are view dependent.

    The culling for cached shadows ignore the view frustum culling and uses only the area of influence of the light.
     
    PutridEx likes this.
  3. francescoc_unity

    francescoc_unity

    Unity Technologies

    Joined:
    Sep 19, 2018
    Posts:
    193
    PutridEx likes this.
  4. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    What do you mean by "area of influence of the light"?
    I tested it just now and the cached shadows are rendered onDemand even when outside the lights range. I assume my understanding is incorrect though.

    In my test, shadow rendering onDemand every half a second worked great for non-directional shadows.
    Excellent :D
     
  5. bonickhausen

    bonickhausen

    Joined:
    Jan 20, 2014
    Posts:
    115
    Hi there!

    I have a realtime shadow casting light. This is what it looks like:
    upload_2021-11-12_14-56-45.png

    This is my sample scene:
    upload_2021-11-12_14-57-5.png

    By moving the set of chairs to the side, I get this:

    upload_2021-11-12_14-58-8.png

    I get a duplicate shadow, from where the chairs used to be.

    They are not marked as static. Their mesh renderers do NOT have "Static Shadow Caster" toggled on.

    What am I doing wrong? Shouldn't I be getting mixed cached shadows here?
     
  6. Ruchir

    Ruchir

    Joined:
    May 26, 2015
    Posts:
    934
    Maybe they were baked into shadowmask at some point and you haven't rebaked since then. Try checking it in the debug view once.
     
  7. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    W
    I think this is a problem with shadow caching in scene view. In scene view it's problematic sometimes, weird issues etc.

    I say:
    Go in-game, using the ingame camera look at the chair, and move the char position. Is it working as intended?
    If it is, than it's some scene issues, tick "always refresh' in scene view, maybe it helps.
    But also maybe worth making a bug about it. Menu: Help > report a bug

    also, it could be what Ruchir said.
     
  8. Dessix-Machina

    Dessix-Machina

    Joined:
    Aug 5, 2013
    Posts:
    6
    "On Demand" is described as:

    It does not mean that it will always update when contextually appropriate- it just allows you to schedule when to update the map.

    You could, perhaps, toggle the mapping method between the two based on player proximity, and have it do less-frequent but manual updates when the player is further away? This is where a better real-time occlusion culling system would save us a lot of performance.
     
    ftejada likes this.
  9. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,073
    Is it the number of lights that causes the fps drop or the ammount of volumetric effect on the screen.
    Like one big light on the screen with volumetric effect vs multiple small lights that take only small part of the screen.
     
    Gasimo likes this.
  10. UnityLighting

    UnityLighting

    Joined:
    Mar 31, 2015
    Posts:
    3,874
    First thing is the poly count in performance drops (solution is the Simplygon)
    Then the shadows(solution is the baking lights or culling shadows for small objects or in far distance objects)
     
  11. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    I've done everything here that works for my project but still can't reach 1080p60 without DLSS. Are there any examples of HDRP nature scenes (without baked lighting) that get decent performance? Book of the Dead no longer runs in current versions, and anything I get from the asset store has similar S*** performance.
     
    Last edited: Feb 17, 2022
  12. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    The amount of volumetrics shouldn't matter, but the resolution might if you're low on GPU memory. I'd say number of lights has the biggest impact.

    You should tweak the amount of local fog in the HDRP asset. The default setting is ludicrously high and you can easily save a gigabyte of memory with this.
     
  13. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    what's your hardware?
    Make sure all your grass/trees is instanced, ideally indirect instancing - which I think HDRP supports starting from 2021.2 when using terrain grass.

    What's your bottleneck? CPU or GPU? Use the profiler to found out (ideally in a build)
     
  14. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    RTX 3060. Bottleneck is GPU so instancing shouldn't matter.

    Here's what stands out in Nsight:
    - 3ms on BuildLightList, seems to include shadows? Two directional lights, one renders shadows at 2048 res.
    - 3ms for SSR seems excessive
    - 3ms for volumetric lighting (one directional light)
    - 1ms for volumetric clouds

    EDIT:
    I even got a frame with over 8ms for SSR. Is it supposed to be that heavy?
     
    Last edited: Feb 17, 2022
  15. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    SSR cost seems a bit crazy, not sure why it would be so high. Are you on DX11 or 12? Is it on high?
    - One thing you can try is caching shadows for directional light. Thing is with directional light shadows caching is based on camera frustum view so you'll still have to update it often otherwise there will be popping.

    - You can try reducing volumetric light quality. You can switch volumetric fog quality to manual and set the resolution and other settings yourself. As you probably know this will lead to noise and less quality but maybe you can find a good balance.

    - Why 2 directional lights? For time of day system? That might be making a big difference. Try only using one and see if there's a good perf boost or not.

    - What about shadow distance? Assuming your project is an open world, maybe reducing the shadow distance & using baked shadows after a certain distance would be worth considering.
     
  16. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    I'm on DirectX12. I have dynamic time of day so any kind of baking is a no-go. SSR is set to medium (32 ray steps).

    Shadow caching is just going to lead to lag spikes.

    Reducing fog quality seemed to have some impact at least.
     
    Last edited: Feb 17, 2022
  17. UnityLighting

    UnityLighting

    Joined:
    Mar 31, 2015
    Posts:
    3,874
  18. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    As UnityLighting said, DX12 in unity is indeed quite a bit slower than DX11. Worth checking out the perf differences when using DX11.
     
    Gasimo likes this.
  19. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    Interesting. I've seen very big performance gains for DX12 on low-end hardware but here DX11 does indeed seem to be much faster. Seems to break DLSS though.

    I've tried Vulkan in the past but that just leads to graphical artifacts and crashing.
     
    Gasimo likes this.
  20. dgoyette

    dgoyette

    Joined:
    Jul 1, 2016
    Posts:
    4,195
    I was curious if there was any good documentation on what kind of performance gains we're talking about? So far, all I've found are very specific, small comparisons on the performance of specific functions that absolutely don't represent a meaningful fraction of my project's performance.
     
  21. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    The gains can be massive when you're using P/Invoke, which is used by a lot of the Unity codebase. I've done texture operations in C code, and the speedup was just immense. Talking 1000x improvement.
     
  22. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Another edit, added level design tip & some VRAM considerations
    If anyone has other tips or just info to consider when it comes to performance, especially when it comes to HDRP, do give :)
     
    koirat likes this.
  23. Deleted User

    Deleted User

    Guest

    I generally keep my shadow distance low(for directional light)... If u are using real-world scale I keep shadow distance to 80-200m and keep the resolution to 512 for 80 and 1024 for greater distances ... For distant shadows I use contact shadows only with distance set to mor then 10km.. and to be honest there is no difference at all + shadows of very distant object is also visible.... Thanks to the contact shadows!!....i have tested these settings only in a forest scene... Don't know how it plays in yours.... And for grass, disable shadows for them and again just use contact shadows on them... Use tris count of your large trees between 10k and 4k (keep as low as possible in the given range especially for open worlds) if covering old gen like ps4 (if next gen u can use in the range of about 10k-25k).... Limit the use of too many textures and materials and use atlasses and virtual texturing ... Use simplified proxy shadow casters for your vegetation (most openworld games on PS4 like horizon zero dawn and ghost of tsushima did it)... Reuse your vegetation assets alot and just use two- three variations of a tree type(look at rdr 2 lots of reused assets across the world)...use the new instanced terrain tools for performance
     
    blueivy, Wolfos and PutridEx like this.
  24. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    868
    How do you change the resolution in respect to shadow distance? Never saw such a setting in HDRP.
     
    olavrv likes this.
  25. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    In the pipeline settings under Lighting > Volumetrics, the max local fog on screen is set to 64 by default. At high resolution this takes 710MB of VRAM. You can probably set this to a more sane value like 3 or something and save hundreds of megabytes.
     
    PutridEx likes this.
  26. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    It does this by default. They're called cascades.
     
  27. Qleenie

    Qleenie

    Joined:
    Jan 27, 2019
    Posts:
    868
    I am using cascades, but I understood that all cascades use same resolution for shadowmap, or is this configurable? If yes, where?
     
  28. Deleted User

    Deleted User

    Guest

    Think twice when using any screen space effects in unity especially SSGI and SSR and can make your game crawl !!... These things are extremely heavy and quality wise not that great even at ultra... SSGI produces lots of noise and some artifacts in low light conditions like cloudy day or dim lit rooms even at full resolution and again extremely heavy, SSR isn't usable at all, when set to medium or low because of some horrible lining artifacts it produces, it gives expected results at ultra and high but it's extremely heavy then and still not that great when compared to legacy pipeline SSR!! I recommend using asset store alternatives to them for better quality and performance!! These things really need to be improved because these don't seem to be production ready at all!!
     
    ftejada and blueivy like this.
  29. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    I figured out how to improve SSGI in cloudy conditions by baking a single Enlighten probe with a "ground" cube below it. This means that it doesn't sample the skybox from below on ray miss.

    SSGI is heavy, yes, but it also adds a lot to my game. SSR isn't that heavy in my case? But I have noticed those artifacts.
     
  30. Deleted User

    Deleted User

    Guest

    Thanks!! Can u provide a video to show that SSGI isn't producing any noise in your environment... I had tried to set the SSGI fallback to only reflection probes but didn't work (this is not similar to your technique, but won't it do the same of not sampling the skybox from below??)
     
    Last edited by a moderator: May 15, 2022
  31. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951


    It's not quite free of artifacts of course, but if you go look for them you see artifacts in most games.
     
    blueivy, Deleted User and PutridEx like this.
  32. TheVirtualMunk

    TheVirtualMunk

    Joined:
    Sep 6, 2019
    Posts:
    150
    Hi there!
    Cool thread! Just wanted to add some notes (although mostly applicable for URP and mobile development)
    Just wanted to add that I've had quite a big success with using Amplify Impostors for mobile games.
    Traded around 400k tris for a couple of MB texture memory and extra (instanced SRP batched) drawcalls.
    Highly recommended!

    Also for some mobile devices, it can be good to decrease the render scale below 1 as some have very high DPI.

    Moreover, keep an eye on the development of adaptive performance.
     
    Last edited: Jun 23, 2022
  33. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Made an addition about "shadow casting" and its effect on performance + talked about volumetric fog cost.


    - Both: Objects that cast shadows will be considered shadow casters, even if they receive 0 direct lighting and thus, they have no shadows, shadow calculations are still running for them!
    This especially has a massive GPU cost. You can find the "cast shadows" parameter in the inspector for your object, you can select "off, on, shadows only" - if your object never receives direct lighting and you know that, set it to off. Also, if it's a small object with LODs, consider setting "shadow casting" to off on the second or third LOD. You decide based on the object and its size.

    To give you an idea on how important this is, consider the following pics:

    With shadow casting "on":


    With shadow casting "off":


    Over 4ms on the GPU! On an RTX 2070 super.

    The scene for the above profiling:


    The barrels are clusters, each "mesh" is around 6-7 barrels.
    With the above tests, these barrels were switched from shadow casting on and off.

    The barrels in the picture are 525 objects (some are on top of each other).
     
    Last edited: Nov 22, 2022
    Deleted User and tmonestudio like this.
  34. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Worth mentioning: Modern GPUs can render tens of millions of polygons pretty fast!
    The above barrels are all LOD0 with all other LODs deleted.
    With shadow casting off, they're about 15M tri and 29M vert!

    They're all rendered in 3.5ms with forward mode, and about the same with deferred.
    GPUs can handle even more, but the biggest performance cost here for these props might not be the crazy amount of polygons, but small triangles. They're all LOD0 and fairly small, so far away barrels end up with very small triangles that increase GPU cost quite a bit.

    So, you might not need 2-4 LODs, you can get away with a single LOD1 and then cull after that. The purpose of LOD1 is to reduce complexity of objects at a distance. HDRP actually has a debug mode for this you can use called "vertex density" and "quad overdraw". You can also add an imposter for final LOD if you don't want to cull it.
    In the end it depends on object size, your map, etc.

    upload_2022-11-22_20-21-9.png
    (This is vertex density debug mode in HDRP)
    Notice how all the barrels from a distance are completely red, while the big cliffs and nearby barrels are fine.
    The cliffs are huge and not polygon dense for their size (originally a small cliff scaled up), but the barrels are small and very high detail.
    You want to avoid that red as much as you can with LODs/imposters/culling.


    (Quad Overdraw)
    this displays small/thin triangles

    edit: This doesn't mean LODs aren't important, or you should only have one. The addition of LODs will make a massive difference when it comes to performance. Don't misunderstand this post :D (don't be afraid to have 3-4 LODs, just don't go crazy, and consider the object you're making LODs for and its size in the world)
     
    Last edited: Jan 3, 2023
    Rewaken, apkdev, timmehhhhhhh and 5 others like this.
  35. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Added 3 new tips: (Texture atlasing, HDRP shadow filtering quality, Baked lighting)

    ---

    Both: Texture Atlasing - although HDRP has SRP batching, it's still faster to have less materials to begin with. They're ideal for meshes that are scattered around at high densities. They also provide you with the ability to combine all meshes that use the same material.
    Ideally, you'd combine them intelligently based on a cell system or manually. Close meshes combine into one and so on. Otherwise, if all of them are a single mesh, you'd lose frustum view culling & occlusion culling. (You can use combine mesh studios for cell-based mesh combining with LOD support *not affiliated)
    Note: Texture atlasing might increase your VRAM usage, keep an eye on it

    GPU: Shadow filtering quality - HDRP has different shadow filtering quality options, that control the how soft the shadows are. "high" is the most expensive, the issue with high quality shadow filtering is with smaller lights (point/spot). These lights with high filtering will have a large performance increase, much larger than the directional light. Worst of all, that increase can't be cached, so the performance will be there no matter what as long as the shadows are on.
    The increased performance cost from high quality shadow filtering will be added to "deferred lighting", shown in unity's profiler with the GPU module enabled.

    If your project utilizes point/spotlight shadows, consider switching to medium.
    HDRP developers should separate the option into two, one affecting the directional light & another for point/spotlights. The cost with the directional light high shadow filtering isn't as crazy as point lights.

    Both: Baked lighting - You can set lights to baked, that way there will be zero performance cost of said lights, their shadows too if you want. You will lose specular lighting though; only real-time & mixed lights are capable of that without custom shaders.
    The cost you'll pay is baking times and fighting unity's lightmapper (difficulty: impossible)
     
    Last edited: Jan 3, 2023
  36. pwka

    pwka

    Joined:
    Sep 19, 2012
    Posts:
    49
    You can use Bakery. At the cost of 3 extra textures (regular png, not hdr) you will get speculars (not perfect though). It's also easier to bake (gpu based), works with prefabs and so on... but has its own quirks and problems. Still, it's worth a try.

    btw awesome thread! Thanks for sharing.
     
    PutridEx likes this.
  37. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Last edited: Jan 24, 2023
    Genebris, jiraphatK and AcidArrow like this.
  38. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    edit:
    added shadow cascades and performance examples between different cascades
    & Added more info
     
    Last edited: Jan 25, 2023
  39. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    A poem about game performance: don't ask why...

    Performance, performance, ever so dear,
    GPU and CPU, always near.
    Frames per second, smooth and fast,
    A game's true measure, to make it last.

    Latency low, inputs precise,
    A player's experience, truly nice.
    Optimization and tuning, a never-ending quest,
    To make the game, truly the best.

    Memory and bandwidth, always in mind,
    To avoid stutters, of any kind.
    Performance, the key to a great game,
    A true obsession, it will always remain.

    So let us strive, to make it shine,
    Performance, the heart of the design.
    For a game that runs well, is truly divine,
    And leaves players, forever entwined.




    - ChatGPT
     
    Last edited: Jan 25, 2023
  40. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    added (2):
    Both: There's an issue that few people notice when using the SRP batcher, which is that it will separate draw calls if they have different shader keywords, forcing a new draw call.

    With SRP batcher, as long as you use the same shader (even with different materials), it will improve your performance, but different keywords per material can reduce its efficiency.

    To found out more, you need to use the frame debugger, from that you can tell why some calls are separated and weren't combined with the previous SRP batcher call, usually the cause is a completely different shader, or a difference in shader keywords (imagine two materials use the same shader, but one material has receive SSR enabled, the other has it disabled).

    Also, your materials will keep keywords from previous shaders, even though they're useless, so this will mess up with SRP batcher even more.

    To remove them, follow this video:


    If you have any "invalid keywords", click on the "-" to remove.

    To see the difference this can make, same scene, same camera view:
    Unity_s7qiMVAKaE.png
    25 steps to render Gbuffer, 171 overall for the frame.

    After some keywords changes:


    Down to 8 steps for the Gbuffer, and 135 total.


    Also, you can see in the pictures "Batch cause, SRP: node use different shader keywords." or "different shader"
    this gives you the reason why it had to do its own draw call and wasn't a part of the previous SRP batch, and so on.

    In this scene, SRP batcher can be improved even further with more keyword changes & shader changes.

    & added to misc info:
    Profiler & frame debugger: Improving performance has 1 extremely important requirement: Knowing your bottleneck, knowing the performance cost of your scene.
    It is absolutely essential that you learn to use the unity profiler, both the CPU and GPU modules, and to a lesser degree learning the frame debugger which helps with knowing exactly how many steps are taken to render a single frame of your project, frame debugger can help you debug the SRP batcher, and thus improve its efficiency, leading to improved performance. (see "SRP batcher & efficiency" above).
     

    Attached Files:

    Last edited: Feb 2, 2023
    Gasimo likes this.
  41. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    edit:
    Added "meaning of GPU profiler metrics, forward mode, HDRP":


    1. "ForwardOpaque": The cost of your opaque objects (the majority/all of your objects placed in the scene. This is decided by poly count, MSAA cost also goes here, and maybe drawcalls)
    2. "RenderShadowMaps": The cost of rendering all shadows in your scene, this is effected by the amount of objects that cast shadows, their polycount, shadow render distance, and the amount of shadow cascades you're using. (doesn't include Contact shadows if you have it enabled).
    3. "Volumetric Lighting": This is the cost of your volumetric fog, decided by the quality options chosen in the fog post process override, also affected by the number of lights with "volumetrics" enabled. The denoiser selected in the fog override has a cost as well.
    4. "Volumetric Clouds": Cost of using Volumetric clouds, effected by (num of primary steps) and (Num of light steps) selected in volumetric cloud post process override.
    5. "Post Processing": This is the cost for some of the post processing available in HDRP, like Bloom, Exposure, motion blur, etc.
    6. "ForwardDepthPrepass": This is the cost of doing a DepthPrepass in forward mode.
    A depth pre-pass eliminates or significantly reduces geometry rendering overdraw. In other words, any following color pass can reuse this depth buffer to have one fragment shader invocation per pixel. This is because a pre-populated depth buffer contains the depths of opaque geometries closest to the camera. The subsequent passes will shade only the fragments passing the z test with matching depths and avoid expensive overdraws.
    7. "Contact Shadows": Cost of doing contact shadows, decided by quality options in it's post process override.
    8. "Ambient Occlusion": Cost of doing SSAO, decided by it's post process override quality options.
    9. "ObjectsMotionVector": Cost of object motion vectors, decided by the amount of meshes with object motion vector (like animated grass).
    10. "ColorPyramid": Not 100% sure, but I believe this is decided by the "color buffer format" and/or "Buffer Format" in your HDRP asset.
    11. "BuildListList": cost of building a light list in your scene, decided by the amount of active realtime lights in your scene and possibly their range.
    12. "OpaqueAtmosphericScattering": This cost comes from your fog override. (HDRP).
    13. "CopyDepthBuffer": copies depth buffer :D


    Deferred is very similar with some changes:

    ForwardOpaque is split into multiple metrics in deferred mode:
    1. "Deferred Lighting": Which handles lighting costs, this is affected by the amount of realtime lights you have, and most importantly their range. Range makes a big difference in deferred, you can have many lights with very little performance cost as long as their range is small. The bigger it is, the more expensive.
    2. GBuffer: Cost of your rendered objects, affected by polygon count.
     
    Last edited: Mar 7, 2023
    tmonestudio, Ruchir and blueivy like this.
  42. Stepan_33

    Stepan_33

    Joined:
    Nov 21, 2022
    Posts:
    26



    I have such a job in unity, it consumes 80% of my video memory, how can I reduce the load on the video card, since I have a laptop, my video card is (nvidia mx 450)
    (i5 1135 g7)
    (12gb ram)
     
  43. HIBIKI_entertainment

    HIBIKI_entertainment

    Joined:
    Dec 4, 2018
    Posts:
    595

    You'd need a lot of time planning your scope and scope for Hardware targets in HDRP, in general, in your case, you may have to be extra vigilant with it.

    open world vegetation with volumetrics like your post, takes a lot of processing power either as an upfront cost or runtime cost so for a laptop.
    think how baking and minimising costs of high-cost features can help you IF they're a must for your project scope.

    I would first focus on the scope of your project and hardware.
    What do you NEED, vs what's a nice have?
    understanding the costs of the features you're implementing and where it can be improved or even completely removed is and important part of HDRP especially.

    For your hardware, you might be looking at GTX 1660-1070 level performance if you had to do a desktop comparison.

    you could look at games as a study around 2016, since typically this is where the architecture was most used then.

    AAA games are okay for studying, but in terms of optimisations, you'll never really find anything, so look for similar games made in Unity around that time, it might just give you some additional ideas to help you create a scope and set your self-limits and budgets, for HDRP features on a lower end card.

    PutridEX has gone to great lengths and studies over the past year or so, to create this massive list of information.
    Read it, and try your best to understand the engine better and the costs behind HDRPs features.

    Have that help you build your limits and see if you can build your ideas from there, it's super important to set yourself limits, engines can't do everything, even if you have the best hardware.
    our imaginations need some guidance because we're all incredible thinkers.

    we can only really give you answers to explore in situations like this, as your project and your working environment and variables are going to be different.
     
  44. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    Everything @HIBIKI_entertainment said is on point.
    But also, to be completely honest your GPU is ancient; It might not be old, but in terms of performance it might as well be. 2GB of VRAM is hard to work with.

    You can only do so much with so little. At that point you should look into fully baked lighting, otherwise it'll be difficult to get good performance with HDRP on that GPU.
     
  45. Stepan_33

    Stepan_33

    Joined:
    Nov 21, 2022
    Posts:
    26
    My video card pulls that fragment of the world at 60 fps without any problems, but I'm not sure that it will pull the entire island that I planned, so I think it would be right to reduce the size of the game location for more convenient operation, maybe even rework the concept of the game
     
  46. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    Lots of good advices. Few more extra:
    Use Bakery to bake lightmaps. Forget Unity's lightmapper it will explode on any large level.
    Use RTX card to boost baking speed to the max (~5-10 times faster baking speed than just GPU baking. CPU baking is dead long ago).

    Baking lightmaps allows to "stack" lots of lights with shadows in the same place where HDRP might runtime one will just explode in squares (at best, crash at worst).

    Pay attention to the VRAM consumption in HDRP specifically.
    While baking lightmaps on large maps they may eat up your entire budget.

    Best way to handle this is to reduce directional lightmap max size to at least half, and adjust rest of the maps accordingly depending on the desired fidelity of that particular portion of the level. In most cases even 1k lightmaps may be "good enough". Which will save you GB's of build size as well as GB's of VRAM in runtime.

    Each separate object scale in lightmap can be adjusted.
    For extremely large objects - use lesser lightmap scale (via Renderer settings).

    If you're getting "cannot excess buffer at offset" exception, which results in a black screen - you're out of VRAM.
     
  47. x1alphaz1

    x1alphaz1

    Joined:
    Dec 19, 2020
    Posts:
    23
    do you know if Bakery allows baking at runtime ?
     
  48. Wolfos

    Wolfos

    Joined:
    Mar 17, 2011
    Posts:
    951
    Even if it does, the fact that it only works on Nvidia GPU's on Windows makes that a pretty bad idea.
     
  49. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    Its Editor Only.
     
  50. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,411