Search Unity

  1. Get the latest news, tutorials and offers directly to your inbox with our newsletters. Sign up now.
    Dismiss Notice

PSA: A general rant on model LODs.

Discussion in 'World Building' started by jbooth, Apr 2, 2020.

  1. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    A public service announcement about modeling mesh LODs.

    You are:
    • Usually making too many LODs
    • Usually introducing pop-in with them
    • Usually wasting tons of memory with them
    • Usually making batching and other optimizations less efficient with them
    • A lot of the lore around "polygon counts" is based on software rendering in 1995

    A modern screen at 1080p has 2 million pixels. Rendering in deferred, you will fill 4 buffers of this size, then resolve that to the main screen. If you're doing post processing, then you can expect several more passes over that data. And that's not including overdraw, transparency, etc. Any reasonably modern game easily processes tens of millions of pixels per frame, and the computations used per pixel are usually much more complex than the ones used for a vertex. In most cases, all a vertex shader does is convert the vertex from object to screen space, and perhaps compute a few things to pass over to the other shader stages.

    So, pixels are expensive, vertices are cheap, and "polygon counts" are truly irrelevant since they aren't really a thing anyway.

    Often you'll see a rock with LODs, which goes from 1000 vertices, to 750, to 500, to 250. In this case, the only LOD which will likely help is the last LOD. Why?

    • An extra 750 vertices being transformed is a microscopic amount of work when you're drawing 20 million pixels per frame.
    • If your scene has thousands of rocks, most will be in the last LOD. Very few will be in the middle LODs. So the last LOD is the only one which actually reduces vertex counts by any reasonable amount. Those mid LODs might be saving a few thousand vertices, while the last one is saving you 900*750 vertices.
    • Your two mid range LODs are consuming more memory than the original mesh and final LOD.
    • MicroTriangles are the real killer
    What is a MicroTriangle? Well, on most modern hardware, when a triangle gets small enough, it starts to incur a larger cost. This is because GPUs rasterize in 2x2 pixel chunks, as this allows them to share texture lookups between those pixels and compute proper mip maps. However, when one of those pixels is on another triangle, it cannot use that data; so it computes the whole 2x2 grouping, and throws the result away. As triangles get smaller on screen, you have more and more edges, and thus more and more wasted work. And when triangles get smaller than a pixel, the whole 2x2 block may get thrown out.

    Some timings on MicroTriangle throughput can be found here.

    Generally speaking, the critical point for this is about 10x10 pixels in size, in which work required per pixel grows exponentially. This is all view dependent, of course, so you're always going to have some of these cases, but when your mesh gets small enough that most triangles are this small, that is the time to consider LODs.

    A good way to figure out when you need to have an LOD is to turn on shader wireframe mode and zoom until the mesh is mostly wireframe with little of the shaded object coming through between the wireframe. That is the distance you should design your LOD for. Anything sooner is just introducing extra pops, extra memory for the meshes, etc. And often at that distance, especially for things like rocks, an imposter which can be a single triangle is just fine.
     
    Last edited: Apr 2, 2020
    Ruchir, Runemark, TipTop3D and 69 others like this.
  2. JoNax97

    JoNax97

    Joined:
    Feb 4, 2016
    Posts:
    474
    Mister, I'm becoming a fan of you by the day.

    Thanks for the insight! :D
     
  3. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    362
    Very good advice as always i am sure, however just to ask does this apply in your view across all modern platforms, including e.g. mobile GPU?
     
  4. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    Yeah, there's some variance between GPUs as to how bad they handle microtriangle cases, but the fundamental are the same across all GPUs since a very long time ago.
     
    Cynicat and jamespaterson like this.
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    10,216
    It applies to almost literally every raster based GPU made in the last 25 years. The PS1 and Saturn are the only ones I know of that it doesn't apply to.

    Basically if the hardware supports mipmapping, it has problems with micro triangles.
     
  6. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    Or maybe "If the hardware supports mipmapping, then its optimized to handle pixels in 2x2 blocks, which break when you get microtriangles". Triangles smaller than a pixel would still be a problem on those other GPUs, just not as much of one because they would run a lot slower anyway.

    What has happened since then is that the architecture has become more general- I believe at one time some cards had processing for vertex transformations separate from pixel ones (a long time ago when these were hardware fixed function stages), so the cost wasn't the same. Now, everything is just some CU doing some work, pixel or vertex be damned.

    Either way, it's been the case for almost as long as people have been stressing over "poly counts".
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    10,216
    Roughly speaking, any GPU that supported D3D10 (or GLES 3.0 for mobile) has a "unified shader model", which means the same computational hardware is reused for both vertex and pixel operations. That's 2006 for Nvidia & Intel, 2007 for AMD, and most if not all mobile GPUs from 2012 on, though many older GLES 2.0 GPUs were already using the unified shader model as far back as 2009 or so.

    Fixed function GPUs that came before that had separated hardware for vertex and pixel operations. This meant that "vertex count" was a real thing as there was an upper bound for the number of vertices and triangles you could render that was separate from the render resolution. To put it into console terms, since the Xbox 360 "triangle count" hasn't really been a real thing that could be definitively measured.
     
  8. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    776
    Really interesting thread, I'd definitely recommend posting this far and wide (Graphics forum, maybe on Twitter for #UnityTips).

    This seems like a really succinct summary that I've actually never read before. If a given frag and vertex shader have the same number of operations, how broadly could we say they are effectively the same performance hit?
     
  9. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    10,216
    Ooph.

    On some basic level, sure. But there's so many more factors that go into how "expensive" something is on a GPU that saying "20 instructions in the vertex shader = 20 instructions in the fragment shader" is probably not actually ever true. Vertex shaders generally output a lot more data than fragment shaders, and don't have to deal with the limitation of pixel quads, but do have to deal with the limitation of warps / waves (which fragment shaders also deal with) and how well laid out the vertex data is for memory access coherence. Then there's the fixed function hardware for rasterizing the triangles, which is probably also a factor in why micro triangles are so slow even ignoring the pixel quad.
     
    Iron-Warrior likes this.
  10. abbaswasim

    abbaswasim

    Joined:
    Apr 3, 2020
    Posts:
    2
    Although this is generally true, on mobile platforms bandwidth is a big factor that generally recommends using less triangles.
    - On unified architectures, cycles you spent on unnecessary vertex shading can be used to do fragment shading so every little helps
    - Most mobile platforms are tile based deferred renderers. After vertex shading all internal state is written out to main memory. This is a big bandwidth hit and can again help a lot although agree not as big compared to your G-buffer
    - But then deferred shading on mobile platforms is not recommended

    I do agree with having the right amount of LODs or just having the right amount of geometry to start with is still a good advice.

    edit: Of course the best course of action is to always measure.
     
    Last edited: Apr 3, 2020
  11. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    It's really no different on mobile than desktop. If you were to shave 10 pixels off your screen size, you'd likely save more work than all of your vertex transformations, even on a complex scene. The ratio of screen pixels : vertices is relatively the same, mobile often having very high res screens, and the tiled architecture makes no difference here.
     
    hippocoder likes this.
  12. abbaswasim

    abbaswasim

    Joined:
    Apr 3, 2020
    Posts:
    2
    How would you do that? You mean render in lower res and upscale etc? You could do that. Remove redundant render passes and reduce texture res and use compression but all of the pixels/fragments are generated from triangles. The less triangles you have to start with the less you will generate fragments. So don't underestimate how further you can go by removing a few triangles you can. This is completely ignoring the fact there is a lot of bandwidth and other benefits (less to animate etc).
    So again I wish it was that simple, you will really have to measure your specific use case and decide how much geometry reduction is going to help you.
     
  13. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    There's still a cost for processing vertices, and some of it goes through fixed function hardware still (input assembly, interpolators, etc). That's why some high-end games are doing fancy stuff like occlusion and frustum culling at triangle level using compute shaders, and consoles have features where they can run a position-only variant of the vertex shader for culling before running the full thing.

    On mobile tile-based renderers, geometry is more expensive as well, since it needs to be stored into buffers post transform and binned into the tiles.
     
  14. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    26,915
    I decided to sticky this thread. Thanks @jbooth and @bgolus

    I get where Jason is coming from: the poly count is irrelevant given a scenario where you already are dealing with MicroTriangles. By optimising for the MicroTriangle case, you already have the ideal poly count.
     
    mariandev likes this.
  15. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    I think people are starting to argue some semantics, which while having truth in them, are kind of not the point. Shaving vertices off by culling is not something an artist making a model can do; this post is about LOD generation, and is fueled by simply how many times I go into someone project and notice it being done completely wrong.

    This is one of the things I kind of hate about optimization discussions, because while the answer is always "It depends", the guidelines that 99% of the people doing the actual work of making the art are under are usually wrong, and the actual answer is far too nuanced for them to understand (not that they can't, but it's a lot to ask when it's not your area of expertise). And so instead of fixing those guidelines we all start talking about execution cases, niggly cases like how to format code to help the compiler use MULADD instructions instead of the bigger pictures stuff, like why are you fetching memory from everywhere and destroying the cache.

    If you are at the point in which tweaking instruction generation is your number one source of performance gains, you're either in a very, very tight inner loop of extremely well optimized code, or you have better places to look, like a more performant architecture. Like wise, using my original example:

    1000 rocks in scene
    900 in last LOD (250 * 900) = 225,000
    30 in second to last LOD (30 * 500) = 15,000
    30 in first LOD (30 * 750) = 22,500
    40 in no LOD (40 * 1000) = 40,000

    Those two LODs which consume more memory that the first and last and cause all kinds of ugly popping? They are saving you 13% of total vertices. Meanwhile, if you're getting bad micro triangle issues, they could costing you an order of magnitude of your potential performance. Do you want them focused on potential win of 13%, or that potential loss to 1/10th the speed? Should I maybe, as a coder, put in an imposter system and save them from having to do any LODs to those models, while also saving 75% of my vertex count in the process? Probably.

    So you can argue about vertex count, MULADDs all you want, pull out all the algorithmic optimizations, talk about how great OO is, and getting gains in the 10% kind of range from that stuff. I'll be organizing my memory into cache coherent structures, designing efficient pixel sampling culling systems for my shaders, and seeing gains in the 300-3000% range. After that point, I might care about MULADD's and shaving off 13% of my vertices, but only once everything else is blazingly fast. Usually you never get to that point, and if you artists are spending a ton of time trying to help you save that 13% of vertices at the cost of twice as much memory and a worse looking game, you're wasting their time..
     
    Last edited: Apr 3, 2020
    mick129, GliderGuy, ftejada and 6 others like this.
  16. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    10,216
    On AMD hardware at least, interpolators are handled in the pixel shader directly, not through fixed function hardware. Has been that way for GCN from the start. I think the input assembly is also handled directly by the vertex shader too. There is still some fixed function hardware for rasterization, depth rejection, and frame buffer blend operations. I think some mobile GPUs might do the blend op in the pixel shader though.

    Removing triangles removes vertices, which gives you more computation for pixels if it can be done early enough, potentially reduces memory bandwidth usage, and removes / reduces overdraw, the later of which is often the main reason to explore those savings. Those are also games are often pushing many millions of polygons and letting the GPU do most of the culling of very large single piece models.


    We’re not saying there’s no advantage to lowering the poly count, just that the vertex or poly count isn’t the usually right metric to focus on.
     
    ron-bohn, neoshaman and jbooth like this.
  17. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    Yeah, sorry, you're right. Three or two LODs are enough for most things, depending on intended usage. A LOD that doesn't reduce the polygon density by at least 50% of the previous is worthless IMO. It's a shame the Simplygon Unity integration is DOA (Unity's highly laborious LOD setup doesn't help), because in my experience LODs generated with perceptual-based algorithms work for the vast majority of cases.

    (Spent two hours generating, importing, and setting up LODGroups for 30 or so models in Unity today, whereas in UE4 it would be done in under 5 minutes).

    Optimizing GPU loads nowadays is very "arcane". If you ever had to dance with one of the console platforms' GPU profilers, or NVidia/AMD/Intel's vendor-specific profilers, you know they never give you an straight answer. Even the time it takes for one draw call doesn't tell the entire story since it's likely to run in parallel with other calls so now we have to deal in terms of overall workloads.
     
    Runemark likes this.
  18. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,806
    God I love Simplygon. The Unity integration was crap, but I was working on a large open world game at harmonix years ago for a next gen console, and the artists had just gone mad with detail and were scrambling to get things under control but we had demo's coming up. So another coder and I broke the world into 64 meter blocks, rendered all the triangles of each model into unique textures with albedo/normal/roughness/etc in them, and constructed a new mesh with these thousands of little textures on it. The whole thing went through the simplygon API and came back as one model with one material and one set of textures. We then combined 128m blocks, etc, until we could draw the whole world as a bunch of cubes with very predictable performance. The original art was only shown up close.
     
    Last edited: Apr 15, 2021
    neoshaman and Neto_Kokku like this.
  19. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    712
    When you enable re-meshing and material baking, it's delicious.

    It's a shame about the integration, because all it did was run the command line API behind the scenes. It's mostly an UX problem due to the terrible LODGroup workflow.

    I actually got it to load and work mostly fine in 2019.2 quite recently, but it just imports the LODs into randomly named folders, might as well just use the Simplygon UI.
     
  20. Noogy

    Noogy

    Joined:
    May 18, 2014
    Posts:
    94
    Thanks Jason, I'm in the process of figuring out my LOD-creation workflow and this is exactly what I needed to read.
     
  21. jRocket

    jRocket

    Joined:
    Jul 12, 2012
    Posts:
    569
    What about characters? It is helpful to have multiple LODs for skinned characters as they tend to need to scale from ultra-high detail close ups all the way out to low detail. Plus there is a cost, either GPU or CPU, for skinning which can be lowered by having less verts/bones overall. Furthermore, unique and expensive materials such as eyes or hair can be replaced and baked into the main material at higher LODs, saving draw calls.
     
  22. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    10,216
    The complaint here isn’t that you should never use mesh LODs. The complaint is you shouldn’t use them on everything indiscriminately.

    Skinned mesh LODs if you have very complex high detail hero meshes are an excellent reason for using LODs, as long as they’re used within reason.

    If you have character meshes with hundreds of thousands of polygons with detailed facial animation and multiple materials, yeah, absolutely have an LOD or two to drop down to so you’re only using the full detail mesh when it’s needed for cutscene closeups and the like. Drop to a lower polygon model if your face mesh is extremely dense. Have a single mesh / material version with only the minimal facial rigging for far shots. Maybe have special star fish rigs for distant crowds. These are legitimate use cases for LODs.

    Just don’t have 5 LOD levels for your 6k vertex, 20 bone toon characters.
     
    GliderGuy, hippocoder and jbooth like this.
  23. chingwa

    chingwa

    Joined:
    Dec 4, 2009
    Posts:
    3,642
    Well I never liked those LODs anyway... :D
    These days in most cases I just go straight from LOD0 to an imposter. Works well for most static objects, particularly well for trees and foliage. Characters are a different matter of course and I find it still mandatory for LOD geometry there.
     
    stain2319 and zeropointblack like this.
  24. Abended

    Abended

    Joined:
    Oct 9, 2018
    Posts:
    109
    I have been messing around with my LODs at runtime, and I was able to cut my draw calls in half by removing my LOD1 and LOD2s. I have been hemming and hawing for a few days now thinking that I should not attempt to remove my LODs as it goes against good design. I also have a procedural shader running on those meshes, and have been jumping through hoops to get the LODs to play nice. I appreciate those of you that take the time to spell it out for us newbs. Feeling a lot more confident looking into imposters as an alternative! Thanks!
     
  25. Skotrap7

    Skotrap7

    Joined:
    May 24, 2018
    Posts:
    90
    Just wanted to add another thank you to this post. I was struggling with figuring out a good setup for my LOD distances. This post helped a lot. I have most of my objects set with only 2 LODs where LOD0 is the high asset. Generate an imposter for LOD1 transitioning at 20% with animated crossfade and cull around 2%.
     
  26. nyanpath

    nyanpath

    Joined:
    Feb 9, 2018
    Posts:
    77
    How about LODs to reduce or even remove overdraw? That's my intended use case for them on certain distant objects.
     
  27. Grimreaper358

    Grimreaper358

    Joined:
    Apr 8, 2013
    Posts:
    743
    Just adding this bit of info for anyone using or will be using HDRP 10.x.x and forward (Currently 2020.2 Beta or 2021.1 alpha HDRP 11). There's now a Vertex Density Debug mode that shows you your mesh density and colors things as well to let you know what to optimize. Basically showing you micro triangles.

    Vertex Density

    upload_2020-10-29_13-4-40.png

    Moved Back with the same mesh density so now it's showing that the mesh is too dense at that distance. In Red

    upload_2020-10-29_13-3-22.png
     
  28. argosy_ops

    argosy_ops

    Joined:
    Dec 27, 2012
    Posts:
    49
    Very cool, didn't see this before. Very useful information across this thread. I wasn't aware of the micro triangle issue, but this comes as very handy information regarding scene optimization. Actually, a lot of the environment assets I'm getting from the asset store have several LODs configured at draw distances that cause tremendous drawcall overhead since they essentially make static batching kinda inefficient. It's cool to have pre-built prefabs to use right away, but considering this would make asset creators even more valuable when you don't have to edit every single prefab for optimization in the end :)
     
  29. zeropointblack

    zeropointblack

    Joined:
    Jun 8, 2020
    Posts:
    147
    i love making LODs. i have a great sense of accomplishment when im done.
     
    chingwa likes this.
  30. Runemark

    Runemark

    Joined:
    May 23, 2013
    Posts:
    229
    I mostly do environment (ruins, I love ruins) and props. Right now I'm working on a model pack with security cameras, traps and stuff. All my models have less than 5000 tris, and I always do 3 LODs for props (100%, 50% and 5%). If the prop is very small, for example an USB drive, I only do 2 LODS, one for close-ups and one with the minimum vertices possible. In addition, I made a skeletal rig for the security camera so it can be moved with a script to follow the player, for example, but I'm starting to think it is not a very good idea for a prop. Any suggestions on that?

    I would love to learn how to do 3D imposters because those 2 triangle billboards like the ones for trees creep me the hell out.Anyone have a good reading material or video on the topic?
     
    Ruchir likes this.
  31. jRocket

    jRocket

    Joined:
    Jul 12, 2012
    Posts:
    569
    https://shaderbits.com/blog/octahedral-impostors/

    Impostors have instrumental for me in making large open environments. I'm not sure if it's savings on vertex count, overdraw, or the microtriangle issue, but using impostors for the last LOD gave me lots of performance. As usual, Unity is not interested in implementing such useful features as impostors(they don't even have proper foliage painting tools yet), but there are implementations on the Asset Store or github.
     
    Recon03 and JoNax97 like this.
  32. TheSniperFan

    TheSniperFan

    Joined:
    Jul 18, 2013
    Posts:
    696
    Just a heads up for those of you who don't use HDRP:
    You can get the same information (& much more) with RenderDoc, which is free and comes with a Unity integration. It's comparable to Unity's own frame debugger, except much more powerful and it should work with all pipelines in- and outside of the editor.
     
  33. Recon03

    Recon03

    Joined:
    Aug 5, 2013
    Posts:
    586
    I have stated this for to long, and so many Unity users will just SAY, your wrong.......I been an artist, programmer for decades and it bugs me how many people, that just don't listen.... Jason this is a great post, that should be actually PLASTERED some where.. . I get exhausted, hearing how people use so many LOD, or use 1990 base polygon counts..... Unreal 5 we don't NEED any LOD, or normal maps anymore, wait until these people hear that.... They will be like NO!!!! you are wrong!!
     
    Ruchir likes this.
  34. Recon03

    Recon03

    Joined:
    Aug 5, 2013
    Posts:
    586

    That is an understatement, Unity didn't for years have a decal tool...a DECAL tool...which always blew my mind.... Nothing for shaders, no tool for placing, nothing... I made one many years ago...the one that have NOW still sucks... too... So some of these basic tools, that most engines have...Like Imposter..... proper foliage procedural tools... We shouldn't need some asset for that crap, or to make something ...... for this BASIC stuff.. 20 years of using Unreal, I OWN ZERO assets........ 15+ years of Unity, I made nearly 16 assets and bought prolly 300.... sad, and about 290 of them are JUNK...



    But to keep it on topic, this should be posted everywhere...Its exhausting, telling people this about LOD's and polygon counts..to users...... every day it seem...
     
    Ruchir likes this.
unityunity