Search Unity

  1. Unity 2020.1 has been released.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

PSA: A general rant on model LODs.

Discussion in 'World Building' started by jbooth, Apr 2, 2020.

  1. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    A public service announcement about modeling mesh LODs.

    You are:
    • Usually making too many LODs
    • Usually introducing pop-in with them
    • Usually wasting tons of memory with them
    • Usually making batching and other optimizations less efficient with them
    • A lot of the lore around "polygon counts" is based on software rendering in 1995

    A modern screen at 1080p has 2 million pixels. Rendering in deferred, you will fill 4 buffers of this size, then resolve that to the main screen. If you're doing post processing, then you can expect several more passes over that data. And that's not including overdraw, transparency, etc. Any reasonably modern game easily processes tens of millions of pixels per frame, and the computations used per pixel are usually much more complex than the ones used for a vertex. In most cases, all a vertex shader does is convert the vertex from object to screen space, and perhaps compute a few things to pass over to the other shader stages.

    So, pixels are expensive, vertices are cheap, and "polygon counts" are truly irrelevant since they aren't really a thing anyway.

    Often you'll see a rock with LODs, which goes from 1000 vertices, to 750, to 500, to 250. In this case, the only LOD which will likely help is the last LOD. Why?

    • An extra 750 vertices being transformed is a microscopic amount of work when you're drawing 20 million pixels per frame.
    • If your scene has thousands of rocks, most will be in the last LOD. Very few will be in the middle LODs. So the last LOD is the only one which actually reduces vertex counts by any reasonable amount. Those mid LODs might be saving a few thousand vertices, while the last one is saving you 900*750 vertices.
    • Your two mid range LODs are consuming more memory than the original mesh and final LOD.
    • MicroTriangles are the real killer
    What is a MicroTriangle? Well, on most modern hardware, when a triangle gets small enough, it starts to incur a larger cost. This is because GPUs rasterize in 2x2 pixel chunks, as this allows them to share texture lookups between those pixels and compute proper mip maps. However, when one of those pixels is on another triangle, it cannot use that data; so it computes the whole 2x2 grouping, and throws the result away. As triangles get smaller on screen, you have more and more edges, and thus more and more wasted work. And when triangles get smaller than a pixel, the whole 2x2 block may get thrown out.

    Some timings on MicroTriangle throughput can be found here.

    Generally speaking, the critical point for this is about 10x10 pixels in size, in which work required per pixel grows exponentially. This is all view dependent, of course, so you're always going to have some of these cases, but when your mesh gets small enough that most triangles are this small, that is the time to consider LODs.

    A good way to figure out when you need to have an LOD is to turn on shader wireframe mode and zoom until the mesh is mostly wireframe with little of the shaded object coming through between the wireframe. That is the distance you should design your LOD for. Anything sooner is just introducing extra pops, extra memory for the meshes, etc. And often at that distance, especially for things like rocks, an imposter which can be a single triangle is just fine.
     
    Last edited: Apr 2, 2020
  2. JoNax97

    JoNax97

    Joined:
    Feb 4, 2016
    Posts:
    323
    Mister, I'm becoming a fan of you by the day.

    Thanks for the insight! :D
     
  3. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    239
    Very good advice as always i am sure, however just to ask does this apply in your view across all modern platforms, including e.g. mobile GPU?
     
  4. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    Yeah, there's some variance between GPUs as to how bad they handle microtriangle cases, but the fundamental are the same across all GPUs since a very long time ago.
     
    Cynicat and jamespaterson like this.
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    8,975
    It applies to almost literally every raster based GPU made in the last 25 years. The PS1 and Saturn are the only ones I know of that it doesn't apply to.

    Basically if the hardware supports mipmapping, it has problems with micro triangles.
     
  6. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    Or maybe "If the hardware supports mipmapping, then its optimized to handle pixels in 2x2 blocks, which break when you get microtriangles". Triangles smaller than a pixel would still be a problem on those other GPUs, just not as much of one because they would run a lot slower anyway.

    What has happened since then is that the architecture has become more general- I believe at one time some cards had processing for vertex transformations separate from pixel ones (a long time ago when these were hardware fixed function stages), so the cost wasn't the same. Now, everything is just some CU doing some work, pixel or vertex be damned.

    Either way, it's been the case for almost as long as people have been stressing over "poly counts".
     
    Cynicat, jamespaterson and bgolus like this.
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    8,975
    Roughly speaking, any GPU that supported D3D10 (or GLES 3.0 for mobile) has a "unified shader model", which means the same computational hardware is reused for both vertex and pixel operations. That's 2006 for Nvidia & Intel, 2007 for AMD, and most if not all mobile GPUs from 2012 on, though many older GLES 2.0 GPUs were already using the unified shader model as far back as 2009 or so.

    Fixed function GPUs that came before that had separated hardware for vertex and pixel operations. This meant that "vertex count" was a real thing as there was an upper bound for the number of vertices and triangles you could render that was separate from the render resolution. To put it into console terms, since the Xbox 360 "triangle count" hasn't really been a real thing that could be definitively measured.
     
  8. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    740
    Really interesting thread, I'd definitely recommend posting this far and wide (Graphics forum, maybe on Twitter for #UnityTips).

    This seems like a really succinct summary that I've actually never read before. If a given frag and vertex shader have the same number of operations, how broadly could we say they are effectively the same performance hit?
     
  9. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    8,975
    Ooph.

    On some basic level, sure. But there's so many more factors that go into how "expensive" something is on a GPU that saying "20 instructions in the vertex shader = 20 instructions in the fragment shader" is probably not actually ever true. Vertex shaders generally output a lot more data than fragment shaders, and don't have to deal with the limitation of pixel quads, but do have to deal with the limitation of warps / waves (which fragment shaders also deal with) and how well laid out the vertex data is for memory access coherence. Then there's the fixed function hardware for rasterizing the triangles, which is probably also a factor in why micro triangles are so slow even ignoring the pixel quad.
     
    Iron-Warrior likes this.
  10. abbaswasim

    abbaswasim

    Joined:
    Apr 3, 2020
    Posts:
    2
    Although this is generally true, on mobile platforms bandwidth is a big factor that generally recommends using less triangles.
    - On unified architectures, cycles you spent on unnecessary vertex shading can be used to do fragment shading so every little helps
    - Most mobile platforms are tile based deferred renderers. After vertex shading all internal state is written out to main memory. This is a big bandwidth hit and can again help a lot although agree not as big compared to your G-buffer
    - But then deferred shading on mobile platforms is not recommended

    I do agree with having the right amount of LODs or just having the right amount of geometry to start with is still a good advice.

    edit: Of course the best course of action is to always measure.
     
    Last edited: Apr 3, 2020
  11. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    It's really no different on mobile than desktop. If you were to shave 10 pixels off your screen size, you'd likely save more work than all of your vertex transformations, even on a complex scene. The ratio of screen pixels : vertices is relatively the same, mobile often having very high res screens, and the tiled architecture makes no difference here.
     
    hippocoder likes this.
  12. abbaswasim

    abbaswasim

    Joined:
    Apr 3, 2020
    Posts:
    2
    How would you do that? You mean render in lower res and upscale etc? You could do that. Remove redundant render passes and reduce texture res and use compression but all of the pixels/fragments are generated from triangles. The less triangles you have to start with the less you will generate fragments. So don't underestimate how further you can go by removing a few triangles you can. This is completely ignoring the fact there is a lot of bandwidth and other benefits (less to animate etc).
    So again I wish it was that simple, you will really have to measure your specific use case and decide how much geometry reduction is going to help you.
     
  13. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    346
    There's still a cost for processing vertices, and some of it goes through fixed function hardware still (input assembly, interpolators, etc). That's why some high-end games are doing fancy stuff like occlusion and frustum culling at triangle level using compute shaders, and consoles have features where they can run a position-only variant of the vertex shader for culling before running the full thing.

    On mobile tile-based renderers, geometry is more expensive as well, since it needs to be stored into buffers post transform and binned into the tiles.
     
  14. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    26,727
    I decided to sticky this thread. Thanks @jbooth and @bgolus

    I get where Jason is coming from: the poly count is irrelevant given a scenario where you already are dealing with MicroTriangles. By optimising for the MicroTriangle case, you already have the ideal poly count.
     
  15. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    I think people are starting to argue some semantics, which while having truth in them, are kind of not the point. Shaving vertices off by culling is not something an artist making a model can do; this post is about LOD generation, and is fueled by simply how many times I go into someone project and notice it being done completely wrong.

    This is one of the things I kind of hate about optimization discussions, because while the answer is always "It depends", the guidelines that 99% of the people doing the actual work of making the art are under are usually wrong, and the actual answer is far too nuanced for them to understand (not that they can't, but it's a lot to ask when it's not your area of expertise). And so instead of fixing those guidelines we all start talking about execution cases, niggly cases like how to format code to help the compiler use MULADD instructions instead of the bigger pictures stuff, like why are you fetching memory from everywhere and destroying the cache.

    If you are at the point in which tweaking instruction generation is your number one source of performance gains, you're either in a very, very tight inner loop of extremely well optimized code, or you have better places to look, like a more performant architecture. Like wise, using my original example:

    1000 rocks in scene
    900 in last LOD (250 * 900) = 225,000
    30 in second to last LOD (30 * 500) = 15,000
    30 in first LOD (30 * 750) = 22,500
    40 in no LOD (40 * 1000) = 40,000

    Those two LODs which consume more memory that the first and last and cause all kinds of ugly popping? They are saving you 13% of total vertices. Meanwhile, if you're getting bad micro triangle issues, they could costing you an order of magnitude of your potential performance. Do you want them focused on potential win of 13%, or that potential loss to 1/10th the speed? Should I maybe, as a coder, put in an imposter system and save them from having to do any LODs to those models, while also saving 75% of my vertex count in the process? Probably.

    So you can argue about vertex count, MULADDs all you want, pull out all the algorithmic optimizations, talk about how great OO is, and getting gains in the 10% kind of range from that stuff. I'll be organizing my memory into cache coherent structures, designing efficient pixel sampling culling systems for my shaders, and seeing gains in the 300-3000% range. After that point, I might care about MULADD's and shaving off 13% of my vertices, but only once everything else is blazingly fast. Usually you never get to that point, and if you artists are spending a ton of time trying to help you save that 13% of vertices at the cost of twice as much memory and a worse looking game, you're wasting their time..
     
    Last edited: Apr 3, 2020
    ron-bohn, Ofx360, transat and 3 others like this.
  16. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    8,975
    On AMD hardware at least, interpolators are handled in the pixel shader directly, not through fixed function hardware. Has been that way for GCN from the start. I think the input assembly is also handled directly by the vertex shader too. There is still some fixed function hardware for rasterization, depth rejection, and frame buffer blend operations. I think some mobile GPUs might do the blend op in the pixel shader though.

    Removing triangles removes vertices, which gives you more computation for pixels if it can be done early enough, potentially reduces memory bandwidth usage, and removes / reduces overdraw, the later of which is often the main reason to explore those savings. Those are also games are often pushing many millions of polygons and letting the GPU do most of the culling of very large single piece models.


    We’re not saying there’s no advantage to lowering the poly count, just that the vertex or poly count isn’t the usually right metric to focus on.
     
    ron-bohn, neoshaman and jbooth like this.
  17. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    346
    Yeah, sorry, you're right. Three or two LODs are enough for most things, depending on intended usage. A LOD that doesn't reduce the polygon density by at least 50% of the previous is worthless IMO. It's a shame the Simplygon Unity integration is DOA (Unity's highly laborious LOD setup doesn't help), because in my experience LODs generated with perceptual-based algorithms work for the vast majority of cases.

    (Spent two hours generating, importing, and setting up LODGroups for 30 or so models in Unity today, whereas in UE4 it would be done in under 5 minutes).

    Optimizing GPU loads nowadays is very "arcane". If you ever had to dance with one of the console platforms' GPU profilers, or NVidia/AMD/Intel's vendor-specific profilers, you know they never give you an straight answer. Even the time it takes for one draw call doesn't tell the entire story since it's likely to run in parallel with other calls so now we have to deal in terms of overall workloads.
     
  18. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    4,337
    God I love Simplygon. The Unity integration was crap, but I was working on a large open world game at harmonix years ago for a next gen console, and the artists had just gone mad with detail and were scrambling to get things under control but we had demo's coming up. So another coder and I broke the world into 64 meter blocks, rendered all the triangles of each model into unique textures with albedo/normal/roughness/etc in them, and constructed a new mesh with these thousands of little textures on it. The whole thing went through the simplygon API and came back as one model with one material and one set of textures. We then combined 128m blocks, etc, until we could draw the hole world as a bunch of cubes with very predictable performance. The original art was only shown up close.
     
    neoshaman and KokkuHub like this.
  19. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    346
    When you enable re-meshing and material baking, it's delicious.

    It's a shame about the integration, because all it did was run the command line API behind the scenes. It's mostly an UX problem due to the terrible LODGroup workflow.

    I actually got it to load and work mostly fine in 2019.2 quite recently, but it just imports the LODs into randomly named folders, might as well just use the Simplygon UI.
     
  20. Noogy

    Noogy

    Joined:
    May 18, 2014
    Posts:
    77
    Thanks Jason, I'm in the process of figuring out my LOD-creation workflow and this is exactly what I needed to read.
     
  21. jRocket

    jRocket

    Joined:
    Jul 12, 2012
    Posts:
    538
    What about characters? It is helpful to have multiple LODs for skinned characters as they tend to need to scale from ultra-high detail close ups all the way out to low detail. Plus there is a cost, either GPU or CPU, for skinning which can be lowered by having less verts/bones overall. Furthermore, unique and expensive materials such as eyes or hair can be replaced and baked into the main material at higher LODs, saving draw calls.
     
  22. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    8,975
    The complaint here isn’t that you should never use mesh LODs. The complaint is you shouldn’t use them on everything indiscriminately.

    Skinned mesh LODs if you have very complex high detail hero meshes are an excellent reason for using LODs, as long as they’re used within reason.

    If you have character meshes with hundreds of thousands of polygons with detailed facial animation and multiple materials, yeah, absolutely have an LOD or two to drop down to so you’re only using the full detail mesh when it’s needed for cutscene closeups and the like. Drop to a lower polygon model if your face mesh is extremely dense. Have a single mesh / material version with only the minimal facial rigging for far shots. Maybe have special star fish rigs for distant crowds. These are legitimate use cases for LODs.

    Just don’t have 5 LOD levels for your 6k vertex, 20 bone toon characters.
     
    hippocoder and jbooth like this.
  23. chingwa

    chingwa

    Joined:
    Dec 4, 2009
    Posts:
    3,539
    Well I never liked those LODs anyway... :D
    These days in most cases I just go straight from LOD0 to an imposter. Works well for most static objects, particularly well for trees and foliage. Characters are a different matter of course and I find it still mandatory for LOD geometry there.
     
unityunity