Search Unity

Potential consequences when using texture atlasing and very large textures on mobile?

Discussion in 'General Graphics' started by hungrybelome, Aug 19, 2019.

  1. hungrybelome

    hungrybelome

    Joined:
    Dec 31, 2014
    Posts:
    336
    Hi, right now I am using 8k texture atlases to atlas 64 different 1k texture meshes. I'm targeting mobile. While this reduces my Set Pass count to 1, I'm wondering about what any downsides might be.

    For example, how does mip map selection work for when the mesh uses such a small portion of the texture atlas? And how does using such a large source atlas texture affect texture caches in the GPU? I've read that using too large of a texture relative to mesh size results in a lot of GPU cache misses, but this is an area I don't really understand.

    Similarly, I've read that using a larger texture with greater compression is better than a smaller texture with lower compression. So if the 'appropriate' texture size for a mesh is 1k and ASTC4x4, I can use a 2k texture with ASTC8x8 to get the same memory footprint, with supposedly better imagine quality. But are there any considerable performance hits when doing so? Again, targeting modern mobile devices (late 2016 and newer, OpenGL ES 3.2).

    Thanks!
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    A single scene using a single 8k texture vs 64 1k textures where all textures are in use at the same time ... there’s not a huge difference.

    Multiple 1k textures means more granular loading & unloading is possible via manual asset management. With texture mip streaming it’s possible for less memory to be used by the GPU. But if all textures are in view and you’re android focused, neither of those matter too much.

    The way GPUs load textures is it only loads the part of the texture that’s needed. For something like an ASTC texture, that would be the block(s) needed to do bilinear sampling. For example, a 4x4 ASTC has blocks that are 4x4 pixels in size. To do bilinear sampling from a position inside that block, only that single block is needed. If you’re sampling from a position on an edge or corner you may need 2 or 4 blocks. A 4x4 ASTC uses 8 bits per texel, or 128 bits per block. For trilinear sampling you need at least one block from two different mip levels* or up to 8 blocks, or 128 bytes. Those same 128 bytes with those 8 blocks of data can be effectively shared across multiple pixels if they all sample from a similar area. When first rendering an object, or otherwise needing a new block(s), it requires fetching those from RAM. That’s the memory bandwidth being used.

    There’s some additional stuff like reading multiple blocks of data that are consecutively placed in RAM is faster than if not. Texture data is stored in the GPU RAM in Morton order which attempts to limit the distance between parts of the linear image data that are spatially close to increase the chance data will be consecutive.

    The end result is 1 8k vs 64 1k likely isn’t any different in use.

    * Some GPUs optimize this and only sample from a single mip level for a wider area before doing the transitions.
     
    ABCodeworld, Cynicat and hungrybelome like this.
  3. hungrybelome

    hungrybelome

    Joined:
    Dec 31, 2014
    Posts:
    336
    Hi @bgolus, thanks for the amazing write up as always!

    So if I understand correctly, you are saying that from the GPU perspective the performance and mip behavior is about the same for 1 8k atlas as 64 1k textures. So there aren’t any downsides, aside from not being able to unload unused texture data like I would with 64 1k textures. But from the CPU perspective, the atlas is very different in use, since I get the performance benefits of only having 1 set pass call, which means only needing to update the shader once for the all of meshes. Is this correct?

    Super interesting about the ASTC info. Does that mean that an 8x8 ASTC block needs 512 bits per pixel? So the memory bandwidth cost of fetching one 8x8 block is 4x greater than 4x4 block? But in return the greater area of the 8x8 block results in being reused more by pixels, and less chance of needing to load adjacent blocks for bilinear filtering? Or on second though, maybe the 8X8 ASTC block still only needs 128 like the 4x4 block? Which is why a 2k 8x8 texture uses the same memory as a 1k 4x4, despite have 4x the pixels?

    Also, prior to your response I had assumed that there were 2 loading steps for sampling a texture, as in first loading/locating the whole texture, and then actually deciding where to sample from the texture. But from what I understand now, all of the textures loading in RAM are contained in like a ‘flat’ structure, so there is no extra loading step per texture, and the cost of sampling between various textures is about the same (aside from considering consecutively placed blocks as you mentioned)? And that the GPU ‘cache for textures’ holds what blocks are currently loaded, and acts like a CPU L1 cache?

    Sorry for so many questions, just trying to piece together a mental model. Thanks again for all your help!
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    This. ASTC blocks are always 128 bits. Those bits might represent a 4x4 single channel to a 12x12 HDR RGBA, with a lot of options in between. The block's texel size is fixed for an entire image, but each block can choose a different color format. Most commonly this would mean a texture with some transparency will use RGB blocks in areas with no transparency and RGBA or RGB+A for areas with transparency. Either way, they're 128 bits. The more channels or more texels per block means an overall lower quality representation of the image for that block.

    It's exactly like an L1 cache, because it is an L1 cache. It's just an L1 cache for the TMU (Texture Mapping Unit), the physical bit of hardware on the GPU that decodes texture data and outputs the color values the shader gets when calling tex2D().
     
  5. hungrybelome

    hungrybelome

    Joined:
    Dec 31, 2014
    Posts:
    336
    Thanks a ton @bgolus! Your answers have cleared up a lot!