Search Unity

Crunched Textures lower some performances?

Discussion in 'General Graphics' started by Horus_Sungod42, May 2, 2019.

  1. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Hello,

    I'm making tests with a new project with 16 8k textures (each with alpha) to try out crunching textures (at quality 22). Then I make builds.

    - The crunched build is 4x smaller on disk, which is great.
    - However, loading my test scene goes from under 1 seconds to a much longer 4-5 seconds. Oh no!

    I've read the loading should be the same speed or maybe a bit faster when using crunched textures (since it's already in a GPU-friendly format).

    Is this a known problem? I'm testing this on a PC with a 950 nvidia card, and the same result happened with a 1080.

    Thank you
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    It's supposed to be faster, but there are some big assumptions made for that to be the case. The Crunch texture format is not actually GPU friendly format. Rather, it is a modified lz4 compression format with a decompressor that directly writes out to a GPU friendly format.

    GPU friendly formats like DXT1, DXT5, and ETC themselves aren't that compressed compared to similar PNG or JPG formats due to their requirement of being a constant rate format. When the GPU needs to read a single pixel from a texture, it needs to know exactly what region of the texture data to read to get that pixel. PNG, JPG, GIF, really any other compressed format you might be familiar with, use variable rate compression, which means the content determines the size of the file. In that case there's no way to know what a specific pixel's color is unless you decompress a large portion of the image. Potentially the entire image in some formats.

    GPU friendly formats also don't further compress well when using zip, rar, lz4, or other general purpose compression techniques. So taking a DXT5 texture and putting that into a .rar by itself may not produce a file that's significantly smaller. It may even produce a file that's slightly larger due to the rar file's overhead. Pack a bunch of textures into a single package and compressing that may yield a 15% smaller file, and a lot of that is probably coming from having the Unity data packaged with the textures being highly compressible.

    What Crunch does is takes a common GPU friendly compression format, modifies the data a bit (sometimes by reducing the image quality) to make it more compressible, and then uses lz4 compression to compress that. The lz4 compression algorithm itself doesn't compress all that well compared to some other options, but this can still produce files similar in size to decent quality .jpg, and it does decompress really really fast. The decompressor is also modified so that the file that comes out is the GPU friendly format with no further modifications needed.


    A 2048x2048 DXT1 with mip maps will always produce a ~2.7 mb image. The .png of that same file might be between a few kb and >12 megs depending on the content, but lets say it's ~2 mb. That's not that much smaller, but it's also completely lossless unlike the DXT1. The .jpg is some amount smaller than that, again depending on the content, and the quality selected, but lets say it's ~0.4 mb at a relatively high quality setting. That's 1/5th the PNG, but the image quality is reduced, but overall about on par with the DXT1 image (with each having their strengths), an image that's >6.5 times larger. The crunch file is also basically the same ~0.4 mb as the JPG, give or take a few hundred kb depending on the content and quality setting. Crunch supposedly has a better than average DXTC compressor, so at high quality settings may actually produce better quality images than the default DXT1 compression, but as you slide that quality setting down it'll get worse and worse, just like a JPG.


    So why not using a JPG to begin with? Because JPGs are really slow to decode. Decoding speed wasn't really a major factor they considered when creating JPG. It just needed to be fast enough to decode the image as it was delivered to a computer over a 14.4k modem. Sure computers (and internet connections) are a lot faster now, but data access from harddrives or SSDs are way faster than that. So modern CPUs take longer to decode a JPG than it takes to read the data from the disk. Plus, if you want to use that image on a GPU, you probably want to turn it into a DXT1 which takes some time too even using real-time compression. This is still (potentially) going to be faster than reading an uncompressed BMP or TGA image from the disk, but the win over a pre-compressed DXT1 is less obvious.


    The big idea behind Crunch, and why that lz4 compression usage is important, is now it only has to read the same amount of data as the JPG, but decompressing it is likely much faster, and there's no additional recompression step. So the overall time to get an image from the disk to the GPU is lower than using a JPG while at the same time having similar on disk file size benefits.

    It's also supposed to be faster to load and decompress the crunched image than loading the original DXT1 image, but that makes a lot of assumptions about file read times and the CPU performance. A slower CPU with a decently fast 7200 rpm HDD, or SSD the DXT1 may still be faster.

    edit: Note, this would have to be a really slow CPU for crunch to load slower. See my next post.
     
    Last edited: May 7, 2019
    Knottt14, Mr-Fierce, jdtec and 20 others like this.
  3. CortiWins

    CortiWins

    Joined:
    Sep 24, 2018
    Posts:
    150
    @bgolus Just stopped by to say that i appreciate when people take the time to explain things like you did.
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    I was trying to think through the theoretics of this. Each 8k DXT5 is going to be 64 MB w/o mips, ~85 MB with. 16 8k textures is just over 1.3 GB(!) of data. Loading that much data from even a fast HHD shouldn't be less than 12 seconds under the most ideal conditions. Even a really fast SATA SSD would be 3 seconds. It makes no sense to be able to load all those images in under a second. So I wonder if Unity isn't, but rather in the "normal" DXT5 case is only loading the smaller mips to start with and streaming in the higher mips as needed after the fact.

    Crunch can't do that, possibly because it's a compressed format with all the mips together, it has to uncompress the entire "image" to get the mips. Or Unity just didn't add support for streaming crunched textures. So it's not a fair fight. Also, it shows what crunch can do that it can load 1.3 GB of texture data in 4 seconds off of a HDD. That's probably 4.5 seconds to read the ~340 MB from the disk, and less than half a second expanding that to 1.3 GB.
     
    Last edited: May 3, 2019
    DimHoly likes this.
  5. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,028
    afaik really really fast SSDs can read up to ~2 Gb/s (a single SSD, no RAID0 stuff), so it's possible to have it under a second.
     
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    2 Gb/s would actually be a fairly slow SATA SSD speed. 2 Gb/s (Gibibits per second) is equivalent to 256 MB/s (Mebibytes per second). That's the kind of speeds you'd see out of the super cheap Toshiba or SanDisk SSDs from 4 years ago. That would be just over 5 seconds to load 1.3 GB (Gibibytes) of data.

    (Note: Gibi & Mebi are the proper terms for what most people refer to as Giga and Mega. A Mebibyte is 1024 bytes, where as a Megabyte is 1000 bytes. This is part of why your "1 TB" hard drive shows as "930 GB" in Windows; the box is showing Terabytes, or 1000 Gigabytes, and Windows is showing Gibibytes.)

    But I'm guessing you meant 2 GB/s. Something like a Samsung 970 Pro can hit almost 3.5 GB/s in synthetics, which translates to around 2 GB/s for some real world cases. Most NVMe SSDs will do over 1 GB/s, so yeah. I guess most modern NVMe SSDs can load that in ~1 second. But for a PC with a Nvidia GTX 950, I'm guessing it's unlikely to have an NVMe or PCIe SSD. ;)
     
    Horus_Sungod42 likes this.
  7. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Haven't had the time to read through this, but man, you're like a superhero!

    I'll get to it soon
     
  8. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,028
    Correct :)
    Fair enough, but the initial post said "the same result happened with a 1080"
     
    bgolus likes this.
  9. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Very interesting. I'll make a version where all the mips are disabled, and bring the default spheres with the textures very close to the camera (to be sure).

    I'll write down the results here for others to benefit from the knowledge.
     
  10. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Made some tests and without mip maps, the scene with crunched textures does load at the same speed as the one with textures with ordinary compression.

    Reactivating the mip maps (all the settings by default) re-increases the loading time from 1 second to 4 seconds.

    This was tested with a bunch of spheres with different materials/textures, closer to the camera.
     
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Could be the disk load times aren't your bottleneck, and it's the GPU sync time? That didn't make a ton of sense either though since the connection between the CPU and GPU is pretty fast, add increasing the data by 33% shouldn't increase the load time by 4x

    I'm still going with my guess that its something to do with how mip maps are handled between crunched and uncrunched textures. But I'd say report a bug.
     
    R0man and aleksandrk like this.
  12. Horus_Sungod42

    Horus_Sungod42

    Joined:
    Oct 30, 2014
    Posts:
    99
    Good idea. When the work dies down a bit, I'll file a bug with unity's bug function.
     
    R0man likes this.