Search Unity

  1. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice
  2. Ever participated in one our Game Jams? Want pointers on your project? Our Evangelists will be available on Friday to give feedback. Come share your games with us!
    Dismiss Notice

[CFH] Shader variants memory footprint

Discussion in 'Shaders' started by Altair4Ru, Feb 25, 2020.

  1. Altair4Ru

    Altair4Ru

    Joined:
    Aug 21, 2018
    Posts:
    2
    Hi there.

    I've been struggling for a while with our shaders taking up too much memory on iOS (not exclusive).
    At a certain moment I've caught myself thinking that I lack instrumentation. Thus, I want to summarize all my knowance and ask for help to mitigate blind spots.

    We use our custom "uber" shaders that have lots of variants. Collecting those variants is a tough task. Unity offers automatic collection of variants being used while playing the game in Editor. For us using this method means several man-hours (or even days) per build spent. We have shader lods with different sets of variants on board, dynamic gameplay-specific variants and so on. The game should be re-played several times from start to the very end with different quality settings to collect all the possible variants. And some of them could still be skipped...

    As an alternative I've managed to gather all of the possibly used variants from the build using IPreprocessShaders interface. So I build the game without any ShaderVariantCollections (SVC) and collect those variants, then generate an SVC and add it to the build (into an asset bundle specifically), running the build process once moar! Doubling the build time helps to get rid of the possible shader duplicates in asset bundles. In the end I get shader assets with all the variants used throughout the game. This set is indeed much smaller than all of the possible variants, but still is a bit bigger than I would've expect. Unity is a bit greedy here generating some strange variants to cover all of the possible issues, if I get it right.

    Let's assume I've gathered all the required variants and put them into a SVC. If I then load any shader from this SVC, Unity loads its' (Unity's) internal representation of this shader asset into system memory using malloc. This memory then ends up being dirty and adds to the application memory footprint from the iOS' point of view. This Unity's internal representation is the full representation of the shader asset, meaning all of the variants with all their properties and stuff. This memory is then visible in Unity Profiler/Memory/Detailed view/Other/Rendering/ShaderLab, isn't it?
    When I load a gameplay scene, ShaderLab memory consumption jumps up to 160Mb. That is huge for a mobile project. On that certain project there are not too much asset bundles, so duplication is not really a problem, but still I'd expect to win a couple of megs gathering all the shader assets in one bundle. In reality I see the opposite effect. The same scene loaded after the gathering takes up to 256Mb of ShaderLab memory. I assume, that is because all of the provided variants from the build are loaded simultaneously into memory in Unity's representation.

    Documentation says (can't remember exact place) that this Unity's representation is required to compile the specific variants on the fly when they are needed by the pipeline. Once compiled, the variant source data is discarded leaving the compiled program in GPU memory for an app lifetime.
    Knowing that, I've tried warming up the SVC. This took an enormous amount of time, but in the end I've got only 3.5Mb ShaderLab memory and aroung 50 Megs of memory taken by shader assets under Unity Profiler/Memory/Detailed view/Assets/Shader. That's a lot better than 256 megs of ShaderLab memory from my point of view, but is not shippable because of the time taken by WarmUp process. I can't make a player wait for another 15 minutes before the play.

    I think this time consumption should be treated as a bug, because 80% of the CPU time is spent for Shader::SRPBatcherInfoSetup() though we use neither SRPBatcher nor SRP itself.

    And finally, here come the questions:
    1. What can we or Unity do to lower down the ShaderLab memory (except minimize the variants count)?
    2. If I strip certain graphic tiers from the build, would this only affect the build size? In other words, do all of the graphic tiers variants load into memory in runtime or only the current one? What happens if the tier is switched in runtime?
    3. Isn't there a better way to handle shader loading? For example, dump all that Unity's representation memory into the file and mmap it making it clean memory?

    If some of my inferences here are wrong, please, correct me. Let's make this topic a vault of useful information.

    Thank you for your patience during longread.

    P.S. If that matters, we're on Unity 2019.2.8 and 2019.2.19 (two projects on different versions). No specific packages in use. No SRP, built-in RP only.
     
    Last edited: Feb 26, 2020
  2. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,501
    Hey, can you submit a bug report for this - i agree we really need to improve this. Feel free to reference this forum post in the bug report. (and replying here with the case number will help too)

    Thanks!
     
  3. YuriGrachev

    YuriGrachev

    Joined:
    Jul 12, 2016
    Posts:
    4
    Hi @richardkettlewell,

    I've prepared a repro and submitted a bug (Case 1223610).

    Please, take a look.

    P.S. My initial post was mistakenly made from a personal account instead of the corp one.
     
    richardkettlewell likes this.
  4. YuriGrachev

    YuriGrachev

    Joined:
    Jul 12, 2016
    Posts:
    4
    While investigating further I think I've noticed another lower-level issue.

    Each variant is compiled in runtime for the exact platform/GAPI. For Metal Unity uses newLibraryWithSource:eek:ptions:error: from MTLDevice protocol. This creates an object that conforms to MTLLibrary protocol. I don't know the exact type of that object, but I'm sure it is allocated in the dirty memory. Also, it is not trackable by Unity. It does not count it either in GfxMemory or in any other section of Memory profiler.

    In the instruments there's a template named Metal System Trace that has Metal Shader Compiler activity among the tracks. That instrument shows that my test project has a lot of MTLibrary creations along with the comparable number of shader compilations.

    Is there a reason for such aggressive MTLLibrary objects creation? AFAIK, each compiled shader is represented as MTLFunction. MTLLibrary can handle a lot of MTLFunction's at once.

    Also, there's a noticeable pattern in timings that are spent for MTLibrary creation (while compiling a large number of shaders in a row). It seems, they are synced to the framerate and/or vsync (look at the screenshot).
    Screenshot 2020-03-20 at 14.22.02.png
    If Unity would switch to a single library use, we can skip those long-lasting and memory-consuming multiple library creations and win both memory and performance.

    What do you think? @richardkettlewell @martonekler
     
  5. VictorChow_K

    VictorChow_K

    Joined:
    Jan 16, 2019
    Posts:
    2
    The ShaderLab memory optimization issue is listed as fixed in 2020.2.0a9 (or alpha 8) on 29 Apr 2020.
    Will this be backported to a 2019 release?

    Also interested in a response to the excessive MTLLibrary creation post above (though perhaps for a different thread).
     
  6. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,501
    Hey I just checked, and we evaluated it for 2019 but decided not to proceed as the code around the fix had changed significantly and we decided the risk of breaking stuff was too high.
     
  7. VictorChow_K

    VictorChow_K

    Joined:
    Jan 16, 2019
    Posts:
    2
    Thank you for the prompt reply -- sad to read it won't be in 2019. Our ShaderLab data increased from 250 to 400MB from 2018.4.14 to 2019.3.10 with no changes or explanation. After aggressive shader variant stripping, it is back down to ~250MB but it leads me to believe there is a hefty chunk of memory to reclaim in what is otherwise a black box.
     
    Peter77 likes this.
  8. richardkettlewell

    richardkettlewell

    Unity Technologies

    Joined:
    Sep 9, 2015
    Posts:
    1,501
    I've passed your feedback along to the folks involved
     
    VictorChow_K and Peter77 like this.
unityunity