Search Unity

  1. Click here to see what's on sale for the "Best of Super Sale" on the Asset Store
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

Post-processing Post Processing V2 CPU Performance Concerns

Discussion in 'Image Effects' started by lasercannon, Jul 26, 2018.

  1. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    74
    Hello!

    I was investigating the performance of my PostProcessing 2.0 stack, and noticed that the stack is doing a lot of work every frame. The biggest hits are that it calculates the overrides every frame (even if nothing changes, and even if only ONE layer "overrides" that value) and, even when I have no effects enabled, spends a lot of time in PostProcessManager.ReplaceData (partially beacuse it seems to be doing A TON of Spline calculations?).

    The whole PostProcessLayer.OnPreCull() process takes 1.8ms when both of my volumes are enabled, and 0.8 ms when they're both disabled. This is on a next-gen i7 desktop CPU. This is equivalent to 5-10ms on modern consoles, since most consoles are built for multi threading moreso than raw speed.

    Are there any plans to significantly clean this up and/or allow these calculations to run asynchronously? Or is there, perhaps, something I'm doing inefficiently?

    Thanks!
    Bryant
     

    Attached Files:

  2. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    1,704
    Hi!

    That's not what we've seen on a base (non-pro) PS4, it's 1ms there.
    I would suggest profiling a release version using an external profiler and checking, what's the situation there. Development mode is close to Release mode, but the profiler can add its own overhead.
     
  3. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    74
    Ah true! I seemed to forget my results are with the deep profiler on. Ha. The PS4 times aren't correct. Forgive my unreasonable alarm. It was the end of a long day.

    Either way, I'm mostly comparing to other parts of the frame time, and PostProcessLayer is about half as time-intensive as our entire BehaviourUpdate loop (3.0-3.5ms when under the same deep profiling conditions, ~1ms with normal profiling), which seems unnecessary considering we are splitting hairs trying to achieve a final 60fps result.

    My setup:
    • One camera w/ PostProcessLayer
    • Global PostProcessVolume with all effects enabled (many values are overridden)
    • Global PostProcessVolume that overrides certain effects to "Off" based on the quality settings (but the profile still has all of the components)

    Some more direct questions. Times are with deep profiling on, estimated normal profiling time in brackets, all on my 4.0GHz i7 6700K:
    • When I delete all PostProcessVolumes, I still have approximately 0.7ms [~0.25ms] of overhead from PostProcessManager.ReplaceData. Is there a way to remove this? Are these override values cached in the PostProcessLayer somehow?
      • Looking at some of the code, it looks like you're using reflection to loop through all of the TYPES at this stage, even if one isn't used anywhere. Perhaps it's possible to optimize in the future?
      • It would work really well to run this part of the calculation in a Job starting in Update! (I assume this isn't currently possible because it'd be considered unsafe? Now that I understand the code a bit more I'm going to see if I can get that to work...)
    • Within ReplaceData(), I there are 8 Spline.Cache() calls, taking up a total of 0.4ms [~0.15ms] (of that 0.7). This amounts to 1024 calls to Spline.Evaluate(). I know ColorGrading has color curves, but mine aren't changing at all (In fact I think they're all set to linear, currently). Is this necessary to be called every frame? Is there any way to remove these calls if I'm not using any Spline overrides?
    In short, 1ms on a PS4 CPU seems pretty significant, especially when a lot of CPU work seems to be getting wasted on updating values that don't need to be updated.

    By the way, despite all my nitpicking, I'm really loving the new tool so far! :)

    Thanks,
    Bryant
     
  4. larsbertram1

    larsbertram1

    Joined:
    Oct 7, 2008
    Posts:
    5,909
    1ms, really? that is 1/16 of the complete frame time... which is quite a lot imho.
     
  5. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    1,704
    afaik: (1) If you're interested in why the things are why they are, there are a lot of comments in the source code, explaining this :)
    (2) If you're using a built-in pipeline, postfx v2 is the only volume system, and it adds an overhead for that. If you're using SRPs, this 1ms is distributed between several volume systems.
     
  6. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    74
    I understand that there's gonna be some overhead, I just see some places it might have some useful optimizations.

    Found this comment in the Spline.cs code, so I know I'm not completely off base here:

    Code (CSharp):
    1.         // Note: it would be nice to have a way to check if a curve has changed in any way, that
    2.         // would save quite a few CPU cycles instead of having to force cache it once per frame :/
    This change sounds like it would require checking for changes to AnimationCurve, or perhaps making assumptions about how often that specific SplineOverride needs to get cached.

    Anyway, just consider this a request to get this particular spline code optimized in the near future. It'd definitely help us out. :)
     
    GoGoGadget and Garrettec like this.
  7. Arycama

    Arycama

    Joined:
    May 25, 2014
    Posts:
    132
    1ms on a modern platform is ridiculous, just to blend between some values and volumes, especially when you only have one active post process volume. (Especially when -none- of those values have changed. Surely you can just hash the values and compare those once a frame, instead of 1024 spline evaluations?)

    If you were aiming for 60 fps, that's 1/16th of your frame time gone for basically no reason. I could spend several hours trying to reduce draw calls and CPU logic just to reclaim that.

    The fact that you guys aren't planning to do any more work on it for non-SRP renderers is pretty annoying, especially because the SRP's still aren't usable for production for a lot of projects, especially those already built heavily on the legacy rendering pipeline.
     
  8. GoGoGadget

    GoGoGadget

    Joined:
    Sep 23, 2013
    Posts:
    717
    Just jumped in and had a look myself, did Unity get a volunteer intern to write their volume system for them? I added Post-process volume blending to my own asset in a few nights way back and it was nowhere near this convoluted and heavy. Seriously, if you're at least going to copy the concept (not that it's new or unique), copy an implementation that doesn't cycle through giant managed lists each frame!



    Why does it cost a post-processing "manager" 1ms to manage nothing? All of that CPU time (on a Ryzen Threadripper 2950x, mind you) is completely wasted with only one global manager in the scene. This seems to just be a ~1ms CPU cycle "tax" imposed by the design of the new post-processing system!

    I just can't put it in any other words, it's mind-boggling!
     
  9. peaj_metric

    peaj_metric

    Joined:
    Sep 15, 2014
    Posts:
    60
    We also have this problem. PostProcessLayer.OnPreCull is taking 1.28ms each frame on console
    We only use 1 global and 1 local volume.
    Shouldnt it at least early out if the camera is far away from the local volume?
     
  10. Lukas-Wendt

    Lukas-Wendt

    Joined:
    Jun 1, 2014
    Posts:
    13
    We are also seeing 1.3 to 1.5ms each frame on console. We don't change parameters and we only use a global volume, but we still pay a hefty price for the ability to do so. We will try to skip ReplaceData in PostProcessManager if it has already been called. This seems to work, but we need to test what happens during scene changes.
     
  11. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    Anyone has a fork without this nonsense? Thankfully we at least have source code now.
     
    funkyCoty, spajus, hjohnsen and 2 others like this.
  12. spajus

    spajus

    Joined:
    Jun 10, 2015
    Posts:
    16
    For those who want to get rid of that 1ms in
    PostProcessManager.OnPreCull
    , here is how you can patch the source yourself.
    Warning, it breaks dynamic enabling / disabling of effects.

    Code (CSharp):
    1.            
    2. //in PostProcessLayer#OverrideSettings
    3. if (!settings.active || !settings.enabled) { continue; }
    4.      
    5. //in PostProcessManager#ReplaceData
    6. foreach (var settings in m_BaseSettings)
    7. {
    8.   if (!settings.enabled) { continue; }
    9.  
    Exact locations:

    https://github.com/Unity-Technologi...Processing/Runtime/PostProcessManager.cs#L305
    https://github.com/Unity-Technologi...stProcessing/Runtime/PostProcessLayer.cs#L744

    Basically what it does, it will not try to loop through 100 unused settings on every frame to check if they are overridden or not.

    Oh, and you will have to add every single effect and uncheck the enabled flag them in your PostProcessingProfile, because surprisingly default value for enabled is true.

    For Unity team, it would be nice to have a way to completely exclude some effects from the stack, so they would not even be considered on every frame.
    Another optimization would be to skip cycling dynamic parameters of completely disabled effects on each frame (the hack I did above, but probably in a cleaner way).
     
    Last edited: Jan 28, 2020
    Nothke, zyzyx, Cynicat and 10 others like this.
  13. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    Thanks! This stuff takes a whooping 2.5ms on Switch. That's over 15% frame time budget for a 60fps game!

    Since PPV2 is on the deprecation chopping block, we have to fix this stuff ourselves. This looping of all settings to blend everything is very poor design.
     
    Cynicat, joshcamas and Arycama like this.
  14. Jackless

    Jackless

    Joined:
    Jan 9, 2018
    Posts:
    19
    EDIT: Ok nevermind, I found it. I used the PostProcess from the package manager and it didnt import some classes for some reason. I have now downloaded the github project and there are all the classes that were missing before.
    I wonder how it even worked^^


    Hello thanks for the information. Could you maybe point us to the exact location where this code needs to be pasted?
    I´ve searched through the entire PostProcessing Folder in VS and couldnt find the method unfortunately.
     
    Last edited: Jan 28, 2020
    ihgyug likes this.
  15. spajus

    spajus

    Joined:
    Jun 10, 2015
    Posts:
    16
    I updated my post with exact location links to GitHub.
     
    Jackless likes this.
  16. Jackless

    Jackless

    Joined:
    Jan 9, 2018
    Posts:
    19
    Hey thanks man, that was super quick! ;)
     
  17. Bordeaux_Fox

    Bordeaux_Fox

    Joined:
    Nov 14, 2018
    Posts:
    507
    And who pays for the additional people which are needed to maintain that many graphic pipelines? I think Unity should spend their time in the new tech.
     
  18. AlexisTB

    AlexisTB

    Joined:
    Jul 26, 2017
    Posts:
    14
    I have done the change and did gain some performances from the fix but it wasn't that much. I am guessing since we actually do use a Bloom and Blur post processes then I can't really optimise this further? It takes 1.7ms during the BuildCommandBuffers on Switch.
     
  19. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    I managed to optimize it a little further, check the commits in our branch:
    https://github.com/KokkuGames/PostProcessing/commits/v2

    The sampling of all color grading spline curves on every frame was absurd! I don't see how those curves can change during runtime outside of the editor, so we changed it to cache forever, with the color grading editor clearing the cache to reflect changes.

    For bloom, reducing diffusion also helped, because it determines the number of times it will loop in the C# side.

    We still need to bring that down more, however.
     
  20. unisip

    unisip

    Joined:
    Sep 15, 2010
    Posts:
    280
    This is insane!

    I have been looking at this PPv2 stack overhead problem about 18 months ago, did some quick fixes (not as clean as what you did, thanks for sharing, btw), and now I see that Unity never bothered to clean up their mess. The argument about it being « faster with URP » is a joke (how is that an excuse for wasting 1ms updating parameters that hardly ever change in most real world use cases?).

    Don’t get me wrong, I love Unity, but I am REALLY disappointed that no one seems to care on their end about such performance non sense. And don’t get me started on the lightmap packer that leaves 60% of UV space empty while creating new lightmaps, resulting in more VRAM usage, more drawcalls, etc...

    The one thing that I like, though, is that they now at least give us the source code so we can fix it (please give me a hook so I can write my own lightmap UV packer). I still wonder how that PP stack v2 parameter management system could actually make it past internal code reviews. I guess it was trying to be all things to all people, ending up with bad performance in 99% of real world use cases.
     
  21. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,069
    Holy crap, I'm so glad I happened to find this thread. I've always been suspicious of PPv2's extremely high overhead, super glad that it's source is accessible and therefore open to us to change. I wonder what else in Unity is designed like this, but not in a modifiable package...
     
  22. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    Not exactly CPU-related, but I added a compute-shader bloom to a branch I'm using for crazier optimizations/changes (which are not guaranteed to work well in all use cases as they are being tested on a specific project). Reduced our GPU time on Switch by over a whole millisecond compared to the original bloom. Check it out if you're adventurous:

    https://github.com/KokkuGames/PostProcessing/commit/1757e8e1ba5cbecf173f0ebf3f26b6f3d5f50f31

    It's a near verbatim port of the bloom from Microsoft's MiniEngine (https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/MiniEngine), with minimal modifications.


     
    Last edited: Apr 5, 2020
  23. Raul_MadGoat

    Raul_MadGoat

    Joined:
    Jan 10, 2015
    Posts:
    332
    I want to thank you for doing all these optimizations. Really saved my life with getting as much performance as possible from our game.

    Also a side note, if anybody is using this and still needs to sometime change settings (we have a graphics settings menu at runtime since we target PC), disabling and re-enabling the post processing layers of your cameras seems to work for applying new settings, and shouldn't be much of a performance issue since it will only happen once when applying settings (in our case at least)
     
    joshcamas likes this.
  24. Captain_Pineapple

    Captain_Pineapple

    Joined:
    Nov 9, 2016
    Posts:
    5
    Damn guys. Using this instead of the default postprocessing stack saves us more than a millisecond in a built version.
    Thank you so much for making this public!
     
    AcidArrow, joshcamas and Raul_MadGoat like this.
  25. Raul_MadGoat

    Raul_MadGoat

    Joined:
    Jan 10, 2015
    Posts:
    332
    Relatable :D
     
  26. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,069
    Is there a way to force enabling / disabling of effects? I do this when I switch graphic options : )
     
  27. Raul_MadGoat

    Raul_MadGoat

    Joined:
    Jan 10, 2015
    Posts:
    332
    Does this not work for you?
     
  28. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,069
    Oh I didn't see that, sorry!
     
  29. Wolfgame

    Wolfgame

    Joined:
    Feb 7, 2014
    Posts:
    12
    Unity has left the chat.
     
    David_Fenner, cassius, ihgyug and 6 others like this.
  30. cassius

    cassius

    Joined:
    Aug 5, 2012
    Posts:
    52
    Is this still an ongoing issue in 2019.4.5? This thread was started 2 years ago!
     
  31. Raul_MadGoat

    Raul_MadGoat

    Joined:
    Jan 10, 2015
    Posts:
    332
    Post processing stack 2 hasn't changed much/at all since scriptable pipelines got released (except for compatibility updates and minor fixes), so performance probably still is and will be an issue.

    Haven't tested on 2019.4, we are still on 2019.3 with our project as we are anxious to update right before our release and the issue is still there, but I doubt they updated the ppv2 package for 2019.4 to fix this given the focus is on SRP which both have their own more performant volume post processing system.
     
  32. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    4,035
  33. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    Further CPU cost reduction would need a more drastic rewrite. A lot of time is spent rebuilding the command buffers for each effect from scratch every frame, so the obvious approach would be to break it into multiple commands buffer blocks so the ones that never change can be cached.

    That's quite the effort, since right now a single command buffer is passed around for each effect to write their commands into and the architecture itself would need to be changed to a different one where each effect provides one or more command buffers to be added into the camera. This way each effect can minimize command buffer re-creation to the bare minimum. The color grading, for example, wouldn't need to do any work at all if none of the values changed in the current frame.

    I'm not sure I'll have the time to go that far into this, but I'm leaving this here if anyone is wondering where to go next.
     
  34. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,069
    Interesting, I should have assumed PP2 would be dropped in favor of the new and "improved" SRP one. Sigh. I feel like 90% of the unity population (the ones using built-in still) are being ignored now, which is frustrating. I guess it just is what it is. Thank god these packages are open source!
     
    camta005 likes this.
  35. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    URP has it's own PP stack, but it doesn't support custom post effects in an easy-to-plug way, so they made it possible to use PPv2 with it until they get that working. Other than that, no "new" work is supposed to be made on PPv2, except making it compatible with the latest Unity versions.

    I haven't read URP's PP code yet, but since it's an integrated stack it's supposed to be faster because the effects aren't completely isolated from each other (there are some assumptions and data re-use one can make when all your effects are jumbled together). However, I heard it was somewhat based on the PPv2 design so some inefficiencies could have carried over.
     
  36. cassius

    cassius

    Joined:
    Aug 5, 2012
    Posts:
    52
    Blush. All this time I thought PPv2 was the URP PP stack. Does the URP's post processor have a Package, or is it included somehow else.

    Having leaped from version 5 up to 2019.2, I should have known I'd be misunderstanding how it all works now. So confusing.

    Also, @laurentlavigne I completely agree. I bought it all up.
     
  37. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    Thing is: PPv2 was URP's PP stack at first, back when it was called LWRP. Then they introduced an integrated PP (sometimes called "PPv3") and broke compatibility with PPv2. You'll find a lot of "how to custom post process in URP?" threads from that period, when Unity realized they dun goofed because yes a lot of Unity developers do need to run custom post-processing in their games after all and that most were not finding their way around the sorely undocumented URP custom render features/passes, which actually can be used for post-processing.

    It's not a package, it's built into URP. If you install URP you can go right into enabling things like bloom, color grading, depth of field, etc, without anything extra.
     
  38. cassius

    cassius

    Joined:
    Aug 5, 2012
    Posts:
    52
    I see. So, in actuality, I am already using it. It's just that I also have the PPv2 package installed, but unused. I guess between that and what you just wrote, it makes sense that I got confused.

    Do you know if "UberPostProcess", which keeps showing up in my Profiler, is a component of URP Post Processing then?
     
  39. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    I could be wrong, but I think it is. AFAIK that isn't too light on the CPU either, specially on ARM CPUS.
     
    cassius likes this.
  40. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    513
    PPv2 is the gift that keeps on giving. After some struggles with memory-related issues on PS4, we found out that the MultiScaleVO was leaking temporary render targets. Unity seems to eventually clean some of them up, but only the ones older than 16 frames. This means there are always 15 extra copies of all temporary render targets used by the MultiScaleVO, using almost 200MB more GPU memory at 1080p than it should. Yay!

    The leak doesn't happen in 2019.4, so it was fixed somewhere along the way. But we are still stuck with 2019.2 so I had to find out a workaround. Oddly, that was the only MSVO that was leaking, all other effects behave correctly. Basically, the calls to CommandBuffer.ReleaseTemporaryRT() do nothing, but only for this post FX.

    After some experimentation, comparing to how other effects allocate/release their temp render targets, I found the culprit:

    https://github.com/KokkuGames/PostProcessing/commit/a925cf92ed8d33c2fbeb0bdf5eaf6220f4126a81

    Yup, that's it. Specifying mipcount in the descriptor fixes it somehow.
     
    Last edited: Aug 24, 2020
  41. jamespaterson

    jamespaterson

    Joined:
    Jun 19, 2018
    Posts:
    304
    hi all, especially @KokkuHub . I just want to say many thanks for this thread. I have switched out the standard PPV2 code for this version and am saving ~0.8ms per frame on a laptop PC (6700HQ i7). Every little helps!
     
unityunity