Search Unity

Post-processing Post Processing V2 CPU Performance Concerns

Discussion in 'Image Effects' started by lasercannon, Jul 26, 2018.

  1. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    72
    Hello!

    I was investigating the performance of my PostProcessing 2.0 stack, and noticed that the stack is doing a lot of work every frame. The biggest hits are that it calculates the overrides every frame (even if nothing changes, and even if only ONE layer "overrides" that value) and, even when I have no effects enabled, spends a lot of time in PostProcessManager.ReplaceData (partially beacuse it seems to be doing A TON of Spline calculations?).

    The whole PostProcessLayer.OnPreCull() process takes 1.8ms when both of my volumes are enabled, and 0.8 ms when they're both disabled. This is on a next-gen i7 desktop CPU. This is equivalent to 5-10ms on modern consoles, since most consoles are built for multi threading moreso than raw speed.

    Are there any plans to significantly clean this up and/or allow these calculations to run asynchronously? Or is there, perhaps, something I'm doing inefficiently?

    Thanks!
    Bryant
     

    Attached Files:

  2. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    1,111
    Hi!

    That's not what we've seen on a base (non-pro) PS4, it's 1ms there.
    I would suggest profiling a release version using an external profiler and checking, what's the situation there. Development mode is close to Release mode, but the profiler can add its own overhead.
     
  3. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    72
    Ah true! I seemed to forget my results are with the deep profiler on. Ha. The PS4 times aren't correct. Forgive my unreasonable alarm. It was the end of a long day.

    Either way, I'm mostly comparing to other parts of the frame time, and PostProcessLayer is about half as time-intensive as our entire BehaviourUpdate loop (3.0-3.5ms when under the same deep profiling conditions, ~1ms with normal profiling), which seems unnecessary considering we are splitting hairs trying to achieve a final 60fps result.

    My setup:
    • One camera w/ PostProcessLayer
    • Global PostProcessVolume with all effects enabled (many values are overridden)
    • Global PostProcessVolume that overrides certain effects to "Off" based on the quality settings (but the profile still has all of the components)

    Some more direct questions. Times are with deep profiling on, estimated normal profiling time in brackets, all on my 4.0GHz i7 6700K:
    • When I delete all PostProcessVolumes, I still have approximately 0.7ms [~0.25ms] of overhead from PostProcessManager.ReplaceData. Is there a way to remove this? Are these override values cached in the PostProcessLayer somehow?
      • Looking at some of the code, it looks like you're using reflection to loop through all of the TYPES at this stage, even if one isn't used anywhere. Perhaps it's possible to optimize in the future?
      • It would work really well to run this part of the calculation in a Job starting in Update! (I assume this isn't currently possible because it'd be considered unsafe? Now that I understand the code a bit more I'm going to see if I can get that to work...)
    • Within ReplaceData(), I there are 8 Spline.Cache() calls, taking up a total of 0.4ms [~0.15ms] (of that 0.7). This amounts to 1024 calls to Spline.Evaluate(). I know ColorGrading has color curves, but mine aren't changing at all (In fact I think they're all set to linear, currently). Is this necessary to be called every frame? Is there any way to remove these calls if I'm not using any Spline overrides?
    In short, 1ms on a PS4 CPU seems pretty significant, especially when a lot of CPU work seems to be getting wasted on updating values that don't need to be updated.

    By the way, despite all my nitpicking, I'm really loving the new tool so far! :)

    Thanks,
    Bryant
     
  4. larsbertram1

    larsbertram1

    Joined:
    Oct 7, 2008
    Posts:
    5,487
    1ms, really? that is 1/16 of the complete frame time... which is quite a lot imho.
     
    GoGoGadget and Arycama like this.
  5. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    1,111
    afaik: (1) If you're interested in why the things are why they are, there are a lot of comments in the source code, explaining this :)
    (2) If you're using a built-in pipeline, postfx v2 is the only volume system, and it adds an overhead for that. If you're using SRPs, this 1ms is distributed between several volume systems.
     
  6. lasercannon

    lasercannon

    Joined:
    Nov 29, 2012
    Posts:
    72
    I understand that there's gonna be some overhead, I just see some places it might have some useful optimizations.

    Found this comment in the Spline.cs code, so I know I'm not completely off base here:

    Code (CSharp):
    1.         // Note: it would be nice to have a way to check if a curve has changed in any way, that
    2.         // would save quite a few CPU cycles instead of having to force cache it once per frame :/
    This change sounds like it would require checking for changes to AnimationCurve, or perhaps making assumptions about how often that specific SplineOverride needs to get cached.

    Anyway, just consider this a request to get this particular spline code optimized in the near future. It'd definitely help us out. :)
     
    GoGoGadget and Garrettec like this.
  7. Arycama

    Arycama

    Joined:
    May 25, 2014
    Posts:
    101
    1ms on a modern platform is ridiculous, just to blend between some values and volumes, especially when you only have one active post process volume. (Especially when -none- of those values have changed. Surely you can just hash the values and compare those once a frame, instead of 1024 spline evaluations?)

    If you were aiming for 60 fps, that's 1/16th of your frame time gone for basically no reason. I could spend several hours trying to reduce draw calls and CPU logic just to reclaim that.

    The fact that you guys aren't planning to do any more work on it for non-SRP renderers is pretty annoying, especially because the SRP's still aren't usable for production for a lot of projects, especially those already built heavily on the legacy rendering pipeline.
     
    GoGoGadget likes this.
  8. GoGoGadget

    GoGoGadget

    Joined:
    Sep 23, 2013
    Posts:
    689
    Just jumped in and had a look myself, did Unity get a volunteer intern to write their volume system for them? I added Post-process volume blending to my own asset in a few nights way back and it was nowhere near this convoluted and heavy. Seriously, if you're at least going to copy the concept (not that it's new or unique), copy an implementation that doesn't cycle through giant managed lists each frame!



    Why does it cost a post-processing "manager" 1ms to manage nothing? All of that CPU time (on a Ryzen Threadripper 2950x, mind you) is completely wasted with only one global manager in the scene. This seems to just be a ~1ms CPU cycle "tax" imposed by the design of the new post-processing system!

    I just can't put it in any other words, it's mind-boggling!
     
  9. peaj_metric

    peaj_metric

    Joined:
    Sep 15, 2014
    Posts:
    25
    We also have this problem. PostProcessLayer.OnPreCull is taking 1.28ms each frame on console
    We only use 1 global and 1 local volume.
    Shouldnt it at least early out if the camera is far away from the local volume?
     
  10. Lukas-Wendt

    Lukas-Wendt

    Joined:
    Jun 1, 2014
    Posts:
    11
    We are also seeing 1.3 to 1.5ms each frame on console. We don't change parameters and we only use a global volume, but we still pay a hefty price for the ability to do so. We will try to skip ReplaceData in PostProcessManager if it has already been called. This seems to work, but we need to test what happens during scene changes.
     
  11. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    95
    Anyone has a fork without this nonsense? Thankfully we at least have source code now.
     
  12. spajus

    spajus

    Joined:
    Jun 10, 2015
    Posts:
    10
    For those who want to get rid of that 1ms in
    PostProcessManager.OnPreCull
    , here is how you can patch the source yourself.
    Warning, it breaks dynamic enabling / disabling of effects.

    Code (CSharp):
    1.            
    2. //in PostProcessLayer#OverrideSettings
    3. if (!settings.active || !settings.enabled) { continue; }
    4.      
    5. //in PostProcessManager#ReplaceData
    6. foreach (var settings in m_BaseSettings)
    7. {
    8.   if (!settings.enabled) { continue; }
    9.  
    Exact locations:

    https://github.com/Unity-Technologi...Processing/Runtime/PostProcessManager.cs#L305
    https://github.com/Unity-Technologi...stProcessing/Runtime/PostProcessLayer.cs#L744

    Basically what it does, it will not try to loop through 100 unused settings on every frame to check if they are overridden or not.

    Oh, and you will have to add every single effect and uncheck the enabled flag them in your PostProcessingProfile, because surprisingly default value for enabled is true.

    For Unity team, it would be nice to have a way to completely exclude some effects from the stack, so they would not even be considered on every frame.
    Another optimization would be to skip cycling dynamic parameters of completely disabled effects on each frame (the hack I did above, but probably in a cleaner way).
     
    Last edited: Jan 28, 2020
  13. KokkuHub

    KokkuHub

    Joined:
    Feb 15, 2018
    Posts:
    95
    Thanks! This stuff takes a whooping 2.5ms on Switch. That's over 15% frame time budget for a 60fps game!

    Since PPV2 is on the deprecation chopping block, we have to fix this stuff ourselves. This looping of all settings to blend everything is very poor design.
     
    Arycama likes this.
  14. Jackless

    Jackless

    Joined:
    Jan 9, 2018
    Posts:
    13
    EDIT: Ok nevermind, I found it. I used the PostProcess from the package manager and it didnt import some classes for some reason. I have now downloaded the github project and there are all the classes that were missing before.
    I wonder how it even worked^^


    Hello thanks for the information. Could you maybe point us to the exact location where this code needs to be pasted?
    I´ve searched through the entire PostProcessing Folder in VS and couldnt find the method unfortunately.
     
    Last edited: Jan 28, 2020
    ihgyug likes this.
  15. spajus

    spajus

    Joined:
    Jun 10, 2015
    Posts:
    10
    I updated my post with exact location links to GitHub.
     
    Jackless likes this.
  16. Jackless

    Jackless

    Joined:
    Jan 9, 2018
    Posts:
    13
    Hey thanks man, that was super quick! ;)
     
  17. Bordeaux_Fox

    Bordeaux_Fox

    Joined:
    Nov 14, 2018
    Posts:
    214
    And who pays for the additional people which are needed to maintain that many graphic pipelines? I think Unity should spend their time in the new tech.
     
  18. AlexisTB

    AlexisTB

    Joined:
    Jul 26, 2017
    Posts:
    8
    I have done the change and did gain some performances from the fix but it wasn't that much. I am guessing since we actually do use a Bloom and Blur post processes then I can't really optimise this further? It takes 1.7ms during the BuildCommandBuffers on Switch.
     
unityunity