Search Unity

  1. Unity 2019.1 is now released.
    Dismiss Notice

Feedback Incremental GC feedback thread

Discussion in 'Experimental Scripting Previews' started by jonas-echterhoff, Nov 26, 2018.

  1. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Unity 19.1a10 has experimental support for incremental garbage collection. You can find more information about the feature in this blog post.

    I'm opening this forum thread as a place to discuss the feature and to collect any feedback. We are very interested in hearing from anyone trying this on projects (especially projects which are suffering from GC spikes), and to hear how incremental GC affects these projects - but any other type of feedback is very welcome as well of course.
     
    Last edited: Nov 26, 2018
  2. codestage

    codestage

    Joined:
    Jul 27, 2012
    Posts:
    1,194
    Looks really interesting, @jonas-echterhoff !

    Thanks for letting us try it at such early stage.

    Here is a first simple experiment on Android:

    upload_2018-11-27_2-15-5.png

    upload_2018-11-27_2-16-4.png

    upload_2018-11-27_2-20-46.png

    Last screenshot reveals nature of the Incremental GC: Profiler shows GarbageCollector.CollectIncremental taking all the WaitForTargetFPS frame (wait for vsync) and GC.Collect runs portion of job within CollectIncremental time.

    And if I got it correctly, this picture is totally correct - Incremental GC makes some job to define borders, then just runs chunk of synchronous GC.Collect() at the specified frame and then makes some more additional work to prepare for the next frame.

    And this is much, much better than a single 9ms spike with Incremental GC turned off for same scene:

    upload_2018-11-27_2-28-30.png

    I'm really happy to see this is coming and will be available at the 19.1.

    Though I'm afraid this will relax requirements for the developers on heap allocations avoidance and it may increase ignorance to the GC allocations problem, leading to more issues with GC in the future on the late project stages =D
     
    MegamaDev and JimmyCushnie like this.
  3. yoyobbi

    yoyobbi

    Joined:
    Nov 26, 2013
    Posts:
    15
    We see significant performance problems in any managed code that allocates memory, independent of garbage collection spikes - code that allocates just runs more slowly. My theory is that the Boehm GC approach means fresh allocations constantly spill into fresh cache lines, so code that allocates will almost always be hit with a performance-crippling cache miss.

    I had hoped that the rumoured "new garbage collector" would be a generational garbage collector with good cache utilization for short-lived allocations. Is there an initiative at Unity to support generational GC, or is incremental Boehm the best we can hope for? Reducing spikes is great, but if allocation continues to hurt performance then we will continue to avoid allocations as much as humanly possible.
     
    Sluggy likes this.
  4. liiir1985

    liiir1985

    Joined:
    Jul 30, 2014
    Posts:
    35
    This would be the benefits of percise GC, Bohem is a conservative collector, which means it cannot tell the difference between real pointer and a integer value. So compacting memory is not possible with boehm, as well as generational marking. Percise GC(both sgen, coreclr's gc, jvm's gc) will compact memory, which means to move live objects together in order to eliminate memory fragments and to improve cache localty.
    But currently it's most unlikely unity will adopt any percise GC, because non of those work with il2cpp. It's difficult to get the stackmap out of c++ compiler which is crucial for percise GC.
    Using percise GC at this point would mean to abandon il2cpp and switch to JIT generate code gen system, like mono aot or coreRT. CoreRT is currently not production ready and don‘t support iOS
     
    Last edited: Nov 27, 2018
    yoyobbi likes this.
  5. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Thanks for the testing! From your screenshots, it looks like you don't actually have vsync enabled, though, making the player run at >100fps? If you enable vsync, the GC should have a better clue at how much time it should use. If you don't, try changing the value of GarbageCollector.incrementalTimeSliceNanoseconds.

    Yes, this is a concern I share - people might make up for the better time distribution by writing less optimal code, and then not benefit in the end. Though you could argue that there is still benefit, if you can get to a similar result with less hard optimization work.
     
    codestage likes this.
  6. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Right now, no. But as I wrote in the linked blog post, incremental Boehm seemed like the smallest (and thus, safest) step to take towards a better GC, and should help solve the biggest problem people seem to have (spikes). Once this is shipping and stable, we are at a better point to switch to other GC solutions, as the write barrier part needed by pretty much any modern GC is solved then. We will continue to listen to feedback and consider future steps based on that.

    That said, no possible solution is a silver bullet. Unity's requirements don't necessarily match that of other software, so what works well somewhere else might not work well for Unity. Eg, users have repeatedly asked about switching to Sgen, which I have been testing with, and did not get overall better performance results in Unity content.
     
  7. codestage

    codestage

    Joined:
    Jul 27, 2012
    Posts:
    1,194
    Thanks for your reply, Jonas!

    It actually was built with Every V Blank setting:

    upload_2018-11-27_11-40-15.png

    Though I agree CPU graph looks unusual for the Player with VSync enabled.
     
  8. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,107
    What's the logic for sweeping with this? Does it have anything resembling generations or other knobs we can tweak?
     
  9. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    773
    The only knob to tweak is the maximum time spent on scanning per frame.
    No big logic changes and no generational GC yet.

    As jonas-echterhoff explained in post#6 you can view it as a sort of preparation stage for coming changes that also already fixes the biggest issue we have with the GC (which is frame time spikes).
     
    r618 likes this.
  10. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    I think the profiler graph may be wrong here. Looking at the reported total frame time of ~42ms, that does not match the graphed frame rate between 100-200 fps. I think there were some bugs in profiler graph rendering in 19.1, I'll check with our profiler developers.
     
    codestage likes this.
  11. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Just to make sure I'm not overpromising: There are no specific "coming changes" planned after incremental GC. GC spikes are clearly the biggest user issue with GC today, so we are setting out to fix those. Once that has landed and is out of experimental, we will listen to feedback and evaluate what are the most pressing issues to work on, and plan further steps based on that.
     
    Peter77 likes this.
  12. codestage

    codestage

    Joined:
    Jul 27, 2012
    Posts:
    1,194
    Cromfeli likes this.
  13. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Cromfeli and codestage like this.
  14. yoyobbi

    yoyobbi

    Joined:
    Nov 26, 2013
    Posts:
    15
    Thanks for clarifying. Spike reduction is definitely a great step forward, so thank you for that.

    We will continue to avoid allocating memory in order to maintain decent cache performance. I guess the good news is that all the tricks we've learned and pooling mechanisms we've built aren't about to become obsolete after all. :)

    In future with ECS + jobs + Burst compilation - all premised on native arrays of value types - we should be writing more cache-friendly code with less allocation.
     
  15. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    452
    In a managed environment they will never become obsolete even with generational GC. Even if Unity someday will get a modern GC, you still have to pool almost everything.
     
    dadude123 likes this.
  16. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    14
    Any ideas why enabling Incremental GC doesn't seem to be doing anything? Even in a brand new project on 2019.1.0a11, with the only changes being setting Scripting Runtime to 4.x and enabling Incremental GC in Player Settings, with a simple test script, I'm still seeing GC being run as a single frame, and without the GarbageCollector.Incremental call in my profiler.
     
  17. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Which platform are you testing this on?
     
  18. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    14
    Windows.
     
  19. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Testing in editor or player? Incremental GC is only supported on players atm. Also, how long is your GC spike? If it is very short, there might not be a point in spreading it over multiple frames.
     
  20. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    14
    That explains it. I was testing in the editor.
     
  21. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    3,480
    There is this great Unite talk, that explains why profiling in the editor might not be the best option.
     
  22. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    +Don't even trust the hardware. Throttling after intensive development and testing is quite common on anything mobile, wired or not, and even sometimes on desktops.

    A little anecdote, and I'll use FPS here to be relatable as the hard data is not available any longer. I pushed the Vita so hard that initially it was >60fps but in repeated play, went <50fps, which really wouldn't do.

    So I actually ended up giving it more to do on the CPU, and the GPU took a little rest, bringing the thermal down and running at a steady 60 throughout.

    It's odd. I did more work. Subtracted no work. Ran faster because GPU wasn't trying to take off. The heck I know. Obviously it's not going to be something you'll be able to use but still a funny tale I thought I'd share.
     
  23. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    @jonas-echterhoff - Hi just wanted to run by you if it's possible or even desirable to query if the Incremental GC is currently busy, as I have wiggle room when I can instantiate things. Here's my use case:

    1. Player moves to new area, and the scene is loaded async (streaming so there are no pauses for loading). The scene contains only geometry and textures, so in this case, it'll be usually loading in just meshes + associated things like colliders, navmesh stuff etc - and if I'm not mistaken, Unity handles this with ring buffers well these days...

    2. Once this is done I'd like wait for the incremental GC to finish being busy, then move onto instantiating the objects that the area requires, nice and slowly, perhaps time-sliced so that the objects can have time to set themselves up without doing so while the incremental GC is running. I would include these in the scene rather than instantiating them myself, but it seems like that would just be slower as it would bloat the scene with a lot of repetition.
    I don't really need to have all these objects ready for at least a few seconds as there will be at least a few seconds of player travel time guaranteed, so I can delay object construction if the GC is busy sweating.

    Is this a good or bad scheme? Your thoughts are welcome, thank you.
     
  24. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,107
    It's generally considered a horrible practice outside of games, because second guessing what a VM will do is nearly impossible in the generic case.

    That said most apps have a single steady state and don't have the hard sync points that game loops do. So while I still cringe at the idea of making runtime logic decisions based on GC stats, I think it has a lot of merit here.

    So I think I would propose having access to concrete stats that are actually a thing and from those define what busy is for your context. For example if you had basic stats on what is allocated, what the promotion/tenuring thresholds were and how many objects were set to be promoted/tenured in the next pass.

    On top of that maybe a callback after each GC pass just signaling that the pass is over, to better coordinate with your code logic.

    I still cringe at the idea, I think it's a minefield no matter how good you are. The making runtime decisions based on the stats that is. The stats themselves would be super useful regardless.
     
    SugoiDev and hippocoder like this.
  25. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    Absolutely! it's totally a hack, which is kind of OK in game dev, depending. Most people will go "why aren't you using proper frame timings to predict this?" and that's fair for optional jobs like effects and so on.

    This is quite applicable to my use case though, for tasks that are mandatory to do but you have some wiggle room when, so it's just about having a bit more knowledge of what Unity's doing to defer some potentially expensive operations. I mean an incremental GC quite happily throws off your average frame timings, so you probably won't be able to tell when it's running.

    Generally my choices are about consistent frame times not peaks and pits. Maybe there's better ideas, and I'd love to hear them. I'm just after a delicious smooth experience as levels are streamed in, and some object setup, and some GC is going to be part of that.
     
    Prodigga and SugoiDev like this.
  26. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    I have some doubt on whether such a setup would end up giving you any real life benefits. Assuming you have some frame time goal (either vsync or target FPS), the incremental GC is designed to make an effort to fit into that, so if you do more work, the incremental GC will do less work, and just take a bit longer, which should not hurt, so I guess what you suggest would ideally just cause the incremental GC to finish a bit earlier (because you delay your other work, giving more time to the GC), but I'm not sure if that would make a difference to the player in the end.

    If you wanted to try it however: There is no direct way to query if the GC "is currently busy". I think you can kind of get that information indirectly, however, using a recorder, like
    UnityEngine.Profiling.Recorder.Get("GarbageCollector.CollectIncremental")
    - which should let you query how much time the GC spent the last frame (should be 0 if not busy). I think that this should not require a development build, and is supported in release builds, but I'm not 100% sure about that.
     
    glenneroo, LeonhardP and hippocoder like this.
  27. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    Awesome, thank you. You're probably right though, will give it a whirl anyway.
     
  28. Arthur-LVGameDev

    Arthur-LVGameDev

    Joined:
    Mar 14, 2016
    Posts:
    62
    Gave it a preliminary shot this evening. Looked positive at first glance, but upon loading a larger save game file into our app via JSON.NET it crashed without any indication of reason in the log (log is clean, just exits). Loading smaller save files worked fine, and the same code works on 2018.3x so am guessing it's a bug. Could potentially be unrelated to the incremental GC though, unsure since there's no stack trace or indication of why it's exiting.

    I didn't yet try a 2019.x build without incremental, will do so in the next day or two & report back. I'm running editor in MacOS, and the build was targeting MacOS with Mono backend, and runtime/API is set to 4.x as I believe is required.

    Anything else I can/should check to debug further or get a lead? A "minimal" repro project, if does end up being needed, will be not-so-minimal in our case, unfortunately.
     
    Shorely likes this.
  29. brendan-vance

    brendan-vance

    Joined:
    Jan 16, 2014
    Posts:
    36
    I've been starting to time slice some UI loading logic in order to keep a somewhat-playable framerate in between different screens, and in general it has been difficult to figure out how much 'spare time' is available after Unity finishes running its various processes for a particular frame! I dunno if the GC itself should report timing stats, but some kind of universal 'Time.realtimeRemainingInTargetFrame' (that makes an effort to include GPU, async load tasks etc...) would be interesting to play with.
     
    Ivan-Pestrikov likes this.
  30. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    At first, please do check if this is incremental GC related at all (easy to confirm, just make a build without incremental GC). Then we will need to work on getting a repro case.
     
  31. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    I can see how it could potentially be useful to be able to schedule optional user tasks in the remaining time spent waiting at the end of the frame. However, a `Time.realtimeRemainingInTargetFrame` would not be enough to solve that, as if you used that during an `Update()` call, you could not know how much more time Unity would need between your `Update()` call and ending the frame. So in addition to a way to get a remaining time estimate, you would need to have a new callback called just before waiting for the next frame. Depending on the platform and on what is used for frame timing (hardware vsync, or software timing for example to implement Application.targetFrameRate), that can happen at different positions in the player loop.

    I suggest filing a feature request on https://feedback.unity3d.com
     
    brendan-vance and hippocoder like this.
  32. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    @jonas-echterhoff if memory serves, didn't Dynamic Resolution Scaling require a way improved timing API now?, I recall Unity did a whole bunch of work for this and has an API I can't remember the name of. Seems like a great fit.
     
  33. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    3,480
    hippocoder likes this.
  34. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,120
    That's it.
     
  35. Kruko

    Kruko

    Joined:
    Jan 26, 2016
    Posts:
    246
    Hi, is this available in 2019.1 Beta?
     
  36. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    3,480
    If you're referring to the Incremental GC, then you might want to take a look at the very first post of this thread:
     
  37. RichSiegel

    RichSiegel

    Joined:
    Jun 13, 2014
    Posts:
    2
    Is there any timeline for when this will be supported on PS4? We are experiencing GC related spikes on PS4 and this sounds like a dream come true for us. We'll be shipping our title within the next 3 or 4 months, is there any hope for us?
     
  38. Kruko

    Kruko

    Joined:
    Jan 26, 2016
    Posts:
    246
    Yes. Judging by your answer, it's alpha only feature for now. I'm quite excited for the feature, hopefully it will lower the requirement for pooling a lot of stuff.
     
  39. goran_okomotive

    goran_okomotive

    Joined:
    Apr 26, 2017
    Posts:
    24
    ETA for major consoles?
    And another question: will incremental GC also help to reduce spikes during an asynchronous Resources.UnloadUnusedAssets call?
     
  40. JJJohan

    JJJohan

    Joined:
    Mar 18, 2016
    Posts:
    174
    I'd also be curious if there is any sense of where WebGL support stands, since I believe the Boehm system is also used there, not necessarily an ETA but rather which order support for additional build targets lies. We'd definitely benefit from a reduction in GC spikes for our project.
     
  41. buFFalo94

    buFFalo94

    Joined:
    Sep 14, 2015
    Posts:
    181
    Is not alpha only features, did you tried to download the beta version to see if it's missing?
    His answer just mean it's was introduced in 2019.1 alpha 10 it's will not disappear.
     
    Peter77 likes this.
  42. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    19.2
     
  43. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    19.2.

    No, that's unrelated.
     
  44. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    Not sure. We have it working on WebGL, but currently unsure if/when it's going to ship. One problem is: On WebGL, GC can only happen after the frame has completed (due to missing support for stack introspection needed for detecting if objects are currently referred by the stack) - see "Garbage Collection considerations" here: https://docs.unity3d.com/Manual/webgl-memory.html . Now, when you combine this limitation with incremental GC, that means that you can never fall back to a full GC when you need the memory "right away". And since GC is only performed for a limited time between frames, it could result in cases where you run out of memory on slow devices, but are fine on faster devices, which might be problematic to deal with.
    .
     
  45. Ivan-Pestrikov

    Ivan-Pestrikov

    Joined:
    Aug 8, 2014
    Posts:
    10
    I've just switched to 2019.1.0b3 and tried some perfomance stress tests of my project.
    "Use Incremental GC (Experimental)" is checked.
    Built the project for Windows 64 (development build).

    GarbageCollector.isIncremental returns true. QualitySettings.vSyncCount is 1.

    But the GC works in the same "spikes" way.

    upload_2019-2-16_14-12-40.png

    upload_2019-2-16_14-14-18.png

    What am I doing wrong?

    UPD: I tried to invoke manually the GarbageCollector.CollectIncremental with time ~10 mln nanoseconds -- no effect at all, it doesn't show up on the profiler, only usual spikes pop up every 20 seconds.

    UPD1: The garbage generation is quite optimized in my project. The long GC run time is caused by a high count of managed objects (mostly small arrays of bytes, Vector3, etc.) I will reduce their quantity to the minimum and will run the GC test again.
     

    Attached Files:

    Last edited: Feb 18, 2019
  46. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    I'm curious to take a look at this. Any chance of extracting a repro project you could file in a bug report?
     
  47. Ivan-Pestrikov

    Ivan-Pestrikov

    Joined:
    Aug 8, 2014
    Posts:
    10
    I tried to extract the minimal repro, but it's not really possible -- the main scene holds dependencies to almost the whole project.
    Maybe there are some tests I can run locally?
     
  48. jonas-echterhoff

    jonas-echterhoff

    Unity Technologies

    Joined:
    Aug 18, 2005
    Posts:
    1,553
    I'd also take a big repro if it is possible to submit it?
     
  49. Hagn

    Hagn

    Joined:
    Apr 1, 2013
    Posts:
    8
    On your screenshots you're profiling the editor instead of your standalone build. Were you really profiling your build or did you post the wrong screenshots?
     
  50. Ivan-Pestrikov

    Ivan-Pestrikov

    Joined:
    Aug 8, 2014
    Posts:
    10
    Done, Case 1129037. Thank you for your time!

    It is confirmed, that the spike scales with the managed objects count, linearly.
    1.4 mln objects lead to 150 ms spike, on my CPU, so I guess that the viable limit is somewhere around 200k.
    I focus now on merging arrays and moving everything to structs.
    The obvious choice is to move to ECS, but I'm trying to postpone it as long as possible.
     
    hippocoder likes this.