Search Unity

Unable to take a memory sample from bigger scenes. (Infinity loop)

Discussion in 'Editor & General Support' started by GloriaVictis, Aug 14, 2018.

  1. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hello,

    For a long time we are struggling with a issue we cannot memory profile our server.

    We also have tried using this tool: https://bitbucket.org/Unity-Technologies/memoryprofiler/src but it also doesn't work as it hangs out on taking a profile sample.

    The issue is that, it works for a small scenes, but it seems like after hitting certain amount of objects it gets into infinity loop somehow because it either works within seconds or just hangs for 24 hours until crashes of being out of memory. We are on 2017 LTS, but 2018.2 have exact same issue on our project.

    Anyone had issues like that before? What could we do?
     
  2. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    Hello,

    I'd think that the memory snapshot process for the old Memory Profiler (the one in the Profiler Window's Memory view under detailed) should work. There used to be a lockup in the connection that was fixed earlier this year. So the question is in which part of the process it hangs. If you could use this API to request the snapshot by code and just see if you actually get a snapshot, that would clear up if it's an issue in the UI or in the backend.

    However, the focus is currently on the new Memory Profiler that's about to ship with 2018.3 so I'd wait for that if possible.
     
  3. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Honestly its hardly possible for us for two reasons, first is that we are using uLink networking which no longer works on 2018.2 so transition will be pretty costly, I have spent like 3 days just to hack our project to work on 2018.2 just to see it still hangs the same way. Also API you referenced to me is not supported on 2017 version - do you think there's a chance if I would submit entire project as a bug raportyou could try doing a backport of fix on Memory Profiler so we could actually use it? Second, we cannot wait any longer.


    We have MMORPG game, and GC.Collect after launching server takes around 300ms (which is too much already), but after 24 hours it takes nearly 2000ms due to some leak we cannot tracksfor months as profiler is simply not working, which is tragic, especially as our entire game is getting a heavy hit on overall reviews because we cannot profile the issues with engine we are using.
     
  4. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    The doc link was for 2018.2 and switching version does not carry over the opened API, but the API is available in 2017.4

    With what you described it sounds to me like you should be looking at the CPU profiler and find out where all those GC.Allocs are coming from and reducing them.
    Have you looked at this doc yet?

    How is your managed heap growing over time? What exact version of 2017.4 are you on? Have you had a look at this tool?

    Also yes, you could just file a bug about this.
     
  5. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hi,

    Thanks for the link to the documentation on 2017.4, I will test it asap.

    Of course we have looked at the doc you have mentioned already, but our server is handling about 200 concurent players and a few thousands of other networking objects like NPC - so basicly, no matter how much we will cut from collecting we won't prevent GC.Collect running from time to time - which is making a S***load spike.

    Also, we cannot do a proper tests of GC.Allocs as deep profile is showing almost only false data on remote connection, and while hosting server via editor it goes to like 1 fps when profiling due to lots of gameobjects on our scene (another profiler issue).

    We are using latest LTS version on 2017.4, I was not a ware of tool you have just linked, going to check it right away, thanks!
     
  6. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    Hi again,

    going over your issues a bit more closely again, taking a memory snapshot might not be the only thing you can do to address this. It might not even help you all that much. What it would show you is: native objects (like textures or other assets) and managed objects.

    Sure, if you're leaking memory, then it would help you discover that. But I'm guessing you're not using the unsafe keyword in C# to allocate unmanaged memory? Maybe you're using RenderTextures or something like that which could leak native memory because it is not getting disposed but I'd assume you'd crash out of memory way faster if that was the case. Maybe you're keeping static references to managed memory so stuff does not get cleaned up by the GC. However, this should also show up in a smaller scale. Mabye you're just fragmenting your Mono Memory by constantly allocating on the managed heap and letting the GC clean most of it up but not all of it. Then, the GC has to go over larger and more fragmented pieces of memory with each iteration taking longer and longer.

    To check if you're actually leaking and if it is native or managed, you don't necessarily need to take a snapshot. You don't need deep profiling or potentially even attaching the profiler.

    In a debug build of your server, you could periodically, (e.g. every 15 min or half hour) turn on Profiler.enableBinaryLog (check the linked docs for more detailed instructions) and specify a different log file location each time. Since the Profiler Window currently only allows for showing the last 300 frames of that capture, there is no need to capture more frames than that each time. Now, when loading these captures in to the Profiler Window, you can check the stats in the Memory Profiler Chart of the Profiler Window. Do the numbers there go up or do they stay mostly constant over time? Also, if some go up, which ones?

    You can also use these methods to get an idea of your memory usage:

    // Reserved Mono Memory
    UnityEngine.Profiling.Profiler.GetMonoHeapSizeLong()

    // Used Mono Memory
    UnityEngine.Profiling.Profiler.GetMonoUsedSizeLong()

    // Reserved Native Memory
    UnityEngine.Profiling.Profiler.GetTotalReservedMemoryLong()

    // Used Native Memory
    UnityEngine.Profiling.Profiler.GetTotalAllocatedMemoryLong()

    Remember, If you use Debug.Log to output those, you're aggravating your GC.Alloc problem so use that sparingly and maybe not while capturing the binary log. If your server is not running headless, maybe you can display it on the screen. However, converting numbers to text and changing the UI frequently is not usually a very performant and GC.Alloc free operation either. There are suggestions for how to implement a StringBuilder that does this in a GC.Alloc free way out there, e.g.:
    http://www.gavpugh.com/2010/03/23/xnac-stringbuilder-to-string-with-no-garbage/
    http://www.gavpugh.com/2010/04/01/xnac-avoiding-garbage-when-working-with-stringbuilder/
    http://www.gavpugh.com/2010/04/05/xnac-a-garbage-free-stringbuilder-format-method/

    Also, you might want to use TextMeshPro, since the default uGUI Text fields regenerate the text mesh everytime, Allocating Garbage.

    If you want to know more in detail, where your GC.Allocs come from, but can't or don't want to rely on deep profiling, you can use a Divide and Conquer methodology to narrow it down. Checkout where most of the allocations happen in the profiles you captured with the Binary log method and then divide the methods that you can see there up with CustomSamplers. For example, you could put a CustomSampler into every bigger method you're calling from the one you can see in the profiler, basically selectively going deeper in the callstack for where you need to investigate. If that doesn't help narrowing it down, go even deeper and delete those Samplers again under which there where no allocations to reduce overhead. This is a bit more of a manual process than what we'd like it to be but at least it should help you get somewhere with the investigation.

    Maybe you can't reduce your GC.Allocations to zero. Maybe you need to consider periodically spinning up a new server instance of the game to take over with a clean memory slate. But reducing you're allocations should help the server run for longer before it runs into problems.

    Also, if we can figure out a way for you to share the project with us, we still do want to fix the bug preventing you from capturing and maybe this will help figuring out what happens. What I'm suggesting is alternatives and things you could try in the meantime :)
     
  7. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hi,

    Thank you for your reply,

    Actually, we do know where are our GC allocations, but due to the size of project, when we have about 200-300 players and much more NPC's on server at the exact time our GC allocators are things like .Move on CharacterController and sending RPC's, where GC allocation is on very, very deep level.

    We know we can rework entire controller and get into Rigidbody to have it lower, but we cannot waste so much time on fighting with GC allocations when we still have some kind of leak we cannot find due to being unable to take a memory sample from our server.


    There is a sample from server working about a hour:
    https://i.gyazo.com/8bc96c6239de4878e40a050ed1b68ffb.png - GC.Collect takes about 100ms which is not a problem at all, it is not being even felt by players due to lag compensations.

    There is a sample from server which is working for a few hours already (before we restart it):
    https://i.gyazo.com/704a81cad99c37c50fd33560245872fa.png - GC.Collect takes about 600ms which is actually killing a gameplay performance.

    As you can see, only real difference is on Mono, which we cannot profile properly which I gave details here:
    https://forum.unity.com/threads/wip...analyzer-for-unity.527949/page-3#post-3817459

    By the was, does CustomSampler and/or Profiler.BeginSample do overhead while running on Developement build, but without attached profiler?
     
    Last edited: Oct 25, 2018
  8. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    Right, that does look like you're leaking managed memory. I've replied to the other thread as well and am poking around a bit. Thanks for bearing with us as we're trying to figure this out.

    To your questions:
    There is overhead in development players as the code can only be parsed out with a recompile. So whether the profiler is attached or not, having those calls in your code does come with a bit of overhead. That said, when the profiler is disabled/not attached the overhead is reduced. The CustomSampler API comes with less overhead than Profiler.BeginSample, since (among other things) the later one has to emit the name as meta data and the former just creates a sample id.

    For release players, the overhead is very minimal as Profiler.BeginSample(), Profiler.EndSample(), CustomSampler.Begin() and CustomSampler.End() calls are compiled out as they are marked by the Conditional attribute. Creation in a release build is no more than the method call overhead for calling into the bindings layer, one call to native and an IntPtr compare.
    There is a small memory overhead of the CustomSampler API of one pointer size wherever a created sample is stored, plus the name string (which isn't stored in the CustomSampler in release).
     
  9. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Thank you, those informations are very helpfull, especially on low level optimizations we are doing.
    By the way - could you give us more insight of overall performance difference between developement and release build? As we are doing lots of profiling, we have yet been building developement build always, but if performance difference is real, on additional things like CustomerSampler/Profiler.BeginSample() methods we might think of actually using release build mostly.

    Thank you in advance!
     
  10. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    The performance difference is real though I can't tell you by how much. One other thing that can have an impact is the option to Attach for Script Debugging (When building a player, the debug build option "Script Debugging", for the Editor Preferences>External Tools>Editor Attaching) as this injects code that checks if an attempt to connect has been made and if so, accept it. For better benchmarking results when profiling a development player, that option should be turned off.
     
  11. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Thank you for your answer, by the way, I have been able to cut down the project and send it to your QA - they replied they have reproduction. Hopefully it will be fixed pretty soon! Fingers crossed!
     
    MartinTilo and Peter77 like this.
  12. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hello @MartilTilo is there any news on that case by any chance? It has been a while already, having any update would be great.
     
  13. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    Hey there,
    at first we assumed the connection might time out, which would have needed a very involved fix, but it turns out that it doesn't.
    The status right now is threefold:
    • Regarding the crash: a fix for that is in the process of landing.
    • Regarding the infinite loop: we've been able to capture, which took 2 1/2 hours due to the sheer amount of managed objects and the connections between them.
    • Regarding the out of memory crash: we've managed to capture above mentioned snapshot on a laptop with 32 GB RAM, It might well be that a system with less RAM or a non stripped version of the project might run out of memory.
    In 2018.3, we've changed the backend to be streaming the native memory capture to reduce the RAM usage during capture. Since capturing managed memory seems to be the biggest issue, that might not be enough. We'll be looking at reducing RAM usage even further by streaming the managed memory as well in future versions of the backend, as well as streaming to disk and nearly entirely avoiding RAM in especially dire circumstances. Sadly there is no way to backport these backend changes to 2017.4.
     
  14. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hello,

    Thank you for your reply - I am a bit surprised you were able to take a sample, as we couldn't on 32 giga on the same project. Also, we though it is a infinite loop, because it either eats 4-6 gb of ram at peak or just eats everything out of 32 and crashes in our tests.

    Do you know, if you were doing those tests using 2018.3 or 2017.4 actually? Looks like we would need to get into 2018.3 anyway, so it woud be super cool if you could update us when there will be a few more fixes in place where we should be able to do a proper profiling. Transition into 2018.3 is very, very painfull as our network backend (uLink based) is so obosolete, so we would like to be sure it would be worth it.


    Also, as our only tactic we can do is to strip down the scene out of features, test them, find leaks and than put next tests with next features - but it is Sisyphean work as we have fixed more than 100 places in code like that, but we cannot simulate proper gameplay when we need to cut like 95% of components to make it work. Is there any tip if somehow we could evade those issues? Like, are there some certain things which makes the issue with taking profiler sample the most?
     
    Last edited: Dec 13, 2018
  15. alexrvn

    alexrvn

    Unity Technologies

    Joined:
    May 16, 2017
    Posts:
    53
    Hey!
    When running the tests(2017.4) I grabbed two snapshots first one took around 2hr:45m second around 3hr+ I didn't time that one. So the main offenders here are the sheer number of objects in managed and native. Part of the aforementioned time will be spent crawling handles which at least on my tests(with the repo you guys provided, which was stripped) was approx 30-45 min. The largest amount tho was just going through the native objects (think anything that inherits UnityEngine.Object) liveness states, We're planning to take a look at that for both 2018.4 and 2017.4 as improving it should speed up the capture on massive projects. On the memory overhead side of things, I doubt we can do anything to get the capture streaming into 2017.4, I'm hopeful for 2018.4.
    My advice in getting snapshots out of the current system would be chunking by areas and then by systems when testing ( if possible). I'd also suggest trying to use the "Memory" tab from the Profiler and attempt to take memory samples (while somewhat limited in information compared to the memory profiler captures it should provide at least some minimal amount of data to work on).
    As a side note main issue when taking a capture (where the application pushes the limits on memory available on the device) is the copying of the managed heap(gets duplicated and written to the snapshot object). This is problematic to get around of since pretty much most allocations in your scripts are usually managed. On your stripped down repo project this was not really the issue as not that many managed allocations were in play, I suppose on the actual non-stripped project this is completely different.

    Last but not least, reading this thread made it sound that you guys deployed your server under development build, that is not alright as development builds add quite a bit of overhead versus release builds, I'd also suggest trying to do IL2CPP builds for the server as should give a performance boost ( in general, but I do not know the code base in question here so it might not have too much of an impact).
     
  16. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,618
    My observations are that the connections array in a memory snapshot often contains an absurd amount of elements, like 509.117.918 elements and many of these connections seem wrong to me.

    For example, native mesh objects have 700 references to all sorts of different unrelated Components in one memory snapshot an user sent me. This really does not seem right to me, if a Mesh has a reference to a AudioSource for example.

    Were you able to observe and reproduce these connection issues too?

    I'm talking about this one for example:
    https://forum.unity.com/threads/wip...analyzer-for-unity.527949/page-2#post-3619123
     
  17. alexrvn

    alexrvn

    Unity Technologies

    Joined:
    May 16, 2017
    Posts:
    53
    Not as of yet, that sounds really weird ... having a mesh with references to [many] instead of being referenced by [many] :-/.

    What I'd expect to see in such a case would be:
    We have a MeshFilter now this MeshFilter is referenced by the GameObject it is attached to, where the GameObject is also referenced by say an AudioSource(on the GameObject). The pattern here is each Component has a ref to it's GameObject and each GameObject has an array of components. Now lets walk the hierarchy the AudioSource has a reference to the GameObject it is attached to which in turn references your MeshFilter which finally references the Mesh.

    Edit:
    We will be improving the way we report references so that it is easier to understand.
     
    Last edited: Dec 14, 2018
  18. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Thank you very much about the reply - we can get a machine with any amount of ram needed, but the project you received were stripped as much as possible anyway - we have tried running the memory sample catch for 24 hours already and it was crashing out of ram at 32gb (maybe we should try extending those values even more).

    We will try getting more details on IL2CPP but it seems to be not available on windows build at 2017.4?

    What is stopping us from getting off the developement build is mostly not having a possibility to build a scripts only when we need to do a fast hotfix without developement build. As our server builds almost a hour and client almost two hours - we waste so much time we could save simply having an option to build scripts only without developement build.
     
    alexrvn likes this.
  19. alexrvn

    alexrvn

    Unity Technologies

    Joined:
    May 16, 2017
    Posts:
    53
    https://docs.unity3d.com/2017.4/Documentation/Manual/IL2CPP-BuildingProject.html IL2CPP is available afaik, it needs to be installed when you install/change Unity
     
  20. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Seems like it is but only for Android and Windows store, I cannot find that option for Windows itself
     
  21. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    Hello,

    We have built a machine with 64GB ram version and I tried to strip down hierarchy removing every possible parent and moving child into "null" parent - and seems like we were a little bit "closer", but it still crashes on :

    0x000000014093B373 (Unity) DeserializeNativeObjects
    0x000000014093B708 (Unity) MemorySnapshots::DeserializeSnapshot
    0x0000000141214AC7 (Unity) EditorProfilerConnection::HandleMemorySnapshotDataMessage
    0x00000001407FCCD3 (Unity) GeneralConnection::poll
    0x00000001407FF48E (Unity) EditorConnection::pollWithCustomMessage
    0x00000001411F3636 (Unity) Application::TickTimer
    0x000000014141B35F (Unity) MainMessageLoop
    0x000000014141CC1C (Unity) WinMain
    0x0000000141E75618 (Unity) __tmainCRTStartup
    0x00000000777359CD (kernel32) BaseThreadInitThu

    I have also uploaded dump files, I dont know if it will be helpfull, but we are sadly still stuck.

    Is there any tip what we could potentially try doing?
     

    Attached Files:

  22. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    @alexrvn @MartinTilo any chance you have been using 2018.3 profiler for project such as ours? Is there a chance we could be able to take a memory snapshots of really big "leaked" samples to find out what caused them?

    I am asking because moving to 2018.3 would be weeks of switching for us, we will probably do that into 2019.1 already but getting that knowledge would make us move as soon as possible.
     
  23. alexrvn

    alexrvn

    Unity Technologies

    Joined:
    May 16, 2017
    Posts:
    53
    @GloriaVictis We haven't used the 2018.3 memory profiler for anything with a memory footprint as large as yours. What I'd love to do is find out how it crashed on your end when de-serializing as that might be yet another issue with the old back-end. I would suggest moving to 2018.3 either way and switching to 2018.4 LTS would be way better than going to the 2019 tech stream as you already have a live game. The only question would probably be how painful would the migration be in either case.
     
    Last edited: Feb 21, 2019
  24. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    2019 have a great feature with new GC, which actually might be a remedy for the issues we are looking ways to profile the memory stamp (as now GC.Collect takes 700ms after a while). Do you know if it will be introduced on 2018.4LTS by any chance? @alexrvn
     
  25. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    I highly doubt it will get backported to 2017.4 as it was introduced as experimental. Bringing an experimental change into a stabilizing and released version would be too risky.

    Depending on how frequently you get GC.Collect calls, I also doubt that it would solve the problem. If you accumulate allocations faster than they can be processed by splitting the GC.Collect over multiple frames, it will just fall back to processing them all at once instead of time-sliced. The addition of the Incremental GC feature does not suddenly make all issues with managed allocations go away. You'd still want to reduce the amount you allocate frame over frame, ideally to 0.

    That said, if you are okay to be on the Tech Stream and update through to 2019.LTS, it might be worth the risk/a try.
     
  26. GloriaVictis

    GloriaVictis

    Joined:
    Sep 1, 2016
    Posts:
    133
    @MartinTilo thank you for your reply! Actually, we have been able to cut almost everything of GC.alloc, the thing which is the biggest bottleneck for both GC.Alloc and CPU benchmark is CharacterController.Move (around 0.1ms and 0.1KB per moving character per update, even on empty project) when we have 300-400 players online at the server, but luckily the GC.Collect even then launches now about every 30-60 seconds which probably could give us new GC.Collect work properly, I hope at least.
     
    MartinTilo likes this.
  27. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,456
    yeah, I guess that should leave enough time. And yes, some APIs sadly make it really hard to get the amount of allocations down... :(