Search Unity

Need some guidance on finding source of memory heap issue

Discussion in 'Scripting' started by Schneider21, Nov 17, 2021.

  1. Schneider21

    Schneider21

    Joined:
    Feb 6, 2014
    Posts:
    3,512
    We have an app that we recently discovered has an issue with memory. At first we thought it was a memory leak, which made sense because our entire application is made from tech debt. But after digging in with the Memory Profiler, it looks like the problem is actually that the heap is getting expanded constantly instead of re-using deallocated space. After 2 hours of normal usage, the heap is nearly 5GB with 4.34GB of that being empty fragmented space.



    I've done a number of sessions to try to pinpoint the cause, and I have my suspicions as to what systems might be responsible, but I feel like I'm stuck at this step without being able to measure more definitively when the heap is being expanded and why that's happening versus using an empty existing space that may fit.

    The process in question is a multi-threaded (using Unity Jobs) system that deserializes a bunch of JSON into Dictionaries of Entitas components. My reflexive reaction was to try creating class-level Dictionaries that the methods would use, but obviously once I ran it things fell apart because the threaded tasks were now trying to share the same collection. We need to be able to deserialize and process these elements to display the data, and I can't think of another way to do it that's not allocating a ton of collections the way we are now.

    Anyway, back to my actual question: How do I confirm what's causing the massive increase to fragmented heap space? Is there something I can be doing with these Dictionaries to try to enforce their memory allocation reuse? Even if it means initializing them at a larger size or something... I'd rather take up a bunch of memory up front and have it get reused rather than using the minimum each time and getting lost the longer the app runs.

    I don't feel like this post makes any sense. Anyone with any pointers on how to solve fragmented heap issues is welcome to just throw words at me. :(
     
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,674
    This smells like maybe you're not allowing your past Dictionaries to be cleaned up... this can trivially happen if you have a reference to the Dictionary in a script, then something keeps that script alive forever, such as subscribing to a UI delegate or adding it to an event.

    One common source of this is doing subscriptions to buttons or events in Awake() and then NOT unsubscribing.

    It's always best to subscribe in OnEnable() and unsub in OnDisable() to ensure stuff gets disconnected and is eligible for GC.

    Good old Awake() doesn't really have an opposite, and it only runs once in any case.
     
  3. Schneider21

    Schneider21

    Joined:
    Feb 6, 2014
    Posts:
    3,512
    Hey, @Kurt-Dekker. Unfortunately for me, this situation is a bit more complex. We're using the Job system which I'm not overly familiar with (this code was written by someone prior to me joining the project). The main deserialization happens in here:
    Code (CSharp):
    1. public class DeserializeElementComponentsJob : ThreadedJob
    2. {
    3.     public List<JToken> InData;
    4.     public List<Dictionary<int, IComponent>> OutData;
    5.     public DateTime startTime;
    6.     protected override void ThreadFunction()
    7.     {
    8.         List<Dictionary<int, IComponent>> elementsInformation = new List<Dictionary<int, IComponent>>();
    9.         foreach (JToken elementData in InData)
    10.         {
    11.             Dictionary<int, IComponent> components = CreateComponents(elementData);
    12.             if (components != null)
    13.             {
    14.                 elementsInformation.Add(components);
    15.             }
    16.            
    17.             if (elementData["subFeatures"].ToObject<JArray>().Count > 0 && elementData["uuid"] != null)
    18.             {
    19.                 foreach (JToken subFeature in elementData["subFeatures"])
    20.                 {
    21.                     subFeature["featureClassGroup"] = elementData["featureClassGroup"];
    22.                    
    23.                     Dictionary<int, IComponent> subComponents = CreateComponents(subFeature, elementData["uuid"].ToString());
    24.                     if (subComponents != null)
    25.                     {
    26.                         elementsInformation.Add(subComponents);
    27.                     }
    28.                 }
    29.             }
    30.         }
    31.        
    32.         OutData = elementsInformation;
    33.     }
    34.     // Other stuff omitted for brevity
    35. }
    CreateComponents has all sorts of allocations... The outer Dictionary<int, IComponent>, along with inner values that can be Dictionary<string, string>s or my favorite: List<List<List<double>>> for groups of polygon map coordinates.

    The whole thing gets kicked off from an Entitas system, which is basically just an event trigger that sets the job in motion:
    Code (CSharp):
    1. DeserializeElementComponentsJob job = new DeserializeElementComponentsJob
    2. {
    3.     InData = elements,
    4.     startTime = DateTime.Now
    5. };
    6. job.Start();  
    It might be that the JSON returned from the API is coming back in an inconsistent order, so that even if the same element is being processed after panning the map away and returning to that same area (did I mention previously it's a map application?), the values may not match up in a way that allows even the same data to use the deallocated space in the heap that it was using previously. Or it could just be that that space of 5MB it used before got partially filled by an item that only needed 1MB and the rest was too fragmented to be able to use for anything useful.

    To be clear, I really don't except anyone to solve this for me, especially given how little specific information I'm giving. I'm just running out of ideas on how to chase this trail any further without rewriting our deserializing code just to see what happens.

    Thanks as always for the input.
     
  4. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,674
    Of course you're welcome! I still stand by my prediction that there is something hanging onto your references. Here is my logic: C# is not a toy language. It powers the enterprise world today. I'm gonna guess it has a pretty robust GC system behind it.

    Now TBF a lot of it is in web applications that spin up, do their thing and sigterm. But I'm gonna bet there's a lot of heavy lifting long-lived programs written in C# that chew up memory and live for months if not years.

    Can you extract (read "tear apart") portions of your suspect jobbified code so that it runs and does all of its allocations but so that you are very careful to not retain or use any of the output data? That may prove harder than it looks but if you can strip that thing down so that you blast enough JSON into it that it consumes 4gb of RAM, then release it all and do a GC, then do another blast of JSON and have it not go (much) above 4gb, that would prove that one of the things you stripped away was holding onto a reference.

    And who knows, there's probably heap crawlers out there to give you better insight into them; I'm just not that much in the C# language space, as I mostly use it just as plain vanilla oldschool C with some classes added in.
     
  5. Schneider21

    Schneider21

    Joined:
    Feb 6, 2014
    Posts:
    3,512
    Yeah, that's why I'm thinking it's gotta be something with the specifics of what we're doing that's wrong. Like, we're doing multithreading (using System.Threading... it's other parts of the app that use the Unity Jobs system) to do a bunch of JSON deserialization and allocate a ton of collections on the heap, plus all the entity/component creation stuff that Entitas is doing in the background... It might just be that that much memory allocation while multithreading inherently fragments things so badly that it can never be fully utilized again. More experienced enterprises -- which we are not -- know better than to do this kind of thing, perhaps.

    My day so far has been spent trying to strip the multithreading part out of our system and just run everything on the main thread to see if that changed the results, but that's turned out to be a bigger task than I expected. I'll pass what I've learned so far up the totem pole and see what the directive is.

    Thanks for being a sounding board!
     
    Kurt-Dekker likes this.