Search Unity

Profiling background threads

Discussion in 'Scripting' started by bekatt, Mar 10, 2018.

  1. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Hi all,

    I recently started using the built in Unity profiler to better understand the perf of my game. I really like the detailed information I receive about code running on Unity's main thread (I'm using deep profile, and I like to inspect all the function times and memory allocations for functions I've written). It seems I don't really get any information about tasks running in my background threads though. The one exception is, I can manually call Profiler.BeginThreadProfiling() to log some timing data about stuff in background threads.

    Is there a way I can automatically get deep profile to provide the same level of detail about background threads as the Unity main thread? I'd like to have the ability to inspect all function timings and memory allocations. If this info isn't available in the Unity profiler, is there another recommended way of obtaining this data?

    Thanks!
    Ben
     
  2. dgoyette

    dgoyette

    Joined:
    Jul 1, 2016
    Posts:
    4,195
    There's a dropdown on the Profiler to switch the bottom display to Timeline view, which shows stats on worker threads:

    upload_2018-3-10_0-46-57.png
     
  3. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Thanks - I'm aware of the Timeline view. It seems like it only gives me a small fraction of the info available in the Hierarchy view though. Specifically, I didn't see any of the following:
    - memory allocations from background threads
    - nested timing info. For instance if I have something like the following:



    class SomeMonoBehaviour : MonoBehaviour {
    private CustomSampler sampler;

    private static void a() {
    b();
    }

    private static void b() {
    Thread.Sleep(5);
    }

    private void Start() {
    sampler = new CustomSampler("CustomSampler");
    var t = new Thread(() => {
    Profiler.BeginThreadProfiling("My threads", "My thread 1");
    sampler.Begin();
    a();
    sampler.End();
    Profiler.EndThreadProfiling();
    });
    t.Start();
    }
    }


    This is just a crude example to demonstrate my point - it seems that something like this will only tell me that a() took about 5ms in duration. It won't tell me that b() was invoked, or that the 5ms was actually due to the Sleep call. It also won't tell me if a() or b() allocated any memory. Is there a way for deep profile to automatically tell me that b() or Sleep() was invoked?

    My use case is quite a bit more involved than this trivial example - I'm including an external library (in the form of a dll and pdb). The external library makes background threads which I'd like to profile in Unity.
     
    Last edited: Mar 10, 2018
  4. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    We also finally got around to trying the thread profiling but were immediately extremely disappointed when seeing that the threads really only get added to the timeline view (which on top is even bugging out as our thread was hidden behind the horizontal scrollbar most of the time).
    The timeline view is nice to get a quick overview but what's really important is the hierarchy, especially as said above for the GC allocations. After all execution time can (at least roughly) be measured manually, GC allocations not.

    Probably a question for @alexeyzakharov: Is there anything that we can/need to do to get more data out of this? If that's currently (2017.3) not possible, are there any plans to get that stuff done in an upcoming build?

    Regards,
    Chris
     
  5. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Hi @alexeyzakharov: Can we get any update on what the thread profiling is supposed to be like?

    Regards,
    Chris
     
  6. Nigey

    Nigey

    Joined:
    Sep 29, 2013
    Posts:
    1,129
    Hi bekatt,

    I might be jumping the gun not reading everything, but you'll get a lot more detailed information running from this: https://bitbucket.org/Unity-Technologies/memoryprofiler. I'm not sure whether you already are, but the basic Unity profiler will not give you what you want IMO. I'd try this and see whether this is more fitting.

    It is purely a memory profiler, and so obv you're still missing the thread's actions, however memory alloc might be handy still.
     
    Last edited: Apr 6, 2018
  7. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi!

    2018.2 will have an improvements which should help with threads profiling:
    1. Scrollable timeline view where you can properly see threads.
    2. Plugin API which you can use to dump the data to your own format and analyze it later (e.g. chrome trace format), or forward it to other tools (e.g. VTune, PIX).

    Besides that (a bit later) we are enabling auto profiling for mono-spawned threads and making better api to get profiler data in the editor to enable custom visualization possibilities.

    Unfortunately, in 2017.3 you can get thread data only using undocumented Editor API, and the quality of that data is what you see in Timeline View.
     
    Nigey and Alloc like this.
  8. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Hi @alexeyzakharov ,

    thanks a lot for the info. That's quite some good news. Unfortunately the main question of this thread remains (though I didn't make it that explicit in the mention in my previous post ;) ): Is there a way to get the GC information for background profiled threads in any way currently? If not, will there be one?

    Regards,
    Chris
     
  9. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    As I said in 2017.3 the only way is to use undocumented API:
    1. UnityEditorInternal.ProfilerFrameDataIterator to find "GC.Alloc" samples - faster, but without allocation size.
    2. UnityEditorInternal.ProfilerProperty (propertyName + GetColumnAsSingle()) - slower, but with all data.
    This API allows to enumerate all captured threads.
    Note, that custom C# threads should use Profiler.BeginThreadProfiling to enable data collection during profiling.
     
  10. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Unless I'm missing something this doesn't seem to work though.
    Added the BeginThreadProfiling to the start of the thread, even added a custom sampler for that thread but there's no mention of either the name passed to the BeginThreadProfiling nor the custom sampler name nor any of the functions called in the thread after gathering all the data with the ProfilerProperty. I see a lot of GC.Alloc samples but those all seem to be regular main thread allocations.
     
  11. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Hi @alexeyzakharov , wonder if you saw my reply :)
    Still not sure what we do wrong (or if that's an Unity issue) but as I said iterating the ProfilerProperty stuff still does not return any non-main thread related data.
     
  12. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    567
    I faced absolutely the same problem, but with Tasks.
     
    KAJed likes this.
  13. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Gotta bump this. If there is a way to get the thread's profiler data I'd really like to know how :)
     
    alexeyzakharov likes this.
  14. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi, sorry for late reply, was on holidays.

    Yes, my bad, ProfilerProperty can get only main thread data. The only editor API which can walk trough profiler thread data is UnityEditorInternal.ProfilerFrameDataIterator. You can see the usage example at https://github.com/Unity-Technologi...or/ProfilerWindow/ProfilerTimelineGUI.cs#L679

    Besides that if you are interested only in allocation fact native plugin API might help in 2018.2 (Draft doc about it here https://docs.google.com/document/d/...1-EWxU7CaX5Ik1v6Y/edit#heading=h.dpur7r7omyg0). It is not yet capturing "GC.Alloc" samples, but we hope to make it to 2018.2 soon as it is important feature.
     
    codestage likes this.
  15. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Yeah, ProfilerFrameDataIterator works fine indeed :)
    If I get this right I will get one sample per call of a specific method (instead of the combined data per sample path I can get from ProfilerProperty)? So this one wouldn't be hard to "fix" by adding up data for the same sample path.
    But what it's really missing is indeed the allocations. It would be really cool if that could be added.

    Even though allocations are one of the major concerns the other data (execution time, nr of calls) is also still important.
     
  16. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Also, currently we can't use Profiler.BeginSample on other threads. Will this be added so that we can mark areas of threads to give us a sample point?
     
  17. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Since 2018.1 there is an internal UnityEditorInternal.Profiling.FrameDataView class which is used in Hierarchy view and allows to get what you want on other threads. The usage example is at https://github.com/Unity-Technologi...rEditor/ProfilerWindow/ProfilerWindow.cs#L721 and https://github.com/Unity-Technologi...filerWindow/ProfilerFrameDataTreeView.cs#L119
    We've planned to make it public, but don't have exact timeline yet.

    It is enabled to be [ThreadSafe] in 2018.2. Please file a bug for 2018.1 if you need it before.
     
  18. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Forgot to mention - CustomSampler can be used on threads without restrictions.
    It is preferred API as it can be used with Recorder and more compact for the data stream as we transfer only named marker id.
     
  19. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Just nitpicking here: Unity hard-crashes when calling GetItemPath (probably would with some others too) on the RootItemID.
    Code (CSharp):
    1.  
    2. 0x0000000141FF3CA6 (Unity) FrameDataView_CUSTOM_GetItemFunctionName
    3. 0x0000000052C2C915 (Mono JIT Code) (wrapper managed-to-native) UnityEditorInternal.Profiling.FrameDataView:GetItemFunctionName (int)
    4. 0x0000000052C2C58A (Mono JIT Code) [C:\buildslave\unity\build\Modules\Profiler\Editor\Public\ProfilerFrameDataView.bindings.cs:197] UnityEditorInternal.Profiling.FrameDataView:GetItemPath (int)
    Other than that working with FrameDataView seems to work. Would it be possible to make that class public in the 2018.1 cycle though? Not really nice to have to modify the UnityEditor.dll to use it (and reflection wouldn't really make this process faster ;) ).
     
    alexeyzakharov likes this.
  20. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Also, the GC.Alloc samples basically are what previously was attributed to the method that had the allocation?
    E.g. before a method that created a new array could have had 40 Bytes as self allocation, now the method would have a child entry GC.Alloc with 40 Bytes instead?
    If so, if we wanted the old behaviour back, we would just move that value from GC.Alloc entries to their immediate parents?
     
  21. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Thanks for reporting this!

    I'm glad it works :)
    I've tried to make it public, but was short on time. I can try one more time, but no guarantee.

    Hierachy view (FrameDataView with ProfilerViewType.[Raw]Hierarchy parameter) accumulates GC.Alloc for the hierarchy, so GetItemColumnDataAsSingle(id, ProfilerColumn.GCMemory) should return accumulated value for all children.
    GC.Alloc has been made an explicit sample to be searchable and also to attach callstack to it (at that point it might contain intermediate hints to e.g. boxing).
     
  22. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Great, thank you :)

    Another "issue" I just stumbled upon:
    Before I was creating a new FrameDataView for each frameIndex (can't seem to reuse it), gathering the data from the frame then explicitly calling Dispose on it and also running GC.Collect. This seemed to work fine but the GC.Collect takes "ages" (i.e. the total extraction of data for a simple test run took about 23s, out of these only about 400ms were the actual data gathering, the rest attributed to the GC.Collect call. So I tested removing it and immediately got this crash:
    Code (CSharp):
    1. DynamicHeapAllocator allocation probe 1 failed - Could not get memory for large allocation 23068672.
    2. DynamicHeapAllocator allocation probe 1 failed - Could not get memory for large allocation 23068672.
    3. DynamicHeapAllocator allocation probe 2 failed - Could not get memory for large allocation 23068672.
    4. DynamicHeapAllocator allocation probe 3 failed - Could not get memory for large allocation 23068672.
    5. DynamicHeapAllocator allocation probe 4 failed - Could not get memory for large allocation 23068672.
    6. DynamicHeapAllocator out of memory - Could not get memory for large allocation 23068672!
    7. Could not allocate memory: System out of memory!
    8. Trying to allocate: 23068672B with 16 alignment. MemoryLabel: Profiler
    9. Allocation happend at: Line:551 in C:\buildslave\unity\build\Runtime/Utilities/dynamic_array.h
    10. Memory overview
    11.  
    12. [ ALLOC_DEFAULT ] used: 260321708B | peak: 0B | reserved: 1226710508B
    13. [ ALLOC_TEMP_JOB ] used: 0B | peak: 0B | reserved: 40894464B
    14. [ ALLOC_TEMP_BACKGROUND_JOB ] used: 0B | peak: 0B | reserved: 1048576B
    15. [ ALLOC_GFX ] used: 43292261B | peak: 0B | reserved: 224922733B
    16. [ ALLOC_CACHEOBJECTS ] used: 47658176B | peak: 0B | reserved: 325058560B
    17. [ ALLOC_TYPETREE ] used: 2910640B | peak: 0B | reserved: 17825792B
    18. [ ALLOC_PROFILER ] used: 14342335304B | peak: 0B | reserved: 14414774272B
    19. [ ALLOC_TEMP_THREAD ] used: 32768B | peak: 0B | reserved: 19726336B
    20. Could not allocate memory: System out of memory!
    21. Trying to allocate: 23068672B with 16 alignment. MemoryLabel: Profiler
    22. Allocation happend at: Line:551 in C:\buildslave\unity\build\Runtime/Utilities/dynamic_array.h
    23. Memory overview
    24.  
    25. [ ALLOC_DEFAULT ] used: 260321708B | peak: 0B | reserved: 1226710508B
    26. [ ALLOC_TEMP_JOB ] used: 0B | peak: 0B | reserved: 40894464B
    27. [ ALLOC_TEMP_BACKGROUND_JOB ] used: 0B | peak: 0B | reserved: 1048576B
    28. [ ALLOC_GFX ] used: 43292261B | peak: 0B | reserved: 224922733B
    29. [ ALLOC_CACHEOBJECTS ] used: 47658176B | peak: 0B | reserved: 325058560B
    30. [ ALLOC_TYPETREE ] used: 2910640B | peak: 0B | reserved: 17825792B
    31. [ ALLOC_PROFILER ] used: 14342335304B | peak: 0B | reserved: 14414774272B
    32. [ ALLOC_TEMP_THREAD ] used: 32768B | peak: 0B | reserved: 19726336B
    33.  
    34. UnityEditorInternal.Profiling.FrameDataView:Internal_Create(ProfilerViewType, Int32, Int32, ProfilerColumn, Boolean)
    35. UnityEditorInternal.Profiling.FrameDataView:.ctor(ProfilerViewType, Int32, Int32, ProfilerColumn, Boolean) (at C:\buildslave\unity\build\Modules\Profiler\Editor\Public\ProfilerFrameDataView.bindings.cs:86)
    36. ProfilerExport:writeProfilerFrame(DictionaryKeyList`2, DictionaryKeyList`2, Int32, BinaryWriter) (at Assets\Editor\ProfilerExport.cs:138)
    37. ProfilerExport:ExportData() (at Assets\Editor\ProfilerExport.cs:49)
    38.  
    Other than that (and the "internal" FrameDataViewer ;) ) I'm currently quite happy. Thanks a lot for all the work on that stuff :)
     
  23. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    This is out of memory fatal error.

    FrameDataView is 90% native and uses extra memory for hierarchy creation. It's Dispose frees all native buffers which are not a subject to GC.
    I suppose that the GC pressure is coming from strings when crawling the data.
    In Hierarchy view we resolve strings on demand when the cell is actually visible so this didn't cause any issues so far.

    In you case to avoid GC generation I can suggest using GetItemMarkerID() method which returns an id of the marker name for the profiling session. Most of samples are generated by persistent markers which have a unique runtime id.
    You can build then a hashtable with id-to-string mapping and use the managed string stored in the hashtable (normally there are about 5K markers and 30-100K with deepprofiling). GetItemFunctionName generates a new string all the time and when traversing million of samples might cause a GC.Collect issues.

    Also it might be useful to profile ProfilerExport:ExportData() with DeepProfiling and see where are the allocations coming from - profiling inception kind of :)

    I'm glad you found this useful, thank you for the feedback!
     
  24. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Yeah, it's indeed mostly from GetItemPath. Actually did a profile of the export this weekend :)
    profilerexport.jpg

    On the marker ids: Are those *always* unique or just sometimes? Or what exactly does "Most of samples are generated by persistent markers" mean? I mean if just some of the samples get a unique marker id and some not how would I reliably get back to the names?
    Also, I assume there is no way to find the name for a marker id? So I'd have to check for each sample if I already have the marker id in my cache, if not immediately get the name and add it to the cache as there's no way to get the names for the ids later on?

    And lastly, I think it might help if the methods GetItemDescencants / GetItemMarkerIDPath / GetItemAncestors would take a List as argument, just like GetItemChildren does, so they don't allocate (other than possibly resizing the List) any memory themselves and let the user control how the memory is handled. For the path it's actually questionable if it even needs the MarkerPath struct instead of just a plain List<int> like the others (I'd rather add a static CompareMarkerPaths method if really needed).
     
    alexeyzakharov likes this.
  25. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    PS: Wondering why Unity's profiler window does not allow to view threads? Seems like the only thing it would require is another dropdown to select one of the available thread ids for the current frame?
     
  26. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Yes, right now (and I think it makes sense to keep it in the future for FrameDataView) id corresponds to the sample name value. Internally at runtime it is used to reduce bandwidth by writing event ids instead of strings (exception is generic Profiler.BeginSample, where the string is going to be always transferred as metadata). So you can rely on the fact that FrameDataView markerid uniquely identifies string.

    No, FrameDataView doesn't have a mapping atm - we use direct pointers to get the name of the marker. You can create your own mapping. Id is exposed to solve exactly the same issue you've experienced. And I think that managed hashtable lookup cost might be less than subsequent GC.Collect cost.

    Yes, that makes total sense - we did that with GetItemChildren because that is a hot path during traversal.
    For now I wouldn't recommend to use GetItemPath, GetItemMarkerIDPath for the conversion - these are used in hierarchy view only to get full selection path when you click on the item. Similarly GetItemDescendantsThatHaveChildren/GetItemAncestors are used only when expanding/collapsing full tree with Alt+Click.
    MarkerPath is a leftover after refactoring - it was used as a hash table key for expanded state tracking. I agree that List would be enough now - it is only for the selection path.

    Yes, we have all pieces in place and were busy with timeline improvements. Now we need to experimenting with UI representation of threads for Hierarchy View.
     
  27. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    So, what do I get when the profiling data includes Profiler.BeginSample entries? I tested it with checking if the returned marker id is negative but this never seems to be the case. I.e. how do I know if I have a valid unique marker id? Or do I also get unique marker ids for those samples, just that each time the BeginSample/EndSample pair is hit during the profiling it creates a new marker id instead of reusing the same one?
     
  28. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    With FrameDataView you get the same data as for all other entries. The difference is at lower level (and more relevant to Plugin API). During data stream deserialization we create a new unique marker id for Profiler.BeginSample entries and cache it in a hashtable. So subsequent Profiler.BeginSample events with the same name/category get same marker id.
    You just assume that marker id is always valid and corresponds to a string and the mapping is unique.

    It is always unique and valid. There might be negative values, but they are also valid and correspond to some name.

    It reuses the same id.
     
  29. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Ok, that sounds good :)
    Just wondering: "So subsequent Profiler.BeginSample events with the same name/category get same marker id."
    Does that mean if it happened that two BeginSamples in our code would use the same name they would be mapped to the same marker id, even if they're completely independent calls? (We can and should obviously avoid that anyway, just curious :) )

    Will still have to use the GetAncestors method as I need that for keeping compatibility with the old export format I used, but not using GetItemPath and GetFunctionName all the time should already reduce GC pressure a lot.
     
  30. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Yes, such calls will have the same marker id in FrameDataView terms.

    It should definitely help - we were following similar path when doing Hierarchy view update.
    Returning string or array from native is very expensive and always allocates. So far it is faster to do postprocessing in native (C++) and then cache data in managed (C#) land.
     
  31. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Just wanted to say thanks again as I finally updated my code to use the markers as much as possible. Only need GetItemFunctionName once per marker id and GetItemAncestors once per sample now. A looot better than the repeated GetItemFunctionName calls before (especially compared to GetItemPath, internally calling GetItemFunctionName and building strings ;) ).

    Current wishlist of possible improvements:
    - Change GetItemAncestors so it takes a List<int> instead of allocating arrays
    - Make FrameDataViewer public
    - Some way to "autostart" thread profiling so that one can properly profile async handlers (in our case used for the HttpListener but I think that's applying to a lot of network stuff)

    My only major complaint left though is the issue with the editor not being able to keep up with heavy profiler traffic (especially when deep profiling of course but seems like it's also the same issue when trying to create a memory snapshot for the memory profiler completely preventing us from using it).

    Either way, the improvements that are already made to the profiler of these past months are really incredibly helpful. THANK YOU! :)
     
    alexeyzakharov likes this.
  32. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Wow, thanks for all of the responses. I didn't get any emails that this thread had been updated until today. FWIW, I ended up profiling my background threads task work by executing them on Unity's main thread. Obviously not something I could ship doing for performance reasons, but it atleast told me where the gc alloc hotspots were.
     
    alexeyzakharov likes this.
  33. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi!
    I'm glad that it works for you - it seems that exposing the API is a right direction :)

    I'm exposing FrameDataView api right now and cleaning it up. It will be renamed to HierarchyFrameDataView to reflect the fact that it does postprocessing to build and accumulate hierarchical data. And all array getters will be converted to take List<> as a parameter.

    We've added (for 2018.3) autoregistration of C# threads on Mono.

    Yes, for Memory Profiler the solution is on the way and for CPU Profiler one of solutions has always been there (but not exposed) - I'm adding a bootconfig/cmdline/property parameter to control buffers size - that should help with deep profiling Editor.

    That works too. Another approach is to use BeginThreadPofiling and enable callstacks collection for managed allocation. Then in Timeline View you can spot red samples which are GC.Alloc. In the Editor you can get a symbolicated callstack by clicking on that sample.

    To get an understanding of GC allocation per frame besides GC memory chart in ProfilerWindow you can also (in the dev players) use Profiling.Recorder API for "GC.Alloc" marker name to get a total number of GC allocation on all threads (registered and not) for a previous frame. And for "GC.Collect" to get then GC time. That helps sometimes with diagostics.
     
  34. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    That sounds great :)
    But you know, I just have to ask this ;)
    Other than the autoregistration feature targeted for 2018.3, can you say anything about which version the other points are supposed to be released?

    Will there be some way of making sure the data gets through without relying on buffer sizes? At some point any size of buffer won't be enough and some kind of waiting on the sender side of things might be more helpful in such situations :)
     
  35. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    When we collect the data faster than we can write (to disk or socket), there are 3 different ways we can possibly handle it:
    1) Discard data.
    2) Accumulate in memory.
    3) Wait until data is sent.
    All these options are valid for a particular usecase.

    Memory Profiler is snapshot based and pauses the game by it's nature, and thus by default implements option 3).

    CPU profiler continuously collects and sends data and has to minimize its impact on the gameplay. For it:
    Options 1) and 2) I'm trying try to land to 2018.3 with mentioned before API.
    Option 3) I consider non sustainable for game profiling for now as if that happens then the data becomes skewed and irrelevant. And the only solution is to limit instrumentation scope of profiler with enable/disable API.
    If you have an example when this would help you, please share it with us :)
     
  36. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Two reasons I'd still love option 3: Firstly, allocations are performance independent. So when looking at that area I'd not mind at all if performance is going down.
    Secondly, when profiling a very big call hierarchy it could still be useful to see some kind of "relative" data, i.e. how different parts of the code compare to each other. As long as you know what you see when looking at such profiler results they might still be useful.
    Also, couldn't it be that it just has to drop data in frame x+1 because the data of frame x wasn't fully delivered yet? How about only blocking the player *between* frames, so those pauses won't affect the profiler information, only the gameplay performance (which imho does not matter that much during profiling ;) )?

    But yeah, APIs to control the scope of profiling would definitely help a lot, especially when filtering out calls to framework libraries like mscorlib (though it would be nice to get an entry for the first level of those calls so you would still know that certain time/allocations come from e.g. string.Format but you would not care about what string.Format internally calls).

    Any chances to get any of this ported to 2018.2 as 2018.3 is probably at least three more months? :)

    Also, what about the public Hierarchy API and List<> interfaces?
     
    alexeyzakharov likes this.
  37. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Yes, this is a good option to consider.

    I've mentioned above that we are exposing it as HierarchyFrameDataView with non-alloc changes.

    Yes, profiling only game code is a feature we are considering for quite a while. Now that new mono supports instrumentation filter we can add it. I believe it should be configured through a profiler settings menu - inclusive/exclusive assemblies to deep profile - that way we can also align it with deep profiling in players in the future.

    What exactly do you want to have backported? Only bugfixes can be backported to a released version.
    (API for a memory control can be considered as a bugfix)
     
    Alloc likes this.
  38. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Yeah, but not for which release this is planned atm ;)

    Mostly anything that would help with profiling without running into the "receiver can not keep up with the data" issue :)

    The other stuff is more of a "nice to have" level, my exporter code currently uses reflection to get everything and the non-alloc APIs would be nice but currently are not posing a real issue.

    Also, 2018.2 isn't out yet either ;)
    I assume it's already content locked though?
     
  39. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Current plan is to have it in 2019.1.

    Yes, only bugfixes are allowed.

    This fix will be backported - in a form of runtime/bootconfig switch which allows you to set max used memory by profiler to e.g. int.MaxValue to have an unlimited (almost) buffer.
     
  40. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Hi @alexeyzakharov ,

    I have learned to really love the Allocation Callstacks feature. If you only want to find reasons for GC allocs this is a great help so you don't even have to use Deep Profiling at all :)

    Two questions though:
    1. Is there any way this can be enabled in a standalone development build? Seems like with the profiling to file we can get both normal and deep profiling data, but I didn't find a way to enable the Allocation Callstacks feature in a build with any API?

    2. When we have GC.Alloc entries in the profiler data with a time of X, does this time X really only account for the actual allocation? Or does it include e.g. the time for gathering the callstack? I got a lot of those entries with incredibly high times, like up to 10 or 20% of the parent so I wonder if the allocations can really be that bad at times or if that's somehow related to the profiling again.

    Regards,
    Chris
     
  41. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi!

    No, it's Editor-only feature atm. We have some ideas how to extend it to players, but so far all of them have memory or performance overhead.
    Alternatively, you can use Deep Profiling in players (-deepprofiling cmdline arg) to get a call hierarchy which resembles callstack.

    GC.Alloc sample wraps callstack retrieval code, so yes it includes the time for gathering the callstack and is profiling-only overhead.
     
  42. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Not as long as deep profiling doesn't work with more complex game code ;)

    Good to know, thank you :)
    So either GC observations or CPU time as it seems to heavily impact the CPU time results this way.
     
  43. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Is it because of significant overhead?
    (Stopping profiling upon reaching buffer limit issue should be fixed for the editor in the next 2018.2 patch release (we've increased buffer size to 256MB). And for 2018.3.we've added runtime api to control the buffer size.)
     
  44. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Yeah, with the "receiver can not keep up" issue. Though I must admit I'm not sure if I tried to deep profile into a file instead of being directly connected to the Unity Editor Profiler.
    Glad to hear about the buffer limit increase, will try it out as soon as it's available :)
     
    alexeyzakharov likes this.
  45. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Hi @alexeyzakharov ,
    was wondering if this change is in yet as I hadn't noticed anything in the release notes so far.

    Also, would it be possible to add something like FrameDataViewer.GetAllocationSize(int id, int sampleIdx) to go along with ResolveItemCallstack to see how much memory was actually allocated in one such resolved callstack? That would be really awesome :)

    Regards,
    Chris
     
  46. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi!

    The change is in 2018.2. Perhaps the larger buffer is required. The usecase we tested was Enter Playmode with DeepProfiling on. It worked quite well and allowed to collect 10-15M samples on tested projects.

    Allocation size can be obtained as float with FrameDataViewer.GetItemColumnDataAsSingle(id, ProfilerColumn.GCMemory) api for merged samples. If the view itself is not merged, then you can get individual values per sample.
    If you can use only merged view, we can add FrameDataViewer.GetItemColumnDataAsSingles to get individual values for merged samples.

    Beware that for 2019.1 this API has been refactored - renaming and accommodation to noalloc pattern.
     
  47. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Cool, but which one? ;) Or are you saying it was in from the very first final build (2018.2.0)?

    Sure, but that will get the memory allocated for all allocations of the specific GC.Alloc entry in the sample hierarchy. But such an entry typically comes from a vast of different places in the code, like if you do some string.format operations you get allocations from a bunch of ToString calls hidden in it, from concat, etc that all end up in one GC.Alloc entry in the hierarchy. What I want is that I can get the individual allocated memory for each of the resolved callstacks. E.g. we have a MonoscriptBehaviour object GUIFPS, in the hierarchy it's ... / GUIFPS.Update / GC.Alloc. Now if I resolve all callstacks in that GC.Alloc entry I get something like this:
    Code (csharp):
    1.  
    2. 0x00007FF87FBDA9C1 (mono) [object.c:4441] mono_array_new_specific
    3. 0x000000003D700AD6 ((<unknown>))
    4. 0x000000004158CA75 (Mono JIT Code) [BitConverter.cs:90] System.BitConverter:GetBytes (byte*,int)
    5. 0x000000004158C984 (Mono JIT Code) [BitConverter.cs:162] System.BitConverter:GetBytes (double)
    6. 0x000000004158C60F (Mono JIT Code) [BitConverter.cs:75] System.BitConverter:DoubleToInt64Bits (double)
    7. 0x000000004158BCF7 (Mono JIT Code) [NumberFormatter.cs:450] System.NumberFormatter:Init (string,double,int)
    8. 0x00000000415D8519 (Mono JIT Code) [NumberFormatter.cs:874] System.NumberFormatter:NumberToString (string,single,System.IFormatProvider)
    9. 0x00000000415D83FD (Mono JIT Code) [Single.cs:256] single:ToString (string,System.IFormatProvider)
    10. 0x00000000416691A8 (Mono JIT Code) [Extensions.cs:230] Extensions:ToCultureInvariantString (single,string)
    11. 0x000000007B414CF0 (Mono JIT Code) [GUIFPS.cs:70] GUIFPS:Update ()
    12. ...
    13.  
    14. 0x00007FF87FBD9578 (mono) [object.c:4515] mono_string_new_size
    15. 0x000000000BC161AE (Mono JIT Code) (wrapper managed-to-native) string:InternalAllocateStr (int)
    16. 0x000000000BC3281C (Mono JIT Code) [String.cs:2762] string:CreateString (char[],int,int)
    17. 0x000000000BC32485 (Mono JIT Code) (wrapper managed-to-managed) string:.ctor (char[],int,int)
    18. 0x00000000461A13FB (Mono JIT Code) [NumberFormatter.cs:1273] System.NumberFormatter:FormatFixedPoint (int,System.Globalization.NumberFormatInfo)
    19. 0x00000000415D89E3 (Mono JIT Code) [NumberFormatter.cs:1063] System.NumberFormatter:NumberToString (string,System.Globalization.NumberFormatInfo)
    20. 0x00000000415D8787 (Mono JIT Code) [NumberFormatter.cs:887] System.NumberFormatter:NumberToString (string,single,System.IFormatProvider)
    21. 0x00000000415D83FD (Mono JIT Code) [Single.cs:256] single:ToString (string,System.IFormatProvider)
    22. 0x00000000416691A8 (Mono JIT Code) [Extensions.cs:230] Extensions:ToCultureInvariantString (single,string)
    23. 0x000000007B414CF0 (Mono JIT Code) [GUIFPS.cs:70] GUIFPS:Update ()
    24. ...
    25.  
    26. 0x00007FF87FBD9578 (mono) [object.c:4515] mono_string_new_size
    27. 0x000000000BC161AE (Mono JIT Code) (wrapper managed-to-native) string:InternalAllocateStr (int)
    28. 0x000000000BC41701 (Mono JIT Code) [String.cs:2081] string:Concat (string,string)
    29. 0x000000007B414D12 (Mono JIT Code) [GUIFPS.cs:70] GUIFPS:Update ()
    30. ...
    31.  
    So the actual allocations come from three different places, and I'd like to see f.e. that the System.BitConverter:GetBytes allocated e.g. 40 B, the string:InternalAllocateStr allocated e.g. 50 B and the string.Concat part allocated e.g. 80 B. Currently we only get to know that all of these together would be allocating 170 B.

    Knowing the individual allocations would help making these callstacks even more valuable as deep profiling would basically be not needed for analyzing allocations at all anymore :)
     
  48. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Sorry, the changeset with the fix is tagged in 2018.2.4f1.

    As for the per-sample allocation size, I could argue, that you can still get what you want with not-merged hierarchy by walking samples manually (and merging manually). E.g. in your example creating FrameDataViewer with no merge flag would mean you get 3 separate GUIFPS:Update () samples instead of one with combined stats.
    But it is pretty easy to expose the functionality you've mentioned, so I can add that (hopefully for 2019.1).
     
  49. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Deep profiling would be still useful for Mono players - we don't have callstacks implemented anywhere but in the Editor :)
     
  50. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    241
    Thank you.

    I don't think that would work, because in this example there's still only one call to GUIFPS.Update, it's just that this one call allocates in three different places inside of it. I'm not sure though whether it would give me three instances of GC.Alloc. But being able to rely on Unity's own hierarchy merging makes things a easier and I suppose faster than a handcrafted merging in .NET land, even more so better for GC during merging.

    Great, thanks :)
    Can't say how grateful I am for all the improvements you brought to the profiler.


    Indeed ... but I'm still hoping for this to become true:
    :)

    Deep profiling has it's purpose, but as it is it just impacts performance way more than just enabling the allocation callstacks so if we can get the same data without using deep profiling it that would be cool ;)