Search Unity

  1. Unity 2018.1 has arrived! Read about it here
    Dismiss Notice
  2. Scriptable Render Pipeline improvements, Texture Mipmap Streaming, and more! Check out what we have in store for you in the 2018.2 Beta.
    Dismiss Notice
  3. If you couldn't join the live stream, take a peek at what you missed.
    Dismiss Notice
  4. Improve your Unity skills with a certified instructor in a private, interactive classroom. Learn more.
    Dismiss Notice
  5. ARCore is out of developer preview! Read about it here.
    Dismiss Notice
  6. Magic Leap’s Lumin SDK Technical Preview for Unity lets you get started creating content for Magic Leap One™. Find more information on our blog!
    Dismiss Notice
  7. Want to see the most recent patch releases? Take a peek at the patch release page.
    Dismiss Notice

Profiling background threads

Discussion in 'Scripting' started by bekatt, Mar 10, 2018.

  1. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Hi all,

    I recently started using the built in Unity profiler to better understand the perf of my game. I really like the detailed information I receive about code running on Unity's main thread (I'm using deep profile, and I like to inspect all the function times and memory allocations for functions I've written). It seems I don't really get any information about tasks running in my background threads though. The one exception is, I can manually call Profiler.BeginThreadProfiling() to log some timing data about stuff in background threads.

    Is there a way I can automatically get deep profile to provide the same level of detail about background threads as the Unity main thread? I'd like to have the ability to inspect all function timings and memory allocations. If this info isn't available in the Unity profiler, is there another recommended way of obtaining this data?

    Thanks!
    Ben
     
  2. dgoyette

    dgoyette

    Joined:
    Jul 1, 2016
    Posts:
    478
    There's a dropdown on the Profiler to switch the bottom display to Timeline view, which shows stats on worker threads:

    upload_2018-3-10_0-46-57.png
     
  3. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Thanks - I'm aware of the Timeline view. It seems like it only gives me a small fraction of the info available in the Hierarchy view though. Specifically, I didn't see any of the following:
    - memory allocations from background threads
    - nested timing info. For instance if I have something like the following:



    class SomeMonoBehaviour : MonoBehaviour {
    private CustomSampler sampler;

    private static void a() {
    b();
    }

    private static void b() {
    Thread.Sleep(5);
    }

    private void Start() {
    sampler = new CustomSampler("CustomSampler");
    var t = new Thread(() => {
    Profiler.BeginThreadProfiling("My threads", "My thread 1");
    sampler.Begin();
    a();
    sampler.End();
    Profiler.EndThreadProfiling();
    });
    t.Start();
    }
    }


    This is just a crude example to demonstrate my point - it seems that something like this will only tell me that a() took about 5ms in duration. It won't tell me that b() was invoked, or that the 5ms was actually due to the Sleep call. It also won't tell me if a() or b() allocated any memory. Is there a way for deep profile to automatically tell me that b() or Sleep() was invoked?

    My use case is quite a bit more involved than this trivial example - I'm including an external library (in the form of a dll and pdb). The external library makes background threads which I'd like to profile in Unity.
     
    Last edited: Mar 10, 2018
  4. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    We also finally got around to trying the thread profiling but were immediately extremely disappointed when seeing that the threads really only get added to the timeline view (which on top is even bugging out as our thread was hidden behind the horizontal scrollbar most of the time).
    The timeline view is nice to get a quick overview but what's really important is the hierarchy, especially as said above for the GC allocations. After all execution time can (at least roughly) be measured manually, GC allocations not.

    Probably a question for @alexeyzakharov: Is there anything that we can/need to do to get more data out of this? If that's currently (2017.3) not possible, are there any plans to get that stuff done in an upcoming build?

    Regards,
    Chris
     
  5. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Hi @alexeyzakharov: Can we get any update on what the thread profiling is supposed to be like?

    Regards,
    Chris
     
  6. Nigey

    Nigey

    Joined:
    Sep 29, 2013
    Posts:
    932
    Hi bekatt,

    I might be jumping the gun not reading everything, but you'll get a lot more detailed information running from this: https://bitbucket.org/Unity-Technologies/memoryprofiler. I'm not sure whether you already are, but the basic Unity profiler will not give you what you want IMO. I'd try this and see whether this is more fitting.

    It is purely a memory profiler, and so obv you're still missing the thread's actions, however memory alloc might be handy still.
     
    Last edited: Apr 6, 2018
  7. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Hi!

    2018.2 will have an improvements which should help with threads profiling:
    1. Scrollable timeline view where you can properly see threads.
    2. Plugin API which you can use to dump the data to your own format and analyze it later (e.g. chrome trace format), or forward it to other tools (e.g. VTune, PIX).

    Besides that (a bit later) we are enabling auto profiling for mono-spawned threads and making better api to get profiler data in the editor to enable custom visualization possibilities.

    Unfortunately, in 2017.3 you can get thread data only using undocumented Editor API, and the quality of that data is what you see in Timeline View.
     
    Nigey and Alloc like this.
  8. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Hi @alexeyzakharov ,

    thanks a lot for the info. That's quite some good news. Unfortunately the main question of this thread remains (though I didn't make it that explicit in the mention in my previous post ;) ): Is there a way to get the GC information for background profiled threads in any way currently? If not, will there be one?

    Regards,
    Chris
     
  9. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    As I said in 2017.3 the only way is to use undocumented API:
    1. UnityEditorInternal.ProfilerFrameDataIterator to find "GC.Alloc" samples - faster, but without allocation size.
    2. UnityEditorInternal.ProfilerProperty (propertyName + GetColumnAsSingle()) - slower, but with all data.
    This API allows to enumerate all captured threads.
    Note, that custom C# threads should use Profiler.BeginThreadProfiling to enable data collection during profiling.
     
  10. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Unless I'm missing something this doesn't seem to work though.
    Added the BeginThreadProfiling to the start of the thread, even added a custom sampler for that thread but there's no mention of either the name passed to the BeginThreadProfiling nor the custom sampler name nor any of the functions called in the thread after gathering all the data with the ProfilerProperty. I see a lot of GC.Alloc samples but those all seem to be regular main thread allocations.
     
  11. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Hi @alexeyzakharov , wonder if you saw my reply :)
    Still not sure what we do wrong (or if that's an Unity issue) but as I said iterating the ProfilerProperty stuff still does not return any non-main thread related data.
     
  12. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    273
    I faced absolutely the same problem, but with Tasks.
     
  13. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Gotta bump this. If there is a way to get the thread's profiler data I'd really like to know how :)
     
    alexeyzakharov likes this.
  14. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Hi, sorry for late reply, was on holidays.

    Yes, my bad, ProfilerProperty can get only main thread data. The only editor API which can walk trough profiler thread data is UnityEditorInternal.ProfilerFrameDataIterator. You can see the usage example at https://github.com/Unity-Technologi...or/ProfilerWindow/ProfilerTimelineGUI.cs#L679

    Besides that if you are interested only in allocation fact native plugin API might help in 2018.2 (Draft doc about it here https://docs.google.com/document/d/...1-EWxU7CaX5Ik1v6Y/edit#heading=h.dpur7r7omyg0). It is not yet capturing "GC.Alloc" samples, but we hope to make it to 2018.2 soon as it is important feature.
     
  15. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Yeah, ProfilerFrameDataIterator works fine indeed :)
    If I get this right I will get one sample per call of a specific method (instead of the combined data per sample path I can get from ProfilerProperty)? So this one wouldn't be hard to "fix" by adding up data for the same sample path.
    But what it's really missing is indeed the allocations. It would be really cool if that could be added.

    Even though allocations are one of the major concerns the other data (execution time, nr of calls) is also still important.
     
  16. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Also, currently we can't use Profiler.BeginSample on other threads. Will this be added so that we can mark areas of threads to give us a sample point?
     
  17. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Since 2018.1 there is an internal UnityEditorInternal.Profiling.FrameDataView class which is used in Hierarchy view and allows to get what you want on other threads. The usage example is at https://github.com/Unity-Technologi...rEditor/ProfilerWindow/ProfilerWindow.cs#L721 and https://github.com/Unity-Technologi...filerWindow/ProfilerFrameDataTreeView.cs#L119
    We've planned to make it public, but don't have exact timeline yet.

    It is enabled to be [ThreadSafe] in 2018.2. Please file a bug for 2018.1 if you need it before.
     
  18. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Forgot to mention - CustomSampler can be used on threads without restrictions.
    It is preferred API as it can be used with Recorder and more compact for the data stream as we transfer only named marker id.
     
  19. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Just nitpicking here: Unity hard-crashes when calling GetItemPath (probably would with some others too) on the RootItemID.
    Code (CSharp):
    1.  
    2. 0x0000000141FF3CA6 (Unity) FrameDataView_CUSTOM_GetItemFunctionName
    3. 0x0000000052C2C915 (Mono JIT Code) (wrapper managed-to-native) UnityEditorInternal.Profiling.FrameDataView:GetItemFunctionName (int)
    4. 0x0000000052C2C58A (Mono JIT Code) [C:\buildslave\unity\build\Modules\Profiler\Editor\Public\ProfilerFrameDataView.bindings.cs:197] UnityEditorInternal.Profiling.FrameDataView:GetItemPath (int)
    Other than that working with FrameDataView seems to work. Would it be possible to make that class public in the 2018.1 cycle though? Not really nice to have to modify the UnityEditor.dll to use it (and reflection wouldn't really make this process faster ;) ).
     
    alexeyzakharov likes this.
  20. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Also, the GC.Alloc samples basically are what previously was attributed to the method that had the allocation?
    E.g. before a method that created a new array could have had 40 Bytes as self allocation, now the method would have a child entry GC.Alloc with 40 Bytes instead?
    If so, if we wanted the old behaviour back, we would just move that value from GC.Alloc entries to their immediate parents?
     
  21. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Thanks for reporting this!

    I'm glad it works :)
    I've tried to make it public, but was short on time. I can try one more time, but no guarantee.

    Hierachy view (FrameDataView with ProfilerViewType.[Raw]Hierarchy parameter) accumulates GC.Alloc for the hierarchy, so GetItemColumnDataAsSingle(id, ProfilerColumn.GCMemory) should return accumulated value for all children.
    GC.Alloc has been made an explicit sample to be searchable and also to attach callstack to it (at that point it might contain intermediate hints to e.g. boxing).
     
  22. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Great, thank you :)

    Another "issue" I just stumbled upon:
    Before I was creating a new FrameDataView for each frameIndex (can't seem to reuse it), gathering the data from the frame then explicitly calling Dispose on it and also running GC.Collect. This seemed to work fine but the GC.Collect takes "ages" (i.e. the total extraction of data for a simple test run took about 23s, out of these only about 400ms were the actual data gathering, the rest attributed to the GC.Collect call. So I tested removing it and immediately got this crash:
    Code (CSharp):
    1. DynamicHeapAllocator allocation probe 1 failed - Could not get memory for large allocation 23068672.
    2. DynamicHeapAllocator allocation probe 1 failed - Could not get memory for large allocation 23068672.
    3. DynamicHeapAllocator allocation probe 2 failed - Could not get memory for large allocation 23068672.
    4. DynamicHeapAllocator allocation probe 3 failed - Could not get memory for large allocation 23068672.
    5. DynamicHeapAllocator allocation probe 4 failed - Could not get memory for large allocation 23068672.
    6. DynamicHeapAllocator out of memory - Could not get memory for large allocation 23068672!
    7. Could not allocate memory: System out of memory!
    8. Trying to allocate: 23068672B with 16 alignment. MemoryLabel: Profiler
    9. Allocation happend at: Line:551 in C:\buildslave\unity\build\Runtime/Utilities/dynamic_array.h
    10. Memory overview
    11.  
    12. [ ALLOC_DEFAULT ] used: 260321708B | peak: 0B | reserved: 1226710508B
    13. [ ALLOC_TEMP_JOB ] used: 0B | peak: 0B | reserved: 40894464B
    14. [ ALLOC_TEMP_BACKGROUND_JOB ] used: 0B | peak: 0B | reserved: 1048576B
    15. [ ALLOC_GFX ] used: 43292261B | peak: 0B | reserved: 224922733B
    16. [ ALLOC_CACHEOBJECTS ] used: 47658176B | peak: 0B | reserved: 325058560B
    17. [ ALLOC_TYPETREE ] used: 2910640B | peak: 0B | reserved: 17825792B
    18. [ ALLOC_PROFILER ] used: 14342335304B | peak: 0B | reserved: 14414774272B
    19. [ ALLOC_TEMP_THREAD ] used: 32768B | peak: 0B | reserved: 19726336B
    20. Could not allocate memory: System out of memory!
    21. Trying to allocate: 23068672B with 16 alignment. MemoryLabel: Profiler
    22. Allocation happend at: Line:551 in C:\buildslave\unity\build\Runtime/Utilities/dynamic_array.h
    23. Memory overview
    24.  
    25. [ ALLOC_DEFAULT ] used: 260321708B | peak: 0B | reserved: 1226710508B
    26. [ ALLOC_TEMP_JOB ] used: 0B | peak: 0B | reserved: 40894464B
    27. [ ALLOC_TEMP_BACKGROUND_JOB ] used: 0B | peak: 0B | reserved: 1048576B
    28. [ ALLOC_GFX ] used: 43292261B | peak: 0B | reserved: 224922733B
    29. [ ALLOC_CACHEOBJECTS ] used: 47658176B | peak: 0B | reserved: 325058560B
    30. [ ALLOC_TYPETREE ] used: 2910640B | peak: 0B | reserved: 17825792B
    31. [ ALLOC_PROFILER ] used: 14342335304B | peak: 0B | reserved: 14414774272B
    32. [ ALLOC_TEMP_THREAD ] used: 32768B | peak: 0B | reserved: 19726336B
    33.  
    34. UnityEditorInternal.Profiling.FrameDataView:Internal_Create(ProfilerViewType, Int32, Int32, ProfilerColumn, Boolean)
    35. UnityEditorInternal.Profiling.FrameDataView:.ctor(ProfilerViewType, Int32, Int32, ProfilerColumn, Boolean) (at C:\buildslave\unity\build\Modules\Profiler\Editor\Public\ProfilerFrameDataView.bindings.cs:86)
    36. ProfilerExport:writeProfilerFrame(DictionaryKeyList`2, DictionaryKeyList`2, Int32, BinaryWriter) (at Assets\Editor\ProfilerExport.cs:138)
    37. ProfilerExport:ExportData() (at Assets\Editor\ProfilerExport.cs:49)
    38.  
    Other than that (and the "internal" FrameDataViewer ;) ) I'm currently quite happy. Thanks a lot for all the work on that stuff :)
     
  23. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    This is out of memory fatal error.

    FrameDataView is 90% native and uses extra memory for hierarchy creation. It's Dispose frees all native buffers which are not a subject to GC.
    I suppose that the GC pressure is coming from strings when crawling the data.
    In Hierarchy view we resolve strings on demand when the cell is actually visible so this didn't cause any issues so far.

    In you case to avoid GC generation I can suggest using GetItemMarkerID() method which returns an id of the marker name for the profiling session. Most of samples are generated by persistent markers which have a unique runtime id.
    You can build then a hashtable with id-to-string mapping and use the managed string stored in the hashtable (normally there are about 5K markers and 30-100K with deepprofiling). GetItemFunctionName generates a new string all the time and when traversing million of samples might cause a GC.Collect issues.

    Also it might be useful to profile ProfilerExport:ExportData() with DeepProfiling and see where are the allocations coming from - profiling inception kind of :)

    I'm glad you found this useful, thank you for the feedback!
     
  24. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Yeah, it's indeed mostly from GetItemPath. Actually did a profile of the export this weekend :)
    profilerexport.jpg

    On the marker ids: Are those *always* unique or just sometimes? Or what exactly does "Most of samples are generated by persistent markers" mean? I mean if just some of the samples get a unique marker id and some not how would I reliably get back to the names?
    Also, I assume there is no way to find the name for a marker id? So I'd have to check for each sample if I already have the marker id in my cache, if not immediately get the name and add it to the cache as there's no way to get the names for the ids later on?

    And lastly, I think it might help if the methods GetItemDescencants / GetItemMarkerIDPath / GetItemAncestors would take a List as argument, just like GetItemChildren does, so they don't allocate (other than possibly resizing the List) any memory themselves and let the user control how the memory is handled. For the path it's actually questionable if it even needs the MarkerPath struct instead of just a plain List<int> like the others (I'd rather add a static CompareMarkerPaths method if really needed).
     
    alexeyzakharov likes this.
  25. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    PS: Wondering why Unity's profiler window does not allow to view threads? Seems like the only thing it would require is another dropdown to select one of the available thread ids for the current frame?
     
  26. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Yes, right now (and I think it makes sense to keep it in the future for FrameDataView) id corresponds to the sample name value. Internally at runtime it is used to reduce bandwidth by writing event ids instead of strings (exception is generic Profiler.BeginSample, where the string is going to be always transferred as metadata). So you can rely on the fact that FrameDataView markerid uniquely identifies string.

    No, FrameDataView doesn't have a mapping atm - we use direct pointers to get the name of the marker. You can create your own mapping. Id is exposed to solve exactly the same issue you've experienced. And I think that managed hashtable lookup cost might be less than subsequent GC.Collect cost.

    Yes, that makes total sense - we did that with GetItemChildren because that is a hot path during traversal.
    For now I wouldn't recommend to use GetItemPath, GetItemMarkerIDPath for the conversion - these are used in hierarchy view only to get full selection path when you click on the item. Similarly GetItemDescendantsThatHaveChildren/GetItemAncestors are used only when expanding/collapsing full tree with Alt+Click.
    MarkerPath is a leftover after refactoring - it was used as a hash table key for expanded state tracking. I agree that List would be enough now - it is only for the selection path.

    Yes, we have all pieces in place and were busy with timeline improvements. Now we need to experimenting with UI representation of threads for Hierarchy View.
     
  27. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    So, what do I get when the profiling data includes Profiler.BeginSample entries? I tested it with checking if the returned marker id is negative but this never seems to be the case. I.e. how do I know if I have a valid unique marker id? Or do I also get unique marker ids for those samples, just that each time the BeginSample/EndSample pair is hit during the profiling it creates a new marker id instead of reusing the same one?
     
  28. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    With FrameDataView you get the same data as for all other entries. The difference is at lower level (and more relevant to Plugin API). During data stream deserialization we create a new unique marker id for Profiler.BeginSample entries and cache it in a hashtable. So subsequent Profiler.BeginSample events with the same name/category get same marker id.
    You just assume that marker id is always valid and corresponds to a string and the mapping is unique.

    It is always unique and valid. There might be negative values, but they are also valid and correspond to some name.

    It reuses the same id.
     
  29. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Ok, that sounds good :)
    Just wondering: "So subsequent Profiler.BeginSample events with the same name/category get same marker id."
    Does that mean if it happened that two BeginSamples in our code would use the same name they would be mapped to the same marker id, even if they're completely independent calls? (We can and should obviously avoid that anyway, just curious :) )

    Will still have to use the GetAncestors method as I need that for keeping compatibility with the old export format I used, but not using GetItemPath and GetFunctionName all the time should already reduce GC pressure a lot.
     
  30. alexeyzakharov

    alexeyzakharov

    Unity Technologies

    Joined:
    Jul 2, 2014
    Posts:
    124
    Yes, such calls will have the same marker id in FrameDataView terms.

    It should definitely help - we were following similar path when doing Hierarchy view update.
    Returning string or array from native is very expensive and always allocates. So far it is faster to do postprocessing in native (C++) and then cache data in managed (C#) land.
     
  31. Alloc

    Alloc

    Joined:
    Jun 5, 2013
    Posts:
    163
    Just wanted to say thanks again as I finally updated my code to use the markers as much as possible. Only need GetItemFunctionName once per marker id and GetItemAncestors once per sample now. A looot better than the repeated GetItemFunctionName calls before (especially compared to GetItemPath, internally calling GetItemFunctionName and building strings ;) ).

    Current wishlist of possible improvements:
    - Change GetItemAncestors so it takes a List<int> instead of allocating arrays
    - Make FrameDataViewer public
    - Some way to "autostart" thread profiling so that one can properly profile async handlers (in our case used for the HttpListener but I think that's applying to a lot of network stuff)

    My only major complaint left though is the issue with the editor not being able to keep up with heavy profiler traffic (especially when deep profiling of course but seems like it's also the same issue when trying to create a memory snapshot for the memory profiler completely preventing us from using it).

    Either way, the improvements that are already made to the profiler of these past months are really incredibly helpful. THANK YOU! :)
     
  32. bekatt

    bekatt

    Joined:
    Jan 3, 2018
    Posts:
    3
    Wow, thanks for all of the responses. I didn't get any emails that this thread had been updated until today. FWIW, I ended up profiling my background threads task work by executing them on Unity's main thread. Obviously not something I could ship doing for performance reasons, but it atleast told me where the gc alloc hotspots were.