Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

Question Profiler results captured on target hardware unusable because of profiler overhead?

Discussion in 'Profiler Previews' started by Peter77, Aug 17, 2020.

  1. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,618
    I'm currently having the problem that the Unity Profiler (allegedly) has so much overhead, that the profiling results are not accurate. I'm not speaking from profiling inside the editor, I'm speaking of profiling a player on the target hardware.

    I found that updating UI texts is expensive (link). According to @Stephan_B, it only appears to be expensive, because of the profiling overhead and the player being a development mode build (not using script debugging though). Stephan kindly went into more detail here.

    I would like to know how I can get reliable profiling data that accurately represent what's going on in my game? I need to understand where performance is being spent.

    The discrepancy of what I measure 1.5ms+ and the suggested 0.1ms real cost is huge and I would like to see/measure it in the profiler and not rely so much on opinions.

    I thought the Unity Profiler is capable for doing that?!

    Thanks for any help!
     
    Last edited: Aug 17, 2020
    alexeyzakharov likes this.
  2. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,618
    I would still like to know if profiling distorts the actual timings so much, that they basically become misleading, incorrect and ultimately useless?
     
    alexeyzakharov likes this.
  3. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    508
    Unity Profiler for sure impacts game performance and affects timings. It is instrumentation profiler which works via automatic or manual injection of a code which measures time and writes it out somewhere.
    Overhead of a single instrumentation call is very much platform dependent and mostly corresponds to timer latency. On Android it is usually 0.1us-2us per begin/end pair depending on the device. 1000 measurements contribute to the 0.1-2ms on the device. PC/Consoles are around 2-10ns per sample.

    Deep profiling increases significantly amount of measurements as every single C# method get profiled. That usually skews data significantly exaggerating timings of methods which have a lot of calls.
    As it was mentioned in other forums Deep Profiling usecase is more for understanding what is going on inside some perf critical code and if there is an opportunity to inline some methods or reduce logic complexity.

    Do you have an example of this outside of deepprofiling usecase? Are you talking about 1.5ms+ cumulative difference per frame or per call?

    Would it be helpful to display an estimated overhead of profiler measurements in the profiler window ui?
     
    Peter77 likes this.
  4. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,618
    Hey Alex, thanks for your reply!

    I'm aware of the deep-profiling overhead, that's not the issue I was trying to describe. I was doing a "non-deep" profiling from an Android device and the 1.5 ms are per call, or two calls actually as highlighted in the screenshot below.
    I posted my test-case here, where above screenshot is coming from as well:
    https://forum.unity.com/threads/textmeshprougui-rebuild-performance-on-mobile.837442/#post-6200717

    In the linked post I added two screenshots. The first one a non-deep profile from the target device, which is the one that is important to me. I want to understand why it should be 0.2ms (as mentioned by Unity dev), but is showing almost 1.5ms (1.28+0.16) actually.

    The second image is with deep profile and was only meant to show that the cost comes from that particular subsystem.

    I don't know. How good and reliable would be the estimated value?
     
    Last edited: Sep 22, 2020
  5. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    508
    Thanks Peter - this is very helpful.
    I can see a single Layout call which takes 1.10ms - the profiler overhead of that measurement would be around 0.5-1us. I guesstimate that because of the fact that CPU time is 17ms and typical profiler samples count is 1000-2000 for a frame. This can be verified by adding an empty Update or similar method and checking it's duration in Profiler.
    My feeling is that there is an optimization opportunity for the Layout method. Thanks for the project - it explains the red rectangles for me and I think could be used for the investigation.
     
    Peter77 likes this.
  6. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    508
    Pretty reliable - we can estimate lower boundary by measuring timer latency and multiplying it by samples count in frame
     
    Peter77 and fherbst like this.
  7. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,618
    Hey Alex, thanks again for the reply!

    If I understand this correctly, the Profiler overhead of a single BeginSample/EndSample pair, such as the Layout call in above screenshot, can't be used to explain a performance difference of 1ms with/without profiling.

    BeginSample/EndSample pairs can sum of to a significant cost if called (very) often. This is when the profiler overhead can distort the actual results, such as when deep-profiling.


    If you're able to remove/subtract the overhead caused by profiling from the actual profiled results, that sounds very useful. This is meant for each BeginSample/EndSample and not only a single global "Profiler Overhead 1ms" label I assume?
     
    alexeyzakharov likes this.
  8. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,460
    We'd have to figure out how to display that. If we'd just hide the time from the samples, it'd not add up correctly to the frame time anymore and in timeline view, timings between threads would be confusing. Maybe on selection of a sample in timeline view, it'd show the time with, without, and just the instrumention overhead?
    In hierarchy maybe via a separate column?

    Anyways one thing I wanted to check was if you have compared:
    • Dev build with Deep Profiling support but Deep Profiling turned off
      • With profiler
      • With Recorder/ProfilerRecorder just recording that layout sample
    • Dev build without deep profiling support
      • With profiler
      • With Recorder/ProfilerRecorder just recording that layout sample