Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Voting for the Unity Awards are OPEN! We’re looking to celebrate creators across games, industry, film, and many more categories. Cast your vote now for all categories
    Dismiss Notice
  3. Dismiss Notice

Is the profiler cheating?

Discussion in 'Entity Component System' started by georgeq, Sep 9, 2018.

  1. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    I was doing some benchmarks on pure ECS, and I was impressed by what is seemed to be a huge performance gain until I saw this:

    profilecheat.png

    At first glance it seems the game is running around 200 frames per second, however it is actually running at less than 15.
     
  2. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    Can you expand tree to see what you got there? Is there any synchornisation?

    upload_2018-9-9_4-44-5.png
    Also, can you switch to timeline like on my screenshot, to see threads?
     
    Tazadar66 likes this.
  3. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    Probably waiting for job. If possible you might post the code here so we could make sense of it.
     
  4. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    That what I expect. Time is often takes into consideration waiting period between jobs.
     
  5. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    3,915
    Ignore the actual ms times in Profiler. It is not an accurate frame time. The profiler itself adds overhead. You will likely see this when you switch from regular profiling to deep profiling, usually the ms time goes up noticably.

    When you start optimizing performance and have 100ms per frame time and you keep optimizing until you get to 30 ms frametime then you know you‘ve made a significant improvement. But in terms of actual fps you will most likely not see an exact jump from say 10 fps to 30 fps. You have to measure actual fps in a build using an fps counter (built-in or fraps or Nvidia overlay). Use the ms time only as a relative indicator to where you started at when you begin profiling.
     
  6. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,419
    According to the tinted UI, you seem to profile the game running in the editor. Unity Technologies recommend to profile a build on the target platform instead.
     
    Tazadar66 and Ryiah like this.
  7. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    While this is true in general, personally I wouldn't be advising everyone, to build game just to check, how script / system performs after some changes.
    Editor profiler is good for having reference point, if there is any progress on the optimization. Or if something goes wrong. Even if there is significant check overhead.

    But I would say, if fps drops significantly and no other way to improve, or prove what is wrong, then I would go for in build performance check.
     
  8. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,419
    The Unite talk I linked to explains that often people seem to hunt performance and memory problems that actually do not exist, because they profile the game running in the editor, rather than a build.

    I would at least profile a build before I start optimizing and afterwards.
     
  9. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    Yep, you are definitely right.
    My point was, maybe not specified precisely, that is just no need to build ever time, when script is changed.
     
  10. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    I don't think that's the case as I have only one system, however I'm posting the timeline, for you to see:

    profiletimeline.png


    Here it is... But I want to make clear two things before you read it: 1) I wrote this code for benchmark purposes only, with the only intention to push the CPU as hard as possible, so don't try to make sense out of it. 2) I'm just starting to explore the ECS paradigm, so DO expect to see crazy things.

    Code (CSharp):
    1. [BurstCompile]
    2.    public class AISystemPure : JobComponentSystem {
    3.  
    4.  
    5.    
    6.       public struct AIJob : IJobProcessComponentData<AIParam> {
    7.      
    8.          public void Execute(ref AIParam driver) {
    9.             for(int i = 0;i<10000;i++) {
    10.                driver.WayPoint = driver.WayRight;
    11.                driver.WayRight = driver.WayUp;
    12.                driver.WayForward = driver.WayPoint;
    13.                driver.WayDistance = (driver.WayPoint-driver.WayRight).magnitude;
    14.                driver.WayDistance = Vector3.Dot(driver.WayForward,driver.WayRight);
    15.                Vector3 p = Vector3.ProjectOnPlane(driver.WayForward,driver.WayUp);
    16.                driver.WayPoint = p.normalized;
    17.             }
    18.          }
    19.  
    20.       }
    21.  
    22.  
    23.       protected override JobHandle OnUpdate(JobHandle deps) {
    24.      
    25.          AIJob job = new AIJob();
    26.          return job.Schedule(this,deps);
    27.       }
    28.  
    29.  
    30.  
    31.    }
    I think you all are missing the point here. What'm trying to point out is the fact that the graphic information does not math the numeric data. The numeric data is accurate and it matches what I see on the Entity debugger, but it has absolutely nothing to do with what I see on the graph!

    Totally agree.
     
    Last edited: Sep 10, 2018
  11. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    This is not how you should execute IJob in pure ECS.

    Where you got that example reference from, if I may ask?
    Have you seen Two Stick sample, to see how Jobs are executed?

    Because 10k iterations is really nothing for ECS.

    Do you use class by any chance in the job?
    And vector should be technically float3 for best performance.
     
  12. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    Yes I know, but like I said, this just a benchmark to test a extreme case (that may never happen in real life), I just want to see how far I can push the boundaries without breaking the system. The intention of the for loop is not to process data but just to simulate a extremely large set of instructions, I wouldn't write 10,000 times Vector3.ProjectOnPlane, etc. that's why I'm using a for loop. People usually tests system performance with a large number of entities doing very little processing, I'm trying to see what happens if you have a small number of entities with an extremelly heavy process.

    But again the original intention of this post was to point out a very specific mismatch on the profiler, not to argue about performance.
     
    Last edited: Sep 10, 2018
  13. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    Fair enough.

    I think reason why you you see on the first graph, almost no overhead for your script, is because job is running on a thread. While the main graph is showing things going on main thread. It may be the case, that profile didn't pick it up. This is my assumption. While timing appears to be more correct, in both cases.

    That is why you got secondary worker threading profiler on the timeline.
     
  14. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    Sound reasonable as the profiler was designed before the job system. Is not a major issue, however I think the graph should match what you see in the numbers by all means.
     
  15. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    Well, not really. Because you game is still responsive, while thread doing own work, for whatever long it needs too.
    Because one thread is running, doesn't mean you get low FPS. And that what is the graph indicating. Time corresponding to FPS.

    So you still can have 30, 60,100 etc. FPS, irrespective what threads are doing.
     
  16. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    Yes, you are right, it is impossible to have an exact match because you have two incompatible ways of measuring time, however all the information to make a rough estimated is already on the timeline. If you see closely the second image I posted, the profiler knows the worker thread spent 93.84 ms!, so it shouldn't be that hard to reflect it on the graph.
     
  17. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    I know what you are saying.
    But in my opinion that would be more confusing than need to be. This would require additional graph filters, which are not there.
    That is why you have threading, which displays clearly what you need on the timeline.
     
  18. B0dg3

    B0dg3

    Joined:
    Jun 1, 2018
    Posts:
    2
    You are not running burst on your example. Move the burst attribute to the job, not on the system.
     
  19. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,558
    I think OP is aware about this by now.
    But as pointed out, that is not the main goal of this thread. Is the profiler itself.
     
    Tazadar66 likes this.
  20. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,140
    Hi @georgeq
    Hierarchy view and the Charts should not show any time spend on threads, they just show the main thread, which also dictates the frame rate. If you sync fence on the main thread, then the time spend waiting should also show up in Charts and Hierarchy for the waiting time. If the job has multiple instances and some of them are not yet started on threads, a sync fence would also pull some onto the main thread things up. However, that should only show the timings for those parts of the job that run on the main thread in the Charts and in Hierarchy view. (You only seem to have one instance so this shouldn't be the case)

    Not knowing exactly what the scenario is in the screenshot with the Hierarchy view, this looks like a bug with either Hierarchy view or the charts.

    With the timeline screenshot, you can see that the next frame has a WaitForJobGroupID (i.e. Sync Fence) on the main thread and that is waiting for the Job to finish . The charts do show a spike, alas, one frame before the selected one. Is this spike unrelated? The charts should show a spike for the next frame, where the fencing seems to occur (unless it's even more broken and timeline shows the end time for the frame before the frame actually ended and that the Sync is happening in the selected frame).

    If you could please report a bug (or preferably 2 separate ones, one for Hierarchy and one for Timeline) with a repro project and/or a saved profiler capture, that would be grand :)
     
    Last edited: Sep 11, 2018
    optimise and Antypodish like this.
  21. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,140
    And yes, the profiler does not yet ideally highlight potential issues with threading/ECS. The CPU Chart is technically the Main thread graph. We're aware and, as a first step, Timeline was made scrollable, thread height resizable, and it's the new default view for CPU because it's easier to parse at a glance and now actually usable without a monitor in portrait mode. There are more changes coming, like we're looking for ways to show job scheduling and dependencies as well as Semaphore waits and the likes. Right now, Timeline is still the best place too look at, even though you'll have to make the connection for what influences things on other threads yourself.

    If you have any other ideas on how we can help with this, please let us know :)
    Also, we're currently collecting feedback on Profiling ;)
     
    optimise likes this.
  22. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    Case 1080435 (hierarchy)

    I'm not sure if the Timeline is a bug or not, so I didn't file a bug report for that.
     
    MartinTilo likes this.
  23. MartinTilo

    MartinTilo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    2,140
    It's been a while but here is a bit of an update on this:
    1) I'm gonna go on the assumption that the frame you had selected in the very first screenshot, was the frame after the one in which you scheduled the job. So in terms of the timeline screenshot, it is the next frame to the right of the highlighted and selected on. It adds up as the sample name is that of the System and not the job itself so it is just the time spend waiting for the Sync Fence on that system.
    2) The time spend in the Sync Fence, as can be seen in the timeline screenshot, is spend in WaitForJobGroupID, which is colored grey/transparent as it is categorized as "internal"
    3) Internal is currently evaluated as "Other" in the Categories for the CPU Usage chart module.
    4) Calculation and display of the "Other" Category as well as all the labels in the CPU Usage chart module was broken.
    -> This (#4) is fixed in 2019.2.0a7 and backported to 2019.1 and 2018.3
    There are other parts to this issue that still need to be looked at, like the fact that we're currently effectively hiding the Other category from the chart unless a sample containing time in this category is selected, and then we only show you timings in that category related to this sample. Also, these internal wait samples should ideally be handled differently with regards to their category.

    Just wanted to leave that as an update and clarification here :)
     
    psuong, FROS7 and Timboc like this.
  24. georgeq

    georgeq

    Joined:
    Mar 5, 2014
    Posts:
    660
    Thanks for the update!