Search Unity

Question Very high standard deviation when performance testing Entities with Jobs

Discussion in 'Testing & Automation' started by matros, Nov 4, 2023.

  1. matros

    matros

    Joined:
    Feb 1, 2014
    Posts:
    15
    Hello,

    When I attempt to run performance tests on a system that uses jobs, I observe a very high standard deviation, often much larger than the median. This holds true both in the editor and on the target platform, specifically on a PC in my case.

    Upon closer examination of the samples, it becomes apparent that the results are highly dispersed. It seems that some samples might have been taken incorrectly, given the consistent presence of a minimum value of 35 microseconds across all tests, seems like it samples even when no work is being performed (another question why is it idle during performance test run).

    upload_2023-11-4_22-19-36.png

    I initially suspected that this dispersion might be related to work being done on worker threads. However, both the main thread implementation and the job.Run() implementation exhibit the same issue, especially when working with a high number of entities.
    upload_2023-11-4_22-33-59.png

    Is this level of dispersion expected, and what could be the reason behind it?

    For your reference, I am using the following dependencies:
    • "com.unity.test-framework.performance": "3.0.3"
    • "com.unity.entities": "1.0.16"

    Here is the system under test that uses ScheduleParallel()
    And the full repo with repro.
     
  2. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    5,848
    I have also observed something to this effect a while back. There's many possible reasons. I mainly didn't give it much thought since I was testing in the editor only.

    I'm guessing you'd have to measure this in a release build for the most accurate view. But I'm thinking the reason may simply be tied to how modern hardware works with all the dynamic frequency switching, high performance boost periods that last only a few seconds (or split seconds), heat or energy conservation throttling, load balancing task scheduler, memory and cache access/invalidate patterns, random pipeline stalls due to background tasks, performance vs efficiency cores, all sorts of things come to mind.

    I came to ignore the (bad) outliers because what matters most is: how fast can it be?
    If you have a test that completes in 100 ms most of the times but sometimes it takes 500 ms then you can still be sure that you made an improvement if the runs mostly complete within 80 ms after some code changes, even though sometimes it will now take 600 ms.
     
  3. matros

    matros

    Joined:
    Feb 1, 2014
    Posts:
    15
    @CodeSmile Thanks for your reply, great idea to ignore outliers. Actually, I would prefer the performance testing package to handle it if that's the case.

    Anyway if we take a look at performance testing outside of Entities, we can see that results there even when using jobs are okay, and dispersion is valid.
    upload_2023-11-5_13-13-56.png

    This sample is also available on GitHub.
    Right now I only have seen issues with tests against Unity ECS.