Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Explanation of difference in performance

Discussion in 'Entity Component System' started by Rupture13, Feb 19, 2021.

  1. Rupture13

    Rupture13

    Joined:
    Apr 12, 2016
    Posts:
    129
    Aloha,

    I have a bit of code that finds a sceneEntity based on two values in a component of that sceneEntity (that component was added to the sceneEntity during conversion). This code runs on the main thread using a
    for
    loop (variation 1).

    I was looking to increase the performance of this by finding the entity in a
    ForEach
    (variation 2). However, that new solution seems to be slower (it takes roughly 2.5 times more time) and I was wondering why.

    (Both variations here. I tested the difference in performance with the ProfileMarker and then commenting out one of the variations)
    Code (CSharp):
    1. public class SomeSystem : SystemBase
    2. {
    3.     private EntityQuery m_sceneSectionQuery;
    4.     private SceneSystem m_sceneSystem;
    5.     static ProfilerMarker s_PerfMarker = new ProfilerMarker("PerformanceMarker1");
    6.  
    7.     protected override void OnCreate()
    8.     {
    9.         base.OnCreate();
    10.  
    11.         m_sceneSystem = World.GetExistingSystem<SceneSystem>();
    12.  
    13.         m_sceneSectionQuery = GetEntityQuery(ComponentType.ReadOnly<SomeData>(), ComponentType.ReadOnly<SceneSectionData>());
    14.  
    15.         RequireForUpdate(m_sceneSectionQuery);
    16.         RequireSingletonForUpdate<DesiredDataSingleton>();
    17.     }
    18.  
    19.     protected override void OnUpdate()
    20.     {
    21.         var desiredData = GetSingleton<DesiredDataSingleton>();
    22.  
    23.         s_PerfMarker.Begin();
    24.  
    25.         //-------- Variation 1 --------
    26.         NativeArray<Entity> availableSceneSections = m_sceneSectionQuery.ToEntityArray(Allocator.Temp);
    27.         Entity desiredSceneSectionEntity = Entity.Null;
    28.         for (int i = 0; i < availableSceneSections.Length; i++)
    29.         {
    30.             var sceneSectionEntity = availableSceneSections[i];
    31.             var data = GetComponent<SomeData>(sceneSectionEntity);
    32.             if (data.Foo == desiredData.desiredFoo && data.Bar == desiredData.desiredBar)
    33.             {
    34.                 desiredSceneSectionEntity = sceneSectionEntity;
    35.                 break;
    36.             }
    37.         }
    38.         availableSceneSections.Dispose();
    39.         m_sceneSystem.LoadSceneAsync(desiredSceneSectionEntity);
    40.         //-------- End of variation 1 --------
    41.  
    42.         //-------- Variation 2 --------
    43.         var result = new NativeArray<Unity.Entities.Hash128>(1, Allocator.TempJob);
    44.         JobHandle job = Entities
    45.             .ForEach((in SomeData someData, in SceneSectionData sceneSectionData) =>
    46.             {
    47.                 if (SomeData.Foo == desiredData.desiredFoo && SomeData.Bar == desiredData.desiredBar)
    48.                 {
    49.                     result[0] = sceneSectionData.SceneGUID;
    50.                 }
    51.             }).ScheduleParallel(this.Dependency);
    52.  
    53.         job.Complete();
    54.         var sceneGUID = result[0];
    55.         result.Dispose();
    56.  
    57.         m_sceneSystem.LoadSceneAsync(sceneGUID);
    58.         //-------- End of variation 2 --------
    59.  
    60.         s_PerfMarker.End();
    61.     }
    62. }
    There are only 12 entities being processed by this in my scenario. Is the overhead from scheduling the jobs causing the extra time, which would then not be worth it due to the small amount of entities processed?

    Or is there another reason why variation 2 is slower? I had thought it to be faster due to it using the
    ForEach
    instead of
    GetComponent
    s.
     
    Last edited: Feb 19, 2021
  2. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,579
    12 entities foreach is a bit meaningless to be honest. Too small sample.
    Try with 1200 and 12k entities. You will see the difference.
    Yes scheduling intorduces small overhead, providing there is matching query.
    You can also add to foreach .WithReadOnly ( desiredData ), to see if that will improve anyhow.
    Also, try to compare against IJob.
     
    Rupture13 likes this.
  3. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Using .ScheduleParallel definitely not worth it with low entity count.
    Try using .Run() instead. It burst-compiles code by default, so it should be faster.

    Also, scheduling and completing job instantly will cause a sync point, causing a stall on main thread. Try to avoid it.
     
    Rupture13 likes this.
  4. Rupture13

    Rupture13

    Joined:
    Apr 12, 2016
    Posts:
    129
    Thank you for your replies! The entity count is indeed very low. I now wonder what amount of entities usually starts to become viable for these things, though that of course depends on what the job actually does as well...

    @Antypodish
    .WithReadOnly(desiredData)
    does not seem to work due to it being a (Singleton)Component rather than a NativeContainer of any sorts.

    @xVergilx using
    .Run()
    was indeed faster than either Schedule or ScheduleParallel, but still slower than the implementation without any ForEach. The .Run() is now roughly one ms slower (at an average of 3.117ms) than the for loop (at an average of 2.083ms). I don't really know why it's slower at this point, but it also hardly matters with the small difference and the small amount of entities.
    The code for the new implementation with .Run() is:
    Code (CSharp):
    1.         //-------- Variation 2 --------
    2.         var result = new Unity.Entities.Hash128();
    3.         Entities
    4.             .ForEach((in SomeData someData, in SceneSectionData sceneSectionData) =>
    5.             {
    6.                 if (someData.Foo == desiredData.desiredFoo && someData.Bar == desiredData.desiredBar)
    7.                 {
    8.                     result = sceneSectionData.SceneGUID;
    9.                 }
    10.             }).Run();
    11.  
    12.         m_sceneSystem.LoadSceneAsync(result);
     
  5. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    Worth noting that there's more to consider than just the overhead of scheduling the job. By accessing a component on the main thread you're forcing any jobs that write to it to complete immediately.
     
    xVergilx likes this.
  6. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Difference could come from multiple factors:
    1. If burst compilation is not performed initially - it can slow methods down while compiling (this happens in editor only);
    2. First one has fast exit, second one doesnt.
    3. Not sure, but perhaps LoadSceneAsync take different time for each test. Maybe it should be excluded from testing.
    4. Second test queries two components vs one in the first.
    5. The one @Sarkahn mentioned.

    Note that in most cases ComponentDataFromEntity would be faster than copying whole array of data (this is what GetComponent does under the hood), but it depends on the use case.
     
  7. Rupture13

    Rupture13

    Joined:
    Apr 12, 2016
    Posts:
    129
    Alrighty, I ran some new tests. I mitigated as much of the potential factors as I knew how to:
    1. I don't know what I can do to check this
    2. Made sure the desiredData is the last item the for-loop iterates over, so both implementations should go over each entity
    3. Took the LoadSceneAsync out of the test (I previously ran the test ten times to account for any variable timing issues like it)
    4. Both implementations do query two components (SomeData and SceneSectionData) via the EntityQuery and ForEach respectfully. In both cases they are read-only.
    5. I believe this no longer applies to the new variation 2 implementation with .Run()

    The results of the test are still quite in favour of variation 1 (with the for-loop GetComponent):
     
  8. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Make sure that in Jobs -> Burst menu "Synchronous compilation" is enabled. Don't run tests on first run (generally speaking, any test on JIT-ed platform should not be performed on first run).

    Try disabling Burst Safety Checks, and Leak Detection, see if it makes any difference.

    Also, try profiling with Deep Profile enabled, see what actually takes time.