Search Unity

  1. Unity 2019.2 is now released.
    Dismiss Notice

[SOLVED] EndSimulationEntityCommandBufferSystem huge overhead cause?

Discussion in 'Data Oriented Technology Stack' started by MichaelTwin, Aug 9, 2019.

  1. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    In one of my ECS systems, I provide a NativeArray of components for a job.
    This native array job field is marked [DeallocateOnJobCompletion] [ReadOnly]

    If I don't access this array inside a job, the fps is alright.
    If I iterate over it(simple "for" loop) and try to access every element, Entity Debugger shows that EndSimulationEntityCommandBuffer eats around 30-40ms.
    This system doesn't even explicitly use any EntityCommandBuffer!

    What can cause this behavior?
     
  2. ilih

    ilih

    Joined:
    Aug 6, 2013
    Posts:
    701
    The jobs execution time it is included in CommandBuffer systems execution time because of CommandBuffer systems awaiting completion of all previously started jobs to playback commands.
     
    MichaelTwin likes this.
  3. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,704
    Entity Debugger doesn't really show you the cost of each system and shouldn't be used for profiling. It more shows you where your sync points and main thread usages are.

    EndSimulationEntityCommandBuffer is basically just triggering a sync point and waiting for that previous systems job to end. It's your previous job taking a long time to complete.

    Use the Profiler to actually see how long each of your jobs are.

    -edit-

    beaten by a second
     
    MichaelTwin likes this.
  4. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    thanks everyone, I'll explore this situation though normal Profiler later.
    Still don't understand why pure access of NativeArray elements in a Job can be so expensive.
     
  5. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    Regular profiler yielded the same results(the system runs slow!)
    Also I checked another ECS test project with basically the same system. This project used Entities 0.1.0 and ran at aroun 70-80fps. After updating to Entities 0.1.1, it runs at 15fps.
    Now I have even more questions
     
  6. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    489
    What are your Burst settings?
     
  7. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    @DreamingImLatios
    Leak Detection: On
    Burst:
    Enable Compilation: On
    Safety Checks: On
    Synchronous Compilation: On
    Show Timings: Off​
     
  8. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    432
    Can you share a screenshot of the profiler timeline?
     
  9. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    @GilCat sure. Giant 4000ms spikes are from the same system. Not from GC or anything.
    upload_2019-8-10_14-54-42.png

    Here is the code for the generic parent class of this system.

    Code (CSharp):
    1. public abstract class NeighbourCounting<T1, T2> : JobComponentSystem
    2.     where T1 : struct, ComponentWithValue
    3.     where T2 : struct, IComponentData
    4. {
    5.     [BurstCompile]
    6.     struct NeighbourCountingJob : IJobForEach<Translation, T1>
    7.     {
    8.         [DeallocateOnJobCompletion] [ReadOnly] public NativeArray<Translation> targetComponentPositions;
    9.         [ReadOnly] public float requiredDistanceSq;
    10.      
    11.         public void Execute([ReadOnly] ref Translation translation, ref T1 neighboursCounter)
    12.         {
    13.             int neighbourCount = 0;
    14.             for (int i = 0; i < targetComponentPositions.Length; i++)
    15.             {
    16.                 float distanceToTarget = math.distancesq(translation.Value, targetComponentPositions[i].Value);
    17.                 if (distanceToTarget <= requiredDistanceSq)
    18.                 {
    19.                     neighbourCount++;
    20.                 }
    21.             }
    22.          
    23.             neighboursCounter.Value = neighbourCount;
    24.         }
    25.     }
    26.     protected abstract float RequiredDistanceSq { get; }
    27.  
    28.     private EntityQuery targetNeighbourQuery;
    29.  
    30.     protected override void OnCreate()
    31.     {
    32.         base.OnCreate();
    33.         targetNeighbourQuery = GetEntityQuery(typeof(T2), typeof(Translation));
    34.     }
    35.  
    36.     protected override JobHandle OnUpdate(JobHandle inputDependencies)
    37.     {
    38.         NativeArray<Translation> targetComponetPositions = targetNeighbourQuery.ToComponentDataArray<Translation>(Allocator.TempJob);
    39.  
    40.         var job = new NeighbourCountingJob
    41.         {
    42.             targetComponentPositions = targetComponetPositions,
    43.             requiredDistanceSq = RequiredDistanceSq
    44.         };
    45.          
    46.         var jobHandle = job.Schedule(this, inputDependencies);
    47.         return jobHandle;
    48.     }
    49. }
     
  10. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,704
    How many entities do you have. This is a very poor scaling job if you have any significant entity count.
     
  11. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    In this test there are around 22500 entities.
    The way it scales isn't that surprising, but the 10x degradation of performance with upgrade from Entities 0.1.0 - is
     
  12. GilCat

    GilCat

    Joined:
    Sep 21, 2013
    Posts:
    432
    I don't see how it could have a 10x degradation from Entities 0.1.0 to 0.1.1 but like @tertle said that job is not going to scale.
    You should consider using other algorithms for reducing the amount of entities you are testing the distance for.
    Right now your are testing all your entities for that distance (22500^2 checks each frame). Maybe a QuadTrees could help ?!?
     
  13. MichaelTwin

    MichaelTwin

    Joined:
    May 21, 2013
    Posts:
    8
    @GilCat yeah, I completely understand the inefficiency of this job. O(n^2) is no good, yup.

    I'm trying to create a cellular automaton in discrete 3D space, and getting neighbors through 3D array should be a simpler solution that QuadTrees. Also I don't know a thing about QuadTrees.

    This 3D array of positions should probably be created as a shared Dynamic Buffer data?
    Never tried them either, going in research mode..