Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Voting for the Unity Awards are OPEN! We’re looking to celebrate creators across games, industry, film, and many more categories. Cast your vote now for all categories
    Dismiss Notice
  3. Dismiss Notice

TransformSystem performance preview 11

Discussion in 'Entity Component System' started by Chris-Herold, Aug 29, 2018.

  1. Chris-Herold

    Chris-Herold

    Joined:
    Nov 14, 2011
    Posts:
    115
    First off: I'm having a great time playing around with ECS/Burst/Jobs. Incredible stuff.
    Now to the problem:
    In preview 8 i added my own Scale component system (modifies TransformMatrix after the TransformSystem).
    Now in preview 11 (removed my own Scale component and system) i find the new TransformSystem spending a lot of time doing its thing, although none of my entities are parented, nor will ever need parenting.

    There are ~20000 entities with pos/rotation/scale in my world and TransformSystem 11 can take up to 3.5 milliseconds to compute LocalToWorld (there are only 2 archetypes). I'm rolling back to preview 8 for now, since it's so much faster (<-.1ms)

    I'm wondering if you guys are happy with the TransformSystem 11 and wether or not the trade of parenting versus performance was really worth it and what are your plans with this (and the instance renderer) for the next previews.
     
    Last edited: Aug 29, 2018
  2. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    You should check standalone performance and have a look at the 'Static' component.
     
  3. Chris-Herold

    Chris-Herold

    Joined:
    Nov 14, 2011
    Posts:
    115
    Standalone performance is actually much worse than in-editor. (compared to preview 8 it dropped by more than half)
    Also i cant use static, it's an nbody gravity simulation where everything is in motion.

    I can't help but think that the new TransformSystem has a ton of overhead when there are lots of non-parented entities in motion.
    I realize i can write my own system (and i did for rendering with PerRenderData), but i'm wondering where this is going from here, since the drop in performance (for no gain in my case) is so huge. This should be easily reproducable using any ECS demo with tens of thousands of position/rotation/scale entities.
     
    Last edited: Aug 29, 2018
  4. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    3,356
    Ya I noticed the performance drop also. My assumption is that they favor correctness over performance at this point, and that they will work it out. Working on a real game my approach has to be much more practical though. So I'm not jumping on new stuff nearly as fast anymore. Stuff that performs worse then not using ECS, they will have to fix that in time, I don't see that standing.
     
  5. Gen_Scorpius

    Gen_Scorpius

    Joined:
    Nov 2, 2016
    Posts:
    65
    The new TransformSystem uses the also new ArchetypeChunk API. Perhaps there is still plenty of codegen optimization work to be done.
     
  6. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    I did a quick test and performance is pretty good:
    300.000 entities having Position+Rotation+Scale
    Transform is changed for all objects every frame -> ChangedFiltering used by TransformSystem has no effect.
    Main Thread: 0.8ms
    Worker: 2.0ms in parallel on 8 threads

    I can confirm that standalone is much (50x) slower as Burst is not used for RootLocalToWorld for whatever reason. Other Jobs in the standalone do use Burst.

    Code (CSharp):
    1. using Unity.Burst;
    2. using Unity.Collections;
    3. using Unity.Entities;
    4. using Unity.Jobs;
    5. using Unity.Mathematics;
    6. using Unity.Transforms;
    7. using UnityEngine;
    8. using Random = UnityEngine.Random;
    9.  
    10. public class TransformSystemPerf : MonoBehaviour
    11. {
    12.     public int count = 300000;
    13.    
    14.     NativeArray<Entity> entities;
    15.     EntityManager em;
    16.  
    17.     void OnEnable()
    18.     {
    19.         em = World.Active.GetOrCreateManager<EntityManager>();
    20.         var at = em.CreateArchetype(typeof(Position), typeof(Rotation), typeof(Scale));
    21.         entities = new NativeArray<Entity>(count, Allocator.Persistent);
    22.         em.CreateEntity(at, entities);
    23.        
    24.         SetRandomInitial();
    25.     }
    26.  
    27.     private void OnDisable()
    28.     {
    29.         em?.DestroyEntity(entities);
    30.         entities.Dispose();
    31.     }
    32.  
    33.     void SetRandomInitial()
    34.     {
    35.         for (int i = 0; i < entities.Length; i++)
    36.         {
    37.             em.SetComponentData(entities[i],
    38.                 new Position {Value = new float3(Random.Range(0f, 1f), Random.Range(0f, 1f), Random.Range(0f, 1f))});
    39.             em.SetComponentData(entities[i],
    40.                 new Rotation {Value = quaternion.euler(Random.Range(0f, 1f), Random.Range(0f, 1f), Random.Range(0f, 1f))});
    41.             em.SetComponentData(entities[i],
    42.                 new Scale {Value = new float3(Random.Range(0f, 1f), Random.Range(0f, 1f), Random.Range(0f, 1f))});
    43.         }
    44.     }
    45. }
    46.  
    47.  
    48. public class ChangeTransformSystem : JobComponentSystem
    49. {
    50.     [BurstCompile]
    51.     struct UpdatePosition : IJobProcessComponentData<Position, Rotation, Scale>
    52.     {
    53.        
    54.         public void Execute(ref Position position, ref Rotation rotation, ref Scale scale)
    55.         {
    56.             position = new Position {Value = position.Value + 0.1f};
    57.             rotation = new Rotation {Value = math.mul(rotation.Value, quaternion.rotateX(1f))};
    58.             scale = new Scale {Value = scale.Value + 0.1f};
    59.         }
    60.     }
    61.  
    62.     protected override JobHandle OnUpdate(JobHandle inputDeps)
    63.     {
    64.         return new UpdatePosition().Schedule(this, inputDeps);
    65.     }
    66. }
     
  7. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    >I can confirm that standalone is much (50x) slower as Burst is not used for RootLocalToWorld for whatever reason. >Other Jobs in the standalone do use Burst.
    Oh... Thanks for digging into it.

    We'll check it out.
     
    optimise likes this.
  8. xoofx

    xoofx

    Unity Technologies

    Joined:
    Nov 5, 2016
    Posts:
    412
    On which platform are you using it and compiling for it? In the generated folder, somewhere (depending on the platform) you should have a `lib_burst_generated.*` file. Is it there? Does your job `UpdatePosition` appears correctly in the burst inspector in the Editor?
     
  9. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    Windows x64. The profiler shows (Burst) for UpdatePosition but not for RootLocalToWorld. In the editor both are marked as '(Burst)'. lib_burst_generated.dll is there. I attached the txt file, which does not contain any TransformSystem stuff.

    Edit: correct lib_burst_generated.txt, same problem
     

    Attached Files:

    Last edited: Aug 30, 2018
  10. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    Attached the project.
     

    Attached Files:

  11. xoofx

    xoofx

    Unity Technologies

    Joined:
    Nov 5, 2016
    Posts:
    412
    Thanks, found the issue, we will push a fix to the next version of ECS
     
  12. Chris-Herold

    Chris-Herold

    Joined:
    Nov 14, 2011
    Posts:
    115
    Here's an additional observation.

    Testing a non-changing set of entities - no entities are added or removed, no components added or removed - yields good performance (in-editor, as reported by @julian-moschuering), but when the set is changed frequently performance drops considerably and is much worse than TransformSystem preview 8.

    In my scenario i'm spawning 100 "attractors" and 16000 "non-attractors".
    When two attractors collide, the smaller one changes archetype (removing a set of shared components) and 500 more non-attractors are instantiated. Non-attractors that collide with attractors also change archetype (adding a component).

    If the order of entities shown in the EndFrameTransformSystem (in debugger) is any indication of processing order, my hunch is that due to component addition/removal entities change archetype and some part of the chunk iteration API turns into a bottleneck (just a hunch, i could be a gazillion miles off here...). I'm naively basing this on the fact that in TransformSystem preview 8 the order of entities process shown in the debugger is more or less coherent frame to frame, while in TransformSystem preview 11 the order appears to be completely different frame to frame (in my testcase with a frequently changed set of entities)

    Thanks joachim and xoofx! (big SharpDX fan btw)
     
    Last edited: Aug 31, 2018