Search Unity

Benchmark performance of ecs systems

Discussion in 'Data Oriented Technology Stack' started by jojolepro, Aug 25, 2018.

  1. jojolepro

    jojolepro

    Joined:
    Sep 16, 2015
    Posts:
    3
    Hi,

    I made a benchmark of the ecs systems of unity to compare its performance to the one found in the amethyst game engine (uses "specs" ecs).

    I would like someone that knows the ecs in depth to look at my code and make sure my benchmark is valid and I didn't accidentally make an error that would lower the performance of unity.

    The code is available here:
    https://github.com/jojolepro/unityecsbench/tree/master/Assets/Scenes

    How it works:
    I have 3 components: Comp1, Comp2, Comp3
    I create 3 millions entities according to the following distribution:
    1 million: Comp1
    1 million: (Comp1 & Comp2)
    1 million: (Comp1 & Comp2 & Comp3)

    Then I iterate over all the entities using 3 different systems in readonly, making sure to access the data of each.
    Sys1: Comp1 -> 3 million iter
    Sys2: (Comp1 & Comp2) -> 2 million iter
    Sys3: (Comp2 & Comp3) -> 1million iter

    Each update frame takes a total of 45 seconds to complete (which is REALLY slow for the amount of entities, considering specs can do the same in 8 ms).
    Testing was made by exporting the game, and running it from the command line using -nographics -batchmode
    Logs are in the default log location of your system with the benchmark results.
    Compiling on linux was impossible, so I could only test on windows.

    A built exe of the game is available in the repo under the Build/ directory.

    Let me know your results and thanks for your time!
     
    Last edited: Aug 25, 2018
  2. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,728
    You're not using jobs (let alone burst compiler) so you're not getting any performance benefits.
     
    Last edited: Aug 25, 2018
  3. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,728
    Also your number of entities are off.

    You actually created 3 million entities for the first system, 2 million entities for the second system and 1 mill entities for the third system. Not 1 million for each

    But to expand on my original post, if I jobify this, I can run your exact example in under 3 millseconds for all 3 systems

    Source

    Code (CSharp):
    1. using Unity.Entities;
    2. using UnityEngine;
    3. using Unity.Burst;
    4. using Unity.Collections;
    5. using Unity.Jobs;
    6.  
    7. public class Sys1 : JobComponentSystem
    8. {
    9.     [BurstCompile]
    10.     private struct Job : IJobProcessComponentData<Comp1>
    11.     {
    12.         public int C;
    13.  
    14.         public void Execute([ReadOnly] ref Comp1 data)
    15.         {
    16.             C += data.data;
    17.         }
    18.     }
    19.  
    20.     protected override JobHandle OnUpdate(JobHandle jobHandle)
    21.     {
    22.         return new Job().Schedule(this, jobHandle);
    23.     }
    24. }
    25.  
    26.  
    27. public class Sys2 : JobComponentSystem
    28. {
    29.     [BurstCompile]
    30.     private struct Job : IJobProcessComponentData<Comp1, Comp2>
    31.     {
    32.         public int C;
    33.  
    34.         public void Execute([ReadOnly] ref Comp1 data0, [ReadOnly] ref Comp2 data1)
    35.         {
    36.             C += data0.data;
    37.         }
    38.     }
    39.  
    40.     protected override JobHandle OnUpdate(JobHandle jobHandle)
    41.     {
    42.         return new Job().Schedule(this, jobHandle);
    43.     }
    44. }
    45.  
    46. public class Sys3 : JobComponentSystem
    47. {
    48.     public struct CompGroup
    49.     {
    50.         public readonly ComponentDataArray<Comp2> comp2;
    51.         public readonly ComponentDataArray<Comp3> comp3;
    52.         public readonly int Length;
    53.     }
    54.  
    55.     [BurstCompile]
    56.     private struct Job : IJobProcessComponentData<Comp2, Comp3>
    57.     {
    58.         public int C;
    59.  
    60.         public void Execute([ReadOnly] ref Comp2 data1, [ReadOnly] ref Comp3 data2)
    61.         {
    62.             C += data1.data;
    63.         }
    64.     }
    65.  
    66.     protected override JobHandle OnUpdate(JobHandle jobHandle)
    67.     {
    68.         return new Job().Schedule(this, jobHandle);
    69.     }
    70. }
    Results

    upload_2018-8-25_14-37-30.png

    Debugger

    upload_2018-8-25_14-39-19.png

    FPS

    upload_2018-8-25_14-38-9.png

    And this is all profiled in the editor. If I compiled it it'd be even faster, but 3ms is still magnitudes faster than 45 seconds.
     
  4. rizu

    rizu

    Joined:
    Oct 8, 2013
    Posts:
    1,177
    Not talking on behalf of the original poster but I thought this was more of a question what's taking all the perf on ECS. You can run job system with burst even without ECS so it would just be a different comparison set then (compare jobified non-ECS against jobified ECS).
     
  5. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,721
    @tertle:
    Thank you. This is lovely! And yes absolutely if you use ECS because you want performance, then the expectation is you run ALL your code via burst.


    So there seems to be an issue somewhere. Generally we would like everyone who uses ECS preview to jump on the performance by default train. For some reason newcomers don't automatically by default write jobified code.

    @jojolepro I'd really like to understand what made your first choice of writing ecs code to be writing main thread code? Was it something in the documentation that guided you down that path or some other content we need to clarify?


    I guess also the naming ComponentSystem is shorter than JobComponentSystem makes JobComponentSystem not the default choice...
     
  6. Life_Is_Good_

    Life_Is_Good_

    Joined:
    Mar 4, 2013
    Posts:
    25
    I'd guess that this is simply a result of culture & scale.
    Culture, because many game devs never touch the topic multithreading.
    Scale, because the ECS & Job System are big changes. And you can separate both from each other. The first logical step to adjust to those changes is to start with the ECS, in my opinion.
     
    Razmot likes this.
  7. jooleanlogic

    jooleanlogic

    Joined:
    Mar 1, 2018
    Posts:
    332
    Yes, agree with goodlife on both points.
    When I started, it was enough just getting my head around ECS without the added complexity of jobs. It really is quite a conceptually difficult jump from OO.
    To my shame, I'm yet to even look at the job system as my focus is still on solving ECS design problems for my game, not performance. One day. :rolleyes:
     
  8. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    25,635
    ECS + Jobs + Burst + standalone build = performance...

    I see it too often on forums or on social media where people always omit one of the things in the list above so they get less performance than expected.
     
    Antypodish likes this.
  9. jojolepro

    jojolepro

    Joined:
    Sep 16, 2015
    Posts:
    3
    So I changed my benchmark to use jobs and burst as recommended by @tertle, and I got between 9 and 17 ms per frame. For comparison, the same code on the specs ecs runs between 8 and 10 ms. Is there more optimisations that can be turned on, or that's the sign that code translation has reached its limit?

    (release build + -batchmode + -nographics + burst + jobs)

    The reason I used ComponentSystem first is that I was reading the official examples on unity's github and ComponentSystem was what looked the most like the code I'm used to write with other ECS.

    Is there still something I can do to make unity's performance higher, without modifying the code behaviour?

    ps: @tertle my entity creation is fine, I do want 3 million entities total, but not 1 million per system.
     
    Last edited: Aug 25, 2018
  10. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,841
    I think this was also the case for me, where I initially started. I heard about Job system back then at that point, but wasn't clear, how to bite it. Then further the line I start get more curious about Job implementation, as I read forum. And eventually lead to some other sample codes with Job and ParallelFor.
     
  11. jojolepro

    jojolepro

    Joined:
    Sep 16, 2015
    Posts:
    3
    Parallel for would actually be a different behaviour in this case though since I'm testing serial side-effects to prevent compiler optimizations of the loop.
     
  12. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,841
    I understand, but parallel for is one of advantages of ECS with jobs. So I think is valid case, to be included in the benchmark, along with other methods.
     
  13. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,721
    >Parallel for would actually be a different behaviour in this case though since I'm testing serial side-effects to prevent >compiler optimizations of the loop.
    You can use .Run() instead of Schedule() to execute the code burst compiled but running all on the main thread.
     
  14. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,721
    >So I changed my benchmark to use jobs and burst as recommended by @tertle, and I got between 9 and 17 ms per >frame.
    If you are going by this:
    UnityEngine.Debug.Log("Update loop: "+updater.Elapsed.ToString());

    1. you have vsync enabled, so max framerate will be 60FPS.
    2. You are also measuring the overhead of running an empty scene. Naturally with a complete game engine, unless you disable different modules there is some base cost. Starting with clearing the screen, culling / rendering, audio, physics. While there is nothing in those sub-systems there is a baseline cost to it.

    I suggest running in standalone player, then attaching the profiler and seeing the time spent in the jobs & system code in timeline profiler.

    Also do note that right now you are not using the parallel processing codepath. In the Schedule function use 32 for (innerloopBatchCount.

    This tells the job scheduler it should process the work in batches of 32 entities across multiple cores.
     
  15. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,721
    upload_2018-8-25_22-52-18.png

    It looks like this on my machine. This is 3 year old macbook pro.
     
    starikcetin, Antypodish and wobes like this.
  16. floboc

    floboc

    Joined:
    Oct 31, 2017
    Posts:
    89
    In my case, it directly seemed to me given the documentation that Job code was more complex to write than simple systems... It seems also easier to start with Hybrid ECS which is mostly implemented without jobs in the original samples if I remember right. While I am used to write complex multi-threaded code in other languages, this seemed to be the place to start first before considering jobs.

    Now that I am much more comfortable with ECS in general, I am just starting to rewrite my code using Jobs, for extra performance. As a side note, I didn't use ECS for performance at start, more for the idealogy behind it and its potential, which is why I wasn't so interested into Jobs and Burst.
     
  17. starikcetin

    starikcetin

    Joined:
    Dec 7, 2017
    Posts:
    252
    I disagree.

    I think people don't get into Jobs because there is nowhere enough documentation. If the job system and its integration with ECS were documented better, more people would begin using it.

    Figuring things out from the samples and the source code could have worked, but some people don't have the time and/or motivation to do that when they can simply use the ECS without Job system.

    When the ECS and Job System is released with proper documentation and plenty of samples, then more and more people will start using it, I believe.
     
  18. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,728
    One thing to note is as far as I'm aware the job system is officially released as of 2018.1
    https://blogs.unity3d.com/2018/05/02/2018-1-is-now-available/

    With with proper documentation
    https://docs.unity3d.com/Manual/JobSystem.html

    It's just ECS that is still an early preview.

    To me it's a little weird to avoid using the fully released system yet focus on the preview package.
    I think you'd be better off learning to use jobs with MonoBehaviours than using ECS without job. Just my opinion and I'm sure some disagree but while I feel ECS lends itself to nicely organized code and logic, due to current limitations of ECS using it without jobs isn't really worth the extra effort at this stage.
     
  19. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,841
    I think from my point of view, it is because Jobs nor ECS are "drag and drop". Like rest of Unity. So it is harder to prototype. Or is too easy with classic OOP, so people get spoiled ;)

    And on other hand, Jobs is like additional stuff, on top of what is already there. Most people won't need Jobs for simple games. Yes it can benefit. But until is drag and drop alike, or by default, I think is still long way to go, to gain major popularity. Mind, lot of Unity devs are just like "Integrators". Putting blocks together, with minimum programming expertise. Hence getting into deep programming, not to mention completely getting astray from OOP, might put really big step ahead for many.
     
  20. Gen_Scorpius

    Gen_Scorpius

    Joined:
    Nov 2, 2016
    Posts:
    38
    I think the Jobs documentation is quite sufficient to get started. Also the basic jobs are not complicated at all. However, learning and properly implementing ECS on the other hand is, given its currently incomplete and perhaps partly outdated documentation, definitely a very bumpy road. ;)

    In my case, I used the documentation, the package and the examples source code, the small information nuggets spread on many threads and of course lots of trial and error to piece together what I now know about ECS. Ideally, good documentation and perhaps a playlist of HOW-TO videos should provide enough information to get to the same knowledge level.

    It's understandable that the primary focus is to get the core of ECS working but if the documentation is lacking and / or outdated then it's no surprise if some people use what is already available not the intended way. Though ECS won't be a good fit for those "Integrators" anyway until it matures enough to provide a similar developer experience like MonoBehaviour/GameObject does now.
     
  21. ComteJaner

    ComteJaner

    Joined:
    Jun 9, 2013
    Posts:
    9
    For me, from what I understood on the whole ECS/Job/Burst and Performance by default aproach, ECS was better even without the use of Burst or Multithreading than OOP, just because of the more cache friendly linear data layout. This is how I understood it in the talks. So a main thread ECS should run faster than a main thread OOP
     
  22. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,721
    That is true, but the question is always faster by how much? Here is the thing.

    Switching to ECS by converting OOP code to ECS code is quite a bit of work.

    * Once code is converted switching to Burst & Jobs is pretty simple.
    * Burst + Job code is where the majority of gains come from. It's usually where the 10-20x speedups combined come. Good data layout with mono / il2cpp on main thread alone gets some speedups but its often not that big.
     
  23. ComteJaner

    ComteJaner

    Joined:
    Jun 9, 2013
    Posts:
    9
    Ok understood! In term of effort to result ratio, you have to pay a lot of work to go convert OOP to ECS, but from there going multi-threaded and using the hard constraints of pure ECS to take advantage of Burst is easy and potentially gives great performance.

    For the moment I am doing only pure ECS just because I want to learn and try to get used to ECS patterns. It is purely educational currently.
     
  24. Gen_Scorpius

    Gen_Scorpius

    Joined:
    Nov 2, 2016
    Posts:
    38
    I just finished adapting 2 of my for now simple test systems to the new ArchetypeChunk API:
    With a very small entity/gameobject count (=2): PlayerInputSystem is only a ComponentSystem, while PlayerMovementSystem and the built-in EndFrameTransformSystem are JobComponentSystems (using JobParallelFor + BurstCompile). Screenshot shows measurements within UnityEditor.

    upload_2018-8-27_22-25-35.png

    The difference is significant.
     
  25. beezir

    beezir

    Joined:
    Apr 3, 2009
    Posts:
    133
    A comment on the OP - I think there are way too many variables at play here for any meaningful benchmark comparison between two different systems (rather than comparing within Unity itself, Jobs vs. No Jobs, Pure ECS vs No ECS, etc). The Rust specs version, for instance, has an artificial 2ms thread sleep in the main processing loop which gives an unpredictable fluxuation because the OS won't guarantee a specific time slice that short. I got roughly 2ms +/- a bit per update, vs. about 0.3ms without the frame limit (in both debug and release modes, indicating the work done is insignificant for any accurate benchmarking). Similarly, it's difficult to eliminate the effect of unrelated system overhead from other engine components when comparing vastly different systems. To get decently close, you'd need to create equivalent systems in both Unity and Amethyst/Specs that do a heavy amount of work (preferably work that simulates a real-world game architecture) so the vast majority of processing is within ECS for both, and make sure that neither are limiting frames or introducing any additional artificial thread sleeping.

    As for Burst/Jobs and ECS in Unity, I think the potential disconnect right now is due to the early stage of ECS and that it's perhaps not immediately clear that the performance benefits of ECS only really manifest once you add in Jobs and Burst - Yes, despite the current documentation indicating that all 3 are part of the whole. For anyone looking at ECS for the first time, it can take a bit to change how you think about structuring game code. I would wager that the average Unity developer tends not to delve regularly into intentionally parallel code - after all, part of Unity's appeal is ease of use. I think as ECS matures and becomes more integrated with the editor features, and as the documentation and examples mature, it will become more natural for newcomers to conceptually connect everything together.
     
  26. Gen_Scorpius

    Gen_Scorpius

    Joined:
    Nov 2, 2016
    Posts:
    38
    FYI for comparison: I modified my PlayerInputSystem to run an IJob instead of a regular ECS system that is completed before moving on (since Input needs to be collected on the main thread). I also removed a couple of Debug.Log messages. Again measured in PlayMode. The small scale test demonstrates for me clearly that basically as much game logic as possible should be in a job.

    upload_2018-8-27_23-30-38.png
     
  27. davenirline

    davenirline

    Joined:
    Jul 7, 2010
    Posts:
    497
    I'd like to weigh in on this. The short answer is that jobs aren't intuitive. There are many different job types and each has gotchas. I couldn't imagine writing code in jobs right away, unless they are really simple. My process is to use ComponentSystem first. Make the feature work first. Once that is done, that's the only time I turn it into a job and put a [BurstCompile] on it. I think this is the more maintainable approach. I think of jobs and the burst compiler as optimization options. I'll only use them if the game needs it.
     
  28. Gen_Scorpius

    Gen_Scorpius

    Joined:
    Nov 2, 2016
    Posts:
    38
    You can use use jobs in a ComponentSystem as well. Obviously, you would need to manage the dependencies and handles yourself. But in this case you can ensure that the job is running on the main thread (e.g. for input collection) or for debugging purposes.
     
  29. starikcetin

    starikcetin

    Joined:
    Dec 7, 2017
    Posts:
    252
    I think along the same lines as well:

    1. Make it work
    2. Make it fast

    That seems like the easiest (read: most frictionless) option.
     
    davenirline likes this.