Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

[ReadOnly] is 7x slower than [ReadWrite] with many Systems

Discussion in 'Entity Component System' started by PublicEnumE, Apr 7, 2020.

  1. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    I found something unexpected while trying to debug some poor IJobChunk performance:

    When many systems are scheduling IJobChunk jobs that work on the same Components, [ReadOnly] performance is significantly worse than [ReadWrite].

    In the test below, there are 1,000 entities with a DynamicBuffer<Foo>. 100 systems each schedule one IJobChunk job, which works on the same Foo buffer type. When that type is ReadWrite, each frame completes in ~4ms. If it's set to ReadOnly, each frame is ~28ms+.

    What is going on here?

    Code (CSharp):
    1. public struct Foo : IBufferElementData
    2. {
    3. }
    4.  
    5. public abstract class JobChunkTestSystem : SystemBase
    6. {
    7.     EntityQuery query;
    8.  
    9.     protected override void OnCreate()
    10.     {
    11.         query = GetEntityQuery(ComponentType.ReadOnly<Foo>());
    12.     }
    13.  
    14.     protected override void OnUpdate()
    15.     {
    16.         Dependency = new TestJobChunk
    17.         {
    18.             fooType = EntityManager.GetArchetypeChunkBufferType<Foo>(true)
    19.         }.ScheduleParallel(query, Dependency);
    20.     }
    21.  
    22.     [BurstCompile]
    23.     public struct TestJobChunk : IJobChunk
    24.     {
    25.         [ReadOnly]
    26.         public ArchetypeChunkBufferType<Foo> fooType;
    27.  
    28.         public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
    29.         {
    30.         }
    31.     }
    32. }
    33.  
    34. public class Test01 : JobChunkTestSystem { }
    35. public class Test02 : JobChunkTestSystem { }
    36. public class Test03 : JobChunkTestSystem { }
    37. public class Test04 : JobChunkTestSystem { }
    38. public class Test05 : JobChunkTestSystem { }
    39. public class Test06 : JobChunkTestSystem { }
    40. public class Test07 : JobChunkTestSystem { }
    41. public class Test08 : JobChunkTestSystem { }
    42. public class Test09 : JobChunkTestSystem { }
    43. public class Test10 : JobChunkTestSystem { }
    44. public class Test11 : JobChunkTestSystem { }
    45. public class Test12 : JobChunkTestSystem { }
    46. public class Test13 : JobChunkTestSystem { }
    47. public class Test14 : JobChunkTestSystem { }
    48. public class Test15 : JobChunkTestSystem { }
    49. public class Test16 : JobChunkTestSystem { }
    50. public class Test17 : JobChunkTestSystem { }
    51. public class Test18 : JobChunkTestSystem { }
    52. public class Test19 : JobChunkTestSystem { }
    53. public class Test20 : JobChunkTestSystem { }
    54. public class Test21 : JobChunkTestSystem { }
    55. public class Test22 : JobChunkTestSystem { }
    56. public class Test23 : JobChunkTestSystem { }
    57. public class Test24 : JobChunkTestSystem { }
    58. public class Test25 : JobChunkTestSystem { }
    59. public class Test26 : JobChunkTestSystem { }
    60. public class Test27 : JobChunkTestSystem { }
    61. public class Test28 : JobChunkTestSystem { }
    62. public class Test29 : JobChunkTestSystem { }
    63. public class Test30 : JobChunkTestSystem { }
    64. public class Test31 : JobChunkTestSystem { }
    65. public class Test32 : JobChunkTestSystem { }
    66. public class Test33 : JobChunkTestSystem { }
    67. public class Test34 : JobChunkTestSystem { }
    68. public class Test35 : JobChunkTestSystem { }
    69. public class Test36 : JobChunkTestSystem { }
    70. public class Test37 : JobChunkTestSystem { }
    71. public class Test38 : JobChunkTestSystem { }
    72. public class Test39 : JobChunkTestSystem { }
    73. public class Test40 : JobChunkTestSystem { }
    74. public class Test41 : JobChunkTestSystem { }
    75. public class Test42 : JobChunkTestSystem { }
    76. public class Test43 : JobChunkTestSystem { }
    77. public class Test44 : JobChunkTestSystem { }
    78. public class Test45 : JobChunkTestSystem { }
    79. public class Test46 : JobChunkTestSystem { }
    80. public class Test47 : JobChunkTestSystem { }
    81. public class Test48 : JobChunkTestSystem { }
    82. public class Test49 : JobChunkTestSystem { }
    83. public class Test50 : JobChunkTestSystem { }
    84. public class Test51 : JobChunkTestSystem { }
    85. public class Test52 : JobChunkTestSystem { }
    86. public class Test53 : JobChunkTestSystem { }
    87. public class Test54 : JobChunkTestSystem { }
    88. public class Test55 : JobChunkTestSystem { }
    89. public class Test56 : JobChunkTestSystem { }
    90. public class Test57 : JobChunkTestSystem { }
    91. public class Test58 : JobChunkTestSystem { }
    92. public class Test59 : JobChunkTestSystem { }
    93. public class Test60 : JobChunkTestSystem { }
    94. public class Test61 : JobChunkTestSystem { }
    95. public class Test62 : JobChunkTestSystem { }
    96. public class Test63 : JobChunkTestSystem { }
    97. public class Test64 : JobChunkTestSystem { }
    98. public class Test65 : JobChunkTestSystem { }
    99. public class Test66 : JobChunkTestSystem { }
    100. public class Test67 : JobChunkTestSystem { }
    101. public class Test68 : JobChunkTestSystem { }
    102. public class Test69 : JobChunkTestSystem { }
    103. public class Test70 : JobChunkTestSystem { }
    104. public class Test71 : JobChunkTestSystem { }
    105. public class Test72 : JobChunkTestSystem { }
    106. public class Test73 : JobChunkTestSystem { }
    107. public class Test74 : JobChunkTestSystem { }
    108. public class Test75 : JobChunkTestSystem { }
    109. public class Test76 : JobChunkTestSystem { }
    110. public class Test77 : JobChunkTestSystem { }
    111. public class Test78 : JobChunkTestSystem { }
    112. public class Test79 : JobChunkTestSystem { }
    113. public class Test80 : JobChunkTestSystem { }
    114. public class Test81 : JobChunkTestSystem { }
    115. public class Test82 : JobChunkTestSystem { }
    116. public class Test83 : JobChunkTestSystem { }
    117. public class Test84 : JobChunkTestSystem { }
    118. public class Test85 : JobChunkTestSystem { }
    119. public class Test86 : JobChunkTestSystem { }
    120. public class Test87 : JobChunkTestSystem { }
    121. public class Test88 : JobChunkTestSystem { }
    122. public class Test89 : JobChunkTestSystem { }
    123. public class Test90 : JobChunkTestSystem { }
    124. public class Test91 : JobChunkTestSystem { }
    125. public class Test92 : JobChunkTestSystem { }
    126. public class Test93 : JobChunkTestSystem { }
    127. public class Test94 : JobChunkTestSystem { }
    128. public class Test95 : JobChunkTestSystem { }
    129. public class Test96 : JobChunkTestSystem { }
    130. public class Test97 : JobChunkTestSystem { }
    131. public class Test98 : JobChunkTestSystem { }
    132. public class Test99 : JobChunkTestSystem { }
    133. public class Test100 : JobChunkTestSystem { }
    Profiling shows that the culprit is safety checks, happening from within SystemBase.OnAfterUpdate(). Specifically,
    JobHandle.CheckFencelsDependencyOrDidSyncFence_Injected()
    .

    ...Which makes sense as something the system would be doing. I just didn't expect performance to take such a hit. Is this a DOTS worst case scenario, or am I making an error somewhere?
     
    Last edited: Apr 7, 2020
  2. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    UPDATE: If I change Foo from an IBufferElementData to an IComponentData, [ReadOnly] performance returns to 4ms/frame.

    This performance drop seems to be specific to IBufferElementData types.
     
    Last edited: Apr 8, 2020
  3. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    Similarly poor performance using BufferFromEntity. This gets ~30ms/frame with 1,000 entities, and 100x systems:

    Code (CSharp):
    1. public abstract class JobChunkTestSystem : SystemBase
    2. {
    3.     EntityQuery query;
    4.  
    5.     protected override void OnCreate()
    6.     {
    7.         query = GetEntityQuery(ComponentType.ReadOnly<Foo>());
    8.     }
    9.  
    10.     protected override void OnUpdate()
    11.     {
    12.         Dependency = new TestJobChunk
    13.         {
    14.             fooBuffers = GetBufferFromEntity<Foo>(true)
    15.         }.ScheduleParallel(query, Dependency);
    16.     }
    17.  
    18.     [BurstCompile]
    19.     public struct TestJobChunk : IJobChunk
    20.     {
    21.         [ReadOnly]
    22.         public BufferFromEntity<Foo> fooBuffers;
    23.  
    24.         public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
    25.         {
    26.         }
    27.     }
    28. }
     
    Last edited: Apr 7, 2020
  4. cort_of_unity

    cort_of_unity

    Unity Technologies

    Joined:
    Aug 15, 2018
    Posts:
    97
    Thanks for the report; we'll look into it soon and let you know!
     
    MNNoxMortem likes this.
  5. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    Thank you, sincerely!

    If it's not too much trouble, answering this question would unblock me:

    Is it safe to assume IBufferElementData should be as fast as IComponentData? Is there a chance this performance difference is actually by design?

    Many thanks.
     
    Last edited: Apr 8, 2020
  6. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    It is not by design. One quick thing to check is if this is due to the jobs debugger by disabling it.
     
    cultureulterior likes this.
  7. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    It's the job debugger. Turning it off eliminated the difference. Thank you for that tip. :)
     
  8. cort_of_unity

    cort_of_unity

    Unity Technologies

    Joined:
    Aug 15, 2018
    Posts:
    97
    I'm digging into the issue now. First of all, I believe the IComponentData vs. IBufferElementData performance difference can be explained by the fact that the Foo component is empty. An empty IComponentData takes up no storage in the chunk, and we do a variety of optimizations to take advantage of the fact that we know that no system can possible read or write data that doesn't exist. The systems in this case believe they have no readers/writers to wait on, so the JobsDebugger has nothing to do.

    A zero-size IBufferElementData still requires chunk storage for the BufferHeader itself (which, if nothing else, needs to store how many empty elements are in each entity's buffer). The zero-size optimizations therefore can't be applied, and the safety system has to track things as normal. If I add a "public int Value" field to Foo, the IComponentData path can't use the zero-size optimizations either, and slows down to be comparable to the IBufferElementData path.

    I'll investigate the ReadOnly vs. ReadWrite discrepancy next, and let you know what I find.
     
  9. s_schoener

    s_schoener

    Unity Technologies

    Joined:
    Nov 4, 2019
    Posts:
    81
    Since @cort_of_unity promised an explanation I wanted to deliver on that: The read-vs-write behavior is actually quite what you'd expect. Let me explain.
    The safety check that is performed in this particular case is a pro-active one: We check that for every type that the system reads from/writes to the dependency manager has a dependency for all readers and writers registered on the atomic safety handles associated with the different component types. This check basically ensures that we can tell you when your system schedules a job that writes to a component, but you forgot to register that dependency (in the case of SystemBase that means that you went out of your way to not assign it to the Dependency property). Note that we are checking for dependencies for _every_ reader and writer since we are explicitly trying to catch the case where a dependency was forgotten. Naturally, there is only ever a single writer, because writers will have to form a chain of jobs anyway: Two parallel writers writing to the same component are not considered safe (there are cases where that is safe, but the system does not have the fine-grained resolution needed to distinguish these cases). Readers on the other hand can execute in parallel, so there are potentially many of them that need to be checked.
    (There are of course many different ways to make that faster. I mainly wanted to explain the problem, because that is what was promised :) )
     
    Last edited: Jun 13, 2020