Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Feedback Move EndCommandBufferSystems to their own PlayerLoop system

Discussion in 'Entity Component System' started by mbaker, May 4, 2022.

  1. mbaker

    mbaker

    Joined:
    Jan 9, 2013
    Posts:
    52
    I've been working through a concept where different locations in my project are each simulated in their own ECS World instance.

    In playing around with this setup I've found an opportunity to improve performance for applications that run multiple Worlds by grouping the execution of each world's EndCommandBufferSystems into their own top level PlayerLoopSystems.

    By default, Worlds are scheduled to be processed sequentially at each PlayerLoopSystem. So the order of execution will look something like this:
    • Initialization
      • InitializationSystemGroup (Default World)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndInitializationEntityCommandBufferSystem
      • InitializationSystemGroup (World 1)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndInitializationEntityCommandBufferSystem
      • InitializationSystemGroup (World 2)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndInitializationEntityCommandBufferSystem
    • EarlyUpdate
    • FixedUpdate
    • PreUpdate
    • Update
      • SimulationSystemGroup (Default World)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndSimulationEntityCommandBufferSystem
      • SimulationSystemGroup (World 1)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndSimulationEntityCommandBufferSystem
      • SimulationSystemGroup (World 2)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
        • EndSimulationEntityCommandBufferSystem
    • ...
    This works well but the EndCommandBufferSystems at the end of each group cause a sync to the main thread earlier than necessary. If we move the EndCommandBuffers to their own top level PlayerLoopSystem we can schedule the jobs of every world and then process each EndCommandBuffer while the jobs of the other worlds are in flight.

    The updated PlayerLoop looks like this:
    • Initialization
      • InitializationSystemGroup (Default World)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
      • InitializationSystemGroup (World 1)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
      • InitializationSystemGroup (World 2)
        • BeginInitializationEntityCommandBufferSystem
        • (... Misc other systems/groups)
    • PostInitialization
      • PostInitializationSystemGroup (Default World)
        • EndInitializationEntityCommandBufferSystem
      • PostInitializationSystemGroup (World 1)
        • EndInitializationEntityCommandBufferSystem
      • PostInitializationSystemGroup (World 2)
        • EndInitializationEntityCommandBufferSystem
    • EarlyUpdate
    • FixedUpdate
    • PreUpdate
    • Update
      • SimulationSystemGroup (Default World)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
      • SimulationSystemGroup (World 1)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
      • SimulationSystemGroup (World 2)
        • BeginSimulationEntityCommandBufferSystem
        • (... Misc other systems/groups)
    • PostUpdate
      • PostSimulationSystemGroup (Default World)
        • EndSimulationEntityCommandBufferSystem
      • PostSimulationSystemGroup (World 1)
        • EndSimulationEntityCommandBufferSystem
      • PostSimulationSystemGroup (World 2)
        • EndSimulationEntityCommandBufferSystem
    • ...

    With an example that highlights the issue you go from a profiler that looks like this with the default PlayerLoop:
    upload_2022-5-4_15-41-21.png

    to this with the suggested change:
    upload_2022-5-4_15-38-33.png

    The example is a bit contrived but it illustrates that jobs from the later worlds are able to keep working while the earlier worlds work through their end command buffer on the main thread.

    Why don't you move BeginCommandBuffers into their own group too?
    I don't think there would be benefit in making that change. The goal is to get work scheduled ASAP and then defer requiring it to be complete (syncing to main) for as long as possible. Running through a BeginCommandBuffer "unlocks" a bunch of work that be scheduled so we should schedule that work before processing another world's BeginCommandBuffer. This also allows the first world's job work to happen while processing your next world's BeginCommandBuffer.

    What about EndFixedStepSimulationCommandBuffer?
    Maybe possible?
    I think a similar strategy could be taken with the fixed step but it would be a bit more complicated. Since the FixedStepSimulationSystemGroup is nested within SimulationSystemGroup and can get executed multiple times per frame.
    You'd have to devise a custom PlayerLoopSystem that was capable of repeat calling individual subsystems, as required. You'd also have to move all of the systems before FixedStepSimulationSystemGroup their own PlayerLoopSystem as well.

    I'm not using FixedStep for anything so it's not high on my list of things to explore!

    Are there downsides to this approach?
    I don't think so.

    With this approach there is greater incentive to use EndCommandBuffers vs BeginCommandBuffers. If you really need to rely on BeginCommandBuffers you don't get as much of a benefit.

    At the moment it's a bit awkward to setup and I have to suppress some warnings from the group sorting logic because LateSimulationSystemGroup tries to order itself before EndSimulationEntityCommandBufferSystem and it's no longer in the group. Fortunately, the LateSimulationSystemGroup ends up at the right update position in the group so this isn't an issue...for now.

    Are there downsides if you're only using one world?
    I don't think so. The same work happens just in the next PlayerLoopSystem.

    ---

    Let me know if I'm overlooking an issue with this approach or you have suggestions!
    It would be great if this was the default way that the default PlayerLoop was setup.
     
    apkdev likes this.
  2. JesOb

    JesOb

    Joined:
    Sep 3, 2012
    Posts:
    1,081
    Good test thanks :)

    May be totally interleave systems of worlds and one instance of each group will open doors for faster simulation in all cases and FixedStep will just work because fixed step the same for all systems
     
    mbaker likes this.
  3. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,626
    We just switched all our usages of EndCommandBufferSystem with BeginPresentationCommandBufferSystem (or BeginInitializationCommandBufferSystem for worlds if they have no presentation) to get the same effect.
     
    xVergilx and MNNoxMortem like this.
  4. mbaker

    mbaker

    Joined:
    Jan 9, 2013
    Posts:
    52
    Completely interleaving the systems would suffer from the BeginCommandBufferSystems issue I mentioned above where work for a world can be scheduled but you're waiting on the BeginCommandBuffers of the other worlds. Maybe there's some hybrid approach though where you partially interleave the groups.

    I did spend some time thinking about a way to run the EndCommandBufferSystem and BeginCommandBuffer for a world back to back so you'd have a flow like World1 InitializationEndCommandBuffer -> SimulationBeginCommandBuffer -> Schedule Work -> repeat for world 2. Then you could always have job work in flight. Unfortunately, I think there are too many problems with this approach in the default player loop because there are lots of PlayerLoopSystems between Initialization and Update(Simulation) and then again between Update and PreLateUpdate(Presentation)

    There's definitely room for more optimization in a fully custom player loop.

    That's a good point. If you can get away with putting all of your command buffers on a single BeginCommandBufferSystem for the world you achieve a similar effect.