Question Mega-thread to help me learn to reason about NetCode

DreamingImLatios · Nov 29, 2022

This thread is a follow-up from this thread.

Background

For those who aren’t familiar with me, I’ll provide some quick background about myself. It will be relevant for this thread.

First, I’m the creator of the Latios Framework, which provides native audio and animation for DOTS-based projects (for free) among other things.

But while I am a professional in the field of software development, game development is just a hobby for me that helps me grow as a person and bring outside domain knowledge to the company I work for. I’ve built lots of prototypes, but I usually don’t publish them, usually due to lack of artwork. I’m quite familiar with data-oriented design and ECS architectures, having worked with them before Unity even started with DOTS. While I’m no expert, I know much more than the average developer about hardware, performance, correctness, safety, ect.

But when it comes to networking, my only real experience is with communication between embedded devices over either standardized or purpose-built electrical circuitry. I have never built a real game with networking. And while I have read several pieces of documentation and tutorials, and watched several videos, I never felt like I ever knew enough to make my own decisions. Either the critical pieces are abstracted away, or the tutorials don’t provide the backing insight into why they do things one way and not another. I’m not exactly the trial-and-error type. I know some people work fast and learn from their mistakes quickly. But for me, that’s an incredibly slow and inefficient way for me to learn what I am doing.

So my goal with this thread is to break out of this cycle, such that I can be independent when making decisions inside of a NetCode project. Hopefully, I get there, and can make the same impact on the NetCode side of the DOTS community as I have made for the rest of the DOTS community.

But be warned! My brain works different, and many will find I can be difficult to teach because of it. Also, I get extremely technical. I reason about problems at a technical level that is often much deeper than most devs ever need to even try to reason about. If I challenge what you are saying, that isn’t necessarily because I think you are wrong (even though how I respond will probably make it sound like it), it is simply because I had to try to interpret your words to the level of technicality I reason internally about, and after doing so, it didn’t make sense.

Also, as an aside, I’m pretty cursed when it comes to DevOps and anything involving stuff like virtual machines or bash scripts or cloud hosting or whatever. I somehow manage to run into the weirdest issues, and consequently, my knowledge in these spaces is limited. I’ll try to keep the discussion away from them, as I have other contacts who can help fill me in. But don’t be surprised if something you feel “anyone of my caliber should know” happens to be something I don’t know.

Sorry for the long wordy background. Just know that if you find the discussion frustrating, it is probably equally frustrating for me and it is no one’s fault. Brains are weird like that.

The Project

So for this discussion, I am going to propose a project, simply so that we have something concrete to reason about. The project I want to focus on is LSSS, which if you don’t want to open in the Editor, you can download a playable version here.

Don’t focus on the extreme simulation sizes or the fact that it uses the Latios Framework. The project wasn’t originally designed for NetCode, but rather to stress-test scalability, and it would be unreasonable for me to assume that it could be ported to NetCode.

Rather, focus on the mechanics of the game, and that those mechanics are implemented by a sequence of inter-dependent data transformations, which maps well to the ECS architecture.

It is a team-based shooter, where each ship can move, shoot, and turbo for extra mobility. Ships have health. Health depletes when hit by bullets, explosions, or other ships directly (collision detection only, no real physics simulation). When a ship loses all its health, it explodes. Each faction has a max number of ships allowed to be out in the battlefield at once, and the rest are reinforcements to be spawned later. The game ends one only one faction remains. Ships are spawned by “orbital spawners” which are entities which invisibly fly around and pull from a spawn queue and spawn when no other entity is within their vicinity for the frame.

My first goal is to rebuild these mechanics from scratch as a NetCode project. I have the original to copy-and-paste from if I learn doing so for specific pieces makes sense. But I also expect some things will have to be redesigned. And that’s cool too.

Roadblock #1 – A Game Loop of Data Transformations

One of the first things I do when starting a project, is I map out the game loop to work out categories of data transformations and their dependencies.

For LSSS, I grouped the data transformations into the following categories:

Movement (including bullets, invisible spawners, ect)

Transform Hierarchy (TransformSystemGroup.Update())

Spatial Reasoning and Events (AI, firing guns, collision, health depletion, ect)

Sync Point (spawning or destroying entities, either recorded from command buffers or as synchronous logic)

Presentation (material properties and rendering)

Next, I determined how these high-level concepts need to interact with each other. Both Presentation and the Spatial Reasoning and Events categories need up-to-date world transforms to function properly. Movement and Sync Point trash the world transforms. Movement requires up-to-date transforms after a Sync Point. The Sync Point must follow Events which produce command buffers. And for this game, I wanted entities spawned due to a collision (like an explosion) to be visible the same frame the renderer shows the entities in their collided state.

Therefore, I structured a frame as follows:

First, I have my sync point, which plays back command buffers and spawns ships.

Second, I perform a Transform Hierarchy update, so that child entities of newly spawned prefabs are positioned correctly.

Third, I perform Presentation and render everything, because now I have up-to-date spawns at their spawn positions.

Fourth, I perform Movement.

Fifth, I perform a second Transform Hierarchy update, because movement dirtied the transforms.

Sixth, I run all the Spatial Reasoning and Events logic (this is the largest batch of systems).

And finally, in the next frame, the sync point plays back the command buffers generated from the Events at the end of the previous frame and also spawns new reinforcements after the old ships were destroyed. Thus, the game loop has come full circle and all dependencies are satisfied.

Now actual LSSS has a few more categories and special cases that are related to the scalability of the simulation, but I’ve excluded them for this discussion. Each category is typically represented by a ComponentSystemGroup inside one of the top-level groups.

So now the question:

How do I even begin going about breaking down the game loops for this game in a NetCode setting? I know there are new transformations and dependencies introduced with (de)serialization, rollback, and prediction. I also know that the server and client probably have different game loops.

What are the transformations that NetCode performs? When do each of those transformations happen in a frame? What requirements and assumptions do those systems make about the data when they are invoked? What state do they leave the data in after they happen?

Anyways, I’m going to leave it off here for initial responses. Thank you for reading this massive wall of text! No need to rush a reply. I have many other things to keep myself busy, and I don’t mind if it takes months for me to “break through”.

philsa-unity · Nov 29, 2022

I'd say much of network programming is about finding clever tricks to hide network latency & bandwidth problems under the rug. We have this inescapable limitation of network delays and bandwidth capacities, and we must do our best to find ways to make our players not notice that these problems exist

When thinking about the netcode strategy for a specific game, I like to ask myself these 3 questions:

Where's the best place to hide the latency in this game?

How can I make this work with the limited bandwidth that's available to us?

Would it matter if people cheat? And if so; how can I make it as-hard-as-possible for players to cheat even if they figure out my client/server messages protocol and are able to send any data they want

If the answer to question #3 is "no", then this will make things a lot simpler for you. But the answer is rarely "no" if you have any hopes of your game becoming popular, so we'll assume that preventing cheaters does matter here.

DreamingImLatios said: ↑

How do I even begin going about breaking down the game loops for this game in a NetCode setting? I know there are new transformations and dependencies introduced with (de)serialization, rollback, and prediction. I also know that the server and client probably have different game loops.

What are the transformations that NetCode performs? When do each of those transformations happen in a frame? What requirements and assumptions do those systems make about the data when they are invoked? What state do they leave the data in after they happen?
Click to expand...

It might be necessary to go over why the Prediction loop exists before answering this. You might or might not already know why prediction is a thing, but I figure I'll just go over it anyway:

The cheater problem

Imagine you take LSSS in its current state, and make every spaceship position synchronized over network. Clients are responsible for sending their ship's position to the server, and the server is responsible for sending ALL ship positions to each client (except for the ship that these clients own). With this setup, your ship moves instantly when you press inputs, because you don't have to wait for the server to tell you what your position is; you just handle movement directly and then inform the server about your new position.

The problem with this is that if anyone figures out a "hack" to change their position, then they can start cheating (teleporting, making their ship move faster, etc...). The server can do some case-by-case checks to try and validate the data coming from clients, but this tends to be a flawed solution that doesn't always work and makes it complicated to implement new features to your game.

So what other solution is there? The answers is: make the clients have zero authority over the state of the game. Make sure the only thing clients can send is their inputs, and the only thing servers can interpret is client input; almost as if clients all just had a wireless gamepad connected to the server computer where the game runs, and then the server sends back the state of the game to their machines for display. The worst thing cheaters can do now is to develop a program that controls their gamepad for them (actually they can also do "local" cheats like seeing through walls, etc... but we won't get into this here).

But now we've got a new problem: you press forward input, and your ship only starts to move 80ms later (or whatever your ping is) because you have to wait for your input to do the whole round trip to and from the server before you see the result on screen. The game just feels terrible to control now.

The latency problem

So we just replaced our cheater problem with a latency problem. Can we also solve the latency problem?

What if we did this:

Clients only send inputs to the server

The server reads client input, applies it to game simulation, and returns the state to clients.

At the same time, clients ALSO apply their input to the game simulation, but only on the entities they control.

When a client receives their spaceship's state from the server, it'll be a state from the past due to latency. So if we simply applied that state directly, we'd have a spaceship that just keeps getting its position overwritten to a past state when it moves

In order to compensate for latency, when a client receives state from the server, it will:

Revert the state of game simulation to what it was at the timestamp of the state data we just received from server

Re-simulate every frame that happened between that time and the "present"

Hope that this re-simulation ends up exactly with the same result the client ended up with when it simulated its controlled entities locally

This is the "Prediction Loop" in a nutshell. The client tries to "predict" the state of the game even without having received state from the server at the current simulation tick, and it kinda just hopes it arrives at the same result as the server once it does receive state from the server. If there is a difference, then the state is determined by what the server says it should be. Your job is to make sure your client predictions will be accurate most of the time, so you don't get jitter every time the server corrects your predicted game state.

The takeaway here is that much of your "game simulation" needs to happen in an update group that can be triggered potentially multiple times in a frame: in Netcode, this is the PredictedSimulationSystemGroup. This systems group will be updated by netcode as many times as it needs to in order to do the "rollback and re-simulate to present" logic. So all of your game simulation systems that are involved in any way with anything that needs prediction will go in there.

But "what should actually be predicted" is the big question, and there is no general answer to that. Ship movement is almost definitely something that should be predicted, but whether or not it's a good idea to make weapon shooting be part of prediction really depends on your specific game. There are ways to make it work with or without prediction, and the best choice depends on what you are most willing to sacrifice in your specific game: responsiveness? bandwidth? accuracy? CPU time? etc... The best answer I can provide is, unfortunately, "become very familiar with how this whole thing works in order to make the best decisions with all factors involved"

The bandwidth problem

Obviously, you should always just send the data that really needs to be synced over network.

But when prediction rolls back the game state to the past and re-simulates to present, it can create an unexpected problem. Did your game simulation code rely on any kind of stateful "previous frame" data? If so, then that "previous frame" data must also be restored when we roll back the state to the past, which means it must be part of the state data sent from server to clients. This means you're paying an extra bandwidth cost.

For this reason, it is often a good idea to design your game simulation systems in a way where they are as stateless as possible. In other words; they rely on as little previous frame data as possible. This is what differentiates Unity.Physics and Havok for example. The end result is reduced bandwidth consumption, even if it can add more work for the CPU. Bandwidth can often become a bottleneck before your CPU does, but again; it depends on the game.

The netcode game loop

So with prediction, the netcode game loop is essentially a game loop within a game loop. The "game simulation" will be updated as part of the prediction group potentially multiple times per frame, but then things like presentation don't need to update as part of the rollback & re-simulation in any way.

In a frame, you'll get:

Initialization group

Simulation group

(...)

Receive state from server

Rollback game state

Run prediction group X times to catch up to present (this is the "game loop within a game loop" I mentioned earlier

(...)

Presentation group

I wish I could come up with an easier and more straightforward way to explain all of this, but netcode is a problem that is made quite complex due to latency, bandwidth, and cheating problems. For every networked game feature, there are many valid ways to implement it and they're probably all very specific to your game

DreamingImLatios · Nov 29, 2022

Thank you for the awesome reply and starting point.

philsa-unity said: ↑

Would it matter if people cheat? And if so; how can I make it as-hard-as-possible for players to cheat even if they figure out my client/server messages protocol and are able to send any data they want

If the answer to question #3 is "no", then this will make things a lot simpler for you. But the answer is rarely "no" if you have any hopes of your game becoming popular, so we'll assume that preventing cheaters does matter here.
Click to expand...

I guess a little bit of context. While LSSS mechanically would probably be a game where I would want to prevent cheating if I took it seriously, the project ideas I'm actually serious about cheating isn't that big of a concern. Instead, what I am concerned about is that players mostly have the illusion of a shared experience, and that some data race over the network doesn't put one client into some funky state that requires all the players to exit and rejoin to resolve. With that said, I still believe a server-authoritative model is going to help with a lot of that. So if "preventing cheaters" == "correctness" and "good experience", I'll gladly comply with your assumption.

philsa-unity said: ↑

It might be necessary to go over why the Prediction loop exists before answering this. You might or might not already know why prediction is a thing, but I figure I'll just go over it anyway:
Click to expand...

philsa-unity said: ↑

The cheater problem
Click to expand...

philsa-unity said: ↑

The latency problem
Click to expand...

philsa-unity said: ↑

The bandwidth problem
Click to expand...

So far, we are on the same page. Your summary is excellent. But this is the information that quite a few resources do a great job of explaining. It is the next step where I get stuck. How do I go from these concepts into inserting the right data transformations into the right places in the frame?

philsa-unity said: ↑

The netcode game loop

So with prediction, the netcode game loop is essentially a game loop within a game loop. The "game simulation" will be updated as part of the prediction group potentially multiple times per frame, but then things like presentation don't need to update as part of the rollback & re-simulation in any way.

In a frame, you'll get:

Initialization group

Simulation group

(...)

Receive state from server

Rollback game state

Run prediction group X times to catch up to present (this is the "game loop within a game loop" I mentioned earlier

(...)

Presentation group

Click to expand...

This is where I need you to get much more technical and specific about NetCode. I'm trying to map everything out with a top-down approach so that I can see the order of everything. Here are some specific mechanics I'm struggling to place, some of which may be nonsensical due to my lack of understanding:

Reading player inputs on the client

Processing predicted player inputs on the client

Processing player inputs on the server

Moving ships on the server

Moving predicted ships on the client

Spawning bullets and explosions on the server

Spawning predicted bullets on the client (the ones the player fired)

Spawning other bullets and explosions on the client

Moving bullets and explosions on the server

Moving predicted bullets on the client - I'm guessing I will need to experiment with whether it makes sense to predict the bullets the entire time or transition the bullets to only be server predicted

Interpolating and smoothing predicted entities for rendering on the client - How do I know how much I mispredicted by to help with smoothing?

Extrapolating non-predicted bullets, explosions, and maybe AI on the client - I'm assuming that these things will not be synchronized every frame to save bandwidth unlike something like the player's ship, so while I recognize the player is "playing against the past", I would still want some extrapolation on these deprioritized entities so that rendering remains smooth.

Destroying bullets and explosions on the server

Destroying bullets and explosions on the client - Client never predicts destruction, right?

Performing collision detection on the server - I know some games will check the bullet against the target's past position to hide latency. I'm not sure if this is necessary for LSSS as it is fairly slow paced. But I'm curious where this logic would lie in both cases.

Predicting collision detection on the client - Is this even something I should do? Collision detection only affects ship health in LSSS (and despawning bullets, but see my note about destroying bullets and explosions above).

I'm just trying to figure out where everything belongs along with some sort of intuition as to "why". Knowing the order of things is critical for me to understand dependencies and guarantees about the data, which allows me to solve the actual problems I am faced with rather an imaginary harder one because I don't know details I could have known.

Thoughts?

philsa-unity · Nov 29, 2022

It may not answer all questions, but I'd give these general rules as a starting point:

Gathering input must happen in GhostInputSystemGroup

...because this updates at some point before the prediction group and also updates before the CommandSendSystemGroup which sends commands to the server

Applying input must happen in PredictedSimulationSystemGroup , because we need to use the input we remembered for the predicted tick

Note: input that's unrelated to "player commands", like input to toggle a UI thing on/off, can be just gathered/applied anywhere in the update loop

Any game update logic that affects anything that must be predicted must happen in PredictedSimulationSystemGroup. Predicted logic, by nature, should never care about whether it's on the client or server, because it must be identical for both

Ship movement is predicted, so it goes in PredictedSimulationSystemGroup

If there's an ability system in your game and some ability can affect your ship's speed, then the ability system update must be in PredictedSimulationSystemGroup as well, because it'll end up affecting the outcome of your ship movement, which is predicted

Any spawning/destroying of ghost prefabs happens in server-only logic, whether it's in a server-only system, or in a line of code under a "if(isServer)". Anywhere in the frame before the GhostSendSystem should do the trick. When the server spawns/destroys a ghost, that ghost is spawned/destroyed on clients as well

You might be under the impression that spawning things during the prediction group on the server would be a mistake, because rollback+resimulation might end up re-spawning a ghost multiple times, but.... in Netcode, you can always trust that the server world never rolls back. The server always simulates the prediction group once per tick, and it's the clients' job to always "stay ahead" of the server so that it doesn't have to roll back. If the client fails to stay ahead, it will see a little server correction happen and try to fast-forward its time in order to catch up

But sometimes, not everything that should be visible on all clients should be a ghost. Projectiles are a good example. In some games they might be ghosts, but in other games they might get spawned as non-ghost prefabs by clients based on some "uint shotsCounter" that's synced over network on a weapon ghost. It can be a way to sacrifice accuracy in favor of bandwidth savings. Explosions are another example: you can choose to make the server spawn explosion vfx ghosts when it detects that a ship got destroyed, or you could make clients spawn explosion vfx prefabs locally when it detects that a ship's "bool isDead" is true (or when the ship entity is destroyed).

I'd say performing lag-compensated collision detection on the server sounds like a good idea for this game, although I agree things are not moving super fast so it's kinda on the fence. You can see an example of how this would work here. I don't think this should be done on the client though. Generally, it's the server's job to reconstruct the interpolated state of things that the clients are seeing, and this is what the lag-compensated server collision detection should accomplish

DreamingImLatios · Nov 30, 2022

philsa-unity said: ↑

It may not answer all questions, but I'd give these general rules as a starting point:
Click to expand...

This is a very bottom-up response. I know from our past conversations that top-down thinking makes your head explode, but for me this bottom-up approach feels like I received a bunch of screenshots of various points inside a maze and have to somehow figure out how they all relate to each other. It is quite dizzying. (This is feedback for the documentation team btw.)

Some questions I am going to backlog until after I get past this first roadblock:

Spawning on the server inside of prediction is valid, but what about when a player fires a bullet? Do I wait out the latency for the bullet to spawn on the server? (Bullets are destroyed when they hit enemy ships, which only the server really knows, so for that reason I think I would want to make them ghosts. Is that reasoning sound?)

What timing guarantees are present for knowing that a ship was destroyed if I am going to listen for that to spawn explosions? Explosions can damage the player, so I need to ensure this doesn't get too out of sync.

Anyways, I'm going to leave my original question open and hope that others can help fill in the picture and get all these patches of puzzle pieces into a coherent image.

philsa-unity · Nov 30, 2022

Would a breakdown of the major Netcode steps in a frame (receive data, spawn ghosts, gather input, rollback, prediction, send data, etc...) be the answer you're looking for?

Or are you looking for something that would help you build a general intuition (as opposed to case-by-case examples) for what should be predicted, what should be interpolated, what should be a ghost, what should be synced and how, what should update when, etc...? I think an answer like this would be more useful than my previous replies, but it can be tricky to come up with that answer

DreamingImLatios · Nov 30, 2022

philsa-unity said: ↑

Would a breakdown of the major Netcode steps in a frame (receive data, spawn ghosts, gather input, rollback, prediction, send data, etc...) be the answer you're looking for?
Click to expand...

This is definitely what I am looking for!

philsa-unity said: ↑

Or are you looking for something that would help you build a general intuition (as opposed to case-by-case examples) for what should be predicted, what should be interpolated, what should be a ghost, what should be synced and how, what should update when, etc...? I think an answer like this would be more useful than my previous replies, but it can be tricky to come up with that answer
Click to expand...

I feel like this is something that I will have to experiment with, as it is part of the "feel" of the game. While I don't mind hearing some tips by pros every once in a while, it isn't the thing that is blocking me. I'm much more interested in the technical details of the NetCode package, and the details that will help me ensure my code plays nice with it.

CMarastoni · Nov 30, 2022

DreamingImLatios said: ↑

This is a very bottom-up response. I know from our past conversations that top-down thinking makes your head explode, but for me this bottom-up approach feels like I received a bunch of screenshots of various points inside a maze and have to somehow figure out how they all relate to each other. It is quite dizzying. (This is feedback for the documentation team btw.)

Some questions I am going to backlog until after I get past this first roadblock:

Spawning on the server inside of prediction is valid, but what about when a player fires a bullet? Do I wait out the latency for the bullet to spawn on the server? (Bullets are destroyed when they hit enemy ships, which only the server really knows, so for that reason I think I would want to make them ghosts. Is that reasoning sound?)

What timing guarantees are present for knowing that a ship was destroyed if I am going to listen for that to spawn explosions? Explosions can damage the player, so I need to ensure this doesn't get too out of sync.

Anyways, I'm going to leave my original question open and hope that others can help fill in the picture and get all these patches of puzzle pieces into a coherent image.
Click to expand...

- Bullet can use predicted spawning (there is documentation about it and sample in Asteroids and HelloNetcodea about how it work).

- For the confirmation of a destroyed ship, it depend on the latency. The client predict the server ahead of time. So at the very least the confirmation for an hit will be received after RTT x SimulationTickRate ticks, plus some ticks in case of packet loss. Usually the effect can be spawned and handled without problem.
The ship can't be destroyed on the client (client MUST never destroy ghosts). At most, you can disable the entity.

PolarTron · Nov 30, 2022

philsa-unity said: ↑

Would a breakdown of the major Netcode steps in a frame (receive data, spawn ghosts, gather input, rollback, prediction, send data, etc...) be the answer you're looking for?
Click to expand...

I would love such an explanation. The closest thing I've found to a good enough explanation is the Overwatch GDC Netcode video where they show the prediction loop as a looping gif. But that's just the prediction loop. Some other parts I extracted from the FPSSample Deep Dive video but I felt I had to extrapolate from those too when I made my own networking package.

Something with the level of detail as the DOTS introduction videos would be nice. I love deep diving.

CMarastoni · Nov 30, 2022

PolarTron said: ↑

I would love such an explanation. The closest thing I've found to a good enough explanation is the Overwatch GDC Netcode video where they show the prediction loop as a looping gif. But that's just the prediction loop. Some other parts I extracted from the FPSSample Deep Dive video but I felt I had to extrapolate from those too when I made my own networking package.

Something with the level of detail as the DOTS introduction videos would be nice. I love deep diving.
Click to expand...

We have some plans about that. So stay tuned

timjohansson · Dec 5, 2022

Here's a first attempt at mapping out what netcode is doing in a frame.

Initialization system group
There are no changes to this group itself. It runs before the insulation system group and runs once per frame. For this system delta time and the netcode singletons are not updated yet - so be careful if using them.

The only netcode specific thing in this group is the client only NetworkTimeSystem which calculates the target prediction tick and interpolation tick based on delta time, most recent received snapshot for interpolation tick and server reported command age for prediction tick. The calculations try to keep delta time close to “real time” and applies time scaling in order to keep in sync.

Presentation system group
This also runs once per frame after simulation as usual. The group only exist for client worlds, server worlds and thin client worlds does not present anything so the presentation group is not used. Delta time and elapsed time is not adjusted to network time when this group runs so it is not in sync with the server.

Simulation system group
I left this for last because this is where all the complexity is. This group will run once per frame on clients, but it will run once per simulation tick on servers. This means the simulation group on servers can run 0 times or more than 1 time in a frame. The default behaviour in 1.0 is that it only runs 0 or 1 time, if it needs more iterations it will run with a higher delta time instead of running multiple times (so once with delta time 33ms instead of twice with delta time 16.6ms). This is slightly less accurate, but it avoids the feedback loop where adding more iterations makes it take even longer so it just keeps running more and more iterations in order to keep up.

Before this system group runs Time.DeltaTime and Time.ElapsedTime is updated to the correct time for this simulation step, or the correct time for the prediction tick on the client. This will also update all singletons with target ticks etc.

The first thing that runs in the simulation system group, right after the BeginSimulationEntityCommandBufferSystem, is the client-only GhostSpawnSystemGroup. This is responsible for spawning newly received ghosts. All spawns received in snapshots are put in a queue so they can be spawned at the correct interpolation / prediction tick, and the system spawning them runs right after the ECB system because both of them need to wait for all jobs - and by having them close in time we reduce sync points.

The next thing to run is the NetworkReceiveSystemGroup. This group is responsible for checking for new packets received from the network and storing the data from them as entity data used by later systems.There is a server only CommandReceiveSystemGroup which will decode received input commands and store the command data in the correct entity.

Next up is the GhostSimulationSystemGroup, which is where most of the networking code not related to package send/receive or prediction itself goes. This includes things like Despawn (using a queue similar to spawning and a begin simulation ecb), collection handling which updates the ghost prefab list based on generated code and loaded prefabs, prespawned ghost processing and a lot of other things.

This is also where - on clients - we decode snapshots in the GhostReceiveSystem, then copy the values from the correct snapshot to the target entity - with interpolation/extrapolation for interpolated ghosts - in the GhostUpdateSystem. This copy of values is what results in the rollback before prediction. We also run something called classification systems, these systems will look at the received new spawns from the GhostReceiveSystem and determine if they should be predicted or interpolated so it can place it in the correct spawn queue.

The SimulationSystemGroup is also where - on clients - input sampling happens in the GhostInputSystemGroup. We do some processing of the inputs right after this in order to support IInputComponentData / InputEvent. After that processing we send the commands to the server. We want to do this as early as possible to reduce latency, but we need to decode the snapshots first since the input packet also contains snapshot acks.

Prediction
Everything up until this point is netcode internal stuff intended to run before your code, the first user facing thing that runs is the PredictedSimulationSystemGroup. This is setup to mimic what SimulationSystemGroup would normally do in a single player game, but there are a few differences.

The prediction group will run multiple times on clients, but only once on servers because the full simulation group is fixed step there. The last step on clients can have a lower delta time which we call a partial tick - so it is not quite fixed step on clients. This partial tick is basically reusing the prediction code to do extrapolation so we can render at a higher frequency than the simulation tick rate.

The prediction group will set the enabled state of the Simulate component on all ghosts before it runs. This is important because ghosts can be rolled back to different snapshots which means only a subset are to be simulated in a given prediction step. All systems in this group writing to ghosts must use WithAll<Simulate>. You can always have that check - also in non-netcode single player systems which means you can have systems work in both netcode and non-netcode contexts.

On servers there is also some systems which applies the correct received commands to the IInputComponentData, on the client the prediction loop is the only place you should process inputs since they will not run for all ticks and not handle server corrections outside the prediction loop.

The start of the prediction is the PredictedFixedStepSimulationSystemGroup. This is the same as FixedStepSimulationSystemGroup but for prediction, this is where predicted physics runs.

After that your non-fixed step prediction code runs.

At the very end of the last full prediction step in a frame we will do a backup of the predicted values on the client. This is used for a few things. It is used to restore data in the GhostUpdateSystem if we did not receive new data from the server. It is also used for smoothing and validation. Next time we predict this exact tick after a rollback we compare the values to this backup to detect prediction errors and to smooth out the errors.

After prediction has run you have another game loop which is client or server only, the normal SimulationSystemGroup systems. This has the normal fixed step simulation followed by your systems. So you can really see it as two separate loops, first one for the core game state which can be predicted, then one for client or server specific logic.

Simulation system group end
The last thing that runs in the simulation system group - after EndSimulationCommandBufferSystem - is GhostSendSystem on servers - which serializes the current world state and sends it - and the RpcSystem which serializes all new RPCs created this frame and sends them.

DreamingImLatios · Dec 5, 2022

timjohansson said: ↑

Here's a first attempt at mapping out what netcode is doing in a frame.
Click to expand...

This is incredible! Thank you so much! The extra notes about time steps and interpolation really helped everything "click". I'm sure I will probably have a question or two once I start trying to implement things for real, but those will pale in comparison to the number of questions you just answered.

Roadblock #2 – From Lobbies to In-Game on a Sketchy Connection
I recognize that RPCs are the primary mechanism in the examples for facilitating lobbies, commanding scene changes, ect. But what I don't understand is how to effectively reason about and handle failure cases on bad connections, especially in situations where there is a timed window to join the game, but a client may send the join RPC request before the timeout, but the server may not see it until afterward. And what about the situation where no late joins are allowed, and the server needs to ensure that all clients have successfully switched to the next scene before starting the simulation? What tools are at my disposal for these sorts of things? How do I avoid weird edge cases that can get the client and server out of sync for these types of operations?

timjohansson · Dec 5, 2022

DreamingImLatios said: ↑

Roadblock #2 – From Lobbies to In-Game on a Sketchy Connection
Click to expand...

RPCs are reliable as far as eventually arriving and arriving in order, but they can be processed out of order. All messages received in one frame are processed in undefined order even if they were send on different frames - use your own sequence number if order is really important and you expect multiple messages in flight. Lower sequence numbers will be guaranteed to have arrived this frame or earlier as long as you send in order.

The two issues you can get is timing - you don't know exactly when the message you just sent will arrive - and flooding, you consistently try to send more messages than the transport layer can handle so the queue keeps growing.

The flooding issue is mostly a matter of keeping the number of RPCs low.

For timing, the model we are using is server authoritative, so I always apply that to RPCs too. I don't care when the client sent something, I only care about when the server received it. In my experience putting the responsibility of dealing with that on the client - which has a single delay to consider - is much more robust than trying to have the server reason about the different delays for each connected client.

There are many different ways to deal with that delay client side depending on context. If you have some countdown for joining you can for example try to adjust the countdown displayed on the client so it takes the expected time to send the message into account. The timer will not be perfectly aligned with the server anyway since time is not synchronized between client and server (neither predicted tick nor interpolated tick is the current server time).

For making sure everyone has loaded a map, that is mostly a matter of the server waiting for all clients to confirm they are done before proceeding. The flow would be something like:
* Server puts all client not in game
* Server broadcasts "load level"
* Server loads the level
* Clients receive the message and start loading level
* Clients when done send "level loaded" to the server
* When server has receive "level loaded" for all clients and has loaded it's own level, start the game again

There is additional complexity if a client joins in the middle of the transition, then you need to send "load level" to that client too, and wait for it - or send "you can't join" and close the connection if joining is not allowed.

DreamingImLatios · Dec 5, 2022

This will probably be an area I struggle with, but nonetheless, thank you for your exceptional answers and explanations! Once the Transform System stabilizes, I think I will be ready to give this a real shot without feeling completely overwhelmed.

Depending on how far I get, I may post again in this thread in a month or so, or I may create separate threads. To be honest, I'm kinda speechless. I've had this gap in understanding the relationship between the low-level ECS reasoning and NetCode's design for several years now. And just today, that gap got filled and now it is just the minute details in the code I need to learn by doing. I cannot thank you enough! This has opened up a whole new world of potential for me!

Search Unity

Question Mega-thread to help me learn to reason about NetCode

DreamingImLatios

philsa-unity

Unity Technologies

DreamingImLatios

philsa-unity

Unity Technologies

DreamingImLatios

philsa-unity

Unity Technologies

DreamingImLatios

CMarastoni

Unity Technologies

PolarTron

CMarastoni

Unity Technologies

timjohansson

Unity Technologies

DreamingImLatios

timjohansson

Unity Technologies

DreamingImLatios

Search Unity

Unity ID

Useful Searches

Question Mega-thread to help me learn to reason about NetCode

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies

Unity Technologies