Client-Server Lockstep Model

jpthek9 · Feb 3, 2015

Introduction
Hi, this is an interesting topic that I have found has not been very well explored on easily reachable parts of the internet. I'm setting up my RTS game with a client-server lockstep model because of the many benefits of having a 'middleman' server facilitate communications. This is somewhat of a guide as well as a discussion thread as there are still some things I haven't figured out yet. I figured that with the additional brain power of the minds of the Unity Community, the system overall can be optimized and solidified as a resource for everyone to use. I'll keep updating this post with any developments. First of all, if you're unfamiliar with how the lockstep system for RTS games, check this out. Our model will be based upon the same principles of synchronous simulations as p2p but will have the following improvements:

Every player will only have to communicate with the server; much less data going around and additional bandwidth increase linearly instead of quadratically per additional player.

The server is 'omniscient' about everything including players' confirmations, latency, etc.. This means that confirmations won't have to be replicated and sent to all players - only 1 'go ahead' from the server.

Latency compensations can be decided by the server who has knowledge of everyone's latency.

No NAT punch-through needed. This is because every client will communicate only with the omniscient server - not other clients.

I suspect that Starcraft 2 uses this model but it must be a company-held secret because there's only speculation as to how they do it. I suspect they do this because there've been no mentions of opening ports or such. Also, I can play on EU and Asia servers without others complaining about command delay. With p2p, my latency would affect other people quite significantly but with a middleman server, commands would only be lagged by half my round trip time (I'll go into this some more later). Confirmations would only need to be sent to the central server instead of to every other player. As you can imagine, this system would be incredibly useful for large-room games but also ones that just need a bit of a responsiveness boost. Please contribute any ideas or suggestions you have and together we'll produce one just as good as if not better than that of Starcraft 2.

The Basics
To start, let's run through the basics of the setup:

There must have support for up to 16 players - easily feasable with some central server magic.

I will be using Forge for my project because of its efficiency and flexibility. Any networking solution that allows you to 1. Run custom server logic, 2. Send varying amounts of data in a single packet, and 3. Capture and interpret data will work.

The server is 'smart' and will keep track of player connections and packet receptions as well as communicate to people if they need to stop and get back in sync.

Both the client and server run at 10 steps per second implemented in FixedUpdate(). This includes simulating the game and processing communications. This rate can be changed to accommodate your individual game's needs but in general, try to keep it low for bandwidth efficiency.

Each command is packaged in a class that defines several optional attributes: (Byte) Player ID, (Byte) Command ID, (ushort) PosX, (ushort) PosY, and Target. An extension of this that describes the selected units can be communicated similarly but for the sake of edification, it will not be discussed. For efficiency and determinism, PosX and PosY are ushorts (scale the maximum and minimum of your map size to fit in the range of a short and do some rounding).

In order to keep in sync and identify when things happen, the current 'primary' step count will be kept count of - this is the total number of simulations passed. If all commands are spaced exactly the same steps apart from the first command, the simulation should be the same. There will also be another secondary step count for when there is lag and the primary step count is paused (this is explained further).

All simulation will use doubles if doing isn't too inconvenient. You can grab a great double precision vector library here for positions and movement calculations. If you must use the physics engine, round upwards (or downwards) to the first decimal place and keep your positions under 5 non-decimal digits long to keep floating point imprecisions to a minimum. A lockstep system would be pointless if different simulations were produced.

To keep simulations in sync, a simple boolean will be used that doesn't let simulations proceed if triggered.

Turns = steps = frames = a FixedUpdate(). I use them synonymously.

The Core
As per custom of the p2p model, a packet will be sent every step even if no commands are made. These packets will always contain a header identifying the step it's intended for (more on that later) and the player who sent it. Any commands or updates will be inserted into these packets.

An example of how this system would work is if Player JPtheK9 tells his unit to attack a base, a command will be generated. The current step JPtheK9's ID is 7, the attack command ID is 4, and the base's unit ID is 37. No positions are needed so PosX and PosY can just be given the value of 0 - it won't really matter what value they are. Next, the designated step to run is needed for the packet's header. This is a part I have a sketchy grasp on. In the p2p commands would go to each other player and a confirmation would be sent from everyone to let you know that the command can be run synchronously. This usually takes roughly 100ms for the command to reach the other clients and another 100ms for the confirmation to get back in a p2p model. With the addition of a server, this would take 100ms -> server, 100ms -> other player, 100ms -> server, 100ms -> you, resulting in a bad lag day and very noticeable command delays. Fortunately, there is a solution that I think lies in the server's omniscience. I haven't solidified this very much into code, but here is the general concept I have.

The step to run for sending to the server is pre-decided by clients. The designated frame to run will be a combination of the time to reach the server + the time for the server to reach everyone (slowest person's one trip ping time) which will be calculated and supplied to you by every package from the server. In our example, your ping time is 1 turn (100ms) and the slowest person's ping time is 2 turns (200 ms) so the designated frame to run is your current frame + 3 turns to account for communications latency. For your command to be executed, you don't need confirmation from other players; only confirmation from the server. During the turn the server sends you the confirmation, it will also send everyone else the command. The magic of this system is that it is self-correcting.

Dealing with Possiblities
Back to the example, the step JPtheK9 sends his command on is (i.e.) 1023. The package sent from the server marked for step 1023 says that the time it takes for everyone to get a packet from the server is 2 turns. JPtheK9 knows that it takes 1 turn (based on ping) to reach the server and 2 turns to reach everyone else from the server so the command will be designated to run on turn 1023 + 3 = 1026. 3 Mississippis go by and it's now step 1026. JPtheK9 must examine the packet designated for frame 1026 from the server. There are several possibilities:

The server included a confirmation and everyone is happy

The server didn't include a confirmation because the packet to the server got lost

The server didn't include a confirmation because there was a lag spike and didn't receive the command in time

There is no packet from the server because it got lost

There is no packet from the server because there was a lag spike

When in doubt, consider it lost. #s 2 and 3 can thus be dealt with in the same way and so can #s 4 and 5. If anything arrives too late, act as if it never arrived at all. This is a real time strategy game here! Packets have to be punctual to make the cut. As for all games, consistent latency and packet success can never be guaranteed for every game (though maybe one day computers could maybe predict the future). The best you can do is guesstimate the latency prepare for failure.

#1 is easy to take care of. Everything is in order. Carry on simulating.

#2 and 3 get a bit trickier but a simple thing that can be done is to just drop the command then resend it as a brand new command. The command never got to the server so nobody else knows about it. Everyone still has the same data to simulate.

#4 and 5 is the big essay of this post. If there is no packet at all, how do you even communicate? The answer is: don't. Trigger that aforementioned boolean and stop your simulation. Don't even send a packet to the server!

In the example, JPtheK9 rages and cuts his internet on turn 1024 which means that it will not get the packet from the server's 1024 and 1025, designated for 1026 and 1027. When he reaches turn 1026, he will have found that there is no packet for 1026 (because he lost internet connection when the server sent it). Now he will cut all simulation and sends. That means he doesn't send out a packet on turn 1026 which means that forturn 1027 of the server (current step + steps to reach server), there will be no packet from JPtheK9. On step 1027 when the server finds that there is no packet from JPtheK9, it will freak out and send to everyone connected a 'stop' command which will tell them to freeze as it figures out what to do with JPtheK9. The next packet is designated for frame 1029 (1027 + slowest client reach time). Now, the server will keep sending a similar stop command on turn 1029 for JPtheK9 until JPtheK9 gets one. Once JPtheK9 receives this stop command, he will send an identification for the frame of the packet missed which was 1026.

Now, we have to assume that between frame 1026 and frame 1029 there have been no successful packet sends. This means that it is important for the server to cache commands at least enough to account for the highest possible desync steps; maybe even all if you want to save the commands as a replay to a database. The server sends out every packet since 1026. Now that JPtheK9 has all the information needed, he can immediately catch up until he's at the frame everyone else stopped at but now we have to consider about all the other poor, bored players paused at frame 1029. This is where the second step count comes in. As the primary step count identifying the period of each command stops, the second step count must keep going that will be the true count of games.

The 'go' command tells everyone JPtheK9 is back online so we can all continue playing. Because primary step counts would be paused, we will use the secondary step count to identify the go command. Let's say 100 frames (10 seconds) have gone by before JPtheK9 repents and reconnects his internet (and instantly contacts the server). If the primary and secondary step counts were in sync before, they are now 100 steps apart; the secondary step is at 1129. 1129 + 2 = 1131 which is the designated frame of the packet the server sends out the 'go' message in so when the secondary clock strikes 1131 on all clients, everyone can continue playing the game.

Final Words
These (I hope) are far from the final words but are more of a final touch to this guide/discussion topic to wrap it up neatly. This system will work in theory and I'll be hard at work bringing it to life. There is still a lot of room for optimizations and ideas so please contribute them if you'd like. I'll be communicating closely with this thread as it's directly related to my project. If all goes well for my implementation, I'm thinking of tidying it up a bit and releasing it as an add-on for Forge to give others the tools to make a great multiplayer lockstep RTS game in Unity.

Cranick · Feb 15, 2015

This is a very good guide for everyone to use. Extremely useful and I recommend everyone who is looking for a multiplayer introduction for a good lockstep RTS game for unity this would make it easier to understand.

Basic information of what to expect from a user input receiving and sending, basic understanding of how packets are sent between a networking of people and how the frames could be lost. Thank you Jpthek9, definitely helped me understand a lot more into this solution when I develop my own RTS type of games as well as a good overall understanding of what is achievable and expected in high intense games.

gfoot · Feb 16, 2015

This is pretty much how I have implemented the networking in my RTS networking library - there are some differences but you have the basics laid out fine.

The main thing in your post that rang an alarm bell for me was the suggestion to use doubles and rounding. I would recommend against that - just avoid floating point numbers altogether. It is not hard to write your simulation logic without them, and far safer. My biggest concern with use of floats is the potential for different JIT compilers - especially the AOT used on iOS - to generate slightly different sequences of operations, with different rounding artifacts. Whatever rounding granularity you use, the moment a calculation splits on either side of a grain line on two clients, you get a desync, and that is pretty much the worst thing that can happen to this kind of simulation.

I remember discussing similar things a long time ago - here are two rather old threads on the topic, you might have seen them before:

http://forum.unity3d.com/threads/lock-step-multiplayer-rts-question.142215/
http://forum.unity3d.com/threads/deterministic-rts-game-engine.121947/

Search Unity

Unity ID

Useful Searches

Client-Server Lockstep Model

jpthek9

Cranick

gfoot