Search Unity

  1. Unity 2019.2 is now released.
    Dismiss Notice

Will the job system improve networking and multiplayer speeds?

Discussion in 'Data Oriented Technology Stack' started by Arowx, Feb 15, 2018.

  1. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,396
    My point was more about potential and the current state of modern GC's. The 4ms was more to illustrate the rather large difference between what you can do with the JVM standard collectors currently and what you can do in .Net. The numbers I posted were actually from a server environment tested at around 50k messages second into a full blown realtime game server doing real work.
     
  2. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    Well, it looks like I need to write another benchmark, but for different GC solutions... :D
     
  3. Creepgin

    Creepgin

    Joined:
    Dec 14, 2010
    Posts:
    297
    @nxrighthere Do it. That'll be the best thing that comes out of this thread!
     
  4. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    All right. Luckily, my programming skills are not limited to just C#, so let's shake as many garbage collectors as possible, and see how old and modern technologies work today.

    It will take some time...
     
  5. Per

    Per

    Joined:
    Jun 25, 2009
    Posts:
    456
    And that's part of the problem. Most people don't have the time or want to take the time, hence they're using a game engine rather than rolling their own.

    Think of this less as an architecture and more as an enforced code style guide and it'll make more sense. It's there to try to help the vast majority of users to get the best possible MP performance by utilizing what it essentially SIMD, a well known and understood model for MP development. But without them having to know or understand the concept itself.

    For most users being forced to work this way will help them get an easy speedup even in their worst case scenario. That doesn't mean people can't write bad code with it that fails to scale well or is just plain slow but a lot of the most common problems have been dealt with for the average user, and I think that's the point. Anyone who is gifted enough could go and write their own system or use other libraries in either C# or C++ and perhaps get superior performance, I don't see anything actually stopping them from doing that if they wanted to though - this isn't for those people, this is for the average users who maybe understand the concepts of MP but perhaps don't know the pitfalls. It's a way to reduce the support load from peoples dumb programming mistakes while giving a solid new feature of, well, speed as far as most users are concerned.
     
    nxrighthere likes this.
  6. interpol_kun

    interpol_kun

    Joined:
    Jul 28, 2016
    Posts:
    75
    Correct me If I am wrong: you're telling that the simulation runs at ~1100 fps on a MacBook Pro? My F***ing god, your tech demos from autumn Unite talks were x10 times slower. Do I miss something?
     
  7. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,715
    It's not running at 1100 fps because its heavily GPU bound at this point...

    The simulation speed has indeed gotten 10x faster.

    This is mostly Mike Acton optimizing the code:
    https://twitter.com/mike_acton
     
    hippocoder likes this.
  8. Ethan_VisualVocal

    Ethan_VisualVocal

    Joined:
    Mar 23, 2016
    Posts:
    143
    Wouldn't going with CoreCLR leave Unity without a unified platform-support solution that included iOS? (see: https://github.com/dotnet/coreclr/issues/14950)
     
  9. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    @Creepgin You are welcome.

    I'll add tests for more platforms when I have time for it. I especially want to see how performing the Zing and Immix GC. By the way, Haskell/GHC can't pass this test on my machine due to stack overflow error. The results of the test in Unity: light mode - 4 milliseconds and heavy mode - 1,11 minute.

    @Ethan_VisualVocal For console and mobile platforms Unity has IL2CPP which, as far as I know, is preferable than Mono and this is the only scripting backend available when building for iOS. I'm not familiar with these platforms, and I have almost no experience with them. Therefore, I have no answers to questions in this area.
     
    Last edited: Mar 17, 2018
  10. Creepgin

    Creepgin

    Joined:
    Dec 14, 2010
    Posts:
    297
    @nxrighthere Nice!

    I tested both mono and dotnet using the C# file on my 2015 Macbook Pro.

    Mono Light mode: 1ms to 2ms (most of the time 1ms)
    Mono Heavy mode: ~10s
    (sgen or boehm didn't make much difference)

    Dotnet Light mode: 0ms to 1ms (most of the time 1ms)
    Dotnet Heavy mode: ~20s

    Also for fun, I tested the js file. Light mode 5ms to 6ms. Heavy mode ~22s. (Node v9.1.0)
     
  11. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    Did you check which GC you are running on using mono --version command? How did you switch them? The difference should be significant.
    I guess it's Workstation GC? If yes, then what's about Server GC?
     
    Last edited: Mar 15, 2018
  12. Creepgin

    Creepgin

    Joined:
    Dec 14, 2010
    Posts:
    297

    mono --version
    Mono JIT compiler version 5.8.0.129 (2017-10/835a98d8559 Fri Feb 23 21:09:37 EST 2018)
    Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
    TLS: normal
    SIGSEGV: altstack
    Notification: kqueue
    Architecture: amd64
    Disabled: none
    Misc: softdebug
    LLVM: yes(3.6.0svn-mono-master/8b1520c8aae)
    GC: sgen (concurrent by default)


    Switching using --gc=sgen and --gc=boehm.

    For dotnet, I'm not very familiar with the server gc. How do you enable it? Is the following enough (in the .csproj)?


    <PropertyGroup>
    <ServerGarbageCollection>true</ServerGarbageCollection>
    <ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>
    <RetainVMGarbageCollection>true</RetainVMGarbageCollection>
    <ThreadPoolMinThreads>4</ThreadPoolMinThreads>
    <ThreadPoolMaxThreads>25</ThreadPoolMaxThreads>
    </PropertyGroup>

    It's not doing much on my end (still ~20s on heavy). So I may be missing something here.

    Update:
    Also tested mono 5.10 on my PC. Light mode is always 0ms. Heavy is ~8.7s. Again sgen vs boehm didn't differ much (only <30ms diff in heavy mode).

    .Net Core test on my PC is 0ms light and ~17s heavy. Tried different ways of turning on server gc, but I guess no luck so far. My app.config is below, but GCSettings.IsServerGC is always false.



    <?xml version="1.0"?>
    <configuration>
    <runtime>
    <gcServer enabled="true" />
    </runtime>
    </configuration>
     
    Last edited: Mar 15, 2018
  13. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    This command is not working actually, and if you take a look at the log you will found that:
    Error: --gc option not supported on this platform.
    So, you are running the tests always with SGen, and that's why there's no difference in the results.

    I've just compiled Mono with libmono instead of libmonosgen. I think this is the only way to switch them, unfortunately.

    You can simply run the test in Unity:
    Code (csharp):
    1. using UnityEngine;
    2. using System;
    3. using System.Diagnostics;
    4.  
    5. public class GCLX : MonoBehaviour {
    6.    public bool heavy = false;
    7.    private int windowSize = 0;
    8.    private int messageSize = 0;
    9.    private int messageCount = 0;
    10.  
    11.    private byte[] CreateMessage(int id) {
    12.        byte[] data = new byte[messageSize];
    13.  
    14.        for (int i = 0; i < data.Length; i++) {
    15.            data[i] = (byte)id;
    16.        }
    17.  
    18.        return data;
    19.    }
    20.  
    21.    private void PushMessage(byte[][] array, int id) {
    22.        array[id % windowSize] = CreateMessage(id);
    23.    }
    24.  
    25.    private void Start() {
    26.        if (!heavy) {
    27.            windowSize = 100;
    28.            messageSize = 128;
    29.            messageCount = 5000;
    30.        } else {
    31.            windowSize = 200000;
    32.            messageSize = 1024;
    33.            messageCount = 10000000;
    34.        }
    35.  
    36.        byte[][] array = new byte[windowSize][];
    37.  
    38.        print("Measuring GC latency...");
    39.  
    40.        Stopwatch elapsedTime = Stopwatch.StartNew();
    41.  
    42.        for (int i = 0; i < messageCount; i++) {
    43.            PushMessage(array, i);
    44.        }
    45.  
    46.        elapsedTime.Stop();
    47.  
    48.        print("Done!");
    49.  
    50.        print("Elapsed time: " + (elapsedTime.Elapsed.Minutes > 0 ? elapsedTime.Elapsed.Minutes.ToString("0") + " m, " : "") + (elapsedTime.Elapsed.Seconds > 0 ? elapsedTime.Elapsed.Seconds.ToString("0") + "," : "") + elapsedTime.Elapsed.Milliseconds.ToString("0") + (elapsedTime.Elapsed.Seconds > 0 ? " s" : " ms"));
    51.    }
    52. }

    Yea, I had the same issue a while ago with .NET Framework. It's not documented anywhere, so after some trials, I found that the configuration file should be named exactly as the application itself with extension. For example, if you have Standalone.exe, the configuration file must be named Standalone.exe.config and only then it will work.

    The content of the configuration file should be like this:
    Code (xml):
    1. <?xml version="1.0" encoding="utf-8"?>
    2. <configuration>
    3.   <startup>
    4.     <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
    5.   </startup>
    6.   <runtime>
    7.     <gcServer enabled="true"/>
    8.     <gcConcurrent enabled="true"/>
    9.   </runtime>
    10. </configuration>
    With .NET Core all what you need is just add the following options to csproj:
    Code (csproj):
    1. <Project Sdk="Microsoft.NET.Sdk">
    2.   <PropertyGroup>
    3.     <ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>
    4.     <ServerGarbageCollection>true</ServerGarbageCollection>
    5.   </PropertyGroup>
    6. </Project>
     
    Last edited: Mar 15, 2018
    Creepgin likes this.
  14. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,715
    As far as I can see this benchmark doesn't really measure what is relevant.

    1) Allocating a bunch of byte[] arrays that are only referenced on the stack is not what game code actaully looks like in practice. You have references to other entities. You have classes holding on to containers etc

    2) It's not unusual to have 100-200mb of total GC memory when you are going all OO style on game code. They are in different generations. Lots of temporary memory allocated every frame, a bunch of memory that has been allocated at game startup and a bunch of memory that lives for when a streamable scene is loaded / unloaded. In real actual game code many of these GC objects reference each other.

    3) I am not sure measuring the total time gives you relevant results. Whats important is to measure the maximum additional time spent on garbage collection when it actually runs.

    This is what makes games different than server workload. Game developers generally have one goal. Hit framerate consistently (Every frame without exception). 30FPS, 60FPS, 90FPS.

    The average cost of GC every frame is completely irrelevant. The only thing that matters is the maximum time spent on garbage collection every frame. Because in practice if you set yourself a goal of hitting consistent framerate thats number is exactly what you have to subtract from other systems and just reserve "Just in case garbage collection runs".
     
    Ethan_VisualVocal and Creepgin like this.
  15. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    I mostly agree with you, and latency in such tests should be measured with something like PerfView. But as far as I know, an alternative to ETW events is not implemented in Mono. I would like to see equivalent tracing system in it, but Mono itself is full of stubs where's functionality could be very useful. Especially for those who care about the performance.

    The methodology of this test matched to what people do in Haskell, OCaml, Racket and so on. It's not perfect, but I can't find another method, unfortunately.
     
    Last edited: Mar 15, 2018
  16. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,396
    The problem with averages is they can be the same but hide massive differences in behavior. If you threw in a standard deviation it would start to provide at least some useful information.
     
  17. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,396
    If you want to drool, fire up VIsualVM and look at the GC stats they have. You can sit there and watch graphs of every generation and see how the GC is working in relation to exactly what work your app is doing.

    In the end the only methodology that works is to profile a specific app, because it's all about what types of objects you are creating. Ie short vs medium lifetime objects and how they are being promoted. Good tools to measure this stuff is invaluable, you start getting a better idea of how a specific GC works even if you lack direct control over it.
     
    nxrighthere likes this.
  18. nepoez

    nepoez

    Joined:
    Sep 11, 2011
    Posts:
    298
    I have an idea, probably dumb one. But would be cool if we keep writing in OOP, and unity will convert at build time into DoD code since they are already converting our c# code to c++, why not add another layer of auto conversion? I'm probably just dreaming tho :(


     
  19. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,084
    That would be too unrealistic

    Not only that, but I also think OOP doesn't always scale very well. The ECS style of coding will probably result in much cleaner code and more robust architectures
     
  20. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    If OOP is done right, there are no issues with scalability. Most developers don't understand the problems of OO designs and, as a result, they write bad code (Asset Store is a good example). Before jump to DOD, they must learn and understand OO issues and how they can be solved. These are two things that collide each other.
     
    Last edited: Mar 24, 2018
  21. angusmf

    angusmf

    Joined:
    Jan 19, 2015
    Posts:
    255
    A friend of mine and I are supposedly going to start developing a new API based on the MSF. The back end wouldn't be a Unity process, but will be written against .NET Standard so it can still run on Linux. I'm looking into using Microsoft Orleans for scalability. If anyone is interested and able to help with this project, lmk
     
  22. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    I would suggest you look at Proto.Actor instead of Orleans. It's a great Actor Model framework in my opinion. I'm currently using it in my scalable projects and I'm very happy.

    If you are going to work on an open-source project, I would like to help you.
     
    Last edited: Mar 24, 2018
    Ethan_VisualVocal likes this.
  23. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,396
    If you need any tips let me know, I've done a fair amount with Orleans for game servers. Main thing is async/await is really heavy and the built in serialization sucks because it does a deep very expensive clone. To get Orleans suitable for realtime games basically do the following.

    - Hot path stuff like position/location updates keep out of grains. I pass incoming hotpath messages directly from dotnetty to disruptor. Disruptor also works really well for passing messages from your networking layer to orleans. Much more efficient and garbage friendly then using streams, which is the idiomatic approach to that.

    - Use protocol buffer serialization, it's built in just have to setup a couple of config lines to tell the system to use it.

    - Use Immutable everywhere for grain calls. Orleans serializes local grain calls. You want that in most apps, for realtime games you don't.
     
    Ethan_VisualVocal likes this.
  24. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,396
    Seen it haven't used it, but I already knew Akka so well it would take something major to learn a new framework as similar.

    I wanted to give Orleans a serious try so i'm using it on my current game. Which is a massive scale moba/mmo hybrid. I started out using idiomatic Orleans and ended up with a system that was creating garbage like crazy and eating up 30% of my cpu. that was for a zone with heavy combat going on.

    Once I moved the hotpath out of grains into something more suitable, I'm now at 2% cpu utilization for the same load. Almost all of that difference came from first Orleans serialization, and second async/await.
     
    Ethan_VisualVocal likes this.
  25. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    Yea, I've read your stuff about Akka and Orleans. I'm glad that you shared your experience, for me such information was very useful.

    By the way, here are the docs and examples which could be helpful in learning Proto.Actor.
     
    Ethan_VisualVocal likes this.
  26. angusmf

    angusmf

    Joined:
    Jan 19, 2015
    Posts:
    255
    Lol. Thanks guys. Posted to this thread unintentionally, but this seems to have worked out in my favor. Tips and assistance greatly appreciated. It will be an open source effort.
     
  27. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    @Joachim_Ante Is this a surprise for my tomorrow's birthday? :D And a new incremental compiler based on Roslyn...

    I can finally use Span, ArrayPool, Concurrent Collections, TPL Dataflow, and other neat stuff right in Unity without breaking my bones with back-porting. Thank you.

    I wonder, will someday the SIMD hardware-based acceleration work in Unity for vector types from System.Numerics.Vectors?

    And then I can use it with the jemalloc.NET... And... Ah, I'm just dreaming!
     
    Last edited: Mar 29, 2018
  28. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    Nevermind, I just saw this today.
     
  29. nxrighthere

    nxrighthere

    Joined:
    Mar 2, 2014
    Posts:
    537
    Well, it seems that Mono guys woke up, thanks to @Aras for his experimental path tracer. Better late than never.
     
    Last edited: Apr 18, 2018