Search Unity

Questions about packet compression (Huffman Stream)

Discussion in 'FPS.Sample Game' started by Kichang-Kim, Nov 1, 2018.

  1. Kichang-Kim

    Kichang-Kim

    Joined:
    Oct 19, 2010
    Posts:
    1,011
    Hi, I checked FPSSample and thanks for this great sample.

    I have some questions about packet compression, especially HuffmanInputStream / HuffmanOutputStream.

    My basic understanding for Huffman encoding is here:
    1. Calculate symbol probability distribution of input data
    2. Construct binary-tree and generate symbol-code table by using it
    3. Encode input data using symbol-code table

    And here is my questions:
    1. Is this true that NetworkCompressionModel uses pre-defined symbol-code table? It seems that there is no code for calculating table from actual data.
    2. What are the roles of these variables? :
    NetworkCompressionConstants.k_BucketOffsets
    NetworkCompressionConstants.k_BucketSizes
    "int context" argument of every read/write method

    Thanks.
     
  2. stubbesaurus

    stubbesaurus

    Unity Technologies

    Joined:
    Oct 7, 2016
    Posts:
    3
    Hi,

    Things got a little hectic up to Unite, but we definitely want to revisit this code and document it.

    It seems your basic understanding of Huffman is correct. Symbols are coded using a variable number of bits based on their expected frequency to minimize the expected number of bits needing to be sent.

    1. The default is to use a pre-defined symbol-code table, but there is functionality to generate a model from statistics. NetworkCompressionCapture is a helper class that you can pipe network traffic into and then call AnalyzeAndGenerateModel on to obtail a serialized huffman model that you can use to construct a NetworkCompressionModel specifically for that type of traffic.

    This functionality is also exposed through the console, where 'beginnetworkprofile' starts a recording session, 'endnetworkprofile myfile' ends the recording session and stores the model to a file and finally 'loadcompressionmodel myfile' can be used on the server to load and use a model. Clients connecting after the model has been loaded will receive and use this model as they connect.

    2. When you send something like an 32bit integer over the network, it is impractical to create a huffman tree that encompasses every possible value the integer can take as it would require a tree with 2^32 leaves! To make this more practical we lump values into a managable number of power-of-two-sized buckets and then only code the bucket index with Huffman and code the position in the bucket using a number of raw bits corresponding to the size of the bucket. The buckets are arranged so they are small around 0 and become progressively larger as you move away from zero. The rationale is that as most data is deltas against predictions, we expect values to be small and generally expect most of the redundancy to be in the size of the error and not in exactly which of the values of that size we end up hitting.

    NetworkCompressionConstants.k_BucketSizes specifies the sizes of the buckets in bits, so a bucket of n bits has 2^n values.
    NetworkCompressionConstants.k_BucketOffsets specifies the starting positions of the bucket.

    Bucket n starts at k_BucketOffsets[n] and ends at k_BucketOffsets[n] + (1 << k_BucketSizes[n]).

    The context can be throught of as a submodel that has it's own statistics and uses its own huffman tree. The context used to read and write a specific value must always be the same. The benefit of using multiple contexts is that it allows you to separate the statistics of things that have different expected distributions, which leads to more precise statistics, which again yields better compression. For example FPS sample assigns unique contexts to all fields of all property sheet. More contexts does however result in a marginal cost of a slightly larger model.

    I hope this is helpful.
     
    Enzi and Kichang-Kim like this.
  3. Kichang-Kim

    Kichang-Kim

    Joined:
    Oct 19, 2010
    Posts:
    1,011
    Thanks for detailed explanation!
     
  4. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Hello @stubbesaurus

    Do you have more information on this? Especially on the AnalyzeAndGenerateModel details. It's largely undocumented and somewhat an enigma to me. Is there a resource where these models were created from, like a paper? I'm familiar with Huffman encoding but not these buckets and the way contexts are used.

    In the FPS Sample, there's `maxFieldsPerSchema = 128` which contributes to the `maxcontexts` of the whole compression model. If I increase this to 256, this create an order of magnitude bigger max contents and many more buckets. I'm curious what this is actually doing and how I can perhaps use the profiling functions to improve it.

    In short, the reason I'm doing this is because I've added the ability to use arrays in the fields of a schema. However, each entry of an array is technically its own "field" which means 1:1 context to field. I'm thinking this is the wrong direction to take and may end up condensing the arrays to only use 1 context.