Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Editor serialization of large arrays

Discussion in 'Scripting' started by burningmime, Mar 18, 2023.

  1. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    I noticed in my YAML file if I have a ScritableObject (or other editor-serialized type) with a big array of a primitive type (float, bye, int, etc) it'll be hex-encoded:

    Code (CSharp):
    1. indicesFace: 03000000020000000100000.....
    But if it's an array of Unity.Mathematics.float2, it'll be written verbosely:

    Code (CSharp):
    1. verticesFace:
    2.       - x: 1
    3.         y: 1
    4.       - x: 0
    5.         y: 1
    6.       - x: 0
    7.         y: 0.78497314
    8.       - x: 0.0368042
    9.         y: 0.78497314
    10.       - x: 0.0869751
    11.         y: 0.78497314
    12.       - x: 0.5449219
    13.         y: 0.78497314
    14. .......
    This bloats the YAML files, which makes them huge on disk. But, fine, I guess with enough compression, they won't end up much bigger than the binary representation (assuming the compression algorithm is good at picking up redundancy), so it's not a big deal for source control.

    However, I'm worried about load time. Editor is going to have to parse all those strings, which is much slower than parsing the hex. But, fine, that's just the editor.

    However, I'm more worried about load times in the player. Will it use reflection or some other slow-ish mechanism to load each field of each struct instead of just blitting the bits? If you have a mesh with 30,000 vertices that suddenly becomes a (minor) concern. And load times are something I don't want to sacrifice.

    If I gotta pack all my vectors into float[] arrays, so be it. Just wondering if that's really the right way because it seems so pointless. It would make sense for complex structs, but for vectors and matrices, storing each field separately seems like a waste.

    EDIT: float[] arrays don't work either; they still get turned into decimal :-(. I really really don't want to do byte[], but I guess that's where we gotta go.
     
    Last edited: Mar 19, 2023
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,514
    It's a good question... the other problem with decimal representations such as above is that they inherently lose accuracy going from base-2 to base-10 back and forth.

    You can actually use the
    System.BitConverter
    get the 4 bytes used in the IEEE Single precision representation and then store those 4 bytes however you like, and your floats will come back perfect-o.

    For a large number of things in the editor you might even wanna make your own custom inspector where you could do this packing, as well as improve the presentation and usability of the data.
     
    burningmime likes this.
  3. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,070
    Yes, as Kurt said, if you have issues with the speed of deserialization because you're YAMLs are so huge, you ought to write your own deserialization algorithm. However, if you're having this much data in your YAML, you're abusing the system anyway. Better make custom binary data in their own separate files and stream them the old school way.
     
    burningmime and Bunny83 like this.
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,912
    Absolutely. The performance in a build game would probably not really matter as the load time of the Unity engine as a whole is usually orders of magnitudes larger than the loading of the serialized data. However the performance of the editor would suffer when testing in the editor since the Inspector would serialize your data every frame. So large serialized data arrays is never a good idea.
     
    orionsyndrome and burningmime like this.
  5. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,070
    It is relatively easy, btw, to roll your own serializer/deserializer that also zips data.

    I've made an extremely fast bounded voxel containers with fully-featured editor. To make them responsive and scalable, I've implemented ISerializationCallbackReceiver interface, and used System.IO.Compression.GZipStream. Then I've built an entire editor around it, so that I can control the rate at which callback receiver is actually called (which is only when the scene is opened or saved, also play/stop).

    This project of mine was highly successful, and frankly it really wouldn't work otherwise, as the classic array approach would make my scene YAML enormous even for small containers. As you can imagine, these tend to grow cubically, yet the only interesting parts of them are the state changes between something and nothing. Nearly no change in performance (there is a hard upper bound on the chunk size anyway, because you're supposed to have multiple chunks for various runtime reasons), but the file is less than 1% of the raw size.

    So depending on what exactly you're doing, this can be a graceful and compatible solution, and truly fire & forget.

    Here's a working template
    Code (csharp):
    1. using System;
    2. using System.IO;
    3. using System.IO.Compression;
    4. using UnityEngine;
    5.  
    6. [Serializable]
    7. public class Container : ISerializationCallbackReceiver {
    8.  
    9.   [NonSerialized] uint[] data;
    10.   [SerializeField] Vector3Int size;
    11.   [SerializeField] [HideInInspector] byte[] _compressed;
    12.  
    13.   public Container(Vector3Int size) {
    14.     data = new uint[...];
    15.     this.size = size;
    16.   }
    17.  
    18.   // only data var is used for the actual API, but I've removed all accessors
    19.   // and miscellaneous code for clarity; you don't expose _compressed
    20.  
    21.   // satisfies the interface directly
    22.   void ISerializationCallbackReceiver.OnAfterDeserialize() {
    23.     try {
    24.       if(_compressed is null || _compressed.Length == 0) return;
    25.       FromCompressedByteArray(_compressed);
    26.     } catch(Exception ex) {
    27.       Debug.LogError(ex.ToString());
    28.       Debug.LogError("Couldn't deserialize data.");
    29.     }
    30.   }
    31.  
    32.   void ISerializationCallbackReceiver.OnBeforeSerialize()
    33.     => SaveDataForSerialization();
    34.  
    35.   // I don't know why this is public, probably by mistake
    36.   // or maybe I wrote some tests, can't remember
    37.   public void FromCompressedByteArray(byte[] compressed) {
    38.     byte[] streambuff = new byte[...]; // use size to infer the actual size of the buffer
    39.  
    40.     using(var ms = new MemoryStream(compressed)) {
    41.       using(var gzs = new GZipStream(ms, CompressionMode.Decompress)) {
    42.         gzs.Read(streambuff, 0, streambuff.Length);
    43.       }
    44.     }
    45.  
    46.     data = new uint[...];
    47.  
    48.     // build data from streambuff bytes
    49.     // my algorithm was a bit complicated because of the actual data structures involved
    50.   }
    51.  
    52.   // similarly doesn't have to be public
    53.   public void SaveDataForSerialization() {
    54. #if !UNITY_EDITOR
    55.     if(Application.isPlaying) return;
    56. #endif
    57.  
    58.     if(data is null) {
    59.       Debug.LogError("Data is null upon saving!");
    60.       return;
    61.     }
    62.  
    63.     try {
    64.       _compressed = ToCompressedByteArray();
    65.  
    66.     } catch(Exception ex) {
    67.       Debug.LogError(ex);
    68.  
    69.     }
    70.   }
    71.  
    72.   public byte[] ToCompressedByteArray() {
    73.     byte[] returnArray;
    74.  
    75.     // prepare your data for compression, I had to use shortbuff
    76.     // to cut uints into bytes that are loaded into streambuff, ymmv
    77.     byte[] streambuff = new byte[...];
    78.  
    79.     using(var ms = new MemoryStream()) {
    80.       using(var gzs = new GZipStream(ms, CompressionMode.Compress)) {
    81.         gzs.Write(streambuff, 0, streambuff.Length);
    82.       }
    83.  
    84.       // this doesn't matter much if the result is highly compressed
    85.       // besides, serialization is only for the editor
    86.       returnArray = ms.ToArray();
    87.     }
    88.  
    89.     return returnArray;
    90.   }
    91.  
    92. }
    This code is a custom variant of multiple examples I've found on github. I've experimented with some other stuff as well, but this ended up in my final code, and I'm happy to continue using this approach, because it's gloriously simple.
     
    Last edited: Mar 19, 2023
    SisusCo and Bunny83 like this.
  6. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,912
    Yes, that's also an option. Though compression may not be necessary depending on the exact usecase. In your case of course it makes totally sense. I once made this SerializableType which is a System.Type wrapper that can serialize any System.Type (even the most complex generic types) as a byte stream. It does not compress anything since most types would be just a a few hundred bytes. Though you could implement compression as well. It's just a matter of an additional using line :)
     
    orionsyndrome likes this.
  7. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    All right, I put it into a byte array. Which trades GC pressure for serialization performance, so it's not a perfect solution. Unity only targets little-endian platforms, right? *waits for Sony to announce the PS6 uses z/Architecture*

    Code (CSharp):
    1. [Serializable]
    2. public struct HoleGeometrySection : ISerializationCallbackReceiver
    3. {
    4.     [NonSerialized] public float2[] vertices;
    5.     public int[] indices;
    6.     [SerializeField, HideInInspector] private byte[] _vertexData;
    7.    
    8.     unsafe void ISerializationCallbackReceiver.OnBeforeSerialize()
    9.     {
    10.         if(vertices != null && vertices.Length > 0)
    11.         {
    12.             long byteLength = sizeof(float2) * vertices.Length;
    13.             if(_vertexData == null || _vertexData.Length != byteLength)
    14.                 _vertexData = new byte[byteLength];
    15.             fixed(void* pSrc = vertices)
    16.             fixed(void* pDst = _vertexData)
    17.                 UnsafeUtility.MemCpy(pDst, pSrc, byteLength);
    18.         }
    19.         else
    20.         {
    21.             _vertexData = Array.Empty<byte>();
    22.         }
    23.     }
    24.  
    25.     unsafe void ISerializationCallbackReceiver.OnAfterDeserialize()
    26.     {
    27.         if(_vertexData != null && _vertexData.Length >= sizeof(float2))
    28.         {
    29.             int vertexCount = _vertexData.Length / sizeof(float2);
    30.             if(vertices == null || vertices.Length != vertexCount)
    31.                 vertices = new float2[vertexCount];
    32.             long byteLength = sizeof(float2) * vertexCount;
    33.             fixed(void* pSrc = _vertexData)
    34.             fixed(void* pDst = vertices)
    35.                 UnsafeUtility.MemCpy(pDst, pSrc, byteLength);
    36.         }
    37.         else
    38.         {
    39.             vertices = Array.Empty<float2>();
    40.         }
    41.        
    42.         // Reduce memory usage in the built player, and assume it'll never need to be re-serialized. In the editor,
    43.         // keep the temp array around to avoid GC pressure.
    44.         if(!Application.isEditor)
    45.             _vertexData = null;
    46.     }
    47. }
    Although Unity itself serializes as a big text files with decimal strings (eg save a Mesh as a
    .asset
    file, or see
    .animation
    s), so maybe it's not that bad. I can't really notice/measure performance witho
     
    Last edited: Mar 19, 2023
  8. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,514
    While I certainly respect and appreciate your enthusiasm to dust off a good old
    memcpy(3)
    and get to ransacking some memory, I think you can bypass all your endian concerns with
    System.BitConverter
    because it actually has the
    .IsLittleEndian
    read-only property you can hit.

    Also, not sure if your solution would work on WebGL, and perhaps not even on the mobile AOT targets.

    But hey, your data, your project! Nice solution. I wouldn'ta reached for unsafe; I've actually never been unsafe in C#, except to call interop into C/C++ endpoints... that's where I do my most spectacular memory damage actually.
     
    orionsyndrome likes this.
  9. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845

    Just tested and it does work in WebGL


    I'd *hope* BitConverter in a loop would JIT compile down to be as fast as memcpy, but I don't really trust Mono ;-P.
     
    Last edited: Mar 19, 2023