Search Unity

How to calculate rough file size of custom class?

Discussion in 'Scripting' started by leegod, Jun 3, 2021.

  1. leegod

    leegod

    Joined:
    May 5, 2010
    Posts:
    2,476
    So I want to make save function.

    And I think to use Binary Formatter.

    Code (CSharp):
    1.  
    2. byte[] data;
    3. using (MemoryStream stream = new MemoryStream(saveDataSize))
    4.         {
    5.             var binaryFormatter = new BinaryFormatter();
    6.             binaryFormatter.Serialize(stream, MyCustomSaveClass);
    7.             stream.Close();
    8.             data = stream.GetBuffer();
    9.         }
    and MyCustomSaveClass is like gathering of bunch of other classes, int, string, List, etc.

    Code (CSharp):
    1. public class MyCustomSaveClass(){
    2.    public int a;
    3.    public string b;
    4.    public OtherClass1 oc1;
    5.    public List<OtherClass2> list = new List<OtherClass2>();
    6. }
    For that, I need to know rough file size of that saved MyCustomSaveClass's instance.

    How to calculate and how can I know it? Rough size (like ?MB) is ok.
     
  2. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,531
    I'd argue the complexity of determining that is not worth the effort, and instead... just serialize it to a memory stream, get the length of it, and throw out the stream is a way faster/easier way of accomplishing it. And would be perfectly accurate. As well as continually accurate if you changed out the serialization to json or other format.

    Though looking at your code you say:
    Code (csharp):
    1. using (MemoryStream stream = new MemoryStream(saveDataSize))
    Which implies to me you're attempting to instantiate a memory stream with the appropriate size before hand... meaning my suggestion technically doesn't work since my suggestion relies on creating the memory stream in the first place.

    But... why are you trying to determine the size before hand?

    Just create a MemoryStream with the 0 parameter constructor. It works just fine.

    If this is some weird attempt at avoiding garbage collection... well I don't necessarily understand what the fuss is all about. Note that the GetBuffer array may be larger than the data you want... and what are you doing with that array? Are you writing it to disk since that's generally where you write save files... in which case... use a FileStream!

    ...

    How about instead of asking how to accomplish some bizarro task like this in an attempt to solve a problem you haven't told us about. Instead explain that problem you're attempting to solve for, and maybe we can help come up with a method of accomplishing it that works best for your needs.

    Though I will say I have a sneeking suspicion you're chasing some "zero gc" perfect scenario that is well... pointless. In the end you're creating a stream of some sort which has a buffer which means a byte array of SOME size exists. So in the end I point back to the FileStream because at least then the buffer array can be compact and just writes to disk. Note a smaller buffer means more disk latency as it requires more frequent writes to said disk... but hey, you have to pay the performance price somewhere. Which is why I suggest that this suspected "zero gc" scenario is a futile goal.

    ...

    But I mean if you really... really really really truly are trying to avoid garbage and want to get this perfect scenario so that you can serialize your game state frequently with no GC hits. OK... create a MemoryStream at game start with a very large buffer size... something larger than you'll ever fill up. Then... recycle the MemoryStream every time you need to serialize your game state, clearing it before use every time. Done.

    I mean that's generally what avoiding GC boils down to. You have 2 options. 1) reduce the amount of memory you have to allocate or 2) recycle the memory you have to allocate.
     
    Last edited: Jun 3, 2021
  3. leegod

    leegod

    Joined:
    May 5, 2010
    Posts:
    2,476
    @lordofduct Actually, I don't know well about GC, and don't want to touch it...

    So my purpose is, I should do because Nintendo Switch console requires save file size decision when the game made firstly.
    So I should write [Save Data Size, Save Data Journal Size] at Player-publishing setting

    And the code too, private const int saveDataSize = 10485760;

    So after know exact needed my actual save-needed class's file size (memory size, whatever),
    I would put value that is slightly more than that size into to Player-publishing setting, and code.

    So you mean,
    use this?

    https://docs.microsoft.com/ko-kr/dotnet/api/system.io.memorystream.length?view=net-5.0
     
    Last edited: Jun 3, 2021
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,993
    Stay away from the BinaryFormatter. Please read the official security guide that Microsoft has put up. The BinaryFormatter has so many issues. Besides the general security implications the format is very verbose. The BinaryFormatter uses the remoting protocol which encodes the whole structure of the classes you serialize. So you can even reconstruct the whole class / types including the exact field types and names.

    Instead, if you really want to use a binary format, just roll your own by using the BinaryWriter / BinaryReader. Those can only write / read primitive types, but since your data seems pretty straight forward this should be trivial. We don't know what "OtherClass1" and "OtherClass2" actually contains, so that's the biggest question here.

    Rolling your own format has several advantages. First given a certain dataset you can calculate exactly how large the file gets. In case of version changes it's generally easier to deal with old save files (though it still has to be handled manually). Finally the overall file size is always smaller than what the BinaryFormatter produces.

    Note that you said you want to know the "rough" file size of a save file. However your data contains a dynamic List of other data. So there is no way to tell how large this "may" get without knowing what are the limits of elements in those dynamic structures. Just for example, assume that an instance of your "OtherClass2" has a fix size, say 200 bytes. If your save game contains 100 instances of that class, that means those instances alone require 20KB. However if a save game only contains 20 of those it's just 4KB. On the other hand if you have 10000 instances it requires 2MB. So if those dynamic elements can vary without bounds, there is no bound to the file size.

    So in order to answer your question we need to know the exact data you want / need to store and what are the upper limits for any dynamic structures. Keep in mind that strings also have a variable length. So those are the questions you should first answer before going any further.

    If against all recommendations you want to continue using the BinaryFormatter, we warned that the serialization behaviour does not scale linearly. There's a fix overhead for every "type" you use since every type has to be specified in the stream. This is essentially metadata. So the more similar data you store, the better the ratio between metadata and actual data becomes. However there are some types / structures which scale horribly. Those are certain Dictionary combinations, all multidimensional arrays and a few other things.
     
  5. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,531
    Yes, that's it. That'll give you the size in bytes.

    As for what Bunny83 refers to, yes, BinaryFormatter has security issues in it.

    But in the end you can use a MemoryStream with most serializers and get the length afterward. Otherwise it creates a byte array or even a string (in the case of unity's json serializer), which are trivial to get the length of.
     
    Bunny83 likes this.
  6. leegod

    leegod

    Joined:
    May 5, 2010
    Posts:
    2,476
    I can take convenience rather than security risk.

    Because I don't want to code binary writer for over hundreds variables in my custom class. And even enums, lists values. Write and load them each one by one.

    And I am not making some high sensitive security one, just console game. And pirate will play whatever I do.
     
  7. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,531
    It's not just a matter of security for you... your game can become a vector for malicious code to operate which hurts the user.
     
    Bunny83 likes this.
  8. Vryken

    Vryken

    Joined:
    Jan 23, 2018
    Posts:
    2,106
    Why not just write this logic once in some utility class, and re-use it everywhere else that's needed?
     
    Bunny83 likes this.
  9. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,531
    There's also other formats like json. Or if binary is so important to OP, use bson (a binary version of json). Both of which come in Newtonsoft Json.Net (well you have to explicitly grab the newtonsoft bson lib along with the base json.net one).
     
    Bunny83 likes this.