Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

400 MB save files?

Discussion in 'General Discussion' started by Ne0mega, Feb 27, 2020.

  1. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    My save files are getting into the 400 MB range, and that is just a few maps (heightmap and generated terrain mesh data). I actually haven't played any modern games for a long time, but half a GB save file seems excessive. Maybe it is not. What would a normal save file size be for a game with a 4096 x 4096 randomly generated map texture and a 1024 x 1024 vertex mesh as terrain?

    Should I consider using byte and bit arrays instead?
    Everybody says don't prematurely optimize... ...but I kind of feel like I might have to.

    I am guessing they can be greatly compressed, since some data structures are like an 4 million long int[] storing 0 to 63 done 4 million times. Am I going to have to write my own compression algorithms since they are custom binary files?
     
  2. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,204
    If the terrain is generated, you could set up your randomizer algorithm to always produce the same terrain from the same seed, and then only save the randomizer seed. That'd get you down from a couple of hundred megs of mesh to a single string.
     
  3. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    The generation of the map takes about 20 seconds on a pretty fast machine, and I am looking for more like 5 second transition times between maps, since on slower machines it could take minutes to generate a map.
     
  4. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    So a png can hold 65,000,000 data points (4096x4096 x 4 channels) in about 20 - 32 megabytes. Perhaps I will store the map data in textures, if I can figure out how to pull them from video memory. But IIRC, those require some kind of compression first as well to get to that size, and raw are quite large. Will add some complexity to a save game file, but not much, I think.

    I could also trim the fat, I think, by regenerating the X and Z elements of the vertices, since those are quick to generate, and only save the Y values of the vertices, since those are basically what take all the calculation. Perhaps I could reduce further to a byte array, although I fear when rebuilding into a float a Vector3 can use for a vertexes Y value, the conversion from byte back to float might add a lot of time, though I doubt it.
     
    Last edited: Feb 27, 2020
  5. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,594
    Why you are focusing on storing map so much?
    Mentioned random seed, brings size of file significantly.
    And even you modify something on the map, or have objects, or items, you simply store locations coordinates, item ID, and optional additional info.
    Now you got probably down, to less than 100MB, with populated world.
     
    MadeFromPolygons likes this.
  6. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    3,886
    Yeah basically this. Seed the thing and use the seed to regenerate the same layout.

    For locations of objects you could pack all that data pretty performantly into a single RGBA texture.

    You can use any of the channels in the RGBA to hold the info about what the thing is. If this mapped texture pixels to your maps grid exactly, you could get a lot out of this packing:

    R - a number between 0 and 255 which could represent the item ID in a database or category
    G - a number between 0 and 255 which could be ID or category
    B - 0-255 any other id or info
    A - 0-255 any other id or info

    and then use those values to grab and spawn the correct prefabs. The location of the pixel is the location (on the grid) of the object.

    Ive used texture channel packing like this for a range of things that I need to save and reload from a small file size.

    In addition to this you can also save additional data into byte arrays or any other structure. I usually save into a scriptable object which contains various packed textures as described above. And then I compress that and decompress on load.
     
    Last edited: Feb 27, 2020
  7. manutoo

    manutoo

    Joined:
    Jul 13, 2010
    Posts:
    466
    @Ne0mega ,
    use GZipStream to see where it lands you.

    eg:
    Code (CSharp):
    1.     static public byte[] CompressData(byte[] Data)
    2.     {
    3.         using (var CompressedStream = new MemoryStream())
    4.         using (var GZipStream = new GZipStream(CompressedStream, CompressionMode.Compress))
    5.         {
    6.             GZipStream.Write(Data, 0, Data.Length);
    7.             GZipStream.Close();
    8.             return CompressedStream.ToArray();
    9.         }
    10.     }
     
    Ne0mega likes this.
  8. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    3,886
    Or even better, use what I said above to pack your data and then use what @manutoo has written here to compress that. You will end up with a save file that is kb in size, maybe a few megabytes at most.
     
  9. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,615
    That seems slow to me. Why does it take so long?
     
    MadeFromPolygons likes this.
  10. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    first it generates a 4096 perlinmap float values
    then it generates a lower resolution (1024) from that perlin map to determine the vertices' height, using an animation curve evaluation.
    then it generates 4096x4096 heat and moisture float maps, (not saved) and calculates a biome byte array from them with 64 zones
    the 4096 perlin is converted to an array for serialization.
    then it generates from the biome byte[] a 4096 texture to apply to the mesh.
    then it jitters the trees, and uses the 4096 perlin map to determine the global y


    most time is taken i think when using the animation curve to evaluate heightmap values, since those are decimal, therfore division.

    the best part is all the scriptable objects i use, so i can make all sorts of cool and different terrains. I learned it mostly from this awesome unity conference about tyrannys or something. you should check it out sometime.
     
    Last edited: Feb 27, 2020
  11. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,594
    I suggest to consider using Jobs and Burst, to calculate algorithms, where applicable.
     
    MadeFromPolygons likes this.
  12. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    because i am going to want to swap between 4 different strategic map, as well as from strategic to tactical map for battles. I want the battle to last 20 minutes at most, but more like 3 - 5, and it would be incredibly unfun if it took more time to transition between strategic to tactical every time because of map generation. I still also have to consider navmesh generation time.
     
  13. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    i am definitely thinking i might have too. i might wait until it is more official before diving in. HDRP is in its state is already draining my current cutting edge capabilities.

    i know jobs and ecs are in my future... ...just i think i can put those off until they are more official.
     
  14. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,594
    You may want to chunkify your maps. Load lower res maps first and generate what requires (visible) up close, on high res at runtime.
     
  15. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    navmesh complications is why i am avoiding that, although i am also now considering chunkifying the navmesh as well, since i am also going to have to store fog of war maps. no need to generate the entire navmesh if you would not be allowed to move too far into the unknown anyway.

    chunkifying terrain also caused me texture headaches as well
     
  16. Joe-Censored

    Joe-Censored

    Joined:
    Mar 26, 2013
    Posts:
    11,847
    Does the time consuming work in your map generation use Unity API's? If not then just use regular old threads. You can probably create each of these maps simultaneously, as well as freeing up the main thread so you can keep the player from being so bored waiting. 6 core / 12 thread CPU's are the new standard low to mid range for a gaming PC, so it is not unreasonable to optimize your game for them.
     
  17. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    all those weeks diving into serialization, probably should have dived into threads instead. oh well, ill combine the two in the end for superfastness.
     
    Joe-Censored likes this.
  18. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,594
    Jobs are with Unity for long time and are considered as stable. You can use them already.
    ECS of DOTS however, is still in preview.
     
  19. joshcamas

    joshcamas

    Joined:
    Jun 16, 2017
    Posts:
    1,268
    I would definitely consider chunkifying your world, and then generate as needed. The player could be doing whatever they're doing, and nearby chunks could be generating on another thread.
     
  20. SparrowGS

    SparrowGS

    Joined:
    Apr 6, 2017
    Posts:
    2,536
    How are you actually saving the data?

    I have a map generator thingy too, when i generate a map with a radius of 350 (a hex-shaped hex map) that comes out to 368,551 individual tiles, the save file is only 6,119 kb including faction-stuff (4X game)

    you probably have a bunch of stuff you can combine/cut, if you're saving an int that doesn't go over 127 you can save it as a byte, if you have 2 ints that doesn't go over 63 (each) you can combine them into a single byte, 4 int under 31 to 1 byte, and so on, if you have a byte with some a bit in there - shove a boolean into there, if you have 8 bools you can store them all in a single byte.
     
  21. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,615
    Well, the profiler can probably confirm or deny that for you... but it doesn't sound like it makes a lot of sense to me to be using animation curves for this, given that you're downsampling rather than upsampling. In practice every 4x4 block in your perlin map is becoming a 1x1 result in your low res map. I would generally expect something like that to just be the Frobenius inner product of two 4x4 matrices - using SSE (such as Burst can generate), it'd be something like 4 multiplies, 5 adds, and 2 shuffles, they're all cheap. I'd expect doing that for a 1024x1024 result ought to take a handful of milliseconds on one core of a 3GHz processor.

    I agree with @Antypodish that this could also be a good situation for the Job system. As noted it is not some prerelease preview package thing, it's been part of Unity core since 2018.1 and it's solid. That said I am not a believer in hiding slow code by just spreading it out over more cores... ;)

    I have no idea what you're talking about but it sounds like some really high quality content from somebody who must be very smart and handsome. :D
     
    Ne0mega likes this.
  22. BIGTIMEMASTER

    BIGTIMEMASTER

    Joined:
    Jun 1, 2017
    Posts:
    5,181
    didn't read everything, but why a 4k map? I didn't think anybody was using such high resolution for a heightmap? 2k is more than enough for realistic 2km tile of terrain for me, as an example.

    that high of resolution seems especially wasted if you are just using perlin noise heightmap. Gonna look all blobby anyway. And how dense are the vertices? Would they show the detail difference between 4 and 2k?

    i'm sure you already know, but the difference between 4k and 2k isn't double, its like 16x or something like that. A big difference.
     
    Last edited: Feb 28, 2020
  23. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    to get data from every pixel, since i am not going to be using a tile system, and also so i can generate tactical maps based on the strategic map. the vertices are 1024 x 1024 or possibly 512 x 512, because i want multiple planes a la Master of Magics Arcanus and Myrror.

    difference between 4k and 2k is 4x.
     
    Last edited: Feb 28, 2020
  24. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    the profiler tends to lock and burn unity for me, which sucks, because i like it.
    The animation curves are to give the terrain a less uniform, perliny look.

    and, yes, of course, the old Frobenius inner product trick using the inner product of a 4x4 matrix. ill just have to look that up on wikipedia to remind myself how i could overlook such a simple concept.

    honestly, when i first started this a month ago, i thought perlin was fractal, so i could always just zoom in and still get the same basic shape but extra details. so when i found out it wasnt, i had to backflow the process and do a whole bunch of extra stuff i did not expect. now i see lots of simplex scripts. and i hear that is fractal, but have not confirmed yet.

    if you say it should all only take milliseconds, it is good news indeed.
     
  25. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    thats probably because all you are generating a 350 * 350 * 3.14 spawn points with maybe a value for tile type and another for tile height, if its not a flat map.
     
  26. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,615
    Heh. I just mean doing a component-wise multiply and add - a[0,0]*b[0,0] + a[0,1]*b[0,1] + a[0,2]*b[0,2] + ... + a[3,3] * b[3,3]. 'a' is a 4x4 matrix made of the values from a 4x4 block of the Perlin map, and 'b' is a 4x4 matrix of weights - if all cells in b are 1/16 then what you get is the average, if the middle four cells are 1/8 and the rest are 1/24 then you have something weighted more towards the middle, etc. I'm just guessing about how your animation curves are set up but I'm assuming that any given point in the 1024x1024 grid is not being influenced by keyframes drawn from the _entire_ 4096x4096 region...

    Simplex noise on its own isn't fractal either, but you can keep combining octaves of it at lower and lower amplitudes - I thought you could do the same with Perlin Noise, though...

    Well, I'm making a lot of assumptions about what you are doing - but in general 20 seconds sounds like so many orders of magnitude above what I'd expect that there are probably some easy optimisations in there.