Fast loading of float or vectors

MadAboutPandas3 · Aug 15, 2019

Hi,

What is the fastest way to load 1,000,000 times the position x, y, z ?
What is the best way to compress the data?

The positions are all within 0 and 4,000 and just need a precision of 1 cm. Need to run on all platforms, including mobile.

Currently we have a scriptable object with an array byte[]. During load it gets converted with System.BitConverter to list of vector4 for usage. Saving the other way around.

Kind Regards,
Chris

palex-nx · Aug 15, 2019

Create a union struct with byte array and a bunch of vectors. Fill that aray from file stream, save vectors and repeat until file is over. For compression you may use sharpziplib, for instance.

Yoreki · Aug 16, 2019

A Vector3 has 3 floats, each 32bit = 4 byte each, = 12byte of data. Having a million of those values means 12mb of data. To the best of my knowledge, there is not a whole lot you can do to influence how fast any given volume of data is loaded, other than changing the hardware, or of course decreasing the volume of data.

The way values in floats are stored, is that you basically have a 7 digits number (23bit), and use the rest of the bits to represent / move the decimal dot. However, you use up a lot of this precision since your values go up to 4000. The highest precision you can get out of a float at all would be something along the lines of 3999.001. That said, technically you want a precision of 3999.01, which means you only need 6 digits instead of 7, thus only ~20 instead of 23 bits of information on the digits. You could thus potentially save some data by writing your own way of saving only relevant bits, but doing so for a 3/32 ~= 10% saving seems like a lot of effort.
(It's actually possible to save ~40% since you dont need the other bits making up the float, see comment below by jvo3dc)

Other than that there is compression, which would presumably be able to decrease the amount of data you have to write / read by quite a lot, but it also uses a decent amount of cpu power to do so. Since you said you want it to run on pc (compression wouldnt be a huge problem, but loading 12mb shouldnt be either) but also on mobile (compression + load may or may not take longer than simply loading), you'd have to see what is better for the hardware you target.

Also, i'm not entirely sure if there is any overhead when storing data. Only storing the actual bits you need for your floats should be possible. If there is a overhead by default, directly store the bits only. You can check this by seeing how large the file is you store.
So as a summary, only save the bits you actually need without overhead (=12mb), potentially decrease the amounts of bits you need based on the required precision (~=10mb), if you have cpu power to spare, compress the data (=?mb based on compression method, find a good contrast between data being stored and cpu required for compression).

That said, why are you storing a million position anyways? If loading them is a problem you have to consider, then i have to ask: do you have to store them in the first place? With a million values i assume these are not hand-crafted, so you generated or got them from somewhere at some point. Cant you do that when you need them? Generating a million values should be faster than loading them, since it happens in RAM, not loading from HDD or SSD which is way slower.

jvo3dc · Aug 15, 2019

You can save a little more than 10% I'd say. With a range of 0 to 4,000 in steps of 0.01, you have 400,001 distinct values. That comes down to 19 bits as integer. 16 bits would be easier to handle, but 19 bits is still doable. Would lead to about 7.2 MB of data without compression.

Suddoha · Aug 15, 2019

MadAboutPandas3 said: ↑

Currently we have a scriptable object with an array byte[]. During load it gets converted with System.BitConverter to list of vector4 for usage. Saving the other way around.

Kind Regards,
Chris
Click to expand...

Do you convert the values one by one?

Have you tested / profiled whether the loading part or the conversion that takes the time?

For the conversion, you should be able to use unsafe code, which saves those millions of calls that convert value by value. The only thing you'll need to take care about is bitness for the various platforms. Or similarly, use the struct layout approach that was suggested earlier, which could even save the time for building the vectors.

Though I'm wondering, why are you saving it as byte array in your ScriptableObject when you need it to be Vector4 anyway? Why don't you just use a Vector4 array?

Joe-Censored · Aug 15, 2019

You might consider just moving this work to another thread and kick it off as soon as the game launches. The data would be ready to go as soon as needed most likely without any delay on the main thread.

Yoreki · Aug 16, 2019

jvo3dc said: ↑

You can save a little more than 10% I'd say. With a range of 0 to 4,000 in steps of 0.01, you have 400,001 distinct values. That comes down to 19 bits as integer. 16 bits would be easier to handle, but 19 bits is still doable. Would lead to about 7.2 MB of data without compression.
Click to expand...

You are right. I was fixated on storing the information as a float, but considering the fixed decimal precision, OP should not need the first 8 float bits at all. Simply saving the number "as an integer" and reconstructing the actual value by casting the it to a float and dividing it by 100 to get the coordinate, should be the smallest possible way to store the data. Nice idea!

Joe-Censored · Aug 16, 2019

Hmmm, I have a feeling all the work chopping up the binary data into 19 bit "integers", converting them into actual 32 bit integers, dividing by 100, then converting to float is not going to produce any actual performance gains over just reading 32 bit floats. I'd be interested in finding out if that is actually the case though.

Though it will certainly compress the data. The OP wants both compressed data and the fastest performance, which are often opposing goals. Might need to choose which is more important.

Yoreki · Aug 17, 2019

Joe-Censored said: ↑

Hmmm, I have a feeling all the work chopping up the binary data into 19 bit "integers", converting them into actual 32 bit integers, dividing by 100, then converting to float is not going to produce any actual performance gains over just reading 32 bit floats. I'd be interested in finding out if that is actually the case though.
Click to expand...

True, but we have been given very little context to work with. Loading ~12mb should not be that huge of a deal either way, so there is probably some underlying performance issue or constraint here that we are not aware of / cant work on without seing the code or being given more information. However, technically the question was how to load a million values in the fastest way possible, and what's the best way to compress the data.
If nothing else, we got that covered now hehe. But jeah, for any practical use you'd have to test if it actually helps.

Search Unity

Unity ID

Useful Searches

Fast loading of float or vectors