How to calculate combined unique sum of array elements?

Antypodish · Nov 7, 2019

To be honest, I don't know how to properly title this problem in one short sentence.
But I will try to explain the challenge I would like resolve.

However, first an example of the concept, on binary number, to better understand the problem.
Lets say we have an int, made of 32 bits.
Each bit takes value of course 0, or 1.
If we set some bits and we want to get unique sum of element, we can get for example
1001 0000 1011 ...
For this example I take first 4 bits from the left 1001.
Unique value after conversion to decimals, is 1 + 8 = 9.
There is no other combination, which will give 9.
Similarly applies for all 32 bits.

Now, what I would like to find out, if there is some way, in which I could apply same unique sum of elements concept as in binary, for array of 125 elements, rather 32 and where each element can take not 0 to 1, but 0 to 1023 for example.

Obviously, I can not use same principle as with binary to decimal conversion, as I would have not enough numbers, to cover all combinations.

Then I was thinking, something like that, if considering flowing number:
8010 0103
Then I could multiply each element index by 10, for each corresponding index.
8+0*10 + 0+1*10 + 1+2*10 + 0+3*10 + 0+4*10 + 1+5*10 + 0+6*10 + 3+7*10
= 8 + 10 + 21 + 30 + 40 + 51 + 60 + 70
= 290
But that is 290. Of course, 290 can be achieved by different combination of values. Then when comparing both cases, I would have result that both sequences are equal. Which is not true.
So that is no go.

Only other way I can think of, is hash.
I loose this way ability, to compare by value.
But If I can guarantee, that given combination of array values, will it give always same hash of the array?
Would that be true? I think that is only valid per game session, however.

Probably selecting type of collection (array, List, NativeArray, DynamicBuffer, ... ), is less relevant is such application, as long I could get hash value.

neoshaman · Nov 7, 2019

What you want is universal "base conversion"
- Decimal use 10 numbers
- Binary use 2 numbers
- Hexadecimal use 16 numbers
- Octal use 8 numbers
- AntypodishBase use 1024 numbers
Nothing special work just like the other

Base two is like
number = sum(2^index that are not 0)
You can extrapolate to sum(base^index)
try with hexadecimal and decimal to see and validate
http://codeofthedamned.com/index.php/number-base-conversion

Antypodish · Nov 7, 2019

Yeah definitely good points.
I can now look int the problem with fresher mind, rather mind before bed time

JoeStrout · Nov 7, 2019

It really sounded to me like you were describing a hash.

But If I can guarantee, that given combination of array values, will it give always same hash of the array?
Click to expand...

Yes. That's the whole point of a hash — it always gives you the same output, given the same input. But those outputs are not necessarily unique. There are other outputs that map to the same input — which must obviously be the case, if your input is 32 10-bit numbers (a total of 320 bits of information) and your output (hash) is, say, a mere 64 bits. If there were a way to cram 320 bits of information into 64 bits, I think we'd be doing it already.

And note that hashes have nothing to do with the game session. They're just simple math on the string of numbers. So you could safely store these and use them across sessions, or even across platforms.

However... I suspect what you really want is a Dictionary. It uses a hash to get things started, but follows it with a check of the actual values, so it correctly handles the case of hash collisions (two input values that happen to hash to the same thing).

Antypodish · Nov 7, 2019

@JoeStrout thx for input.

Yeah. I realized now, that two separate array of data, may generate different hashes.
I should be more conscious about that.

That however leads me to a problem, which I will not resolve with hash. At least I don't think.
This is because, what I require, is to check, if two arrays are same (have same elements), without needing to iterate through its values. But with hash, here I can have many same arrays, with multiple hashes.

So if I would like use reference to an dictionary, I will end up with potential many duplicates. Which I would like avoid.

Just a little further explanation.
Imagine that object is generating array of values. I calculate hash. I find this has in dictionary. So I map reference from dictionary to object. This way, I don't need store array in the object.
Now ideally, multiple objects uses same array reference to dictionary.

For my use case, I think I will need abandon the idea of using hash for that reason.

Backing to @neoshaman suggestions of custom base.
M thought is, that works fine, as long I have relatively few elements.
For binary I can pack 32 elements in int data type, to store unique combinations, without case of duplicate.
That gives me 0 to 4,294,967,295 unique combinations.
If I consider hex, then I can pack only 8 hexes, of 4 bits each. Or 16 if I use long data type.
That's 4 bits * 8 values / 4 bits * 16 values.
Further I increase base size, then I loose number of elements in array, while to be able calculate unique combination.
Either way, I can not have more than 32 (int), or 64(long) elements, even if I just use binary.

My another thought is, if I use some key (like random seed), to generate hashing, like MD5, or relevant for example.
That I think, could allow me potentially generate always same hash, with same combination of elements in an array.
Still there is chance for collision, but at least I could validate that. Then probably using different key (maybe). And finally store set of elements in Dictionary, which I can reference safely in in my objects. Key also would need be tore in objects of course.

Edit:
I like GetHashCode(), as it returns int. But If I could pass key for hashing, that would be ideal.
Problem with MD5 and SHA type hashing, even using key, they do not return int, keeping short. Convenient with DOTS.

JoeStrout · Nov 7, 2019

I think you're way overthinking this. If your goal is just to efficiently tell whether two arrays are the same, it's not that hard to do.

In pseudocode, your arrays-equal function should look like this:

0. Compare the array references. If they are actually the same array object, they are equal.
1. Compare the array lengths. If not the same, the arrays are not equal.
2. Compare the array hashes (hopefully you can cache these so you don't have to recompute them every time). If not the same, the arrays are not equal.
3. Zip through the arrays and compare elements.

Yes, I know you said you didn't want to do step 3. But there's no avoiding it, in general. But the key is that in practice, assuming a decent hash function, you will almost never get to step 3. You will bail out on steps 0-2. Of course when you have two arrays that are separate objects but actually are equal, then you'll have to go all the way through step 3.

If that's going to happen a lot, then the only thing for it is to keep, in addition to the hash, some unique identifier that maps a pattern of array entries to a unique ID. That'll be a PITA to implement and keep updated, and I don't recommend it unless you've profiled your code and found that the time you're spending in step 3 is actually important.

Antypodish · Nov 7, 2019

JoeStrout said: ↑

I think you're way overthinking this.
Click to expand...

It may be, that why I am asking here for an assist

JoeStrout said: ↑

0. Compare the array references. If they are actually the same array object, they are equal.
Click to expand...

Yep that part I agree.

JoeStrout said: ↑

1. Compare the array lengths. If not the same, the arrays are not equal.
Click to expand...

This is also easy to check indeed, which is part of my check concept.

JoeStrout said: ↑

2. Compare the array hashes (hopefully you can cache these so you don't have to recompute them every time).
Click to expand...

Yep, they will be hashed at creation. Thing is, each object creates own array sequence. If sequence does not exists, creates reference to Dictionary. If exists, use reference from dictionary. But further I sense an issue.

JoeStrout said: ↑

If not the same, the arrays are not equal.
Click to expand...

As we disused earlier, using hash two generated sequences may return different hash. Hence, even sequence is the same, hash comparison will return false.

In meantime, I came up with custom hash algorithm, base on key.
Don't know however, how prone is to collisions.
So far it works. But I haven't tested all possibilities.
https://docs.google.com/spreadsheets/d/1Rq_SUuu0YUa-nP51Vh9rlHHCD0W-TYEnDZLxjE5C0ys/edit?usp=sharing

JoeStrout · Nov 7, 2019

Antypodish said: ↑

Yep, they will be hashed at creation. Thing is, each object creates own array sequence. If sequence does not exists, creates reference to Dictionary. If exists, use reference from dictionary.
Click to expand...

Wait. Are these arrays whose values never change after creation?!

If so, that's an extremely important point. In this case the whole thing is a non-issue. Your technique just described is perfect: get them from a factory function that always returns an existing array for a previously-seen sequence, and then simply compare object references. Nothing more is needed.

As we disused earlier, using hash two generated sequences may return different hash. Hence, even sequence is the same, hash comparison will return false.
Click to expand...

This is incorrect. If the input is the same, the output (hash) is the same. Hashes would be completely and utterly useless if this were not the case.

Antypodish · Nov 7, 2019

JoeStrout said: ↑

Wait. Are these arrays whose values never change after creation?!
Click to expand...

That is correct.

For the rest I am a bit confused now. Or maybe we talking about different things.
Just in case, I made this simple example, where I use 4 new arrays, of 10 elements each.

Code (CSharp):

int [] a0 = new int [10] ;

int [] a1 = new int [10] ;

int [] a2 = new int [10] ;

int [] a3 = new int [10] ;

Debug.Log ( "A0: " + a0.GetHashCode () ) ;

Debug.Log ( "A1: " + a1.GetHashCode () ) ;

Debug.Log ( "A2: " + a2.GetHashCode () ) ;

Debug.Log ( "A3: " + a3.GetHashCode () ) ;

Code (CSharp):

for ( int i = 0; i > a0.Length; i ++ ) { a0 [i] = 1 ; }

for ( int i = 0; i > a1.Length; i ++ ) { a1 [i] = 1 ; }

for ( int i = 0; i > a2.Length; i ++ ) { a2 [i] = 1 ; }

for ( int i = 0; i > a3.Length; i ++ ) { a3 [i] = 1 ; }

Debug.Log ( "A0 b: " + a0.GetHashCode () ) ;

Debug.Log ( "A1 b: " + a1.GetHashCode () ) ;

Debug.Log ( "A2 b: " + a2.GetHashCode () ) ;

Debug.Log ( "A3 b: " + a3.GetHashCode () ) ;

So obviously same arrays have same hash code, irrelevant what is the content. I.e. in first case a0 has 10x0, in second case a0 has 10x1 elements. Same for a1, a2 and a3.
But creating new array, returns new hash code. So all a0 to a3 are different.

If I want to compare against existing arrays' hash code, with matching content, this approach wont work.

The different story is for ints and vectors

Code (CSharp):

int3 i0 = 0 ;

int3 i1 = 0 ;

int3 i2 = 123 ;

Debug.Log ( i0.GetHashCode () ) ;

Debug.Log ( i1.GetHashCode () ) ;

Debug.Log ( i2.GetHashCode () ) ;

Code (CSharp):

Vector3 i3 = new Vector3 ( 1, 0, 0 ) ;

Vector3 i4 = new Vector3 ( 1, 0, 0 ) ;

Vector3 i5 = new Vector3 ( 123, 123, 123 ) ;

Debug.Log ( i3.GetHashCode () ) ;

Debug.Log ( i4.GetHashCode () ) ;

Debug.Log ( i5.GetHashCode () ) ;

Matching values of same data type, returning same hash code.

neoshaman · Nov 7, 2019

You probably need to create your own hash function then. You could also probably look at perfect hashing. just beware of https://en.wikipedia.org/wiki/Pigeonhole_principle

Also keep an eyes on kolmogorov complexity.

The truth is that teh best hash of an array is itself (concatenating the value), if you try to get to smaller range of value you are effectively doing compression.

Antypodish · Nov 8, 2019

Many thanks.
Looks like another challenge for me.

neoshaman said: ↑

The truth is that teh best hash of an array is itself (concatenating the value), if you try to get to smaller range of value you are effectively doing compression.
Click to expand...

Yep, I would like ideally compress it to int representation.
So for example if I have 1k elements of which each can take value 0 to 1k, that is 1mln minimum number of combinations that I need, but not greater than int itself.

Uniqueness is the major challenge I suppose.

neoshaman said: ↑

Also keep an eyes on kolmogorov complexity.
Click to expand...

This is interesting. Never heard of this term. Good example of fractal representation as algorithm, rather image, is on wiki.
https://en.wikipedia.org/wiki/Kolmogorov_complexity

neoshaman said: ↑

You probably need to create your own hash function then. You could also probably look at perfect hashing. just beware of https://en.wikipedia.org/wiki/Pigeonhole_principle
Click to expand...

I will look again into, but not tonight. I grasp concept in general, but I see more equations there. My brain is slowly shutting down already

JoeStrout · Nov 8, 2019

Antypodish said: ↑

That is correct.
Click to expand...

OK, then you should really just make a factory function that returns these, careful to always return the same object for the same set of values. And the rest of your code can just compare array references with ==.

For the rest I am a bit confused now. Or maybe we talking about different things.
Just in case, I made this simple example, where I use 4 new arrays, of 10 elements each.

Code (CSharp):

int [] a0 = new int [10] ;

int [] a1 = new int [10] ;

int [] a2 = new int [10] ;

int [] a3 = new int [10] ;

Debug.Log ( "A0: " + a0.GetHashCode () ) ;

Click to expand...

Ah, there's the problem. You're getting the hash of the array reference, not a hash of the array data. You might have thought arrays were smart enough to hash their data automatically, but they're not.

This SO question covers the same topic. Or you can just write your own hash function (to operate on the data, not the reference!).

Matching values of same data type, returning same hash code.
Click to expand...

That's true in all cases. Ints and vectors are value types. Array references are a reference type. Matching values of the same reference would return the same hash code... but you have different references, so they return different hash codes.

Antypodish · Nov 8, 2019

Thx @JoeStrout for a SO link.
My earlier conclusions from all post here are indeed, to create own hashing mechanics.

Great input from you guys.
Now I just need digest through all approaches and select best solution.

SparrowGS · Nov 8, 2019

Antypodish said: ↑

T
Now, what I would like to find out, if there is some way, in which I could apply same unique sum of elements concept as in binary, for array of 125 elements, rather 32 and where each element can take not 0 to 1, but 0 to 1023 for example.

Obviously, I can not use same principle as with binary to decimal conversion, as I would have not enough numbers, to cover all combinations.
Click to expand...

Why not? just use a 128 bit int and don't use 3 of the bits/use them for something else

edit: oh, i skipped over the more then 0/1 for value, nevermind me, lol.

Search Unity

Unity ID

Useful Searches

How to calculate combined unique sum of array elements?