Search Unity

Efficient way to look up non-primitive objects?

Discussion in 'Scripting' started by Ardenian, Nov 27, 2019.

  1. Ardenian

    Ardenian

    Joined:
    Dec 7, 2016
    Posts:
    313
    Background: In my project, I have a custom class called
    Tile
    representing a position in a grid. All these objects have a unique combination of coordinates, being
    X
    and
    Z
    .

    As part of my work, I now have a collection of tiles, objects of type
    Tile
    . For all these tiles, I have to efficiently index them somewhere to be able to check if a tile is present in the collection. The complexity of that should be better than the linear complexity O(n) because I have to use all of this in a very time-critical environment. Memory usage is secondary.

    Idea: From my studies, I know that trees are very efficient data structures for looking up something, as well as that hashing is something one can do to generate a pseudo-unique key, allowing to use something like
    Dictionary<TKey, TValue>
    .

    Problem: My problem is, however, that I don't know how to hash a complex object to get a pseudo-unique key. I have seen it plenty of times for value types like bool, int and string, but now I have to objects, a coordinate in
    X
    space and a coordinate in
    Z
    space. How do I efficiently (!) generate a key from that? Something like
    $"{tile.x}_{tile.y}"
    is not performant enough and thus not an option.

    My other problem concerns trees, if I do not use hashing and a dictionary. Since I have to coordinates, how do I index something into a binary search tree? I came up with something like
    MyBinaryTree<TValue, TNode>
    and
    MySimpleBinaryTree<TValue>
    and then combining these two into a field
    MyBinaryTree<int, MySimpleBinaryTree<int>> myLookUpStructure
    , with the first
    int
    being the
    X
    coordinate and the second
    int
    in the simple binary tree being the
    Z
    coordinate. However, not being very good with complexity, I think I would lose the O(log(n)) of a binary search tree if I do it like this.

    Has someone a suggestion regarding my problem and how I could solve this? Should I use hashing and a dictionary or a binary search tree? How could I solve the problems being mentioned above?
     
  2. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,536
    C#/runtime has a default hashing method that it uses for reference objects that creates a hash for you. You don't necessarily have to create your own algorithm if you only need "reference equality" level hashing.

    If you're doing a struct or ref type that has special uniqueness or overridden equality behaviour. Like how string is a ref type, but "abc" and "abc" should be equal and hash the same regardless of reference. Or like a vector of floats where you want fuzzy equality of their components to handle float error.

    Then you'll need to create your own method. Usually you'd make a hash out of what makes your object unique/equal. It sounds to me like it's your X and Z coordinates so you can do something like:

    Assuming X and Z are ints themselves.
    Code (csharp):
    1. return X ^ Z;
    (this one is fast, but not super unique if X and Z are always small numbers)

    Code (csharp):
    1. return (X << 16) | (Z & 65535);
    (slower, but this fills the high 16 bits with X, and the low 16 bits with Z, making it more unique)

    Both of these will be faster than creating a hash from a string of the 2.

    If they aren't ints, and instead are floats or some other numeric type... do something like this:
    Code (csharp):
    1. return X.GetHashCode() ^ Y.GetHashCode();
    And of course you can fill the high and lows with this again for even more uniqueness.

    Note, this does not deal with fuzzy equality for near floats. You'll have to add that layer if you need it.

    At the end of the day though you just need to fill an int with bit data that is repeatable for equal values.

    You're going to want to look into k-d tree's:
    https://en.wikipedia.org/wiki/K-d_tree
     
    ilmario likes this.
  3. AlexN2805

    AlexN2805

    Joined:
    Nov 27, 2019
    Posts:
    33
    I'd probably try this with a HashSet<Tile>. HashSet has the advantage that it provides high-performance operations. "Drawback" is that it can only store unique values, which in your case probably is what you need, so not much of a problem.

    The HashSet used to have the drawback that you had to iterate through the objets to find a specific item, but since Framework 4.7.2 it also supports the TryGetValue() function, so checking if a certain Tile is present in the set is an O(1) operation.

    EDIT: Added pseudocode example
    Code (CSharp):
    1. HashSet<Tile> tiles = new HashSet<Tile>();
    2.  
    3. // add tiles
    4. tiles.Add(new Tile(0,1));
    5. tiles.Add(new Tile(0,2));
    6. tiles.Add(new Tile(1,0));
    7. tiles.Add(new Tile(1,1));
    8.  
    9. // or add a collection of tiles at once to tiles
    10. Collection<Tiles> newTiles = new Collection<Tiles>();
    11. newTiles.Add(new Tile(2,0));
    12. newTiles.Add(new Tile(2,1));
    13. tiles.UnionWith(newTiles);
    14.  
    15. // check if tile is present
    16. Tile lookupTile = .... // probably some function that returns a tile.
    17. bool isPresent = tiles.TryGetValue(lookupTile);
     
    Last edited: Nov 27, 2019
    Ardenian likes this.
  4. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,338
    That's not very helpful. I mean, yes, you should use a HashSet, but that uses the type's GetHashCode method, which is what OP is asking about how to implement.

    @lordofduct's got a pretty good explanation. If you want the easy version, your IDE probably has a function to generate a hash code function from your type's members. Those are generally pretty decent, and will generally incorporate a multiplication with a prime number to get a better hashing spread than the approach recommended by his lordship.
     
    ilmario, Ardenian and lordofduct like this.
  5. AlexN2805

    AlexN2805

    Joined:
    Nov 27, 2019
    Posts:
    33
    My understanding was that he doesn't know how to implement a hashcode for complex objects. By giving a HashSet he doesn't need to know how the hashcode for complex objects is implemented, because it's handled for him.

    I figured he doesn't want to know how to hash objects necessarily, he just wants them hashed. My bad if I'm wrong.
     
  6. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,536
    HashSet and Dictionary deal with hash's the same way. The only difference being the Dictionary associates the key with a value via the key's hash... and a HashSet is just a set of values by that value's hash.

    The automatically handles the hashing isn't done by the HashSet or the Dictionary. It's done by the framework, and that's why I said in my previous post:
    But if you need a custom hash where ref equality hashing isn't suitable. Then you have to write your own.

    That's why I continued with:
    We don't know OP's needs though.

    And as @Baste says:
    Yep, if you're in doubt about the quality of your algorithm, that'd probably be your best option.
     
    Ardenian likes this.
  7. Ardenian

    Ardenian

    Joined:
    Dec 7, 2016
    Posts:
    313
    Thank you for your responses, guys!
    I admit that I was a bit vague in my opening post, trying to reduce its content to only what is needed for my actual question.

    There are two parts, from my point of view. There is my general question about hashing, how you would do that for objects that aren't just primitive and distinguishable by their immediate value, which @lordofduct answered, thank you, as well as @Baste extended on, thank you as well.

    Then there is my other question which concerns basically "marking" something, optimzing a lookup for something in a collection, if something is present, if something is being a part of it. A naive rephrase of my question could be "How can I do this better:
    Dictionary<Tile, bool>
    ?" or something along these lines. This is where @AlexN2805 brought up HashSet, which appears to be exactly what I need, in addition to my knowledge gained from my first question, thank you as well.

    To conclude what I understand from your answers, please correct me if I am wrong, what I basically have to do is to override GetHashCode with my own implementation, which should be as unique as possible, using the tricks @lordofduct listed in his first response. Then, when checking for the existence of my object in a prticular collection or something around that, I use a HashSet, adding all my objects of that collection to it at one point and then use, for instance, TryGetValue to check if an object is in there, with great efficiency.

    Small question, off-topic, I browsed through the official documentation of HashSet and noted that it behaves a lot like a mathematical set, looking at the names of the different methods. Since I have to deal with mathematical sets in the near future, are HashSets a good data structure to realize that, to represent a collection of objects as a mathematical set? I basically need to do stuff like intersect and union, quite literal mathematical operations on mathematical sets as you know them from school.
     
  8. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,536
    You can override GetHashCode if that's way it should always hash.

    You can also create a IEqualityComparer and pass it into the HashSet:
    https://docs.microsoft.com/en-us/do...ric.iequalitycomparer-1?view=netframework-4.8

    The HashSet construtor that takes the comparer:
    https://docs.microsoft.com/en-us/do...em_Collections_Generic_IEqualityComparer__0__

    Cause at the end of the day HashSet (and dictionary) just use either the passed in equality comparer, OR the default equality coomparer which relies on the 'GetHashCode' override.

    Also if you want to just see if it is contained in the set, just use the 'Contains' method of HashSet.

    Though unrelated to that... I saw you talk about a Dictionary<Tile, bool>. This implies you have tile's that are true or false. Like... maybe it's "tile is walkable or not walkable" or something like that. This implies a set of all tiles regardless of that bool state, and you want to filter down to tiles that meet specific states.

    Why not just have a property on 'Tile' for that? Then you're not checking if "WalkableHashSet' contains, or Dictionary bool is true. You just check the property of the Tile? Cause at the end of the day you need a reference to the Tile to look it up in either the hashset or dictionary.

    I don't know your specific design, maybe you have a reason. I'm just wondering is all.

    Technically yes, HashSet can be used for similar behaviour like those mathematical operations.

    Though I don't know what you're doing those math operations on... so I can't actually say it's a perfect fit for your use case.

    Linq also has intersect and union and works on any ienumerable, rather than just HashSet.

    If you could be more specific about your needs (in both your OP and in relation to this math) we'd likely be able to give you a much better direction to head.
     
    Ardenian likes this.
  9. Ardenian

    Ardenian

    Joined:
    Dec 7, 2016
    Posts:
    313
    Thank you for coming back to me, @lordofduct!

    Thanks for bringing this to my attention, it looks like a better way in my situation. I have seen something like this, giving an object in the constructor, in code examples before, didn't believe that is actually something that is recommended to do, though it makes sense.

    I work on a random level generator, having a grid with tiles (imagine a chessboard). Each tile has a set of states that the tile can become (think of something like 'Sea', 'Mountain' and 'Forest'). When a tile is 'fixed', it selects a state of its set and becomes that state. Which state a tile can become, is defined over adjajency conditions (as in "A 'Mountain' tile may not be adjacent to a 'Sea' tile").

    When a tile becomes a fixed state, I recursively update all neighboring tiles, by calculating which states are still allowed since now a neighbor has a fixed state. If the set of states changes as a result of that, I again update all of those neighbors and so on, until no set of any tile changes anymore.

    However, this calculating and checking conditions is expensive, as it is called about 100k+ times per generation. Since it is already implicitely known what changes when a tile becomes a fixed state, I would like to mix mathematical sets and automatons. If I know that a 'Mountain' tile may not be adjacent to a 'Sea' tile, then I don't have to check my conditions to verify that again and again. Instead I create sets of states, with dependencies such as which neighboring tile with what fixed state of the orginal tile has to remove which states from its set of available states.
     
  10. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,536
    First things first I'd describe my world space. I'd likely do so as a 'Graph'... here is my simple IGraph interface:
    https://github.com/lordofduct/space....0/blob/master/SPPathfinding/Graphs/IGraph.cs

    In your case it's a GridGraph, here again is my simple grid graph implementation:
    https://github.com/lordofduct/space...blob/master/SPPathfinding/Graphs/GridGraph.cs

    I mainly suggest this for generalizing the access of the grid tile's and its neighbours. Also it means if you decide to ever change from a chessboard grid to say a hex grid, or even a generic graph of no distinct pattern; you just implement the IGraph interface.

    I'd then define a Tile which things like an enum for its state:

    Code (csharp):
    1.  
    2. public enum TileState
    3. {
    4.    Mountain,
    5.    Sea,
    6.    //etc...
    7. }
    8.  
    9. public class Tile
    10. {
    11.     public TileState State;
    12.     //other fields/properties that may be needed
    13. }
    14.  
    If this 'generation' is only done at load up we can ignore the garbage created by it since it's not frame rate dependent. Just clear your garbage after the level is generated (this is how I usually do it... avoiding garbage during level generation is near impossible anyways. Or rather just time consuming and not really necessary to waste your time on).

    So since we're not super concerned about garbage at this point... just use some linq.

    Code (csharp):
    1.  
    2. var allStates = GetAllKnownStates(); //all TileState's in a list that are available
    3. var availableStates = allStates.Except(mygraph.GetNeighbours(x,y).Select(o => o.State)).ToList(); //a list of states that aren't its neighbour's states
    4.  
     
  11. brigas

    brigas

    Joined:
    Oct 4, 2014
    Posts:
    522
    here is an implementation with a custom struct that is effective to use as a dictionary key

    Code (CSharp):
    1.  public struct KeyHashableVector2Int : IEquatable<KeyHashableVector2Int>
    2.     {
    3.  
    4.         public int X;
    5.         public int Y;
    6.  
    7.         public KeyHashableVector2Int(int x, int y)
    8.         {
    9.             X = x;
    10.             Y = y;
    11.         }
    12.  
    13.         public override string ToString() => nameof(KeyHashableVector2Int);
    14.  
    15.         public override bool Equals(object obj) => obj is KeyHashableVector2Int other && Equals(other);
    16.  
    17.         public bool Equals(KeyHashableVector2Int other)
    18.         {
    19.             return X == other.X &&
    20.                    Y == other.Y;
    21.         }
    22.  
    23.         public override int GetHashCode()
    24.         {
    25.             unchecked
    26.             {
    27.                 int hash = 17;
    28.                 hash = hash * 23 + X.GetHashCode();
    29.                 hash = hash * 23 + Y.GetHashCode();
    30.                 return hash;
    31.             }
    32.         }
    33.     }
    34.    
    to use this you create it like a vector 2 and then use it as a dictionary key like:
    Code (CSharp):
    1. KeyHashableVector2Int key = new KeyHashableVector2Int( 2,2 );
    2.  
    Code (CSharp):
    1. dictionary<KeyHashableVector2Int, node> dict;
    to my knowledge this is the best implementation for hashable structs, or close to it
     
  12. eses

    eses

    Joined:
    Feb 26, 2013
    Posts:
    2,637
    Hi @Ardenian

    I don't know what exactly you are after, but sounds like you only need to store and get tiles from a grid?

    "...having a grid with tiles (imagine a chessboard)"

    Why do you need a hash - could you just do something like this:

    Code (CSharp):
    1. public Dictionary<(int, int), TestTile> tiles = new Dictionary<(int, int), TestTile>();

    Then you can add tile at position:

    Code (CSharp):
    1. var newTile = new TestTile() { x = 2, y = 2 };
    2. tiles.Add((newTile.x, newTile.y), newTile);

    And you can also find them with their x and y coordinates:

    Code (CSharp):
    1. TestTile foundTile = null;
    2. tiles.TryGetValue((position.x, position.y), out foundTile);
    3. // if foundTile != null do stuff

    "When a tile becomes a fixed state, I recursively update all neighboring tiles"

    I think you usually only need to update 8/4 adjacent tiles, not recursively, if you are using bitmask kind of rules to match a set of tiles so that they connect somehow visually?

    I might be wrong.
     
    ilmario likes this.
  13. Ardenian

    Ardenian

    Joined:
    Dec 7, 2016
    Posts:
    313
    @lordofduct, thanks for the links, the framework looks very interesting. I am going to dig a bit through and see what I can take over, learning from it. Although I know the overwhelmingly usefulness of interfaces in C-Sharp, I have stayed away from them in Unity, since the vanilla editor in Unity does not support anything related to interfaces in the inspector, last time that I checked. There are some assets that introduce that functionality, but they usually require a sacrifice like using your own derived definition of MonoBehavior, which is not something that I do lightly because it adds a dependency to a third-party asset.

    I did this for some sub-states that I excluded from my problem descriptions in this thread, for stuff like tile rotation and tile symmetry, but decided against it for tile type state like "Mountain" or "Sea", to have scalability in my project. There is data related to each tile state (dunno, think of something like related prefab objects, text style and whatsoever) and since I do have to wrap that anyways, I decided to make tile states a
    ScriptableObject
    . Instead of a list with constant enumeration values, I have a list with references to tile states, being
    ScriptableObjects
    . Not sure if there is a drawback to that.
    Since this collection is a list of possible values, one could use an enumeration that is treated as a bit field instead, using FlagsAttribute. This is a highly human-readable solution for having a collection of possible values, instead of a list of constant enumeration values. Unfortunately, working with it is very slow, performance-wise, see for instance this thread on Stackoverflow: What is it that makes Enum.HasFlag so slow? I actually had to remove it because it was just too slow, up to 1500% slower than using a list of constant enumeration values, which is similar to the 1600% mentioned in that thread.
    Right now, it is editor-only, outside runtime and Play Mode, so garbage collection is indeed less of a problem, though this certainly will become a problem for me in the future, because I cause a lot of garbage there, never having optimized towards memory.

    Thank you, that is neat, I use an almost identical code integrated into my class definition, nice idea to seperate it this way, since it is resuable everywhere now and something like coordinates is likely to popping up again elsewhere.

    I am not very familiar with using Tuples, but doesn't a Dictionary<TKey, TValue> always hashes the key? Seeing the official article from Microsoft:
    So even when using a Tuple, you end up with something getting hashed. With your solution one can decouple a tile and its position, which could be helpful in some cases.

    My problem is not, however, that I don't need need to access all tiles of my grid, but very few of a selected true subset. Nonetheless your solution works there, however, Tuple don't look ver appealing to me, because you have to create a new object every time (the Tuple) when you try to access a tile in the dictionary and then get its hash code, don't you?
     
  14. condev1224

    condev1224

    Joined:
    Nov 20, 2019
    Posts:
    2
    I think you can use Vector2Int as the key. It has an implementation of GetHashCode which guarantess uniqueness (I think?)
     
  15. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,536
    Here is the GetHashCode for Vector2Int:
    Code (csharp):
    1.  
    2. public override int GetHashCode()
    3. {
    4. return x.GetHashCode() ^ (y.GetHashCode() << 2);
    5. }
    6.  
    https://github.com/Unity-Technologi...master/Runtime/Export/Math/Vector2Int.cs#L228

    No, it's not unique. The positions <1, 1> and <5,0> would generate the same hashcode. In this case 5.

    1 ^ (1 << 2) = 1 ^ 100 = 101 = 5
    101 ^ (0 << 2) = 101 ^ 0 = 101 = 5

    It's relatively easy to predict it's non-unique just from the fact that GetHashCode returns an 'int' which has 32-bits of information. While a Vector2Int consists of 2 ints, so 64-bits of information. You have to lose something to shrink 64 to 32, and in doing so you're going to create collisions.
     
    ilmario and Ardenian like this.