Question How to randomize a HashSet?

maxreee · Jan 29, 2023

so im making this game where the player has a category and the enemies/obstacles appear with a word that may or may not be in that category the player must run into the one that matches their category and earn points. and if they run into one that doesn't they lose a life or points

anywhay the way i was doing this is by having the enemy grab a word from a list
using this

Code (CSharp):

int setIndex = Random.Range(0, vocab.keywordSets.Count - 1);

int wordIndex = Random.Range(0, vocab.keywordSets[setIndex].Count - 1);

string randomWord = vocab.keywordSets[setIndex][wordIndex];

enemyText.text = (randomWord);

and i shuffled that list using this code

Code (CSharp):

public void Shuffle()

{

keywordset1.

for (int i = 0; i < keywordset1.Count; i++)

{

string temp = keywordset1[i];

int randomIndex = Random.Range(i, keywordset1.Count);

keywordset1[i] = keywordset1[randomIndex];

keywordset1[randomIndex] = temp;

}

for (int i = 0; i < keywordset2.Count; i++)

{

string temp = keywordset2[i];

int randomIndex = Random.Range(i, keywordset2.Count);

keywordset2[i] = keywordset2[randomIndex];

keywordset2[randomIndex] = temp;

}

for (int i = 0; i < keywordset3.Count; i++)

{

string temp = keywordset3[i];

int randomIndex = Random.Range(i, keywordset3.Count);

keywordset3[i] = keywordset3[randomIndex];

keywordset3[randomIndex] = temp;

}

for (int i = 0; i < keywordset4.Count; i++)

{

string temp = keywordset4[i];

int randomIndex = Random.Range(i, keywordset4.Count);

keywordset4[i] = keywordset4[randomIndex];

keywordset4[randomIndex] = temp;

}

for (int i = 0; i < keywordset5.Count; i++)

{

string temp = keywordset5[i];

int randomIndex = Random.Range(i, keywordset5.Count);

keywordset5[i] = keywordset5[randomIndex];

keywordset5[randomIndex] = temp;

}

for (int i = 0; i < keywordSets.Count; i++)

{

List<string> temp = keywordSets[i];

int randomIndex = Random.Range(i, keywordSets.Count);

keywordSets[i] = keywordSets[randomIndex];

keywordSets[randomIndex] = temp;

}

}

but then i got some advice that using a hashset would be more effecient for big list like the one i have
so i decided to switch to it
and then i got hit with a "Cannot apply indexing with [] to an expression of type 'Hashset<string>'

so i cant use the same methods for the hashset like i did with the list
so how do i remedy/shuffle a hashset?

spiney199 · Jan 29, 2023

You should read the C# docs on a Hashset:

The HashSet<T> class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
Click to expand...

Emphasis on the 'no particular order' part.

You should also just look at the docs of Hashset which shows you it's available methods: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1?view=net-7.0

Long and short is, don't use a hash set as you'll be in way over your head for your experience level, and I don't think it's especially necessary here. This is also a case of premature optimisation: ergo, don't do it.

maxreee · Jan 29, 2023

spiney199 said: ↑

Long and short is, don't use a hash set as you'll be in way over your head for your experience level, and I don't think it's especially necessary here. This is also a case of premature optimization: ergo, don't do it.
Click to expand...

ah alright i guess i've gone a bit over my head with this one
maybe using list wont lag my game that much and i was just a bit too cautious
thanks for the pointer

halley · Jan 29, 2023

Every data structure has its own tradeoffs. They're usually awesome at one or two, but relatively suck at the others.

* how fast is adding elements?
* how fast is removing a particular element by index or value?
* how fast is finding a particular element by value?
* how fast is adding or removing convenient elements (first, or last)?
* do they keep elements in order of insertion, or order of value, or neither?

A List is O(1) for adding or removing elements at the end, O(n) for inserting or removing elements anywhere else, O(n) for finding an element by value, keeps things in a stable order as the caller adds or inserts them, and can have duplicate element values.

A HashSet is O(1) for adding or removing elements, O(1) for finding or removing elements by value, does not keep them in any particular defined order, and cannot have duplicate element values.

When the documentation says "in no particular order" it more accurately means "the order is not defined as far as the caller is concerned, the items are kept in whatever scheme is convenient internally." It does NOT mean they're in random order, it just means that they are not in an order you can rely on understanding.

If you really really really need the O(1) performance of a HashSet for adding and removing items, but you can put up with an O(n) performance for accessing or removing a randomly chosen one, then you can overcome that with your own pick-an-element code.

Pseudocode (not C# code):

index = Random Number in [0, set.Count);
while index > 0

element = set.Next
index--
set.Remove element

That surely sounds a lot like HashSet is faster than List, right? Well, not necessarily. It really depends a lot on the actual number of elements you have in the collections. A List might have better O(n) performance for lists below 100 elements, than a HashSet with its O(1) performance. It depends a lot on the compiler and the quality of the library code. The only way to know is to measure with your exact use case.

@spiney199 is correct: it's premature optimization most of the time. Don't focus on fixing the performance of List vs HashSet until you have MEASURED it with the profiling tools available, and have discovered that THIS EXACT ROUTINE is your worst performance headache.

lordofduct · Jan 29, 2023

HashSet - an unordered collection where every entry can exist only once (and null is not a valid entry)
List - an ordered collection where every entry exists at a specific index and can exist at multiple indices at once

The idea of a HashSet performing better than a List comes from the idea that a HashSet has more often an O(1) complexity to interact with.

HashSet is O(1) for:
Contains, Add, Remove

But is O(n) for:
Random Access

List on the other hand is O(n) for:
Contains, Remove, Insertion (note insertion doesn't make sense in HashSet since it's not ordered)

But is O(1) for:
Random Access, Add

What does O(1) and O(n) mean?

Well think of it like this... if you want to see if a list contains something. What do you think is the algorithm used? Well, you just loop over every entry in the list and you check if it matches what you're looking for. This means at most you'll need to compare EVERY element in the list (if it's the last entry). That's what the 'n' in O(n) is saying. It's saying its complexity is at most the size of the collection 'n'.

HashSet though, it doesn't have to loop everything. A HashSet, as the name implies, hashes objects when you add them. It basically calls "GetHashCode" on the object, and then uses that value to calculate a slot in itself. This means to test if it contains, it just hashes what you asked to see if it contains, and then looks in that slot. If it matches... then yes. This means it only has to do 1 task, and that's why it's O(1).

BUT... that doesn't mean these tasks are equal in calculation time. The act of calculating a hash takes time. If you have a list of 1 entry, and a HashSet of 1 entry. The List only has to compare 1 thing. The HashSet has to calculate the hash, find the slot, then compare. That's actually more work. HashSet's are slower for small collections. Lists are slower for large collections.

Personally... I don't consider the use of a HashSet in performance accept for extreme situations (like managing very large collections). When I used HashSet for small collections I'm using not for performance, but for a very specific feature of a HashSet. That being it only allows 1 entry of given identity.

What I mean is that a HashSet<int> can only have one value of '5' in it. A HashSet<GameObject> can only have my GameObject(Player) in it once. This is great if say I want to track some entities that enters a trigger box. An entity/rigidbody might be made up of multiple Colliders causing multiple OnTriggerEnter events for the same entity. By using a HashSet<T> of some type that represents my entity (say a Rigidbody, or an explicity Entity script). I can add it to the HashSet without need to check if it Contains it first which I would have to do in List since I can add the same entity twice.

Note this has NOTHING to do with performance. But rather the feature of how it behaves.

...

Also, in a tangent, back at your code. If we want to talk about performance/cleanliness of code.

2 things...

1) you're creating a new List every time you want to shuffle. That's wasteful. Why not reuse the existing List?
2) you've duplicated your code over and over, why not generalize that?

Here I'll use an extension class to generalize a shuffle method for List:

Code (csharp):

public static class ListExtensions

{

public static void Shuffle<T>(this IList<T> lst)

{

int j;

T temp;

int cnt = lst.Count;

for (int i = 0; i < cnt - 1; i++)

{

j = Random.Range(i, cnt);

temp = lst[j];

lst[j] = lst[i];

lst[i] = temp;

}

}

}

And your code above becomes:

Code (csharp):

public void Shuff()

{

keywordset1.Shuffle();

keywordset2.Shuffle();

keywordset3.Shuffle();

keywordset4.Shuffle();

keywordset5.Shuffle();

}

From what I'm reading of other people's response is that you're in the process of learning to code right now.

And I'd argue learning to write reusable code is probably more important at this point than prematurely optimizing your code.

And this example I just gave is a reusable code example.

maxreee · Jan 29, 2023

Thank you all for replying
ill keep what you said in mind

Search Unity

Unity ID

Useful Searches

Question How to randomize a HashSet?

maxreee

spiney199

maxreee

halley

lordofduct

maxreee