Search Unity

Animator StringToHash() performance, collisions?

Discussion in 'Scripting' started by Shushustorm, Feb 6, 2018.

  1. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    Hey everyone!

    I know there is StringToHash, but is there a way to access clips as a consistent array so that Unity doesn't have to compare potentially a lot of strings (or ints) when looking for animations? Or does StringToHash do exactly that by itself?

    Also, may there be two results of StringToHash that are the same? Should I check for collisions manually or does Unity check for this and return an alternate hash?

    Best wishes,
    Shu
     
  2. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    The term 'hash' in regards to the Animator is referring to a hash value associated with an entry in a hash table.

    Essentially an algorithm is defined for generating a hash value. Given an object (in this case a string/animation state) has a hash value calculated for it. Then when placed into the table, the index at which it is placed is determined by the hash (usually as the modulo of the hash with the length of the table, where said length is a prime number to reduce collisions). This allows for fast look up since the index can be determined from the hash (collisions can occur depending the algorithm, in which case buckets are used to determine equality... but the hash reduces the number of equality tests down significantly. Instead of testing all entries, you only test those whose modulo/index equals that of the modulo of another object's hash).

    A simple hash could be as basic as the index in said table/array. In which case buckets aren't even necessary.

    It could be more complex, such as the hash code generated by 'GetHashCode' on all .net objects. Which isn't guaranteed to be unique, and buckets are in use, such as with HashSet<T> and Dictionary<T1,T2>.

    In the end though... unity's algorithm is defined to be unique (as far as I know). It is speedier to look up by hash than by string. So what you should do is in your scripts, on 'Start' you get the hash for your string name of the animation state. And then you always call to play an animation by the hash.

    This will significantly reduce the overhead of looking up which animation you're looking for. Since the string would have to be either matched, or the hash calculated for the string. Where as if you have the hash pre-calculated, that overhead is gone.

    Here are articles about how hash tables work:
    https://en.wikipedia.org/wiki/Hash_table
    https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/

    Note I'm basing my assumption of the hash value being unique on this video:
    https://unity3d.com/learn/tutorials/topics/animation/animator-scripting

    Dude man says it's unique around 3:45.
     
    Last edited: Feb 6, 2018
  3. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    @lordofduct
    Thanks for your detailed answer!
    Concerning performance, it does seem to be the way to go!

    About the hash being unique:
    What confuses me is that it's not mentioned in the docs for StringToHash() ( https://docs.unity3d.com/ScriptReference/Animator.StringToHash.html ), whereas the docs for Shader.PropertyToID() ( https://docs.unity3d.com/ScriptReference/Shader.PropertyToID.html ) say the ID is unique.
    Maybe I will run some tests to check for collision, just in case.
    Maybe like this?:

    CheckForCollisions.cs
    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.Collections.Generic;
    4.  
    5. public class CheckForCollisions : MonoBehaviour {
    6.  
    7. string[] alphabet = {
    8.     "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"
    9. };
    10. string[] randomNames = new string[10000];
    11. int[] randomNamesHashes = new int[10000];
    12.  
    13. IEnumerator Start () {
    14.     while (true) {
    15.         RandomNames_Setup();
    16.         SetHashes();
    17.         CheckForCollisions_();
    18.         Debug.Log("10000 done");
    19.         yield return new WaitForSeconds(0.1f);
    20.     }
    21. }
    22.  
    23. void RandomNames_Setup () {
    24.     for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    25.     int nameLength = 0;
    26.     int letterNumber = 0;
    27.     for (int i = 0; i < randomNames.Length; i++) {
    28.         nameLength = Random.Range(5,20);
    29.         for (int j = 0; j < nameLength; j++) {
    30.             letterNumber = Random.Range(0,alphabet.Length);
    31.             randomNames[i] += alphabet[letterNumber];
    32.         }
    33.     }
    34. }
    35.  
    36. void SetHashes () {
    37.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    38.         randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    39.     }
    40. }
    41.  
    42. void CheckForCollisions_ () {
    43.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    44.         for (int j = 0; j < randomNamesHashes.Length; j++) {
    45.             if (randomNamesHashes[i] == randomNamesHashes[j]) {
    46.                 if (i != j) {
    47.                     Debug.Log("COLLISION");
    48.                 }
    49.             }
    50.         }
    51.     }
    52. }
    53.  
    54. }
    55.  
     
    Last edited: Feb 6, 2018
  4. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Unity's docs can be vague some times.

    It gets really annoying.

    The thing is... unity makes StringToHash available and suggests you use it. If it caused collisions, they wouldn't make it available (not that HashSet and Dictionary from microsoft don't have a way to access by the hash... because they know it collides, that would be bad design).

    With that said... who knows... unity forgets to document stuff sometimes. And they've made bad api design chocies in the past. They might have made a dumb mistake this time around and offered up a feature that might cause collisions. And maybe those odds are just like 1 in 4 billion or something.

    Thing is... I know lots of people use StringToHash, and I've NEVER heard of a complaint about it colliding anywhere.

    You can run some tests... but if there are collisions, you may not even find them.
     
    iileychen, ModLunar and Shushustorm like this.
  5. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    Alright, I am making the script run right now. (I edited my last post, I am running the script there on an empty GameObject in a new scene.)
    So far, 100 iterations and, indeed, 2 collisions. Thats 1/500,000.
    However, I don't know if the script I wrote is the right way to check.
     
  6. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    The script you wrote can create identical strings multiple times, since you're using Random to create it. Random can create non-unique sequences all the time.

    Give me a sec and I'll create a script that would be more accurate.
     
    Shushustorm likes this.
  7. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    You're right! Would this be a solution?
    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.Collections.Generic;
    4.  
    5. public class CheckForCollisions : MonoBehaviour {
    6.  
    7. string[] alphabet = {
    8.     "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"
    9. };
    10. string[] randomNames = new string[10000];
    11. int[] randomNamesHashes = new int[10000];
    12.  
    13. IEnumerator Start () {
    14.     while (true) {
    15.         RandomNames_Setup();
    16.         SetHashes();
    17.         CheckForCollisions_();
    18.         Debug.Log("10000 done");
    19.         yield return null;
    20.     }
    21. }
    22.  
    23. void RandomNames_Setup () {
    24.     for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    25.     int nameLength = 0;
    26.     int letterNumber = 0;
    27.     for (int i = 0; i < randomNames.Length; i++) {
    28.         nameLength = Random.Range(5,20);
    29.         for (int j = 0; j < nameLength; j++) {
    30.             letterNumber = Random.Range(0,alphabet.Length);
    31.             randomNames[i] += alphabet[letterNumber];
    32.         }
    33.     }
    34.     for (int i = 0; i < randomNames.Length; i++) {
    35.         for (int j = 0; j < randomNames.Length; j++) {
    36.             if (randomNames[i] == randomNames[j]) {
    37.                 if (i != j) {
    38.                     Debug.Log("NAMES DOUBLE");
    39.                 }
    40.             }
    41.         }
    42.     }
    43. }
    44.  
    45. void SetHashes () {
    46.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    47.         randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    48.     }
    49. }
    50.  
    51. void CheckForCollisions_ () {
    52.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    53.         for (int j = 0; j < randomNamesHashes.Length; j++) {
    54.             if (randomNamesHashes[i] == randomNamesHashes[j]) {
    55.                 if (i != j) {
    56.                     Debug.Log("COLLISION");
    57.                 }
    58.             }
    59.         }
    60.     }
    61. }
    62.  
    63. }
    64.  
    65.  
    If there are more "COLLISION" than "NAMES DOUBLE", it's a collision due to StringToHash()?
     
    Last edited: Feb 6, 2018
  8. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    Just adding my 2 cents -- as already mentioned by @lordofduct , it's quite possible that there are collisions, but they are handled in buckets. So, they are still resolved very quickly, especially if you have the hash already stored. It's not going to be bothering you :)
     
    Shushustorm likes this.
  9. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Here you go:

    Code (csharp):
    1.  
    2. using System.Collections;
    3. using System.Collections.Generic;
    4. using UnityEngine;
    5.  
    6. public class zTest02 : MonoBehaviour {
    7.  
    8.  
    9.    
    10.     private Dictionary<int, string> _words = new Dictionary<int, string>();
    11.  
    12.     void Start ()
    13.     {
    14.         System.Threading.ThreadPool.QueueUserWorkItem(this.DoWork);
    15.     }
    16.  
    17.     private void DoWork(object token)
    18.     {
    19.         const int NUM_OF_WORDS = 10000000;
    20.  
    21.         int collisions = 0;
    22.         for(int i = 0; i < NUM_OF_WORDS; i++)
    23.         {
    24.             int j = i;
    25.             string word = string.Empty;
    26.             while(j > 62)
    27.             {
    28.                 word += IntToChar(j % 63);
    29.                 j /= 63;
    30.             }
    31.  
    32.             word += IntToChar(j);
    33.  
    34.             int hash = Animator.StringToHash(word);
    35.             if(_words.ContainsKey(hash))
    36.             {
    37.                 collisions++;
    38.                 Debug.Log("COLLISION: " + _words[hash] + " : " + word);
    39.             }
    40.             else
    41.             {
    42.                 _words[hash] = word;
    43.             }
    44.         }
    45.  
    46.         Debug.Log("COUNT: " + collisions.ToString() + " : " + NUM_OF_WORDS.ToString());
    47.         Debug.Log("ODDS: 1 : " + (collisions > 0 ? (NUM_OF_WORDS / collisions).ToString() : "inf"));
    48.         Debug.Log("RATIO: " + (collisions / NUM_OF_WORDS).ToString());
    49.     }
    50.  
    51.     public static char IntToChar(int i)
    52.     {
    53.         if(i < 26)
    54.         {
    55.             return (char)(i + 65); //upper
    56.         }
    57.         else if(i < 52)
    58.         {
    59.             return (char)(i + 71); //lower
    60.         }
    61.         else if(i < 62)
    62.         {
    63.             return (char)(i - 4); //numerics
    64.         }
    65.         else
    66.         {
    67.             return '.';
    68.         }
    69.     }
    70.  
    71. }
    72.  
    I just ran it at 10 million words, and I got 0 collisions.

    Note, I ran only alpha numeric values and the period '.'. This is because these are really the only characters that should show up as state/parameter names.

    Also note, when I change my script to include other characters... collisions start occurring. I did run it with out numerics and I got a few collisions on names that had repeated periods in a row; "....a" for example.

    I didn't run for more, because it takes a while. But to me it seems they try to avoid collisions within the allowed character set. I think the repeated periods caused issues because really you shouldn't have that (periods relate to nested states... so you should always have a name in between periods, not ....).

    This says to me there 'might be collisions. But odds are slim. As long as you stay within the approved naming schema.
     
  10. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    OK....

    I just ran just alpha-numeric (no periods, to rule out the .... clashing).

    I did 2,147,483,647 words (that's int.MaxValue).

    2 billion words.

    I'm getting collisions right now... it's still running (going to take a while. I'm currently at 10 gigs of memory consumed... DO NOT RUN THIS unless you have a lot of memory in your machine. I have 64 gigs of memory, so I can do it).

    They're still not super common though.

    When it's done I'll post the odds.
     
  11. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    lol
     
    ModLunar likes this.
  12. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Were not talking about bucket collisions.

    We're talking about 2 identical hashes calculating the same for 2 different names from StringToHash.

    This sort of collision could cause an issue in Animator.

    What I hope to do is after running this, getting clashing string (they seem to calculate the same from run to run). And see what happens if you pass in the hash to Animator.
     
    BrainSlugs83 and Shushustorm like this.
  13. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    I just updated the script in my last post so that it actually runs.
    205 iterations of this, 10 NAMES DOUBLE, 14 COLLISION.
    So 4/2,050,000 chance of collision for this script?

    Thanks for testing as well! Interesting to see that it seems to depend on the values used for the names.
    Personally, I use "_" a lot. Something like "Throw_Main_Heavy1."
    If chances were at about 1/500000, and I use 200 animations, is the collision chance at 500000/200% = 1/2500 for each time starting the scene?

    Wouldn't Unity just use one of the two animations then?
     
    Last edited: Feb 6, 2018
  14. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    Yes, I stand corrected. I tried to google this, but came up short. Now I'm curious what will happen, too.
     
  15. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    We don't know that...

    Note, I plan to stick those names into an animation controller. There's several things that could happen.

    1) it just plays 1 of the 2, becuase it just finds the first match.

    2) unity could freak out if you put in a name that hashes the same as another. Causing either a warning message that your names are not valid, or even crashing unity, or who knows what else.

    3) unity's algorithm may rely on known state names. And by adding entries to an animation controller adjusts the algorithm, rectifying for the collision.

    We don't know... note, we've been calling 'StringToHash' on names that don't exist in known animation controllers. So... we gotta see what happens.

    ...

    Also, I'm making pulled pork right now... so I may be spurratic.

    I am interested in seeing where this goes too. Cause yeah, I googled early on as well and noticed no information on it.
     
  16. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    That's what I was thinking. I'm hoping it's #2 or #3 (with #2 being a warning not a crash lol). At least those are workable solutions. #1 doesn't sound great.
     
  17. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    I changed the script to test on 200 strings at once and added some characters to the alphabet, which is more like a realistic scenario.

    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.Collections.Generic;
    4.  
    5. public class CheckForCollisions : MonoBehaviour {
    6.  
    7. string[] alphabet = {
    8.     "a","b","c","d","e","f","g","h","i","j","k","l","m",
    9.     "n","o","p","q","r","s","t","u","v","w","x","y","z",
    10.     "0","1","2","3","4","5","6","7","8","9","_","."
    11. };
    12. string[] randomNames = new string[200];
    13. int[] randomNamesHashes = new int[200];
    14.  
    15. int iterations = 0;
    16. IEnumerator Start () {
    17.     StartCoroutine(UpdateState_Iterations());
    18.     while (true) {
    19.         RandomNames_Setup();
    20.         SetHashes();
    21.         CheckForCollisions_();
    22.         iterations++;
    23.         yield return null;
    24.     }
    25. }
    26.  
    27. IEnumerator UpdateState_Iterations () {
    28.     YieldInstruction wait1 = new WaitForSeconds(60.0f);
    29.     while (true) {
    30.         Debug.Log("200 done: " + iterations.ToString());
    31.         yield return wait1;
    32.     }
    33. }
    34.  
    35. void RandomNames_Setup () {
    36.     for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    37.     int nameLength = 0;
    38.     int letterNumber = 0;
    39.     for (int i = 0; i < randomNames.Length; i++) {
    40.         nameLength = Random.Range(5,20);
    41.         for (int j = 0; j < nameLength; j++) {
    42.             letterNumber = Random.Range(0,alphabet.Length);
    43.             randomNames[i] += alphabet[letterNumber];
    44.         }
    45.     }
    46.     for (int i = 0; i < randomNames.Length; i++) {
    47.         for (int j = 0; j < randomNames.Length; j++) {
    48.             if (randomNames[i] == randomNames[j]) {
    49.                 if (i != j) {
    50.                     Debug.Log("NAMES DOUBLE");
    51.                 }
    52.             }
    53.         }
    54.     }
    55. }
    56.  
    57. void SetHashes () {
    58.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    59.         randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    60.     }
    61. }
    62.  
    63. void CheckForCollisions_ () {
    64.     for (int i = 0; i < randomNamesHashes.Length; i++) {
    65.         for (int j = 0; j < randomNamesHashes.Length; j++) {
    66.             if (randomNamesHashes[i] == randomNamesHashes[j]) {
    67.                 if (i != j) {
    68.                     Debug.Log("COLLISION");
    69.                 }
    70.             }
    71.         }
    72.     }
    73. }
    74.  
    75. }
    I have to go now, but I will probably let this run tomorrow for some time.
    Maybe even for Shader.PropertyToID.
     
  18. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    So unfortunately it is case 1.

    Just tested it... and yeah... it just plays the first one with that hash.

    Not really cool I must say.

    But honestly... I'm not super surprised with Unity's track record.
     
  19. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    Well, your work has provided some insight, on the off chance someone ever gets stuck with an animation that never plays. :)
     
    BrainSlugs83 likes this.
  20. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    So last night I'm laying in bed.

    And I'm like, "wait, if they're using the hash to get the index in a table... that means calling 'Play' by the state name just calls StringToHash for us. Do they not have buckets and not check equality? Are they assuming uniqueness???"

    So this morning I tested.

    Named 2 states values that evaluate to the same hash.

    And sure enough.... it only plays the first of the 2 states when calling by string.

    So this problem technically effects even calling it by strings.

    So really... caching the StringToHash is still preferred because the issue still arises for both.

    Not cool.

    I'd consider this a bug.
     
  21. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Also... pulled pork came out amazing!
     
  22. methos5k

    methos5k

    Joined:
    Aug 3, 2015
    Posts:
    8,712
    You can submit it as a bug, and see what they say? :)
     
  23. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,332
    StringToHash is also used for SetBool, SetTrigger, SetFloat, etc. So if you want to save some milliseconds-fractions, you can make a global cache of the hash results and use that instead of the strings. With a lot of animators in the scene I've seen StringToHash be a not insignificant part of the frametime.

    Note that doing that causes error messages for missing parameters to show the hash value instead of the string, so it's got some downsides when developing.
     
    Shushustorm likes this.
  24. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    Collision seems to happen with Shader.PropertyToID as well, I will file a bug report.
     
  25. Shushustorm

    Shushustorm

    Joined:
    Jan 6, 2014
    Posts:
    1,084
    The developers say, for Animator.StringToHash(), this is "as intended", because CRC32 is used, which seems to collide.
    Shader.PropertyToID() is not supposed to collide, since it should use an incrementing value.
     
  26. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Well "as intended" means that it's not usable since it means you can't actually reliably call into Animator via the hash to do anything.

    What's the point of StringToHash in that regard then?

    Sure it's intended in that that engineering choices made led to the fact that this would occur. But that's like saying the leaky dam leaks as intended because we used perforated metal to build the dam. Since perferated metal leaks, then the dam is leaking as intended. Sure... but it's not a very good dam... as intended?
     
  27. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,332
    There are going to be collisions for any string-to-int hash. There are more strings than ints.

    So the bug isn't that there's collisions, it's that the Animator doesn't handle collisions. Adding collision handling would slow down the lookups, though. So unless the algorithm gives collisions for names that are used for animation names, it's not a problem?

    There should probably be an editor time warning/error that there are collisions in the parameter list, since that would be a free check.
     
    Shushustorm and DonLoquacious like this.
  28. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Yes, there are chances for collisions as you say just due to the sheer numbers (though I can think of hashing algorithms that would always be unique upto N entries. Such as an incrementing index hash algorithm).

    Eitherway though, calling anything by hash is unreliable, which means it's basically useless.

    But worse, like my testing showed, even calling by name doesn't work either. If you have 2 names that when hashed collide, it will still match the first of the 2 every time, regardless, because calling by the name really just calls StringToHash and they match by hash and don't bother name-to-name testing.

    Basically:
    Code (csharp):
    1.  
    2. public void Play(string name)
    3. {
    4.     Play(StringToHash(name));
    5. }
    6.  
    7. public void Play(int hash)
    8. {
    9.     //does not know 'name', so therefore just matches by hash
    10. }
    11.  
    So this means there is NO reliable way to call Animator. String or hash. And there's no way to know you have a collision without comparing the hashes.

    In my book. That's a bug.

    IMO... you shouldn't be able to lookup by hash, and when passing in name it should compare against the name after looking up by hash (buckets). If they really want the hash speed-up, allow passing in both hash and name, so that it can still compare within that bucket.

    Of course, I don't use Mecanim, so... yay?
     
    Last edited: Feb 12, 2018
    Shushustorm likes this.
  29. DonLoquacious

    DonLoquacious

    Joined:
    Feb 24, 2013
    Posts:
    1,667
    Yeah, there may be collisions, but the great great majority of those collisions happen with comparisons like "4rFE^$!#743f" and "heewFR#$%1QW". No one's ever going to hit those. The number that actually collide and make some sort of sense as property names are probably absolutely tiny- so tiny as to be negligible. This is one of those problems in which the leaky dam is the optimal decision- any performance impact at all from accounting for such an infinitesimal number of collisions is too much to be worth it.

    Besides, this isn't like some case where you're rolling dice each time you make the function call- if it returns the wrong animation from the hash, that's going to be immediately noticeable and it'll do it 100% of the time until you change it. It'd be a pain in the ass to debug maybe, sure, but eventually you'll just change the name and it'll stop colliding and life resumes. It's not the kind of thing that's ever going to make it into a release on accident, so, who cares? This is far from "unusable". You would've gone your entire career using it just fine if you hadn't happened upon this thread- that's pretty great reliability IMO.
     
  30. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    I feel ya to an extent on the odds of collision.

    BUT:

    1) it's not documented, but instead by allowing you to access by hash it implies that it's unique (and even says it is in some tutorials like the one I linked)

    2) I've listed numerous fixes to it that have little or no overhead. I think the best one that would keep the efficiency as is, is to just have an editor time check that warns if any name pairs do collide.

    The fact they play it off like it is unique, and they offer zero warning/information for dealing with a collision makes this very buggy in my book.

    Unity has a history of designing interfaces that are very "novice" friendly, and presumes limited skills on the part of the programmers. It's very "beginner" friendly.

    If I was approaching this from a expert position, I'd say just having the 'access by hash code' methods is a giant code smell. But from a beginner perspective, sure, it sounds like some extra efficiency.

    Problem is... IF a collision occurs, a novice wouldn't exactly know WHY it's happening. They call:
    Code (csharp):
    1. Play("BLARGH")
    And yet the animation "POOPER" plays instead? What's going on? I don't know anything about these hashes... heck, I'm not even calling by hash... so I wouldn't even to think hashing has anythign to do with it. I'm a novice, I know very limited. And the documentation it says NOTHING on the topic.

    Note... I've used Dictionary<string, ...>, and it doesn't have this issue. So why is Animator doing this?

    I google... and there's nothing (well, maybe this thread will appear now that it exists... but prior to this thread, there's literally nothing about it out there).

    Sure I lordofduct will probably figure out what the problem is (well... I already did, clearly). But what about the huge numbers of novice users? Of course, the odds are slim this will even occur... but, since it can, and there's an easy way to check for it, and all it really needs is like a short paragraph in the documentation... I mean hell, the mere fact they have that code smell of an interface is just... well... as an expert when I saw it I just presumed the engineers behind it would have been smart enough to do the collision testing in the background in some manner. But nope... nothing... naughta... I mean come on. This isn't rocket science here.

    I mean hell... just having documentation would lend me to not call this a bug.

    But they didn't even do that.

    That's bad design in my book... and worse, it's bad design counter to the already bad design up to this point that they commit. They made tons of bad decisions to make life "easy" for beginners... but now make a bad design that doesn't even make life easy for beginners... this choice is just bad from all angles.

    This isn't really a gripe about the actual odds of the situation... but more a gripe of the little to no documentation, the code smell of an interface, and they act like this is "intended behaviour". OK... if it's intended, then at least explain that in the documentation. Something as simple as:

    That's all they needed to do. Editor time collision testing would just be icing on the cake.

    So ok... maybe not a 'bug' perse. But it's lazy.

    Hell, what does the documentation even say?
    They only phrase it in the context of Animator parameters. Not even state names or anything else. Which yes, they do not call 'parameter ids', but instead call them 'state name hash' in the documentation:
    https://docs.unity3d.com/ScriptReference/Animator.Play.html

    (and don't even ask me what 'changes the current state time' actually means...)
     
    Last edited: Feb 12, 2018
  31. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,332
    Oh, yeah, the docs for the Animator is consistently lacking.

    I just... When we shipped our last game, there was in the ballpark of 250 different animator parameters used in the entire game. I made a global cache of Animator.StringToHash of all of them, so it's countable.

    Now, the probability of getting a duplicate in 250 draws when the probability of a collision is 1:10^52 is... probably lower than the wrong animation playing due to radiation from space flipping bits? Also, those 250 are spread over very many animators. I don't think a single controller runs higher than 25 - the parameter list UI somewhat breaks at about 15-20 elements anyways (did I mention that the Animator is bad?).

    Maybe if you have a very, very complex animated character it'll get closer to 50 parameters. The point is, you're really worrying about a collision that will not happen in the real world. You could introduce hash buckets, and every single game ever made with Unity would have exactly zero buckets with more than one element in them. So that's meaningless overhead.

    Unity's doing enough things wrong with regards to the animation system that we should't be up in arms about non-issues. The animator controller's really poorly made, and most of it's design decisions are dumb. The decision to assume that the probability of collisions being so low in real-world situations that the overhead of buckets should be skipped is actually one of the few sound decisions made in the entire process.



    Or, well, serializing a param-name-to-index dictionary, and using indices instead of hashes would probably be a better design. I guess they wanted to avoid some bookkeeping?
     
    laurentlavigne and ModLunar like this.
  32. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    The point of my last post is that I'm not worrying about that. As I said, I don't even use Mecanim (I still use legacy animation).

    My point is that they dropped the ball on documentation, they created an interface full of code smell that allows you to access by hash implying uniqueness, and they don't even check for collisions at editor time.

    The argument that it should just be known strings hashes will collide is really the crux of it. Yeah, I should know, and so should Unity. And maybe like stick that in the documentation???

    Yeah, that's why I just basically consider this just more evidence of what they're doing wrong.

    As I said though, documentation alone would have lowered my ire for the entire issue. That's all I would have asked for at the barest of minimum. But we didn't even know what algorithm was used until they responded to the bug issue.

    Note, I'm not trying to rail on about this. I'm not creating bug issues, or starting blog posts, I'm not even the one who submitted a bug. Hell my primary concern during this thread was how good my pulled pork was going to be, and not the actual issue.

    I just said, I consider this a bug. You guys said 'nope'. And I'm defending my stance. Cause it is a bug, and it is code smell, and it does have a very simple fix.

    Just document it!

    Done.

    That's my only request.

    Documentation.

    Especially since some documentation (actual videos from Unity) actually say the complete opposite (I sourced that material earlier).

    I guess you could argue this thread can serve as that documentation. But I think something more readily available like the official documentation is better.

    I'm sorry, I can't defend Unity in this regard. I'm not going to say "oh, this issue is a lesser issue than all the other issues I have with Unity so thusly I won't have a negative opinion." No... I have a negative opinion of it. I have even more negative opinions of other issues as well, that are probably more important. But if asked (this isn't my thread), yeah, I'm going to call this out as what it is.

    It's fine if you don't share my opinion of the issue.

    But that's all this is. It's my opinion, and you telling me that I shouldn't worry so much. Well I'm not worrying... I'm just doing what lordofduct does. Bitching about stupid S*** that has really simple fixes but the powers that be are too lazy to fix.
     
    Last edited: Feb 13, 2018
    Shushustorm, lclemens, SMHall and 2 others like this.
  33. HHess

    HHess

    Joined:
    Jan 8, 2019
    Posts:
    5
    I think your math is off. In the example picture below, (we are using an 32bit integers) the chance of 250 values to have a collision is somewhere around 1 in a 100000.


    Source: https://preshing.com/20110504/hash-collision-probabilities/

    Also, when I look at the implementation:
    Code (CSharp):
    1. [NativeMethod(IsThreadSafe = true, Name = "ScriptingStringToCRC32")]
    2. [MethodImpl(MethodImplOptions.InternalCall)]
    3. public static extern int StringToHash(string name);
    It looks like they use CRC32 as a hashing function.

    Source: https://web.archive.org/web/20120722074858/http://bretm.home.comcast.net/~bretm/hash/8.html

    With this in mind, the chance for a collision might be even higher.

    The funny thing is, that I think they can not easily change the implementation as it would randomly break games somewhere in the world...
     
    Last edited: Jan 14, 2019
  34. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,983
    Ok since this issue is mentioned several times I'd like to clear up some facts. Yes the documentation is often misleading / wrong. However in this case I can't really see anything wrongly documented. Note that there are two methods which look similar, but do something completely different. There's
    Animator.StringToHash
    and
    Shader.PropertyToID
    . They are not the same. Both methods do exactly what is described in the documentation.

    Animator.StringToHash like the name suggests calculates a hash value from the passed string. Hashes always can have collisions since they map a way larger set of input data to a way smaller output. The method is static and does not take any animations into account that are somewhere in the project.

    Shader.PropertyToID actually returns an ID. In this case the returned integer is not a hash but rather an ID. Unity builds up a lookup table as you are using this method. The IDs return are most the time continuously increasing. When you use the same string twice or more often it will always return the same ID. Of course it could only manage 2^32 different strings but that should be enough. As the documentation actually said, the returned IDs are unique during one sesstion / run but are not globally unique. So you don't get any collisions on one machine in one run, but the IDs can not be stored for usage on other machines or later sessions. They can be generated in Awake and used during the whole run of the game.

    I created a test script that actually uses a Dictionary<int,string> to detect collisions. I generated strings in a brute force manner starting with a single char and just going up. I ran several tests with different character set restrictions. First I simply used the whole 0-255 range. So once the single char reaches 255 it wraps back to 0 and we add another char. I perform a manual "carry" between the chars. I add all hashes into the dictionary if it's not in the dictionary yet. If we calculate a hash that is already in the dictionary, we have a collision and can even tell which two strings have an identical hash. The method "Shader.PropertyToID" did not generate any collisions within the tested 35 million samples. Animator.StringToHash did produce several collisions.


    Range | samples | collisions in percent
    -------------------------------------------------
    0 - 255 | ~35M | 0.36%
    32 - 126 | ~35M | 0.48%
    alphaNum | ~35M | 0.11%


    Of course those 35 million samples just cover strings up to about 3 - 4 characters. Here are a few examples from the second range:


    hash | str1 | str2
    -------------------------------
    1596111053 | RK. | &"!!
    1591792378 | SK. | &#!!
    295781437 | SL. | &#&!
    1177812517 | SR9 | &#86


    Here are a few examples from the alpha numeric range. This includes only a-zA-Z0-9 and space


    hash | str1 | str2
    -------------------------------
    1707915863 | usk2 | V9hab
    -1295671495 | 9vk2 | Vumab
    -17008217 | tIn3 | V8Rdc
    924056769 | zty3 | V6osc


    To sum up the results:

    Yes Animator.StringToHash is just a very basic hashing algorithm and can produce collisions. However the chance that two meaningful words get the same hash is quite small. Collisions between animation clip names and animation property also shouldn't really matter. Yes there is still a tiny chance that two animation names resolve to the same hash.

    I would simply recommend to check all your used animation names and properties with a simple tool like this:

    Code (CSharp):
    1.  
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using UnityEditor;
    5. using System.Text;
    6.  
    7. public class CheckNameCollisions : EditorWindow
    8. {
    9.     string names = "";
    10.     string results = "";
    11.     Vector2 scrollPos;
    12.     bool showAllHashes;
    13.  
    14.     [MenuItem("Tools/AnimatorNameCollisions")]
    15.     static void Init()
    16.     {
    17.         GetWindow<CheckNameCollisions>();
    18.     }
    19.  
    20.     void Check()
    21.     {
    22.         Dictionary<int, List<string>> hashes = new Dictionary<int, List<string>>();
    23.         foreach (var str in names.Split('\n'))
    24.         {
    25.             var name = str.Replace("\n", "").Replace("\r", "");
    26.             int hash = Animator.StringToHash(name);
    27.             List<string> tmp;
    28.             if (!hashes.TryGetValue(hash, out tmp))
    29.             {
    30.                 tmp = new List<string>();
    31.                 hashes.Add(hash, tmp);
    32.             }
    33.             tmp.Add(name);
    34.         }
    35.         StringBuilder sb = new StringBuilder();
    36.         sb.AppendLine().AppendLine("--------");
    37.         int count = 0;
    38.         foreach(var kvp in hashes)
    39.         {
    40.             if (showAllHashes || kvp.Value.Count > 1)
    41.             {
    42.                 sb.Append("H: ").Append(kvp.Key.ToString("X8")).AppendLine(":");
    43.                 foreach (var name in kvp.Value)
    44.                     sb.Append("    ").AppendLine(name);
    45.                 sb.AppendLine();
    46.                 if (kvp.Value.Count > 1)
    47.                     count++;
    48.             }
    49.         }
    50.         if (count > 0)
    51.             sb.Insert(0, "Found collisions: " + count + ":");
    52.         else
    53.             sb.Insert(0, "No name collisions found");
    54.         results = sb.ToString();
    55.     }
    56.     void OnGUI()
    57.     {
    58.         if (results == "")
    59.         {
    60.             scrollPos = GUILayout.BeginScrollView(scrollPos,false, true);
    61.             names = GUILayout.TextArea(names, GUILayout.ExpandHeight(true));
    62.             GUILayout.EndScrollView();
    63.             GUILayout.BeginHorizontal();
    64.             GUILayout.FlexibleSpace();
    65.             showAllHashes = GUILayout.Toggle(showAllHashes,"show all hashes");
    66.             if (GUILayout.Button("Check"))
    67.             {
    68.                 Check();
    69.             }
    70.             GUILayout.Space(10);
    71.             GUILayout.EndHorizontal();
    72.             GUILayout.Space(5);
    73.         }
    74.         else
    75.         {
    76.             GUILayout.Label("Results:");
    77.             scrollPos = GUILayout.BeginScrollView(scrollPos);
    78.             results = GUILayout.TextArea(results);
    79.             GUILayout.EndScrollView();
    80.             if (GUILayout.Button("OK"))
    81.                 results = "";
    82.         }
    83.     }
    84. }
    85.  
    Just create a list with all your used names and put each name on a seperate line. The usage should be straight forward. Though I doubt that you will ever find a positive collision, even with hundreds of names. Just make sure you don't add the same name multiple times ^^. I could have filtered them out but I wanted to keep it simple (and also be able to test collisions).
     
    Shushustorm, iileychen and hippocoder like this.
  35. ch715t0

    ch715t0

    Joined:
    Jan 11, 2016
    Posts:
    4
    I'm late to the conversation, and might be a noob question, but why not just use a dictionary instead of stringtohash?
     
  36. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,332
    That's essentially what this is about. A Dictionary/HashMap will calculate the hash code of the keys you look up. That takes some time, and it's a bit silly to calculate eg. the hash code of the string "Attack" a million times when it's going to be the same thing every time.

    So the Animator's implementation allows you to only do the hash code calculation once (by calling StringToHash), and then re-using that result.
     
    Bunny83, iileychen and ch715t0 like this.
  37. Arithmetica

    Arithmetica

    Joined:
    Feb 11, 2019
    Posts:
    43
    A programmer who doesn't care about hash collisions is a bad programmer without exception.
     
  38. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    I wouldn't go as far to say they're a bad programmer...

    They could be a novice, which isn't bad, that's just novice.

    They could have weighed the gains/losses and determined it suitable for their needs.

    If I need to keep coy-dogs out of my chicken coop and some scrap gets the job done rather than a new roll of chicken wire... well... that doesn't make me a bad farmer. And sure it might fail... but I weighed those consequences when I put it up.

    Though I still stand by my stance that it's bad design to not at the very least tell people via documentation that you did it this way since the API is intended for public use.
     
    Bunny83 likes this.
  39. kinglotrich

    kinglotrich

    Joined:
    Jun 2, 2023
    Posts:
    29
    hi, I wanted to ask someting can ı use "byte jump = Animator.StringToHash("Jump");" or do ı have to use int ?
     
  40. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,332
    No.

    I mean you can, but you're essentially just dumping 3/4th of the calculated hash on the ground, so you can't use it to tell the Animator to play anything.

    Why would you want to?
     
    lordofduct likes this.
  41. kinglotrich

    kinglotrich

    Joined:
    Jun 2, 2023
    Posts:
    29
    for performance, ı check hash numbers are long dont fit in byte
     
  42. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    712
    Well Animator.StringToHash("Jump") already returns an int. That int is already allocated in memory. What you're doing is allocating another byte and copying the value of the int to it (if this is even possible, which is not, unless your explicitly cast it to byte)

    Also what kind of performance improvement are you expecting when using a byte instead of an int... At most it would be 0.00000001 milliseconds. In reality, about 0 milliseconds.

    Just use the int from Animator.StringToHash("Jump").
     
    Last edited: Sep 27, 2023
  43. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    It's not going to work because as Baste pointed out you're dumping 3/4th of the calculated hash.

    The only benefit is that you're storing the data in smaller space when you put it in whatever field/variable you put it in. (which we're talking about 3 bytes... if you have 100 hashes you're saving... 300 bytes. Are you running this on half a potato or something?)

    But you do so lossy. You lose information storing it this way. You can't reconstitute all that information again.

    If number returned from StringToHash is:
    11001101 111010011 01000111 11001001

    You then store:
    11001001

    And then you pass it in via 'Play' or something it converts back to int but converts back as:
    00000000 00000000 00000000 11001001

    Which is NOT the same number.

    Sure it works in that no errors are thrown. Just like driving your car off a bridge "works" in that the universe doesn't crash and delete from existence. But you still end up with negative results... your anim won't play (or worse the wrong anim plays if there's a hash collision for this new hash), and your car is totaled.
     
  44. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    712
    The thing is, he is implicitly converting the int to byte, for "performance".
     
  45. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    And the only performance would be memory space.

     
    Bunny83 and Nad_B like this.
  46. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    712
    And a compile time error, since he needs to explicitly convert an int to byte.
     
    lordofduct likes this.
  47. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    Ohhhh, I see what you're saying.

    You're referring to the lack of explicit cast which would result in a compiler error because C# doesn't allow implicit conversion to byte.

    My bad.
     
    Bunny83 and Nad_B like this.
  48. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    712
    No problem, I did notice this only after I read your reply, which is great btw. :)
     
  49. kinglotrich

    kinglotrich

    Joined:
    Jun 2, 2023
    Posts:
    29
    I dont know although ı carefully about performance ı get 80 fps on unity editor even in empty scene it give 120 fps should not it be like 1000+. I check my pc comopent they are at %3 usage, and when ı play game at backround ı get more fps in unity it is too weird
     
  50. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,527
    I have no idea what you're saying here, nor what it has to do with attempting to store the animator hashes as byte rather than the intended int.