Search Unity

Compare large list<string>

Discussion in 'Scripting' started by S3dition, Dec 28, 2014.

  1. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    I have a list of words about 110,000 long in a text file that I need to compare against. I can load them into a list nice as fast (less than a second to populate the list), but I can't find an efficient way to compare the strings. So far I've tried:

    Contains (always returns true, no matter what)
    for (would take hours, locks up unity)
    Equals(same as for)

    I've tried an array as well, In each case, either the result is always true or unity just hangs and starts gobbling up memory. I'd appreciate another set of eyes or a better way of doing this:

    Code (csharp):
    1.  
    2. public void CheckWord ()
    3.         {
    4.  
    5.         for (int i = 0; i < dictionary.wordList.Count; i++) {
    6.                 if(Equals(dictionary.wordList[i], wordString.ToLower())){
    7.                 ClearLetterTiles ();
    8.                 Debug.Log ("Word Found!");
    9.             } else {
    10.                 ClearLetterTiles ();
    11.                 Debug.Log ("Word not found!");
    12.             }
    13.                 }
    14.  
    15.         /*
    16.                 if (dictionary.wordList.Contains (wordString.ToLower ())) {
    17.  
    18.                         ClearLetterTiles ();
    19.                         Debug.Log ("Word Found!");
    20.                 } else {
    21.                         ClearLetterTiles ();
    22.                         Debug.Log ("Word not found!");
    23.                 }
    24.                 //} */
    Creating the list/array:

    Code (csharp):
    1.  
    2.         void Load ()
    3.         {
    4.         Debug.Log ("Loading Words");
    5.         while((txt = reader.ReadLine()) != null){
    6.             wordList.Add(txt);
    7.             //Debug.Log(wordList);
    8.  
    9.        
    10.         }
    11.  
    12.         //wordList2 = dictionaryFile.text.Split ('\n');
    13.         reader.Close ();
    14.         }
    15.  
     
  2. toreau

    toreau

    Joined:
    Feb 8, 2014
    Posts:
    203
  3. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    Still wrapping my head around dictionaries. Besides, isn't a dictionary overkill? Just wondering.
     
  4. CrazyLikeAFox

    CrazyLikeAFox

    Joined:
    Mar 8, 2011
    Posts:
    71
    You say "Contains" always returns true. This should not be the case. Could you please post a code example of you using list.Contains( item ). Contains is O(n) and using a Dictionary might be better, since its O(1), But ~100,000 items is relatively little.
     
  5. kietus

    kietus

    Joined:
    Jun 4, 2013
    Posts:
    54
    Hello,

    If you can map an int id to a word, you can use :
    Code (csharp):
    1.  
    2. Dictionnary<int,string> wordsById;
    3.  
    4.   if (wordsById.ContainsKey(myWordID){
    5.     // Found
    6.     myWord = myWordID[myWordID];
    7. }
    8.  
    You can also try to split the list in smaller ones.
    Dictionnary<string,List<string>> wordsByFirstLetter;

    Knowing the first letter of the word allow you to search
    List<string> wordsA = wordsByFirstLetter["A"];

    And then you can parse the list. Try to save the list in a local variable, that will be faster. You can also add a break; to exit the loop when the word is found.

    For comparaison, you can also use LINQ, that will allow you to order the list, or use cool search option. But can need smaller sample, usually, LINQ is a bit slower than a List parsing, and List parsing is a bit slower than array. If the list is static, i will suggest to use Array rather than List.
     
  6. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    It's in the second code block I posted - it's currently commented out though.

    This works and populates the list (wordList)
    Code (csharp):
    1.  
    2. void Load ()
    3.         {
    4.         Debug.Log ("Loading Words");
    5.         while((txt = reader.ReadLine()) != null){
    6.             wordList.Add(txt);
    7.             //Debug.Log(wordList);
    8.  
    9.        
    10.         }
    This is the code with Contains():

    Code (csharp):
    1.  
    2. public void CheckWord ()
    3.         {
    4. if (dictionary.wordList.Contains (wordString.ToLower ())) {
    5.  
    6.                         ClearLetterTiles ();
    7.                         Debug.Log ("Word Found!");
    8.                 } else {
    9.                         ClearLetterTiles ();
    10.                         Debug.Log ("Word not found!");
    11.                 }
    12. }
    13.  
     
  7. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    I tried Linq without success and moved on. I don't recall entirely what the issue was with it. I'd rather not break up the list (I considered breaking it down alphabetically, but 110k items shouldn't be a big deal. I'd rather do it right than do a work around). I may add or remove items from the list, so I don't want to use an array unless I absolutely have to. I do count up the number of words at the moment, so I can use an array more or less dynamically as a last resort.
     
  8. kietus

    kietus

    Joined:
    Jun 4, 2013
    Posts:
    54
    Yes, you right, 110K should not be a big deal. Dictionnary will be faster than seeking in a List, but i agree a Dictionnary<string,string> is a bit weird.

    There a strange thing in your current loop. The not found statement is executed for all items that don't match your comparaison. If i'm not mistaken, you send 110K Debug.Log (+ execute ClearLetterTiles() ) .
    Move the not found statement outside of the loop should help a lot.
     
    S3dition likes this.
  9. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    DO'H!

    You're entirely correct. I was telling it send 110k debug logs. No wonder Unity broke. I can't believe I missed that.

    Well, it's actually comparing nice and fast now. However, I'm back to the problem that Contains() had. It always returns true, which is odd. I guess that's my next debug step.
     
  10. S3dition

    S3dition

    Joined:
    Jan 6, 2013
    Posts:
    252
    I figured out what was happening. I was accidentally clearing the string the player creates and replacing it with "". There was a single break in my word list that was being loaded in. Word 40,501 was "" and therefore always returning true, no matter what.

    In coding, it's always the little things that get you.
     
  11. CrazyLikeAFox

    CrazyLikeAFox

    Joined:
    Mar 8, 2011
    Posts:
    71
    Glad you found your bug! Always those off-by-one errors that get ya :)
     
unityunity