Search Unity

Compare string with wordlist

Discussion in 'Scripting' started by veleno94, Dec 6, 2015.

  1. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    Hello everyone!
    I need a script that, once clicked a button, compares a string with a list of words in a wordlist. And if the string is present in the wordlist, it is shown. How can I do?
     
  2. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749
    check out List<T>, and use List.Contains to check if there element is in the list.
    something like:
    and you need to add using System.Collections.Generic;
    to be able to use a list.
    Code (CSharp):
    1. public List<string> listOfStrings = new List<string>();
    2.  
    3. public bool ContainsString(string s)
    4. {
    5.     if(listOfStrings.Contains(s))
    6.         return true;
    7.     else
    8.         return false;
    9. }
     
    Kiwasi likes this.
  3. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    Can I use this procedure also if I have my wordlist in a .txt file?
     
  4. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749
    No, you will have to read your txt into a List...
    so but if it's a txt file, there are better solutions.
    you can get your txt file as a string, a string is an array of char's. so check the string array if it contains your string

    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.IO;
    4.  
    5. public class Example : MonoBehaviour
    6. {
    7.  
    8.     public string filePath;
    9.     string text;
    10.  
    11.     void Start()
    12.     {
    13.         text = File.ReadAllText(filePath);
    14.     }
    15.     public bool ContainsString(string s)
    16.     {
    17.         if(text.Contains(s))
    18.             return true;
    19.         else
    20.             return false;
    21.     }
    22. }
     
  5. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    Ok perfect. The only thing I haven't understand is how to insert my txt file in the script. Because if I declare a public varibile of type string (in your example "public string filePath"), in the inspector I can't associate a txt file, right?
     
  6. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749
  7. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    I tried to do this:
    Code (CSharp):
    1. void Start(){
    2.         text = File.ReadAllText (C:\Users\MyPc\Documents\MyGame\Assets\words.italian.txt);
    3.     }
    but it doesn't work. Maybe I misunderstood
     
  8. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749
    worked fine when i test it.
    put Assets/words.italian.txt in the File Path slot in the inspector. Make sure you file is a .txt file and is in your assets folder.
    you can test like this:
    Code (CSharp):
    1. using UnityEngine;
    2. using System.Collections;
    3. using System.IO;
    4.  
    5. public class ReadString : MonoBehaviour
    6. {
    7.     public string filePath;
    8.     string text;
    9.  
    10.     void Start()
    11.     {
    12.         text = File.ReadAllText(filePath);
    13.     }
    14.     public bool ContainsString(string s)
    15.     {
    16.         if(text.Contains(s))
    17.             return true;
    18.         else
    19.             return false;
    20.     }
    21.     void Update()
    22.     {
    23.         if(ContainsString("test"))
    24.         {
    25.             print ("jipéé");
    26.         }
    27.     }
    28. }
     
  9. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,539
    problem here is that the txt won't exist on build, since the asset isn't actually included anywhere, and the filePath may be different on the install.

    A 'txt' file in the assets folder, if used, can be referenced like as the 'UnityEngine.TextAsset' object:
    http://docs.unity3d.com/ScriptReference/TextAsset.html

    So your script would be better as:
    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Collections;
    4. using System.IO;
    5.  
    6. public class ReadString : MonoBehaviour
    7. {
    8.     public TextAsset text;
    9.  
    10.     public bool ContainsString(string s)
    11.     {
    12.         if(text.text.Contains(s))
    13.             return true;
    14.         else
    15.             return false;
    16.     }
    17.     void Update()
    18.     {
    19.         if(ContainsString("test"))
    20.         {
    21.             print ("jipéé");
    22.         }
    23.     }
    24. }
    25.  
    Then you drag the text asset into the 'text' property in the inspector for your script.

    Do note though, this is a indexed search, when you call contains. It's going to check EVERY index of the entire string. This creates a search loop that repeats N number of times, where N is the length of the text. Otherwise known as O(N) in complexity.

    This can be slow for large dictionaries.

    List would be slightly faster, since it'd repeat N number of times, where N is the number of words in the list. Also O(N).

    But that too can still be slow for large dictionaries (say you were making a scrabble game, the list is hundreds of thousands of words).

    If the list is short enough that storing it in memory is ok, it'd be better to put it in something like a HashSet<string>:
    https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx

    Where the 'Contains' method is O(1), since it just compares the hash of the string. As opposed to O(N).

    Although for large wordlists, this can have a huge memory implication. Where the best method would be a structured datastore, like a database. That can perform efficient queries on large lists. This is good for wordlists/dictionaries that are in the 100's of thousands.
     
    TaleOf4Gamers and jister like this.
  10. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    The txt file in which I have to verify if there is the word, is a set of all Italian words. Therefore, at this point, I'll have to use a datastore. Could you help me on how to do? Because I've never done anything like this
     
  11. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749
    As in the whole Italian language? :)
    may i ask what you are trying to achieve? maybe there is a better solution once the problem is better understood.
     
  12. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    The process should be this:
    - You type a word in the inputfield;
    - Verify (through the script) that the word is present in the Italian vocabulary
    - If the word is present, the program writes "word exists", if no writes "word does not exist"
     
  13. Baste

    Baste

    Joined:
    Jan 24, 2013
    Posts:
    6,338
    A HashSet should be fine. They allow for constant-time lookup of any hashable type, like string.

    Just create a HashSet<string>, and add all of your words to that. Check with the Contains method.
     
  14. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,539
    From research, the Italian language contains about 270,000 words.

    Lets estimate the average word length to be around 5 or 6. This would work out to about 1.5 million characters. Which is about 1.5 million bytes (a char takes up a byte).

    All loaded into a HashSet, the minimum space it will take up in memory (considering a HashSet often has some padding room for empty slots, and any word gap) is 1.5million bytes, or 1.5 mil/1024/1024 Mbytes, or 1.43 MB.

    That's not much actually. Just load that thing into memory.

    The wordlist text file is probably delimited in some way (comma delimited, space delimited, return line delimited, whatever). Just split on that delimiter, and build the set form that array:

    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Collections;
    4. using System.IO;
    5.  
    6. public class ReadString : MonoBehaviour
    7. {
    8.     public TextAsset text;
    9.  
    10.     private System.Collections.Generic.HashSet<string> _set;
    11.  
    12.     void Start()
    13.     {
    14.         var arr = text.text.Split(' '); //splits a space delimited wordlist, replace with whatever the delimiter is
    15.         _set = new System.Collections.Generic.HashSet<string>(arr);
    16.     }
    17.  
    18.     public bool ContainsString(string s)
    19.     {
    20.         return _set.Contains(s);
    21.     }
    22.     void Update()
    23.     {
    24.         if(ContainsString("test"))
    25.         {
    26.             print ("jipéé");
    27.         }
    28.     }
    29. }
    30.  
    I'd suggest a space, pipe, or comma delimited wordlist.

    Return lines in a lot of editors (especially on windows) inject a return char, as well as a linefeed char. Meaning you have to clean the string before splitting.
     
  15. veleno94

    veleno94

    Joined:
    Dec 5, 2015
    Posts:
    23
    Thanks a lot to everyone for your collaboration!:)
     
  16. alexchandriyaa

    alexchandriyaa

    Joined:
    Jan 18, 2017
    Posts:
    140
    its been long i had same doubts.
    how can i give conditions for comparing strings which i typed and the text document. below is my script
    using System.Collections;
    using System.Collections.Generic;
    using UnityEngine;
    using System.IO;

    public class GameController : MonoBehaviour {

    public static string currentWord;
    public Transform spellWord;
    public KeyCode RMB;
    //private string checkwordtype;
    public GameObject showPopup;
    public GameObject showPopup1;
    public string filePath = "Asset/words.txt";
    private string text;
    private int n = 0;

    public static List<string> firstSet = new List<string> (){"A","M","E","R","I","C","A" };

    public List<Transform> availableset1 = new List<Transform> ();
    public static int letternum = 0;
    // Use this for initialization
    void Start ()
    {
    text = File.ReadAllText(filepath);
    availableset1[0].GetComponent<TextMesh> ().text = firstSet[0];
    availableset1[1].GetComponent<TextMesh> ().text = firstSet[1];
    availableset1[2].GetComponent<TextMesh> ().text = firstSet[2];
    availableset1[3].GetComponent<TextMesh> ().text = firstSet[3];
    availableset1[4].GetComponent<TextMesh> ().text = firstSet[4];
    availableset1[5].GetComponent<TextMesh> ().text = firstSet[5];
    availableset1[6].GetComponent<TextMesh> ().text = firstSet[6];


    }
    public bool ContainsString(string s)
    {
    if(text.Contains(s))
    return true;
    else
    return false;
    }
    // Update is called once per frame
    public void Update ()
    {
    spellWord.GetComponent<TextMesh> ().text = currentWord;

    }
    public void Buttonfun()
    {
    if (ContainsString ("words"))
    {
    showPopup.SetActive (true);
    showPopup1.SetActive (false);
    }
    else
    {
    showPopup1.SetActive (false);
    showPopup1.SetActive (true);
    }
    }


    }
     
    Last edited: Feb 17, 2018
  17. jister

    jister

    Joined:
    Oct 9, 2009
    Posts:
    1,749