Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

CSV parsing/searching

Discussion in 'Scripting' started by DP00707, Jan 22, 2015.

  1. DP00707

    DP00707

    Joined:
    Aug 13, 2014
    Posts:
    29
    So I'm looking into the best way to parse a CSV file to test words against. My CSV is very long. Around 20k words. It is just words separated by comma values, ie (apple, air, aardvark, bus, bench, etc...). Now I tried using this script (http://wiki.unity3d.com/index.php?title=CSVReader). Throws no errors, but freezes Unity when I try to parse the CSV file.

    Looking around I found this: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

    This isn't Unity specific and requires a .dll (LumenWorks.Framework.IO.Csv). If what he is saying is true, this might be the fastest way to parse the CSV file (it reads a 45mb CSV file in 1.5 seconds).

    I've seen that Unity can use 3rd party dll's, with of course a warning that they might not all work. You can add them to the projects Plugins folder apparently. At least you can do that for some of them. Has anoyone tried this specific DLL with their projects?

    Then there is this option too, which I still need to test:
    http://answers.unity3d.com/questions/144200/are-there-any-csv-reader-for-unity3d-without-needi.html

    This is regex based, but according to the benchmarks in the Fast-CSV-Reader above, it would be slower.Perhaps it would be enough for my purposes, but I'm wary.

    The goal here is to read it and test against it. Any thoughts on which way to go?
     
  2. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    If it's really just words separated by comma values (no returns or anything), then all you need is:

    Code (csharp):
    1. var words = System.IO.File.ReadAllText ("somefile.txt").Split(","[0]);
    Then you get a String[] array of all the words.

    --Eric
     
  3. DP00707

    DP00707

    Joined:
    Aug 13, 2014
    Posts:
    29
    Yeah but the problem with that is, performance wise it is slow.

    if I do
    Code (CSharp):
    1. foreach (string x in words) {
    2.     if (x == wordtest) {
    3.         Debug.Log("True");
    4.     }
    5.     else {
    6.         Debug.Log("False");
    7.     }
    This chokes the game up. I'm not sure if it the reading of the file or if my problem is the method that I am using to compare values. Essentially I need to see if that exact word is in the file, not if it is somehow contained within a word. Of course being 20k entries to check against, this is absolutely a performance killer. I must be missing something. Any thoughts?

    One clue: Ok so if I let Unity hang and the word is contained, I get 1 true and 20kish false. If the word is false, I get all false, but again over 20k of them.
     
    Last edited: Jan 22, 2015
  4. Random_Civilian

    Random_Civilian

    Joined:
    Nov 5, 2014
    Posts:
    55
    Well, Get rid of the debug statements. If I remember correctly, they cause performance issues.

    Just increment a int and log it after the operation completes.
     
  5. DP00707

    DP00707

    Joined:
    Aug 13, 2014
    Posts:
    29
    Hey that is a good idea.

    Have changed it to:
    Code (CSharp):
    1. if (x == wordtest) {
    2.      tester ++;
    3. }
    4.  
    5.  
    6. if (tester >1) {
    7.      test = true;
    8. }
    Now I just toggle a bool value. Performance issues solved! Awesome, thanks!
     
  6. jgodfrey

    jgodfrey

    Joined:
    Nov 14, 2009
    Posts:
    564
    I assume you've already loaded the file-based word list into a collection of words (using whatever method you choose). It's not clear to me what kind of a collection you're working with here (likely an Array or a List). If that's the case, both collections have built-in methods to determine whether the collection contains a particular value.

    Here's some quick examples:

    Code (CSharp):
    1. string csvString = "foo,bar,baz";
    2. string[] wordArray = csvString.Split(',');
    3. List<string> wordList = wordArray.ToList();
    4. if (Array.IndexOf(wordArray, "bar") >= 0)
    5. {
    6.     // wordArray contains bar
    7. }
    8.  
    9. if (wordList.Contains("bar"))
    10. {
    11.     // wordList contains bar
    12. }
    You can certainly spin through the collection yourself to find the match. However, if you just want to know if the word exists in the collection, I'm not sure why you're spinning through the entire collection and counting the number of occurrences. Why not just set a flag and quit searching when the first match is found. Something like:

    Code (CSharp):
    1. bool found = false;
    2. foreach (string word in wordList)
    3. {
    4.     if (word == targetWord)
    5.     {
    6.         found = true;
    7.         break;
    8.     }
    9. }
    10.  
    11. if (found)
    12. {
    13.    // word was found
    14. }
    Jeff
     
  7. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    You can also convert the array to a HashSet for much faster look-up time.

    --Eric