Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Regex: C# and Non Greedy

Discussion in 'Scripting' started by Sbizz, Jan 21, 2015.

  1. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    Hey.

    So, I'm trying to solve an issue I have in a searching system. Hopefully someone has experience in Regex 'cause I don't understand why he doesn't work in C#...

    Soooo, the system is simple: it opens script files, does a regex to find what I want. End of story.

    Sample file:

    Code (CSharp):
    1. using UnityEngine;
    2. using UnityEngine.UI;
    3. using System.Collections;
    4. using System.Collections.Generic;
    5. using System.Xml.Linq;
    6. using System.IO;
    7. using System.Linq;
    8. using System.Text.RegularExpressions;
    9.  
    10. public class MyClass : MonoBehaviour {
    11.     void Start() {
    12.         MyClass.g("my first sentence");
    13.         MyClass.g("this is red sparta", Color.red);
    14.         MyClass.g("double test", MyClass.g("inside the double test"));
    15.         Debug.Log(string.Format("{0} {1}", MyClass.g("format test"), MyClass.g("inside the format test")));
    16.         Debug.Log(MyClass.g("debug power!"));
    17.     }
    18. }
    When I use my Regex on this file, I should have this result ;
    • my first sentence
    • this is red sparta
    • double test
    • inside the double test
    • format test
    • inside the format test
    • debug power!
    You will be able to find my Regex on regex101.com.

    As you can see, it works. But, when I use this Regex in C#, I can have "inside the double test" and "inside the format test"...

    My code:

    Code (CSharp):
    1. private void regexApply(string scriptFile) {
    2.     Regex r = new Regex(@"MyClass\.g\(\""(.*)\"".*", RegexOptions.RightToLeft);
    3.    string content = File.ReadAllText(scriptFile);
    4.  
    5.    MatchCollection matches = r.Matches(content);
    6.  
    7.    // Some occurences are found in the file
    8.    if (matches.Count > 0) {
    9.        foreach (Match match in matches) {
    10.            Debug.Log(match.Groups[0].Value);
    11.            Debug.Log(match.Groups[1].Value);
    12.            Debug.Log(match.Groups[2].Value);
    13.            Debug.Log("-----------------");
    14.        }
    15.    }
    16. }
    So if you have any clue...

    Thanks !
     
    Last edited: Jan 21, 2015
  2. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    Why is the word 'Language' in your regex?

    Also, personally, I find naming my groups makes pulling the value for that group much easier. This way if you change the regex at all... the index might change, but the name is the same.
     
  3. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    Just a mistake when I copied / pasted my code here. Sorry !
     
  4. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    that should fix your problem, when I remove it, it works.
     
  5. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    What ? Where did you try ? It doesn't work, I just changed Language to MyClass because my script is really big so I just simplified my code.
     
  6. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    Code (csharp):
    1.  
    2.     private void Start()
    3.     {
    4.         string sval = "void Start() {\nMyClass.g(\"my first sentence\");\nMyClass.g(\"this is red sparta\", Color.red);";
    5.  
    6.         var rs = @"\.g\(\""(.*)\"".*";
    7.        Debug.Log(rs);
    8.        Regex r = new Regex(rs, RegexOptions.RightToLeft);
    9.        var matches = r.Matches(sval);
    10.  
    11.        Debug.Log(matches.Count);
    12.        foreach(Match m in matches)
    13.        {
    14.            Debug.Log(m.Value);
    15.        }
    16.  
    17.    }
    18.  
     
  7. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    I'm curious about your result, can you print the log ?

    I fixed my problem, I read on Microsoft's website on how we can apply the option "Non Greedy" on Regex and it appears that you have to add "?" to quantifiers. The following Regex works:

    Code (CSharp):
    1. private string regexPatterns = @"Language\.g\(\""(.*?)\"".?[\),]";
     
  8. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377


    Not exactly sure what you expect the data to be pulled is... but it is grabbing what I would expect.

    There is the group in there that would be the text inside the string like 'this is red sparta'... which if you named the group would be really easy to pull out (or in this example is group at index 1).
     
  9. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    Okay, I just saw what you wrote ; you put some "\n". The issue appears when I'm looking strings inside double quote on the same line. So, if you try your code with the following sval:

    Code (CSharp):
    1. string sval = "void Start() {\nMyClass.g(\"my first sentence\"); MyClass.g(\"this is red sparta\", Color.red);";
    It's not gonna work ; either you will have "my first sentence" in group[1] or you'll have something like "my first sentence\"); MyClass.g(\"this is red sparta\", Color.red);".

    With the Regex I found, I can get the two strings even if they are on the same line.

    Thanks anyway :D
     
  10. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    I put the '\n's in there because I put the text inline in my code.

    \n is the C# string notation for a 'newline'. It's simulating a return line that would be in a regular text file.
     
  11. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    What is your expected result?
     
  12. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    This one :

    and it works with this code

    Code (CSharp):
    1. private void regexApply(string scriptFile) {
    2.     Regex r = new Regex(@"MyClass\.g\(\""(.*?)\"".?[\),]");
    3.        string directory = Path.Combine(Path.GetFullPath("."), "Assets");
    4.        string fileToLoad = Path.Combine(directory, scriptFile);
    5.   string content = File.ReadAllText(fileToLoad);
    6.   MatchCollection matches = r.Matches(content);
    7.   // Some occurences are found in the file
    8.   if (matches.Count > 0) {
    9.       foreach (Match match in matches) {
    10. for (int i = 1; i < match.Groups.Count; i++) {
    11.           Debug.Log(match.Groups[i].Value);
    12. }
    13.       }
    14.   }
    15. }
    You can try this code, just give the filename in parameters (the file should be at the root of the Assets folder).
     
  13. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,377
    You can simplify that out a lot more. You keep writing a regex that grabs stuff AFTER the text you want. Just grab the text you want.

    And again, name that group.

    Note, I added to the text a line where there's 2 entries on the same line.

    Code (csharp):
    1.  
    2.     private void Start()
    3.     {
    4.         string sval = "void Start() {\nMyClass.g(\"my first sentence\");\nMyClass.g(\"this is red sparta\", Color.red);\nMyClass.g(\"double test\", MyClass.g(\"inside the double test\"));";
    5.  
    6.         var rs = @"\.g\(\""(?<quote>.*?)\""";
    7.        Debug.Log(rs);
    8.        Regex r = new Regex(rs);
    9.        var matches = r.Matches(sval);
    10.  
    11.        Debug.Log(matches.Count);
    12.        foreach(Match m in matches)
    13.        {
    14.            Debug.Log(m.Groups["quote"].Value);
    15.        }
    16.  
    17.    }
    18.  
     
  14. Sbizz

    Sbizz

    Joined:
    Oct 2, 2014
    Posts:
    250
    Oh got it. I need this part : ".?[\),]" because if you have something like that : MyClass.g("this is \"a super\" test");, your Regex will cut the string ;)

    I'm gonna put the group name, it's much more cool like that :p