Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Best non Regex way to scan a string for tokens and replace them?

Discussion in 'Scripting' started by MikeUpchat, Apr 11, 2021.

  1. MikeUpchat

    MikeUpchat

    Joined:
    Sep 24, 2010
    Posts:
    1,056
    What would be the best non Regex way to scan a strings for tokens such as {playername} and replace them with another string, at the moment I just use the String.Replace method for each token but just wondering it that will get slow if I end up having dozens of token types one long strings.
     
  2. koirat

    koirat

    Joined:
    Jul 7, 2012
    Posts:
    2,068
    At runtime you can divide your string where the tokens are, and than recreate the string from this smaller strings and tokens values.
    All this using StringBuilder so you can initialize with proper length.
     
  3. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,915
    Well there are 3 factors that you should consider here:

    - how "long" is your actual string? If it's just a couple of KB it's probably not worth using something else. String.Replace would work just fine.
    - What do you consider many tokens? if it's around under 100 it's also not worth the hassle.
    - Finally How often do you need to replace those tokens in that string? If it's just during loading time or when the player changes his name, it's not worth searching for alternatives ^^.

    Of course each String.Replace call will create a new string with the replaced content. As I said if the string is just s couple of KB in size and even if you replace 100 tokens, one after another, the memory that is temporarily allocated is just
    lenghtOfString * numberOfTokens
    . In other words probably less than 1 MB.

    If the number and kinds of token is known beforehand, you could simply use String.Format and use "index based" tokens.

    Though If your text is really long (say closer to 100KB or 1MB) and you have a lot of token instances that you want to replace, it would make more sense to manually replace the tokens. For this method it's necessary that the tokens have a clear start and length or end marker like in your case
    {
    and
    }
    . The replacement is quite trivial. You create a StringBuilder which will store the result string. Now you simple iterate through the source string character by character and copy the characters over. When you find a start marker you will extract the token name up to the end marker, look up your replacement (probably with a Dictionary) and add your replacement text instead of the token.

    Essentially something like this:

    Code (CSharp):
    1.  
    2.         public static StringBuilder ReplaceTokens(string aSource, Dictionary<string, string> aTokens, StringBuilder aDest = null)
    3.         {
    4.             // if no destination string builder has been passed in, create one.
    5.             if (aDest == null)
    6.                 aDest = new StringBuilder(aSource.Length);
    7.             int sPos = aSource.IndexOf('{');
    8.             int ePos = -1;
    9.             // loop while we still have tokens
    10.             while (sPos >= 0)
    11.             {
    12.                 // copy everything between the last token and the next token to dest.
    13.                 aDest.Append(aSource, ePos + 1, sPos - ePos - 1);
    14.                 ePos = aSource.IndexOf('}', sPos);
    15.                 if (ePos < 0)
    16.                     throw new System.Exception("ReplaceToken: found token start at " + sPos + " but no token end was found");
    17.                 string tokenName = aSource.Substring(sPos + 1, ePos - sPos - 1);
    18.                 // if a replacement for the given token is found, add it.
    19.                 if (aTokens.TryGetValue(tokenName, out string replacement))
    20.                     aDest.Append(replacement);
    21.                 // search for the 'next' token start
    22.                 sPos = aSource.IndexOf('{', ePos);
    23.             }
    24.             // copy remaining text. In case of no tokens, this will simply copy the whole string
    25.             if (++ePos<aSource.Length)
    26.                 aDest.Append(aSource, ePos,aSource.Length-ePos);
    27.             return aDest;
    28.         }
    29.  
    30.  
    You can use this method like this:

    Code (CSharp):
    1.  
    2. string testStr = @"Good morning, and welcome to the {company} {transportSystem}.
    3. This {vehicle1} is provided for the security and convenience of the {company} {department} personnel.
    4. The time is {currentTime}. Current topside temperature is {temperature} degrees with an estimated high of {tempMax}.
    5. The {company} Compound is maintained at a pleasant {indoorTemp} degrees at all times.";
    6. var tokens = new Dictionary<string, string>() {
    7.                 { "company", "Black Mesa" },
    8.                 { "transportSystem", "Transit System"},
    9.                 { "vehicle1", "automated train" },
    10.                 { "department", "Research Facility"},
    11.                 { "currentTime", "8:47 A.M" },
    12.                 { "temperature", "93"},
    13.                 { "tempMax", "one hundred and five"},
    14.                 { "indoorTemp", "68"}
    15.             };
    16.  
    17. var res = ReplaceTokens(testStr, tokens);
    18. Debug.Log(res.ToString());
    19.  
    I've written ReplaceToken in a way so you can pass an existing StringBuilder as optional parameter. This can reduce memory allocations if you need to replace a lot of strings. You just have to set the Length property back to 0 before you reuse a StringBuilder instance. Otherwise you would append to the end of the last content.

    This example would essentially reproduce the first few seconds of the Half-life 1 intro announcement :)
    Note currently if a token name does not exist in the dictionary, the token would simply be removed. You could add an else to the dictionary lookup to handle that case. Maybe just add the token name back in. Though you can also throw an exception or log the token name into your own error log so you know that this token is missing.
     
    BenniKo likes this.
  4. MikeUpchat

    MikeUpchat

    Joined:
    Sep 24, 2010
    Posts:
    1,056
    Thanks for the great reply, guess for the number and length of strings I will stick with replace as most of the text is quite short, thanks again.