Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

Question Take a string and put spaces between words in that string

Discussion in 'Scripting' started by billygamesinc, Apr 3, 2024.

  1. billygamesinc

    billygamesinc

    Joined:
    Dec 5, 2020
    Posts:
    345
    I want to convert for example "ParseThisString" into "Parse This String". How would I be able to do that in the most efficient way possible? Is there some kind of string format command for this or would I have to split the string into different words first then join them together with spaces?
     
  2. zulo3d

    zulo3d

    Joined:
    Feb 18, 2023
    Posts:
    1,054
    StackOverflow:
    Code (CSharp):
    1.     string s=Regex.Replace("ParseThisString", "([A-Z])", " $1", RegexOptions.Compiled);        // using System.Text.RegularExpressions;
     
    Ryiah, Bunny83 and samana1407 like this.
  3. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,560
    @zulo3d gave you a great direct answer for your problem.

    I'd like to offer up a unityeditor specific alternative as I'm unsure your exact use (you didn't necessarily clarify if you only wanted to support camelcase, just that your example implied that).

    But since this is Unity, and you may be trying to 'nicify' your string in a similar format to how Unity does internally (see: variable names shown in editor). There is an editor-time only available function that does that for the various variable name formats:
    https://docs.unity3d.com/ScriptReference/ObjectNames.NicifyVariableName.html
     
    SisusCo, Bunny83 and Ryiah like this.
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    4,103
    Of course we should stress that this is an editor only solution that can not be used at runtime. Just to avoid the disappointment because the reader missed the unityeditor specific part :)

    edit
    ps: When you want to play around with regex expressions, you should check out regex101.com. It's a really great tool to setup and test regex expressions on the fly and see what groups and matches it will generate on a certain input.
     
    Sluggy and SisusCo like this.
  5. SisusCo

    SisusCo

    Joined:
    Jan 29, 2019
    Posts:
    1,343
    This can't properly handle inputs such as "UIElement" (would return "U I Element") or "Array2D" (would return "Array2 D").

    This can handle such cases:
    Code (CSharp):
    1. public static string NicifyVariableName(string input)
    2. {
    3.     int length = input.Length;
    4.     switch(length)
    5.     {
    6.         case 0:
    7.             return "";
    8.         case 1:
    9.             return input.ToUpper();
    10.     }
    11.  
    12.     int index = 0;
    13.     int stop = length;
    14.  
    15.     // skip past prefixes like "m_"
    16.     if(input[1] == '_' && length >= 3)
    17.     {
    18.         index = 2;
    19.     }
    20.     // handle property backing field
    21.     else if(input[0] == '<' && input.EndsWith(">k__BackingField"))
    22.     {
    23.         index = 1;
    24.         stop = length - 16;
    25.     }
    26.     // skip past "_" prefix
    27.     else if(input[0] == '_')
    28.     {
    29.         index = 1;
    30.     }
    31.  
    32.     var stringBuilder = new StringBuilder();
    33.  
    34.     // first letter should always be upper case
    35.     stringBuilder.Append(char.ToUpper(input[index]));
    36.  
    37.     // skipping first letter which was already capitalized
    38.     for(index++; index < stop; index++)
    39.     {
    40.         char @char = input[index];
    41.      
    42.         // If this character is a number...
    43.         if(char.IsNumber(@char))
    44.         {
    45.             // ...and previous character is a letter...
    46.             if(char.IsLetter(input[index - 1]))
    47.             {
    48.                 // ...add a space before this character.
    49.                 stringBuilder.Append(' ');
    50.                 //e.g. "Id1" => "Id 1", "FBI123" => "FBI 123", "Array2D" => "Array 2D"
    51.             }
    52.         }
    53.         // If this chararacter is an upper case letter...
    54.         else if(char.IsUpper(@char))
    55.         {
    56.             // ...and previous character is a lower case letter...
    57.             if(char.IsLower(input[index - 1])) //IsLower returns false for numbers, so no need to check && !IsNumber separately
    58.             {
    59.                 // ...add a space before it.
    60.                 stringBuilder.Append(' ');
    61.                 //e.g. "TestID" => "Test ID", "Test3D => "Test 3D"
    62.             }
    63.             // ...or if the next character is a lower case letter
    64.             // and previous character is not a "split point" character (space, slash, underscore etc.)
    65.             else if(length > index + 1 && char.IsLower(input[index + 1])) //IsLower returns false for numbers, so no need to check && !IsNumber separately
    66.             {
    67.                 switch(input[index - 1])
    68.                 {
    69.                     case ' ':
    70.                     case '/':
    71.                     case '\\':
    72.                     case '_':
    73.                     case '-':
    74.                         break;
    75.                     default:
    76.                         // ...add a space before it.
    77.                         stringBuilder.Append(' ');
    78.                         // e.g. "FBIDatabase" => "FBI Database", "FBI123" => "FBI 123", "My3DFx" => "My 3D Fx"
    79.                         break;
    80.                 }
    81.             }
    82.         }
    83.         // replace underscores with the space character...
    84.         else if(@char == '_')
    85.         {
    86.             // ...unless previous character is a split point
    87.             switch(input[index - 1])
    88.             {
    89.                 case ' ':
    90.                 case '/':
    91.                 case '\\':
    92.                 case '_':
    93.                 case '-':
    94.                     break;
    95.                 default:
    96.                     stringBuilder.Append(' ');
    97.                     break;
    98.             }
    99.  
    100.             continue;
    101.         }
    102.      
    103.         stringBuilder.Append(@char);
    104.     }
    105.  
    106.     return stringBuilder.ToString();
    107. }
    Proof.
     
    zulo3d likes this.
  6. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    4,103
    Actually the SO question linked by zulo has a regex solution for that as well, in a different answer.
     
    Ryiah and SisusCo like this.
  7. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,142
    You can also check out this tutorial of mine. It contains a class called SimpleSanitizer, which does what you need. It's not regex but it's small (edit: I meant to say for a non-regex solution), customizable, and extendable. You can find it under a spoiler button.
    Edit:
    It scans the original string one character at a time, but also looks ahead one character, and then uses state machine logic to determine when to insert space, capitalize a letter, or substitute underscore in a new string it's producing. For capitalizing letters, the underlying state assumes a "shift" bit (like the shift key), so it 'cleverly' decides when to "press the shift" (which is then consumed after parsing a letter), and also won't separate digits from each other, but will separate letters from numbers. A very simple machine that does its job in one go without producing (too much) garbage*, making it suitable for heavy duty operation.

    * Though it could be made even more garbage-friendly if ScannerState was persistent between calls.
     
    Last edited: Apr 5, 2024
    Bunny83 and SisusCo like this.
  8. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,450
    GPT-4 generated one that doesn't rely on LINQ.
    Code (csharp):
    1. public class StringFormatter
    2. {
    3.     public static string AddSpacesToSentence(string text, bool preserveAcronyms = true)
    4.     {
    5.         if (string.IsNullOrWhiteSpace(text))
    6.             return string.Empty;
    7.        
    8.         // This pattern will look for places in the string where a lowercase letter is followed by an uppercase letter and insert a space.
    9.         // The pattern also considers the scenario where uppercase letters are adjacent (considered as acronyms) and optionally prevents adding spaces between them.
    10.         string pattern = preserveAcronyms ? "(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])" : "(?<=[a-z])(?=[A-Z])";
    11.         return Regex.Replace(text, pattern, " ");
    12.     }
    13. }
     
    billygamesinc and SisusCo like this.
  9. SisusCo

    SisusCo

    Joined:
    Jan 29, 2019
    Posts:
    1,343
    @Ryiah Great use case for ChatGPT!

    One small detail that I think is still missing from that is returning "ID 1" for the input of "ID1". But GPT-4 was able to quickly remedy that as well when I asked it politely:
    Code (CSharp):
    1. public static string AddSpacesToSentence(string text)
    2. {
    3.     const string pattern = "(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z])(?=[0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])";
    4.     return Regex.Replace(text, pattern, " ");
    5. }
    Note that processing such long regular expressions will probably be quite inefficient - but as long as it doesn't need to go through hundreds of lines of text in one go, I think it should suffice just fine.
     
    Last edited: Apr 4, 2024
    Nad_B and Ryiah like this.
  10. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    730
    Regex can be optimized by the compiler by using the flag Compiled, which generates optimized IL code for the operation directly instead of interpreting it:

    Code (CSharp):
    1. private static readonly SpacesRegex = new Regex("(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z])(?=[0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])", RegexOptions.Compiled)
    2. public static string AddSpacesToSentence(string text)
    3. {
    4.     return SpacesRegex.Replace(text, " ");
    5. }
    :
     
    SisusCo likes this.
  11. SisusCo

    SisusCo

    Joined:
    Jan 29, 2019
    Posts:
    1,343
    Even with this optimization, in my experience it can take a long time for complex regular expressions to process 1000+ lines of text, and manually going through the string character-by-character can give orders of magnitude better performance. It probably depends a lot on the particulars of the situation though.
     
    Nad_B, orionsyndrome and spiney199 like this.
  12. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,142
    Regex is definitely cool, but I think it's more suited for a collaborative environment, i.e. someone else develops it, tests it, and shares it with the engineer who is at the moment doing something much more concrete. This is because if a regex is complicated, taking it lightly is not the best idea, and yet development of it can take a whole day, and purge a programming mind of any other context.

    For this reason alone, I'm always more open toward simpler and straightforward solutions (unless the regex itself is very simple or it is easy to find an working expression somewhere online, which is again similar to having a dedicated colleague who wasn't lazy to test it thoroughly with a large amount of data, and then optimize for better performance).

    And let's not talk about maintenance or a new programmer trying to understand the codebase. Regex is very cool, but perhaps too cool i.e. liquid nitrogen cool.
     
    Sluggy, Nad_B and SisusCo like this.