Search Unity

  1. All Pro and Enterprise subscribers: find helpful & inspiring creative, tech, and business know-how in the new Unity Success Hub. Sign in to stay up to date.
    Dismiss Notice
  2. Dismiss Notice

Getting the text without tags

Discussion in 'UGUI & TextMesh Pro' started by Yerkisk, Mar 22, 2020.

  1. Yerkisk


    Jul 28, 2015

    I've seen dozens of posts on the subject but none too recently. For a multiplayer chat functionality, I'm trying to remove all rich text tags from the TextMeshPro text value that a user type. I actually don't need any tags at all, so the point is to really clear the user-entered string. I can't put the resulting text as not being rich text, as I will be adding tags from the server-side, so all I want is to clear all tags from the input text.

    Problem is, most of what I've seen online so far seems to blindly remove everything in between '<' and '>' without caring if it's an actual tag or not. And if we don't do that and remove only some tags, someone could write <<noparse>noparse> and get away with it as only the first <noparse> would be considered a tag. We also cannot just append <noparse> … </noparse> on the text as any users could start their text with </noparse> and bypass this trick. So it has to be the removal of the tags.

    I've been using some iterative search through the text to make sure that doesn't happen, but surely there must be an easier / more performant way to do this. Performance-wise, I doubt that iterating over and over and rebuilding the string is gonna work with many users on the chat. Right now I'm searching iteratively only for <noparse> and </noparse> and then appending my own noparse tags to clean the string...which feels dirty.

    Any Assets that would strip tags that someone can recommend or is there a new feature of textmeshpro coming up that would allow me to get the text without the markup tags?

  2. MrLucid72


    Jan 12, 2016
    I'm still also looking for a solution. Did you find one? We are a social deduction game. People paste chat in their journal. Well, it pastes ALL THE TAGS! I have no idea the best way to intercept this. I use a ghetto tag remover that "sort of" works, but it's not ideal:

    Code (CSharp):
    1. // .....................................................................................................................
    2. /// <summary>
    3. /// VERY SIMPLE! If <> exists inside, may mess this up
    4. /// Don't forget this strips them COMPLETELY!
    5. ///
    6. /// </summary>
    7. /// <param name="richStr"></param>
    8. /// <returns></returns>
    9. public static string StripRichTagsFromStr(string richStr)
    10. {
    11.     try
    12.     {
    13.         StringBuilder sb = new StringBuilder(richStr.Length);
    14.         bool tag = false;
    15.         for (int index = 0; index < richStr.Length; index++)
    16.         {
    17.             char c = richStr[index];
    18.             if (tag)
    19.             {
    20.                 if (c == '>')
    21.                 {
    22.                     tag = false;
    23.                 }
    24.             }
    25.             else
    26.             {
    27.                 if (c == '<')
    28.                 {
    29.                     tag = true;
    30.                 }
    31.                 else
    32.                 {
    33.                     sb.Append(c);
    34.                 }
    35.             }
    36.         }
    38.         // -----------------------------------
    39.         string strippedStr = sb.ToString();
    40.         //Debug.Log(strippedStr);
    42.         return strippedStr;
    43.     }
    44.     catch (Exception e)
    45.     {
    46.         Debug.LogError("[Common]**ERR @ StripRichTagsFromStr: " + e);
    47.         return "";
    48.     }
    49. }