Search Unity

Question Prevent TextMeshPro tags from being written in game

Discussion in 'UGUI & TextMesh Pro' started by RoeeHerzovich, Sep 19, 2020.

  1. RoeeHerzovich

    RoeeHerzovich

    Joined:
    Apr 12, 2020
    Posts:
    103
    I have a TextMeshPro InputField that I want to use, and of course, it is working.
    The problem is people can write IN GAME the following:
    Hello <color=red>there</color> and the tags will disappear and the "there" would become red.
    Now, I do want to use tags in my text, but I don't want players to be able to write whatever tag they want manually, I want it to be added in code, how can I do that? (I know how to add tags in code I just don't know how to prevent players from writing it themselves too)
     
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    38,727
  3. RoeeHerzovich

    RoeeHerzovich

    Joined:
    Apr 12, 2020
    Posts:
    103
    Rich text does work if you just write the tag, the problem using that(I assume) is that it won't let you do that in code either:

    What I want is the following-
    I want to be able to do that:

    Code (CSharp):
    1. // in some script of mine
    2. inputField.text = "<color=red>Red Text</color>"
    But I don't want to be able to do that:
    (Writing values into the input field in-game)
    <color=red>RedText</color>

    While the first example should result in a red text, because it was done Via a script.
    The second example should result in the literal text "<color=red>RedText</color>" because it was written in game and I don't want players to add their own rich text.
     
  4. Munchy2007

    Munchy2007

    Joined:
    Jun 16, 2013
    Posts:
    1,735
    I guess it would be handy if there was a tag pair that disabled parsing of other tags something like <disabletags> text with tags in </disabletags>

    Might be worth asking on the TMPro thread if it's possible to add that as a new feature.

    You could use the input validator that kurt-decker mentioned to modify the entered text and add the new tags.

    But unless that ever becomes an option you could possibly add a bad tag to the beginning of the text using the input validator, and then the rest of the tags wouldn't make sense, and consequently would be displayed as typed.
     
  5. RoeeHerzovich

    RoeeHerzovich

    Joined:
    Apr 12, 2020
    Posts:
    103
    I think a clear description of what I want is something like a code software, where some words are automatically highlighted with colors, but I don't want players to make their own highlights you know...
     
  6. Munchy2007

    Munchy2007

    Joined:
    Jun 16, 2013
    Posts:
    1,735
    As far as I'm aware, there's currently no way to pass text containing rich text tags to a TMP_Text component and only have some of them be interpreted, which as I understand the problem, is what you want.

    So, I still think the only option would be the addition of tag pairs to disable rich text formatting for portions of the text as I mentioned in my first post.

    Maybe, someone else can suggest a different approach that eludes me.
     
  7. RoeeHerzovich

    RoeeHerzovich

    Joined:
    Apr 12, 2020
    Posts:
    103

    I saw that solution online, it really does look like the best solution in most cases, however, I am trying to make a syntax highlighter meaning the entire text will have highlights(on the right keywords), yet, I cannot allow players to add their own highlights.

    Either way, thank you for your help :) It's very much appreciated
     
  8. Kalita2127

    Kalita2127

    Joined:
    Dec 6, 2014
    Posts:
    279
    Can you share what you have found here? I also encounter this problem in my project
     
  9. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    730
    Well Regex is your friend here:

    Code (CSharp):
    1. using System.Text.RegularExpressions;
    2.  
    3. var sanitizedText = Regex.Replace(inputField.text, "<.*?>", string.Empty); // Remove all tags.
     
    Kalita2127 and orionsyndrome like this.
  10. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,108
    Do what Nad_B suggests, or simply ban < and > altogether from the list of allowed characters.
    You can very easily check if a string has any, or replace them, or notify user with "invalid characters" message. But in general, you need to make sure you sanitize your strings properly.
     
    Kalita2127 and Nad_B like this.
  11. Nad_B

    Nad_B

    Joined:
    Aug 1, 2021
    Posts:
    730
    This ^^^.

    Always sanitize the input you receive from users, especially if you transmit that to your servers/save to database. The rule here is to always assume bad intention from users.
     
    orionsyndrome likes this.
  12. Rocksuit

    Rocksuit

    Joined:
    Sep 11, 2019
    Posts:
    5
    use tag <noparse> and </noparse>
     
    Munchy2007 likes this.
  13. Munchy2007

    Munchy2007

    Joined:
    Jun 16, 2013
    Posts:
    1,735
    Nice, I didn't know about those.
     
  14. karliss_coldwild

    karliss_coldwild

    Joined:
    Oct 1, 2020
    Posts:
    602
    Please don't (at least not without fully understanding the limitations). The obvious way of using Noparse is pretty much useless.

    Assuming you want to prevent insertion of formatting by changing "The user input is {0}" into this "The user input is <noparse>{0}</noparse>".

    That doesn't really prevent user from reenabling the formating by inserting tags to disable noparse.

    userInput="</noparse><color=red>red</color><noparse>"

    So the final string is
    <noparse></noparse><color=red>red</color><noparse></noparse>
    and you are back to square one. Noparse is only useful if you are making Unity example projects which includes instructions about tag usage in the text, or other messages written by developer which contain a lot of < and > symbols. It is not suitable for dealing with untrusted user input.

    In general there are 2 strategies for dealing with stuff like this:
    * Don't mix control data with actual data. Meaning that there is no formatting at all or the formatting data is stored separately instead of inline with text.
    * Escape sequences . All sane programming languages and text markup systems have something like this. For example in c# string literals when you want to write quotation mark you need to use prefix it with backslash like this
    Code (CSharp):
    1. string foo = "This  is quotation mark: \""';
    And since backslash is a special symbol to start the escape sequence you need a way to input the backslash which is done by having double backslash. Other sysstems like HTML and xml based formats have more complex escape sequnces that look like "&lt; &gt; &amp;" to represent symbols <>&.

    For first one in TMP you can disable "Rich Text" both for the input and output TMP fields. You can still apply basic formatting to whole text element. Unfortunately you can't use stylesheet feature for more reusable style setttings when rich text option is disabled.

    For the second the situation in TMP is less clear. While TMP uses HTML style markup it doesn't seem to use "&lt;" style escape sequences. I guess you can wrap each "<" character (not the whole user input) in noparse, that seems quite clunky but I am not aware of better way. Due to the way TMP parser works symbols using using unicode escape sequences are also consider potential candidates for the formatting markup. Meaning that normally "\u003ccolor=red\u003Eaaa\u003c/color\u003E" will get interpreted as red text. And your code needs to handle that as well. This means that regex suggested by Nad_B is insufficient. When testing this be careful and don't mixup backslash based escaping which happens in C# string literals and the second one done by TMP. There are quite a few variations possible for unicode escape sequences (16bit \uXXXX and 32 bit \UXXXXXXXX, also arbitrary mix of capital and lowercase letters in the hex values, so I would probably target \ itself instead of specific backslash based escape sequences.

    One interesting observation: unicode escape sequnces \uXXXX and \UXXXXXXXX get interpreted even when "parse escape characters" option is disabled.



    As for the initial suggestion of using InputValidator. I wouldn't use that for cleaning up untrused input especially in multiplayer game. In general you want to do whatever escaping required during output not input. Mainly because you might use the same data in different kind of outputs. And different output methods required different escaping rules based on context. You might want to display some of the data not only in game, but also website or administration tools written using completely different technology and text formatting rules. Even in just Unity you might switch to different text component, or add a texmesh extensions like text animator which introduces additional formatting syntax. Other problem with InputValidator for multiplayer games is that it's client side check. You don't want to rely purely on clients to selfcheck that they are not sending anything bad. Many fast pace online games have history of excessively trust clients, meaning that modified client software would potentially be a problem anyway. At least for gameplay the broken design is somewhat excused by latency requirements. But for something like chat or other user inputted text where timing isn't critical, there is no reason to do the wrong thing and unnecessarily trust client. InputValidator has it's uses, mostly to prevent user from accidentally inputting invalid text and give them early feedback, but not as a way to deal with untrusted input.


    Update
    My comment above about Unicode escape sequences was inaccurate. Text mesh pro behaves differently depending on how the text is set. Backslash based escape sequences are parsed by TMP only when setting text in the editor, it doesn't apply when setting text using code. Of course the escaping rules of C# or any other data source still apply. This can result in slightly weird behavior, where setting formatting tag from code isn't interpreted, but afterwards making any changes in editor causes it to activate, but that's not as big of a problem as \u being always interpreted.
     
    Last edited: Jan 2, 2023
    equiperadarfit and orionsyndrome like this.