Search Unity

Soft hyphenation not working when setting through script

Discussion in 'UGUI & TextMesh Pro' started by MSQTobi, Oct 20, 2020.

  1. MSQTobi

    MSQTobi

    Joined:
    Feb 12, 2018
    Posts:
    8
    I tried using the soft hyphenation feature of TMP, and it works great in genrel.

    However when setting the text of the TextMeshProUGUI components through script it does not interprete the
    Soft hyphenation characters ("\u00AD") and instead simply displays them.
    I found a workaround by invoking "OnValidate", but this works only in the editor ;(

    I made a simple test script that can be used to reproduce the bug:
    Code (CSharp):
    1. using UnityEngine;
    2.  
    3. public class HyphenationTest: MonoBehaviour
    4. {
    5.     public TMPro.TextMeshProUGUI textComp;
    6.     public bool setWithValidate;
    7.     public bool setWithoutValidate;
    8.     public bool clear;
    9.  
    10.     private const string testString = @"Spe\u00ADdi\u00ADtions\u00ADkauf\u00ADfrau/\u00ADmann";
    11.  
    12.     void Update()
    13.     {
    14.         if(setWithoutValidate || setWithValidate)
    15.         {
    16.             textComp.text = testString;
    17.             if(setWithValidate)
    18.             {
    19.                 textComp.SendMessage("OnValidate");
    20.             }
    21.             setWithoutValidate = false;
    22.             setWithValidate = false;
    23.         }
    24.         if(clear)
    25.         {
    26.             textComp.text = "";
    27.             clear = false;
    28.         }
    29.     }
    30. }
    Check clear flag -> check setWithoutValidate results in:
    WithoutValidate.jpg
    Check clear flag -> check SetWithValidate results in:
    WithValidate.jpg

    Any ideas how to (hack-)fix this. I will report this as a bug asap, but I doubt I have the time to wait for an official fix

    P.S:
    Unity Version 2019.4.9f
    TextMeshPro 2.1.1
     
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    My apologies for the late reply as I never saw this post until I got notice today about the bug report.

    The above behavior is simply the result of the use of "@" in front of the string causing a string in C# to be interpreted as a string literal. Ie. no interpretation where \u00AD is exactly those 5 characters. Whereas if you remove the "@" where the \u00AD is known as a C# Unicode Escape Sequence allowing a UTF16 character to be referenced in a string. \U000000AD is the Unicode Escape Sequence in C# to reference a UTF32 character.

    If you were to change your string to the following. It would render correctly.
    Code (csharp):
    1.  
    2. private const string testString = "Spe\u00ADdi\u00ADtions\u00ADkauf\u00ADfrau\u00ADmann";
    3.  
    In terms of why calling OnValidate() makes it work, is due to the text object being faked out as it now thinks the text is coming from the Text Input Box which has special handling to enable users to type unicode sequences like \u00AD.
     
    Last edited: Nov 18, 2020
  3. Xarbrough

    Xarbrough

    Joined:
    Dec 11, 2014
    Posts:
    1,188
    Sorry to hijack this thread but I have a very similar issue. I'm setting the text containing
    \u00AD
    via script as well. And for me, TMP also doesn't update unless I disable/enable the TMP component again, or toggle "parseEscapeCharacters" in the inspector.

    I tried calling SetVerticesDirty/SetLayoutDirty manually or somehow forcing a rebuild, but that should already be handled by just setting the text property, but alas it doesn't work. I don't have any @ symbol in my string.

    Might there be a different reason for my string to not be parsed, but when it gets validated by the inspector it works again?

    I already tried using something like System.Uri.UnescapeDataString in hopes of converting "\u00AD" to a real unicode symbol, but this doesn't help either.

    I looked at the data I'm loading from disk and it shows as something like
    Polizei\\u00ADbericht
    for example. This data was exported from Excel and serialized as a ScriptableObject by Unity. In all other cases (special symbols, Umlauts, etc) Unity just takes care of the encoding and everything works fine. But in this case, could it be an issue that it has a double slash here? The double-slash doesn't show when I debug my C# string, but things like that are hard to judge, because it could only be the visual display.
     
    Last edited: Feb 11, 2021
  4. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    That is the issue.

    The double slash instructs C# to escape / process these characters as literals. As such, the text becomes "\" + "u" + ... and is never converted into the specific unicode. If the text is coming from some external source, like XML you will need to replace those double slash. There should be several posts about this around.

    It is important to also understand the following:

    string s = "\u200B"; // This gets converted to a single character which is the Unicode point for Zero Width Space.

    public string s = ""; // Where in the inspector you type "\u200B" gets escaped to "\\u200B" where now is ends up being "\" + "u" + ...

    So be mindful how / where your define the string content.

    When such string is passed to TMP, it has no idea whether it comes from a string defined in code or the same string where you changed the value in the inspector. As such, TMP has to parse the text exactly as it is.

    In the TMP Input Text Box, TMP knows the text is coming from its own input text box so it can parse the text differently based on the value of "parseEscapeCharacters".

    In terms of the text not updating, I would need more information and most likely some repro project to figure out why.
     
    Xarbrough likes this.
  5. Xarbrough

    Xarbrough

    Joined:
    Dec 11, 2014
    Posts:
    1,188
    Thanks, this clears the TextMeshPro-side up for me! The issue was hard to debug, since the debugger also does string display with escaping, but the solution was, that I needed to find a way to unescape the string before setting it via the text property. Once that was correct, the text also updated correctly.

    To unescape my string I used Regex.Unescape, which kind of sounds like a heavy-weight method. System.URI.UnescapeDataString sounds similar but didn't work in my case.
     
    horatiu665 likes this.
  6. horatiu665

    horatiu665

    Joined:
    Nov 22, 2012
    Posts:
    24
    Thanks to Xarbrough's Regex.Unescape solution I also managed to fix the issue, altho I thought my data didn't have \\ double slashes, it turns out it did but I just couldn't see it either in the unity console, nor even in visual studio text debugger - sometimes the strings showed up as having one slash \u00AD and sometimes two slashes.

    It turns out I was pulling data from somewhere, using FileStream to write, and StreamReader to read. But FileStream cannot specify the encoding used to write the data it wrote, and it encoded the slashes as double slashes. The internet says replacing FileStream with StreamWriter and StreamReader could fix the encoding issue, and this might bypass the need to do the escaping/unescaping, but that's a task for another time... In short, check your data, people!