Search Unity

TextMesh Pro Lowercase i + SC flag fails to render with tr-TR culture.

Discussion in 'UGUI & TextMesh Pro' started by hurleybird, May 4, 2020.

  1. hurleybird

    hurleybird

    Joined:
    Mar 4, 2013
    Posts:
    258
    TMPro fails the Turkey Test for lowercase 'i' characters when the SC flag is set on a TextMeshPro component. These characters show up as boxes, but otherwise i's seem to render properly.

    Easy way to test this is to copy:

    Code (CSharp):
    1. Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
    2. Thread.CurrentThread.CurrentUICulture = new CultureInfo("tr-TR");
    into an awake function.
     
    erenaydin likes this.
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The box is the "Missing Character Glyph" which indicates this character and glyph is missing from your font asset.

    If you font asset is dynamic then make sure the atlas texture is not full. If it is not, then it means whatever font file was used to create this font asset is missing the character which is the ultimate reason why the missing character glyph is shown.

    If your font asset is static then you will need to either regenerate it to add this character and glyph or add a dynamic fallback created from a font file that includes this character.

    P.S. You can select the font asset and look into the Character Table to see if the unicode character representing this uppercase variant of "i" is present in the font asset.
     
    Last edited: May 4, 2020
  3. hurleybird

    hurleybird

    Joined:
    Mar 4, 2013
    Posts:
    258
    That's not the issue, at least not directly. TMP is most likely trying to use a special character when it shouldn't. If there were the correct weird dotted i glyph in the asset, then probably that would be displayed, but that would still be the wrong behaviour. Turning regular i's or I's into ï, 1, or whatever isn't correct. There's some good reading about Turkish i's in the Turkey Test page.

    To reiterate and expand a bit, I tested with numerous font assets, including TMP defaults, which are are all affected. Uppercase I's are fine. Lowercase I's are fine. SC uppercase I's are fine. Only SC Lowercase i's are affected. And only with tr-TR localisation.
     
  4. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I'll take a closer look tomorrow.
     
    hurleybird likes this.
  5. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    @hurleybird

    Since I am not fluent in Turkish, what string of text re you using for testing this?

    Here are the results I am getting using the following characters "i I ı İ". Top row of each image is the input and bottom row the result.

    To Lowercase
    upload_2020-5-6_16-12-27.png

    To Uppercase
    upload_2020-5-6_16-13-48.png

    To Small Cap
    upload_2020-5-6_16-14-2.png

    Which of these behaviors is incorrect?
     
  6. hurleybird

    hurleybird

    Joined:
    Mar 4, 2013
    Posts:
    258
    First, a quick mea culpa. When I said defaults were affected, that was testing the defaults in my project. It looks like the current defaults include the Turkish symbols.

    Also, "correct" here is probably a bit more relative than I first let on. I think with about 99% of cases you'd want i-ToUpper to yield I and not İ, and likewise I-ToLower should yield i and not ı. The notable exception being games made by Turkish studios in Turkish for the consumption by people in Turkey.

    You might be interested in the thread where this was brought to my attention. The reporter is Turkish, and is also a Unity developer, so there's some good conversation. Interestingly, he reports that with TMPro 1.0.54 + Unity 2017.4.35 TMPro seems to follow the above guidance for ToUpper and ToLower, aka. not changing i or I into those specific Turkish characters but instead the usual english glyphs.

    One interesting point is that TMPro reacts to CurrentThread.CurrentCulture, and *not* CurrentThread.CurrentUICulter. It would be nice if it were the later, and I go over reservations regarding enforcing the former in the linked thread, but I'm guessing there's some deep rooted Unity limitation here that prevents it.

    So, the best workaround I can think of at the moment is just rendering new assets with the special Turkish characters, and then editing the resultant png so that those special characters display as the usual glyphs. A bit of a PITA, but could be worse.

    But I think that the best possible solution here would be if you could set a project-wide setting for TMPro regarding which culture to use for conversions, which could be either the default culture or overridden with a specific one.
     
  7. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The character and glyph being displayed is the result of whatever Unicode character is referenced in the string or in the Editor text field. The language culture selection (does appear to work correctly in strings) but selecting Turkish in the IME / Windows language doesn't appear to affect that in the Editor. This is something I can check with the Editor team as this is not related to TMP. Ie. If the correct unicode is referenced in the string, TMP will do the right thing.

    The above is way too complicated and not necessary. In the Font Asset Character and Glyph tables, you can control what glyph index a character points to. As such you can edit these values to point any character to any other glyphs.

    For instance when \u0130 is requested, this character point to glyph index 242 which is the Turkish uppercase "İ", If the character instead was referencing glyph index 44, then you would get the same "I" regardless of the Latin "I" \u0049 or the Turkish \u0130 being referenced in the string.

    upload_2020-5-6_19-6-3.png
    Character Table showing the character for the Turkish \u0130 character which point to glyph index 242.

    upload_2020-5-6_19-6-46.png
    The glyph metrics for the glyph at index 242 which is the visual representation used by the Turkish \u0130 Unicode character.
     
    Last edited: May 7, 2020
  8. hurleybird

    hurleybird

    Joined:
    Mar 4, 2013
    Posts:
    258
    Interestingly enough, the Turkish characters themselves did finally render in my project after re-importing essential resources and examples. Looks like TMPro is correctly falling back to another text asset, even though I've kept my fallback settings the same as before...

    So my existing text assets don't actually contain the Turkish glyphs and the character table doesn't include the Turkish characters? Is it possible to add a new character to the table? Is this what the glyph adjustment records are for? If so, not quite sure how that works. Or do I need to make a new asset first, and then re-assign those characters to the correct glyphs?
     
    Last edited: May 7, 2020
  9. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The reason why you were getting the missing glyph "Square" character initially was because the previous version of the LiberationSans SDF font asset didn't have any dynamic fallback assigned to it. As such and since these Turkish characters were not included in the static LiberationSans SDF font asset, you were getting the missing glyph character.

    To add these characters to your font assets, you could add a dynamic fallback created from the same source font file to them or regenerate those font assets to include those additional characters.

    BTW: Inspect the newly important LiberationSans SDF and you will see the fallback assigned to it which likely contains a few characters including those Turkish characters and glyphs.

    The following videos should provide you with a better understanding of the dynamic system, multi atlas texture support and fallback support. Although these videos focus on how to easily create font assets to support CJK, the process is the same for essentially all languages.



     
  10. Shefich

    Shefich

    Joined:
    May 23, 2013
    Posts:
    143
    Hi,
    I have the same issue with Unity 2020.
    Dotted i started to being dottless adter using "ToLower()". If I'm trying to lower "I" it lowers to "ı".
    And I'm facing such experience in Unity 2020. In Unity 2018 everything is ok.
     
  11. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Make sure you are running the same comparable version of TMP between 2018.4 and 2020.x.

    The latest releases are:
    1.5.3 for Unity 2018.4
    2.1.3 for Unity 2019.4
    3.0.3 for Unity 2020 or newer
     
  12. Shefich

    Shefich

    Joined:
    May 23, 2013
    Posts:
    143
    I'm running TMP 3.0.3 and Unity 2020.2.2.
    But this issue isn't connected to TMP.
    Just some Android devices convert dotted "I" to the small dottless "ı" after "ToLower()" function.
    I am not seeing this on UI, just in logs, so it's just the code issue.
    Started to appear after I switched from Unity 2018 to Unity 2020. (no code changes)
     
  13. Shefich

    Shefich

    Joined:
    May 23, 2013
    Posts:
    143
    This "I" is dotted. In the lowercase variant: "i". But "ToLower()" function thinks it's dottless ("ı").
     
  14. WaaghMan

    WaaghMan

    Joined:
    Jan 27, 2014
    Posts:
    245
    Hi!

    Just wanted to add our workaround to this "issue", the issue being that char.ToUpper() and char.ToLower() giving unexpected results for the 'i' and 'I' characters when system culture is set to Turkish.

    Our workaround was to just change the usage of those methods in TextMeshPro to use ToUpperInvariant() and ToLowerInvariant() instead, which ignore system culture and just return what you would expect.

    Not sure if this can be considered as the perfect fix, as maybe in some situation you want the method to use current culture (maybe a setting for that would be preferred), but for us it's better to ignore system culture whenever possible. Turkish also caused savedata issues in the past because of this.

    I wonder if there's any other language in the world where those methods return different values for the a-z characters...
     
  15. Kurjenpolvi

    Kurjenpolvi

    Joined:
    Nov 15, 2016
    Posts:
    14
    WaaghMan, your solution worked well for us. Thank you!