Search Unity

Question Several unicode characters cannot be shown using TMP

Discussion in 'UGUI & TextMesh Pro' started by goohhh111, Sep 15, 2020.

  1. goohhh111

    goohhh111

    Joined:
    Mar 26, 2015
    Posts:
    5
    I found that several characters in CJK Unified Ideographs Extension B cannot be shown in game

    These characters look correct in SDF's character table and glyph table, but failed to show in game view

    Characters are totally empty in game view, not missing character symbol (□)

    List of failed characters:
    1. U+2200A
    2. U+23000
    3. U+22004
    4. U+22001
    Environments:

    Windows10
    Unity2019.4.10f1
    Textmeshpro 2.1.1

    Thanks for the help!
     
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Make sure that you are using the correct notation for referencing UTF16 and UTF32 characters.

    UTF16 is \uF0F0 or lower case "u" + 2 hex pair.

    UTF32 is \UFF00FF00 or upper case "U" + 4 hex pairs.

    If you are using the correct notation in the text, can you provide me with a link to the font file you are using and the specific text you are using to reference these characters?
     
  3. goohhh111

    goohhh111

    Joined:
    Mar 26, 2015
    Posts:
    5
    The following files are the original font file and it's SDF which includes failed characters

    https://drive.google.com/file/d/1bV2wPoBPXAmyRm_gv3aBx8KdLiauKUCt/view?usp=sharing

    And image shows these characters in character table

    Thanks!


     

    Attached Files:

  4. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Thank you for providing the font asset for review.

    I can now confirm the issue which is related to incorrect handling of UTF32 with regards to Unicode categorization. More specifically considering this character as white space which it is not.

    Will follow up as soon as I have a fix for this.
     
    goohhh111 likes this.
  5. goohhh111

    goohhh111

    Joined:
    Mar 26, 2015
    Posts:
    5
    Thanks for the help! Look forward to the fix.

     
  6. SakuraBloom

    SakuraBloom

    Joined:
    Jun 28, 2018
    Posts:
    3
    Is there any progress? I have an CJK ExtensionE character "" cant display in unity but I have to use it.

    When I try to put it to UGUIText,I got this error:

    Error: UTF-16 to UTF-8 conversion failed because the input string is invalid
    UnityEngine.GUIUtility:processEvent(Int32, IntPtr)

    It seems like unity only support the CJK common part and ExtensionA, is there any way to support B-E?
    I'd really appreciate for anyone's help.
     
  7. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The UI Text component is a legacy text component and no longer on active development. For that reason, I would suggest switching over to using TextMesh Pro which is still on active development.

    The Text Input Box of the TextMeshPro component does support UTF16 and UTF32 notation where you can reference UTF32 CJK characters by using "\U0002BF94" which is a character in the CJK Extension E for instance.

    Referencing the same Unicode character via a string still works as before using the same UTF16 or UTF32 convention.

    Below is an example of the rendering of that glyph using the PingFang.ttc font file and more specifically the PingFang TC - Regular contained in the Truetype collection. So assuming the font file you have selected contains glyphs in those unicode ranges, TMP is able to display those.

    upload_2022-1-7_13-32-34.png

    P.S. Coincidentally, I was just updating the line breaking rules for CJK this morning where I add these additional CJK ranges. These latest changes only affect line breaking rules, not the ability to display those characters.

    This is some internal code related to the above which is based on the latest Unicode 14 standards
    upload_2022-1-7_13-40-3.png
     
    Last edited: Jan 7, 2022
  8. SakuraBloom

    SakuraBloom

    Joined:
    Jun 28, 2018
    Posts:
    3
    Thank you very much for your reply. This is the first time I've used the TMP. I used the same font PingFang TC - Regular. I tried all the way I could but they all failed. Forgive me for being stupid,can you give me a working file or tell me how to configure it?
     
  9. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I would suggest using a dynamic font asset. Make sure that you are using the correct face in PingFang since it is a Truetype collection.

    I would suggest you watch the following two videos as the contain lots of relevant information about font assets and CJK handling.



     
  10. SakuraBloom

    SakuraBloom

    Joined:
    Jun 28, 2018
    Posts:
    3
    I succeeded, thank you so much.