Search Unity

TextMesh Pro Table of General Standard Chinese Characters

Discussion in 'UGUI & TextMesh Pro' started by Stephan_B, Sep 23, 2018.

  1. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Back in 2013, the Chinese government published a list of essentially all the characters that you might expect to encounter in electronic communication. The Table of General Standard Chinese Characters contains a list of these characters.

    This list is comprised of 8105 characters which are divided into 3 groups. The first group contains the most commonly / frequently used 3500 characters. The 2nd group contains 3000 still common but much less frequently used and the last group the remaining 1605 which are considered rare.

    Attached to this post are the list of these characters in hex values which you can copy paste in the Font Asset Creator using Unicode Range (Hex) option as seen below.

    upload_2018-9-23_14-50-45.png
     

    Attached Files:

  2. E2R_Ben

    E2R_Ben

    Joined:
    Oct 30, 2009
    Posts:
    143
    Hey, really useful thanks, but how did you paste a 3500 hex code string into the TMP window?
     
  3. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    You paste it in the Character Sequence portion of the Font Asset Creator as shown above.
     
  4. E2R_Ben

    E2R_Ben

    Joined:
    Oct 30, 2009
    Posts:
    143
    upload_2018-10-12_12-41-3.png

    Yeah it says the string is too long
     
  5. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    That is an editor error which will go away in Unity 2018.3 (I believe). This error is due to the 65,535 vertices limitation which also affects the editor. The good news is it doesn't prevent the Font Asset Creator from doing its job.
     
  6. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @Stephan_B Is it possible to get the breakup of Korean Character and japanese like the one you have for chinese above?
     
  7. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I would assume there are similar lists available for Japanese but I am not personally aware of any. This is where it would be nice to get some insight from someone fluent in the language.

    In regards to Korean / Hangul, the language is comprised of 11,172 characters.

    Having said all of that, once the next release of TMP is available with Dynamic SDF support, managing languages like CJK which have large character sets will get much easier.
     
  8. Fachmah

    Fachmah

    Joined:
    Jan 11, 2017
    Posts:
    5
    Is there an ETA on the next release of TMP? :--)

    Im currently working on a project which needs support for all unicode characters, really looking forward to the dynamic SDF support!
     
    Last edited: Jan 10, 2019
  9. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
  10. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
  11. fgc0109

    fgc0109

    Joined:
    Feb 28, 2018
    Posts:
    1
    e.....I'm Chinese
    i write characters into a file, and select "characters from file" option
    i think it's easier to read and use
    QQ截图20190124223655.png

    you can find character files here
    https://gist.github.com/z-rui/4cb3431f2bcf26ea39cd569a58e66003/
    (it's not my repo, i just find it)

    ps. i think 2048*2048 for 3500 characters is not enough,the character seems a little blurred
    but 4096 make the file too large... about 17MB,in mobile phone is a problem
    (i use 4096*4096 for 3500 characters before, and now i use 4096*4096 for more than 7000 characters)

    pss. i still have a question : TMP seems has use different shader between pc and mobile
    but at some place i need characters has same effect both pc and mobile (just like tittles)
    what should i do ? don't use advance effect like 'lighting' and 'glow' ?
     
    Last edited: Jan 24, 2019
  12. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Split the characters between a Primary font asset that is 2048 x 2048 and one of more fallback font assets. This way, the Primary can contain the more frequently used characters sampled / using a higher sampling point size and the less frequently used characters contained in any fallback use a lower sampling quality.

    When using fallback font assets, the ratio of Sampling Point Size to Padding must be the same but texture size can be different. So for instance, if the Primary has a sampling point size of 80 with padding of 8, then any fallback could have a sampling point size of 60 with padding of 6 or point size of 50 with padding of 5.

    Having said that please see the following post / video about the soon to be released Dynamic SDF system which will make this a lot simpler.

    The same shaders are used on all platforms. The mobile distance field shaders provide better performance mostly because of their reduced feature set.

    Shader selection is never changes based on platform so if you create a Material Preset that uses the mobile distance field shader because you only care to have outline + shadow on the style of text then that same material and shader will be used on all platforms. The text will render exactly the same on all. If you create another material preset where you want to use bevel and similar features, then this shader will also be used for any text using the material preset on all platforms.

    The choice of shader is based on the features / visual design you want for the text using the given Material Preset and shader.
     
  13. cym_hellfire

    cym_hellfire

    Joined:
    Aug 18, 2017
    Posts:
    11
    Hi, Stephan. I'm trying to generate the Chinese character atlas with Font Asset Creator. I follow the steps above but the generating process often end up with a Fatal Error in GC which will stuck the editor. And I have to kill the editor process with Windows Task Manager. Btw, I'm using the Microsoft YaHei as the Source Font File (the font file is too large to upload).

    Do you have any idea about how this happened?

    Any respond is appreciate. Thank you.

    Environment:
    Unity 2018.3.6f1 (64bit)
    Text Mesh Pro 1.3.0
     
    Last edited: Feb 27, 2019
  14. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Time to update to the latest release of TMP with Dynamic SDF support. This is version 1.4.0-preview.2a for Unity 2018.3.

    Please see the following thread / post and be sure to watch the video to know what to expect.
     
  15. cym_hellfire

    cym_hellfire

    Joined:
    Aug 18, 2017
    Posts:
    11
    It works for me. Thank you very much.
     
  16. Alex3333

    Alex3333

    Joined:
    Dec 29, 2014
    Posts:
    342
    I am using the latest version. I can’t get to add 3 languages. Korean, Chinese and Arabic. tried as it is written here. Downloaded the file generated, but does not add Chinese letters. what is the problem ??
     

    Attached Files:

    • p1.jpg
      p1.jpg
      File size:
      284.5 KB
      Views:
      1,127
  17. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Does your font file contain those Chinese characters?

    Can you provide me with the font file and your text file that contains the list of characters you are trying to add? Once I get those I can test on my end to make sure it behaves as expected.
     
  18. Alex3333

    Alex3333

    Joined:
    Dec 29, 2014
    Posts:
    342
    Good day. has attached.
     

    Attached Files:

  19. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    LiberationSans.ttf does not contain any Chinese, Japanese or Korean or Arabic characters which is why these characters do not show up.

    You will have to select some other font file that contains characters for the specific languages you want to support. Google fonts is a good source to find such font files.
     
    yinyinh likes this.
  20. paatz04

    paatz04

    Joined:
    Aug 1, 2013
    Posts:
    38
    Is it safe to use a main font with Latin characters and fallback fonts for Chinese, Korean, Arabic, Japanese, Russian etc. ? Will this have any negative impact on performance. Thank you!
     
  21. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I still continue to recommend having different sets of font assets for different language regions.

    For instance, for Latin languages, I would include the Extended ASCII in my Primary font asset and Cyrillic and perhaps Greek in two static fallbacks or perhaps have both handled by a Dynamic Fallback. I would include one additional fallback that contains all the required characters to display the language selection menu / ui in all the different languages.

    For Chinese, I would have a primary font asset that is static that includes all the known Chinese characters used in the project and then one dynamic fallback to catch other characters. Would again add one additional fallback (the same one as previously) to display the language selection menu / ui.

    Repeat for Japanese and Korean and Arabic languages.

    These additional resources, could be loaded via Asset Bundles when these languages / regions are selected.

    Again, this all depends on the amount of text in the project where you could go with a primary and few static fallbacks and then rely on the dynamic system for everything else.

    P.S. Performance improvements to the dynamic system are coming in the next release including Multi Atlas Support.
     
  22. paatz04

    paatz04

    Joined:
    Aug 1, 2013
    Posts:
    38
    Thanks for the prompt reply. Is there a way to override the default font used for all TMPro components ?

    Is there an ETA for the next release ?
     
  23. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    You can change the Default Font Asset in the TMP Settings.

    Hopefully within the next 10 to 14 days.
     
  24. paatz04

    paatz04

    Joined:
    Aug 1, 2013
    Posts:
    38
    That's just for the editor though ? I assume the actual fontAsset in TMP instances has to be set manually at runtime ?
     
  25. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    That should work for runtime as well but I would not suggest handling localization / switching font asset for specific languages that way since most of the time this language switch can involves changing other assets / sprites / point size, other settings, etc.

    Most of the time, localization is handled by some resources / language manager or localization tool like I2 Localization for instance that tracks what needs to be changed for any given language.
     
  26. paatz04

    paatz04

    Joined:
    Aug 1, 2013
    Posts:
    38
    We do only have to localize actual strings - I guess I have to just use the dynamic fallback for all languages then.
     
  27. Rich_A

    Rich_A

    Joined:
    Nov 22, 2016
    Posts:
    338
    I was finally able to get Chines fonts working using the following method:

    1. Use the file the Chinese user linked to above, for the base 3500 character set

    2. Download fonts from Google Fonts (their selection is surprisingly good) - the fonts I'd been using to date were failing - maybe they were missing too many characters

    3. Select 4k*4k resolution.

    4. Generate file. It should only take a couple of minutes. Anything longer than that, and something is wrong.
     
  28. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    With the introduction and release of the Dynamic SDF system in Unity 2018.3 or newer, for most of the previous use cases, it should not longer be necessary to create font assets that include all of the characters included in this table.

    The recommended workflow is to continue to create / use a primary font asset that is static and contains all the known / used characters in the project but to now rely on a dynamic fallback for the characters that are unknown and coming from user input.
     
    Last edited: Aug 19, 2019
  29. Rich_A

    Rich_A

    Joined:
    Nov 22, 2016
    Posts:
    338
    Not available for Unity 2017.4 thought right? I ship in 10 days, there are 2000 wishlisters waiting, no time to update!
     
  30. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    You are correct.

    The Dynamic SDF system was introduced in the TMP package version 1.4.0 for Unity 2018.3 or newer.
     
  31. Rich_A

    Rich_A

    Joined:
    Nov 22, 2016
    Posts:
    338
    Just another tip for anyone using the above Chinese user's file, I'd suggest adding the Latin alphabet, punctuation, and numbers. That way you don't have to worry about a fallback system, and its only an extra ~1% greater number of characters.
     
  32. Yeisonlop10

    Yeisonlop10

    Joined:
    Jan 10, 2018
    Posts:
    19
    @Stephan_B I am succesfully using fallback fonts for my project, but the problem is that as we are adding more languages, these fallback fonts for TMP are increasing in size in the order of 30-40 MB and that's bad for our mobile app. We need to offer support because our app is receiving info in different languages from social media. Do the new updates handle this better than this approach? Thank you in advance.
     
  33. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Two new features were added over the past year / months which will significantly improve the workflow when working with languages such as CJK.

    The first feature is support for Dynamic SDF makes it possible to add glyphs and characters to font assets in the Editor and at Runtime. See the following post and video.

    The second feature is support for Multi Atlas Textures per font asset which allows a font asset to add additional atlas texture as needed to handle the growing list of glyphs and characters being added to it. See the following post about this feature.

    In terms of how these features impact the workflow for handling CJK and localization, the recommendation remains to create and use Primary Static font assets that contain the known characters in the project for any given language or groups of language. Then to use and assign as fallback to the static primary a Dynamic font asset with multi atlas texture enabled to handle all unknown / other characters coming from user input or other sources.
     
    Yeisonlop10 likes this.
  34. Yeisonlop10

    Yeisonlop10

    Joined:
    Jan 10, 2018
    Posts:
    19
    Thank you @Stephan_B for your quick response. I'll explore the options that you are providing. So far we are developing with unity 2018.2 and maybe can not upgrade in the immediate term. Is it possible to make these languages as packages that the user downloads and unity installs in TextMeshPro as fallbacks? because as I mentioned before. we are reading info from posts, so we need to support all characters in different languages. thank you
     
  35. sandworm

    sandworm

    Joined:
    Aug 28, 2019
    Posts:
    13
    Sorry to revive a dead thread, but can anyone think of a way to convert those characters above to traditional?
     
  36. unity_zDqKcsNr0fjURA

    unity_zDqKcsNr0fjURA

    Joined:
    Dec 1, 2018
    Posts:
    2
  37. DavidZobrist

    DavidZobrist

    Joined:
    Sep 3, 2017
    Posts:
    234
    The dynamic mode seem to work almost perfectly.
    How ever it seems to not find some characters even tough they exist in the font.

    Example:

    Font:
    Noto Sans Simplified Chinese

    Unity 2021.2.6f1
    TextMesh 3.0.6

    Test string:
    每24小时提供每24小时

    This works on the google website testing the string = the font contains all symbols.
    But in the editor it looks like this, dynamic mode is used, other characters are found.
    upload_2022-2-14_19-44-25.png


    Problem solved:
    The Atlas resolution was to small with 512 with 1024 it works fine.
     
  38. User414322

    User414322

    Joined:
    Jul 9, 2019
    Posts:
    27
    It would be nice if by default there was an option in TMPro to generate a font in Chinese, which automatically handles creating the main and fallback fonts, all linked together and ready to go. I mainly say this because I keep somehow screwing up and making random characters not render.
     
  39. RunninglVlan

    RunninglVlan

    Joined:
    Nov 6, 2018
    Posts:
    182
    I used the list from PDF on this Wiki page to generate static SDF Font Asset for Traditional Chinese
    Here's TXT file with these characters I used in Asset Creator (when choosing Characters from File option in Character Set): https://gist.github.com/RunninglVlan/013113b9d4e3f73eeb303acd1d7f805d