Search Unity

TextMesh Pro Character set for multiple languages

Discussion in 'Unity UI & TextMesh Pro' started by kabumere, Apr 1, 2018.

  1. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    Can anyone recommend a character set range for the Open Sans font that will include all of its supported characters? Its wiki page says it covers "Latin, Greek and Cyrillic alphabets with a wide range of diacritics". Using the Unicode Charts at unicode.org, it seems Cyrillic covers 0400 - 04FF, Greek covers 0370-03FF, and Latin 0000-007F.

    Would my range therefore be 0000-04FF? Would that be covering too much ground for a single asset? Are there any resource issues I should be cognizant of if trying to generate a font atlas for this many characters?

    EDIT: Went ahead and generated one for the unicode range 000-4FF and have attached a screenshot of the different settings I used with it. Any of the settings look to be set wrong? The majority of the characters that seemed to be missing were those with diacritics.

    The generated asset for this1024x1024 atlas was only 2mb, and I'm assuming the larger the atlas, the clearly the text? So should I up it to 2048x2048?
     

    Attached Files:

    Last edited: Apr 1, 2018
  2. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    There are different ways your font asset(s) could be organized to handle multiple languages. How these will be structured kind of depends on language coverage, whether or not you will be dealing with user input (which typically requires increased glyph coverage), target platforms and to some extend personal preferences.

    Since in this case you are dealing with only Latin based languages which have a reduced character set (compared to Chinese, Japanese and Korea (CJK)), I would create a Primary Font Asset which includes the Extended ASCII set.

    When creating font assets, it is a good idea to think about how frequently characters from those font assets will be used. Assuming the Primary Font Asset which contains the Extended ASCII set will be used most often, you could use a higher sampling point size than the less frequently used characters from fallback font assets. So with this in mind, I created my Primary Font Asset with Extended ASCII with the following settings.

    upload_2018-3-31_18-55-52.png

    Then for the Cyrillic character set, I decided to lower the sampling point size while making sure to maintain the same ratio of sampling point size to padding to ensure uniform visual appearance when falling back to these while using different material presets. I also lowered the sampling point size and ratio to allow me to fit more glyphs in the same texture size. I include both Cyrillic and Cyrillic Supplement in the same font asset.

    upload_2018-3-31_18-59-21.png

    Then for Greek, I repeated the process once again lowering the sampling point size and padding (still maintaining the same ratio) to fit both the Greek and Greek Supplemental set in the same font asset. I am also assuming the use of Greek characters might be less frequent.

    upload_2018-3-31_19-2-27.png

    Now that my Primary Font Asset and two fallbacks have been created, I assigned both fallbacks (Cyrillic and Greek) to my Primary Font Asset.

    upload_2018-3-31_19-4-15.png

    Here is an example of a single text object using character(s) from each of those font assets.

    upload_2018-3-31_19-30-1.png


    In this example, I used 3 font assets resulting in (3) 1024 x 1024 font atlas textures. I could have used larger textures to perhaps combine Cyrillic with Greek or split up my font asset differently. Personally, I like using 1024 x 1024 textures as it is slightly more efficient on some devices than using 2048 x 2048 textures. I also like to keep my languages separated as this makes it easier if I need to make changes to any of them.

    Note that since these (2) fallback font assets are from NotoSans, it makes sense to assign them as fallback to the Primary NotoSans font asset. By contrast if I wanted to use symbols from the Font Awesome library and given I am likely to want to use symbols with any potential font asset, I would assign the Font Awesome font asset in the general fallback list in the TMP Settings file.

    Also keep in mind that Sprite Assets that contain sprites to which you have assigned Unicode values can also be used as fallbacks.

    Like I said above, there are many ways that you can structure / arrange your font assets and fallbacks. As long as you make sure the ratio of sampling point size to padding is the same between the font assets you expect to fallback to and from, this will work great.
     
    Last edited: Apr 1, 2018
  3. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    Your reply made me look into the differences between Noto Sans and Open Sans, and I see that Noto Sans supports more characters for the 3 writing systems above, as well as more languages overall, so I switched my font over. I followed your directions above and it's all looking well. I see at https://www.google.com/get/noto/help/cjk/ that Noto Sans CJK can support the Korean, Chinese and Japanese languages as well. Have you used any of those fonts for Unity before, and if so which package would you recommend to include to support those languages?
     
    Last edited: Apr 1, 2018
  4. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    Not sure what you mean by which package? Do you mean to handle localization?

    With regards to particular fonts. Fonts can vary greatly from one to another. Some fonts may only include support for one language while others for more. Fonts vary greatly in terms of design and visual appearance.

    In the end, you simply have to select font(s) which you feel will look good for your project and include the languages you want to support. Keep in mind that with the fallback system, you can use a specific font for Latin based languages and then a completely different one for CJK. Again it all comes down to what you feel will work best for your project / game.

    NotoSans and OpenSans are popular and widely used fonts but those are two out of thousands of fonts. When choosing a font you also have to be mindful of the licensing terms for the given font. Again, it comes down to picking what works best for you.

    There are lots of places to find fonts. Google fonts is a pretty descent place.
     
  5. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    I was referring to the packages from the link I sent you. I assumed, perhaps incorrectly, that since you were using Noto Sans you might've been familiar with their CJK variants (and would thus know the best one on that page to use, specifically when targeting mobile across both iOS and Android).

    And thanks for the other info. I currently use Google fonts already and only consider free, open source fonts when choosing already as well.
     
  6. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    I am familiar with the .otc version of this font. A lot fonts that support CJK are packaged in the otc format which stands for OpenType Collection. So that are basically a bunch of font files packed into a bigger one.

    Since in the case of TextMesh Pro you will be creating your own font asset and choosing what glyphs to include, what matters is whether or not that font file contains the glyph you care about. So whether those be .ttf, .otf or .otc, the TextMesh Pro Font Asset Creator can work with all of them.
     
  7. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @Stephan_B Please let me know what i am doing wrong.
    I was trying to support korean language inside app but have failed so far and followed following steps :
    1. Downloaded NotoSans Korean font (otf format) https://www.google.com/get/noto/help/cjk/
    2. Copied NotoSansCJKkr-Regular font inside asset/font folder.
    3. Launched TMPro Font Asset Creator. Converted to Generated TMPro Font Asset with Font Source as aforementioned font , Font padding :9, Character Set : ASCII Extended. Atlas Resolution: 1024*1024. (Pt/ Size =81)
    4. Saved the asset and used it's reference inside textmeshpro UI component.
    Typed some random Korean Character. All of them are shown as box
    upload_2019-1-8_17-18-53.png


    This is how it looks
    upload_2019-1-8_17-22-5.png
     
  8. fffMalzbier

    fffMalzbier

    Joined:
    Jun 14, 2011
    Posts:
    2,730
    You have to set what characters you like to include in your font
    You had selected "Extended ASCII" and that does not seem to include your needed set
    A good idea if you have a lot of Characters you can select under Character set "Character from file"
    Then you can give it a file that contains all the caracters you need.
     
  9. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @fffMalzbier Thanks. For me, it's user input in Korean, CHinese or Japansese. So, i have to include a bigger set of character which user may type.
    Looking for splitup for korean and Japanese so that i break it into multiple fallback font assets. (chinese i got)
     
  10. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    fffMalzbier likes this.
  11. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @Stephan_B What is the memory and performance impact of using multiple fall back fonts? I have font asset of 34MB for korean character. How will it play in application? Does this mean even during when i don't have korean character in current state of application, these assets will be loaded in RAM? My application is Run time memory sensitive and looking for better optimisation.
     
  12. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    When the primary font asset to which you may have assigned additional fallbacks is loaded, these fallbacks will be loaded at the same time. Unfortunately, It is not currently possible to load these on demand.

    In theory, you should be able to fit (at reasonable quality) all 11,172 characters in a single 16MB 4096 x 4096 atlas. I would recommend breaking this up into 2048 x 2048 textures instead but the total size should be less than the 34 mb you currently have.

    I am assuming here that your asset only contains the Korean characters and not other languages.

    BTW: This is where the new Dynamic SDF system will be very nice where my recommendation will be to create a static font asset that contains all the known / used Hangul characters in the project (ie. in menu, etc...) which should easily fit in a 2048 x 2048 atlas texture and then to assign 1 dynamic fallback which will get populated at runtime to catch whatever other glyphs might be needed.
     
  13. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    Since it is being done solely for user iNput and i am not Korean, i don't have idea of frequency. So, keeping as a single asset.
    Is there a way to have one primary and 1-2 fallback assets for supporting all three languages (CJK) together?
    @Stephan_B Thanks for your answer.
     
  14. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    Also, what benefit does breaking up the character set into different font asset provide if all are loaded along with primary? @Stephan_B
     
  15. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    Since older mobile devices are limited to using a maximum of 2048 x 2048 textures, you can only fit a finite number of characters in them.

    In addition, although some devices allow you to use larger textures, reading in larger textures is not as efficient as doing so in smaller textures.

    Besides the above points, having the ability to structure / split your characters in several font assets where these can be structure to support single languages or language groups gives you more flexibility.

    Having said all of that, with the pending release of the new Dynamic SDF system, this will change as you will mostly only have to create a Primary Font Asset that contains the known characters in the given project and for a given language or groups of languages and then where appropriate assign a dynamic fallback font asset to catch any other character(s) coming from user input.

    In case you have not watched this video yet, I suggest you do so to get a better understanding of the new system.

     
    Deceiver likes this.
  16. shacharoz

    shacharoz

    Joined:
    Jul 11, 2013
    Posts:
    9
    can i use TextMesh Pro with languages like hebrew or arabic?
     
  17. Stephan_B

    Stephan_B

    Unity Technologies

    Joined:
    Feb 26, 2017
    Posts:
    2,290
    Yes. However, TMP only includes basic support for RTL so you will need to consider using the following free plugin in addition to TMP.

    For Hebrew text which makes use of Diacritical marks, you will need to manually setup many of those adjustment pairs in the Glyph Adjustment Pair table.

    Full support for Diacritical Marks as well as many additional OpenType Font Features is planned and currently being worked on.