Search Unity

TextMesh Pro Character set for multiple languages

Discussion in 'UGUI & TextMesh Pro' started by kabumere, Apr 1, 2018.

  1. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    Can anyone recommend a character set range for the Open Sans font that will include all of its supported characters? Its wiki page says it covers "Latin, Greek and Cyrillic alphabets with a wide range of diacritics". Using the Unicode Charts at unicode.org, it seems Cyrillic covers 0400 - 04FF, Greek covers 0370-03FF, and Latin 0000-007F.

    Would my range therefore be 0000-04FF? Would that be covering too much ground for a single asset? Are there any resource issues I should be cognizant of if trying to generate a font atlas for this many characters?

    EDIT: Went ahead and generated one for the unicode range 000-4FF and have attached a screenshot of the different settings I used with it. Any of the settings look to be set wrong? The majority of the characters that seemed to be missing were those with diacritics.

    The generated asset for this1024x1024 atlas was only 2mb, and I'm assuming the larger the atlas, the clearly the text? So should I up it to 2048x2048?
     

    Attached Files:

    Last edited: Apr 1, 2018
    Constantyne and mchts like this.
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    There are different ways your font asset(s) could be organized to handle multiple languages. How these will be structured kind of depends on language coverage, whether or not you will be dealing with user input (which typically requires increased glyph coverage), target platforms and to some extend personal preferences.

    Since in this case you are dealing with only Latin based languages which have a reduced character set (compared to Chinese, Japanese and Korea (CJK)), I would create a Primary Font Asset which includes the Extended ASCII set.

    When creating font assets, it is a good idea to think about how frequently characters from those font assets will be used. Assuming the Primary Font Asset which contains the Extended ASCII set will be used most often, you could use a higher sampling point size than the less frequently used characters from fallback font assets. So with this in mind, I created my Primary Font Asset with Extended ASCII with the following settings.

    upload_2018-3-31_18-55-52.png

    Then for the Cyrillic character set, I decided to lower the sampling point size while making sure to maintain the same ratio of sampling point size to padding to ensure uniform visual appearance when falling back to these while using different material presets. I also lowered the sampling point size and ratio to allow me to fit more glyphs in the same texture size. I include both Cyrillic and Cyrillic Supplement in the same font asset.

    upload_2018-3-31_18-59-21.png

    Then for Greek, I repeated the process once again lowering the sampling point size and padding (still maintaining the same ratio) to fit both the Greek and Greek Supplemental set in the same font asset. I am also assuming the use of Greek characters might be less frequent.

    upload_2018-3-31_19-2-27.png

    Now that my Primary Font Asset and two fallbacks have been created, I assigned both fallbacks (Cyrillic and Greek) to my Primary Font Asset.

    upload_2018-3-31_19-4-15.png

    Here is an example of a single text object using character(s) from each of those font assets.

    upload_2018-3-31_19-30-1.png


    In this example, I used 3 font assets resulting in (3) 1024 x 1024 font atlas textures. I could have used larger textures to perhaps combine Cyrillic with Greek or split up my font asset differently. Personally, I like using 1024 x 1024 textures as it is slightly more efficient on some devices than using 2048 x 2048 textures. I also like to keep my languages separated as this makes it easier if I need to make changes to any of them.

    Note that since these (2) fallback font assets are from NotoSans, it makes sense to assign them as fallback to the Primary NotoSans font asset. By contrast if I wanted to use symbols from the Font Awesome library and given I am likely to want to use symbols with any potential font asset, I would assign the Font Awesome font asset in the general fallback list in the TMP Settings file.

    Also keep in mind that Sprite Assets that contain sprites to which you have assigned Unicode values can also be used as fallbacks.

    Like I said above, there are many ways that you can structure / arrange your font assets and fallbacks. As long as you make sure the ratio of sampling point size to padding is the same between the font assets you expect to fallback to and from, this will work great.
     
    Last edited: Apr 1, 2018
  3. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    Your reply made me look into the differences between Noto Sans and Open Sans, and I see that Noto Sans supports more characters for the 3 writing systems above, as well as more languages overall, so I switched my font over. I followed your directions above and it's all looking well. I see at https://www.google.com/get/noto/help/cjk/ that Noto Sans CJK can support the Korean, Chinese and Japanese languages as well. Have you used any of those fonts for Unity before, and if so which package would you recommend to include to support those languages?
     
    Last edited: Apr 1, 2018
  4. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Not sure what you mean by which package? Do you mean to handle localization?

    With regards to particular fonts. Fonts can vary greatly from one to another. Some fonts may only include support for one language while others for more. Fonts vary greatly in terms of design and visual appearance.

    In the end, you simply have to select font(s) which you feel will look good for your project and include the languages you want to support. Keep in mind that with the fallback system, you can use a specific font for Latin based languages and then a completely different one for CJK. Again it all comes down to what you feel will work best for your project / game.

    NotoSans and OpenSans are popular and widely used fonts but those are two out of thousands of fonts. When choosing a font you also have to be mindful of the licensing terms for the given font. Again, it comes down to picking what works best for you.

    There are lots of places to find fonts. Google fonts is a pretty descent place.
     
  5. kabumere

    kabumere

    Joined:
    Oct 2, 2016
    Posts:
    31
    I was referring to the packages from the link I sent you. I assumed, perhaps incorrectly, that since you were using Noto Sans you might've been familiar with their CJK variants (and would thus know the best one on that page to use, specifically when targeting mobile across both iOS and Android).

    And thanks for the other info. I currently use Google fonts already and only consider free, open source fonts when choosing already as well.
     
  6. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I am familiar with the .otc version of this font. A lot fonts that support CJK are packaged in the otc format which stands for OpenType Collection. So that are basically a bunch of font files packed into a bigger one.

    Since in the case of TextMesh Pro you will be creating your own font asset and choosing what glyphs to include, what matters is whether or not that font file contains the glyph you care about. So whether those be .ttf, .otf or .otc, the TextMesh Pro Font Asset Creator can work with all of them.
     
  7. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @Stephan_B Please let me know what i am doing wrong.
    I was trying to support korean language inside app but have failed so far and followed following steps :
    1. Downloaded NotoSans Korean font (otf format) https://www.google.com/get/noto/help/cjk/
    2. Copied NotoSansCJKkr-Regular font inside asset/font folder.
    3. Launched TMPro Font Asset Creator. Converted to Generated TMPro Font Asset with Font Source as aforementioned font , Font padding :9, Character Set : ASCII Extended. Atlas Resolution: 1024*1024. (Pt/ Size =81)
    4. Saved the asset and used it's reference inside textmeshpro UI component.
    Typed some random Korean Character. All of them are shown as box
    upload_2019-1-8_17-18-53.png


    This is how it looks
    upload_2019-1-8_17-22-5.png
     
  8. fffMalzbier

    fffMalzbier

    Joined:
    Jun 14, 2011
    Posts:
    3,276
    You have to set what characters you like to include in your font
    You had selected "Extended ASCII" and that does not seem to include your needed set
    A good idea if you have a lot of Characters you can select under Character set "Character from file"
    Then you can give it a file that contains all the caracters you need.
     
  9. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @fffMalzbier Thanks. For me, it's user input in Korean, CHinese or Japansese. So, i have to include a bigger set of character which user may type.
    Looking for splitup for korean and Japanese so that i break it into multiple fallback font assets. (chinese i got)
     
  10. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    samatbeam, KatinkaMom and fffMalzbier like this.
  11. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    @Stephan_B What is the memory and performance impact of using multiple fall back fonts? I have font asset of 34MB for korean character. How will it play in application? Does this mean even during when i don't have korean character in current state of application, these assets will be loaded in RAM? My application is Run time memory sensitive and looking for better optimisation.
     
  12. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    When the primary font asset to which you may have assigned additional fallbacks is loaded, these fallbacks will be loaded at the same time. Unfortunately, It is not currently possible to load these on demand.

    In theory, you should be able to fit (at reasonable quality) all 11,172 characters in a single 16MB 4096 x 4096 atlas. I would recommend breaking this up into 2048 x 2048 textures instead but the total size should be less than the 34 mb you currently have.

    I am assuming here that your asset only contains the Korean characters and not other languages.

    BTW: This is where the new Dynamic SDF system will be very nice where my recommendation will be to create a static font asset that contains all the known / used Hangul characters in the project (ie. in menu, etc...) which should easily fit in a 2048 x 2048 atlas texture and then to assign 1 dynamic fallback which will get populated at runtime to catch whatever other glyphs might be needed.
     
  13. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    Since it is being done solely for user iNput and i am not Korean, i don't have idea of frequency. So, keeping as a single asset.
    Is there a way to have one primary and 1-2 fallback assets for supporting all three languages (CJK) together?
    @Stephan_B Thanks for your answer.
     
  14. Shubham_16

    Shubham_16

    Joined:
    Sep 19, 2016
    Posts:
    33
    Also, what benefit does breaking up the character set into different font asset provide if all are loaded along with primary? @Stephan_B
     
  15. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Since older mobile devices are limited to using a maximum of 2048 x 2048 textures, you can only fit a finite number of characters in them.

    In addition, although some devices allow you to use larger textures, reading in larger textures is not as efficient as doing so in smaller textures.

    Besides the above points, having the ability to structure / split your characters in several font assets where these can be structure to support single languages or language groups gives you more flexibility.

    Having said all of that, with the pending release of the new Dynamic SDF system, this will change as you will mostly only have to create a Primary Font Asset that contains the known characters in the given project and for a given language or groups of languages and then where appropriate assign a dynamic fallback font asset to catch any other character(s) coming from user input.

    In case you have not watched this video yet, I suggest you do so to get a better understanding of the new system.

     
    RunninglVlan and Rafael_CS like this.
  16. shacharoz

    shacharoz

    Joined:
    Jul 11, 2013
    Posts:
    98
    can i use TextMesh Pro with languages like hebrew or arabic?
     
  17. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Yes. However, TMP only includes basic support for RTL so you will need to consider using the following free plugin in addition to TMP.

    For Hebrew text which makes use of Diacritical marks, you will need to manually setup many of those adjustment pairs in the Glyph Adjustment Pair table.

    Full support for Diacritical Marks as well as many additional OpenType Font Features is planned and currently being worked on.
     
    tyapichu likes this.
  18. PandaArcade

    PandaArcade

    Joined:
    Jan 2, 2017
    Posts:
    128
    How is this going? Any form of eta?

    I'm managed to get Thai working ok thanks to this script. I was hoping to get Arabic in as well but even with the help of this plugin you linked to, but I don't think we can afford the time. Having to replace all TMP components with the plugins one is a nuisance and will likely have to get reverted in the future.

    @Stephan_B You have posted so much great info in various places(like in this thread) which has taken me a lot of time to find. It would be great to have a single place to find all the relevant info for using TMP for localization, ideally in the Unity manual.
     
  19. WookieWookie

    WookieWookie

    Joined:
    Mar 10, 2014
    Posts:
    35
    @Stephan_B We could use a better tutorial. Answers so far address the needs of Indie developers, not full-scale international launches of fully localized games. What is the process and strategy for supporting "everywhere"?

    I think you're assuming people understand the part about Unicode hex ranges. I just figured it out after a lot of searching around and realized that TMP doesn't have an option for "Include Every Single F'n Glyph in This Font", which would go great with an "Automatically make a new atlas when 2048 gets full" checkbox.

    I'd like to be able to point TMP to a font like Noto and have it automatically atlas the entire font. Looking up hex ranges for the various languages is easier said than done. I see CJK hexes spread out across the chart and have no idea which ones I'll need to include.
     
  20. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    That would be very inefficient as this font contains a massive amount of characters but see my reply to your previous post in the other thread and answer quoted below.

    "To handle CJK, create a dynamic font asset using the NotoSansCJK font and enable Multi Atlas*1 support on it and done. That is not the most efficient way since everything is dynamic where it doesn't need to be but every single character in this font file will be displayed if it exists."


    I agree and I will do my best to update / create new videos to cover the newly added features like Dynamic system and the Multi Atlas support.
     
    Last edited: Mar 16, 2020
  21. fherbst

    fherbst

    Joined:
    Jun 24, 2012
    Posts:
    802
    @Stephan_B not sure if I understood that right –
    1. I created a new SDF Font Asset from Arial, which works great with chinese characters in regular Unity UI
    2. I enabled "Dynamic" and "Multi Atlas"
    3. I assigned that to my TMPro texts
    4. I entered the following text: "Hello World" + some chinese characters that I'm not allowed to post on the forum for whatever reason (see attached error screenshot for the characters)
    5. Expected result: those characters show up in the character table and render correctly
    6. Actual result: only the "HeloWrd" characters show up in the character table and are rendered.
    Am I doing something wrong? (this is on latest 2.1.0)
    Note: there's a lot of those now polluting the hierarchy (after upgrading from 2.0 to 2.1.0-preview.14 to 2.1.0)
    upload_2020-7-23_17-57-42.png



    Also: what the heck, Unity forum...
    upload_2020-7-23_17-58-23.png

    EDIT: Unity seems to be smarter in their default UI font (which is Arial) and fall back to system fonts that actually contain the characters, Arial does in fact not contain those mentioned above. Using a font that contains chinese characters (e.g. Microsoft YaHei), the above process works.

    Is there a way to have a similar behaviour in TMPro (falling back to installed system fonts instead of having to manually move all font files into Unity and manually assign fallback orders)?
    Reason: applications we create usually have a) content management systems b) numerous translations that are not necessarily all known at app creation. Is that a no-go for TMPro or is there a way to make this work?

    Shipping a separate font file and SDF asset for each potentially supported language doesn't sound like a good idea; I understand if that's not "in scope" for what TMPro can do but was under the assumption that Unity is now pitching it as "better default" which would somehow imply feature parity.
     
    Last edited: Jul 23, 2020
    Gekigengar likes this.
  22. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The Arial font file does not contain any Chinese, Japanese or Korean characters. The legacy text system does fallback to some system font which varies from platform to platform. Although this may seem convenient, (1) it hides from the developer what is actually happening and can also result in the selected OS font not really working well both from a design and metrics point of view with the font you selected in the first place. This is usually uncovered and reported by end users back to developer which is not ideal.

    Selecting the right font file from a design and glyph coverage point of view to ensure it supports the languages your application is targeting is very important. This is not something that I believe should be left to some random selection of whatever OS font file might be available on the various platforms.

    There are pros and cons to including various font files in a project.

    On the pro side, it provides you with complete control over font file selection, visual and design consistency between these and on all platforms and quality assurance as well as better performance when using static font assets for the known text contained in the project while relying on dynamic font assets for the rest.

    The cons are mostly build size as you need to include these font files. Depending on the number of languages supported, it might require using AssetBundles or Addressables to split these pay loads per language or language groups which might not be an issue if you were already planning on offering downloadable content.

    In terms of font file selection, there are many font families with support for multiple languages such as the free to license NotoSans and NotoSerif from Google. There are many other available from various websites like Google Fonts where you can even search for font file based on language coverage.

    TMP makes it possible to use a combination of static and dynamic font assets. Dynamic font assets with Multi Atlas Texture enabled, can handle every single character contained in a font file.

    It is also possible to create dynamic font assets at runtime from system / OS fonts. This requires knowledge of the fonts available on the targeted platforms and something you can query using the GetPathsToOSFonts function. I will continue to improve support for OS fonts as well.

    When a text object uses several font assets or sprite assets, sub text objects are created as needed. These sub text objects are no longer serialized nor saved with the scene. These could have been hidden in the hierarchy but decided to expose them for transparency.

    This process may seem complex at first but so does learning about good coding practices, or efficient modeling techniques or light baking or Nav Mesh, etc.

    In reality the process is actually pretty simple where it mostly comes down to selecting the right font file for the targeted languages. Creating a dynamic font asset that can handle every single CJK character contained in a font file take 15 seconds.

    Here are two videos the cover the entire process along with suggested workflows to handle localization.



     
    RunninglVlan likes this.
  23. fherbst

    fherbst

    Joined:
    Jun 24, 2012
    Posts:
    802
    Thanks for the detailed response. I have been using TMPro for quite some years, but for dynamic content always had to fall back to Unity UI and thought I'd give it another try.

    I'm sorry, but this is not a question of beliefs but usability. Every other platform (Unity, the web, every word processor, every Adobe software, every Affinity software, ...) disagrees, because a system-selected character is still better than a non-existant one, and the chance of me receiving bug reports much higher for missing characters than ones that don't perfectly fit the style. Adding the "right" font is certainly a gradual design improvement but shouldn't be necessary to "just render a character".

    File size is especially concerning on mobile and web platforms - I can't ship a 20MB font file in a WebGL build.

    That all being said, could you elaborate on and point me to the right docs for how to dynamically figure out which font the system would use for a character and how to load that into TMPro at runtime as dynamic multi atlas?
     
    Dreamotion and Gekigengar like this.
  24. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I am not saying that we should not have support for OS fonts but that the selection of the font should not be random and something I would like to provide control over. And I agree that build size constraints are the primary reason for wanting to use OS fonts.

    To create a dynamic font asset from OS font, use the following

    Code (csharp):
    1.  
    2. string[] fontPaths = Font.GetPathsToOSFonts();
    3. Font osFont = new Font(fontPaths[index]);
    4. TMP_FontAsset fontAsset = TMP_FontAsset.CreateFontAsset(osFont);
    5.  
    However, if you already know what fonts are available and their paths on those platforms, you can skip the GetPathsToOSFonts();

    I will be making a change soon so that we can skip having to create a Font object and instead just use the font file path to create the dynamic font asset.
     
    Last edited: Jul 24, 2020
    Gekigengar likes this.
  25. fherbst

    fherbst

    Joined:
    Jun 24, 2012
    Posts:
    802
    I agree that control over it is "nice to have" but not having a "default system" fallback at all seems to make TMPro currently impossible to use (for our usecase with dynamic content – I'd be very happy to be wrong here :)). Compare to e.g. how CSS handles this (there's a list of deliberately chosen system fonts as fallback, but after that it will still fall back to default system fonts when characters aren't found).

    The above looks good, thank you; however, how would I figure out that
    1. somehow a character is trying to render with no suitable font already loaded?
    2. which system font contains this character?
    3. Additionally, what platforms is "GetPathsToOSFonts" supported on? The Unity docs contain no information about that unfortunately. (It doesn't sound very mobile/web friendly, but that could be misleading given the method name – do you have more information about system compatibility?)
     
    Last edited: Aug 12, 2020
  26. tim12332000

    tim12332000

    Joined:
    Jun 15, 2017
    Posts:
    20
    I also want to know how to use default system font solution !! If have any updated let me know it . thanks:):)
     
  27. fherbst

    fherbst

    Joined:
    Jun 24, 2012
    Posts:
    802
    @Stephan_B bump! Would be great to get answers to 1/2/3 above :)
     
  28. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    GetPathsToOSFonts works all platforms with the exception of PS4, XboxOne.

    List of fonts available on platforms is available from the different platforms. For instance, here is a link to the fonts available on iOS. Many of the font names indicate what language they cover. Similar lists are available for Android.

    The GetPathsToOSFonts function would return the same list with some potential differences as fonts are sometimes replaced or added.

    As fonts come in all shapes and size, it is best for developer to select what fonts they wish to use for specific languages. This ensures design consistency between languages.

    For example, I selected the font Impact and then typed the words Hello in English and Korean.

    upload_2020-8-12_14-25-49.png

    As you can see, the font that Word selected for the Korean text doesn't match the design style of Impact and is much too thin.

    A better choice would have been to use another font such as Black Han Sans as seen below

    upload_2020-8-12_14-28-1.png
    Having the ability to create font assets at runtime from OS fonts is an important feature but this still requires the developer / designer to know what fonts are available on those platforms and to select the appropriate fonts for the designs.
     
  29. seoyeon222222

    seoyeon222222

    Joined:
    Nov 18, 2020
    Posts:
    186

    I watched two videos. Did I understand correctly? I have a question.

    - If dynamic font setting, i can use all the characters in the font file, but overhead occurs at runtime.
    - If i know the characters to use in advance, build them with static to prevent overhead.
    But the build size increased.
    - for user typing situation, Dynamic setting is the good
    - Other than that, static setting is better

    What I'm curious about is,
    Does the dynamic setting font actually cause a significant performance problem?
    Is there a reason to avoid using dynamic fonts in a general PC specification environment?
    I know that it is difficult to discuss performance or optimization depending on the game and platform environment,
    Will it be a problem if I use the font in dynamic settings?
     
  30. khushalkhan

    khushalkhan

    Joined:
    Aug 6, 2016
    Posts:
    177
    I think using textmesh pro with localized text is a headache, i switched to unity pixelated blurry text.
     
    yaweronline likes this.