Search Unity

TextMesh Pro Creating asset from font with a lot of characters

Discussion in 'UGUI & TextMesh Pro' started by Necronomicron, Jul 16, 2018.

  1. Necronomicron

    Necronomicron

    Joined:
    Mar 4, 2015
    Posts:
    108
    I want to create asset from NotoSansCJKjp-Regular (42188 characters), I will use many of them, so I don't want to pick some certain ones but just to take them all. I also need them to be huge on screen (like half of screen or so), so I want good quality of characters as well. What is the best approach to perform this? What settings should I use etc.?

    I use Unity 2018.1.4f1 and TextMesh Pro 1.2.4.
     
    Last edited: Jul 16, 2018
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Adding every single character from a font file is very inefficient as most of those characters will never be needed.

    The easiest way to handle localization (currently) is to create a Primary font asset that contains all the known characters used in the project for each given language or sets of languages.

    For Latin based languages and since their character set is limited, you can create a Primary font asset that contains all of extended ASCII. Then create a few additional font assets that will contains Cyrillic, Greek and other potential subset and assign those as Fallbacks to the Primary for Latin languages.

    For Latin languages (and depending on the font you select) a sampling point size of 72 with padding of 8 typically results in a nice look font.

    For CJK and since their character sets are much larger, you will create a Primary font asset for each and then have several fallbacks for each of them as well.

    For Chinese for example, your primary will contain all the Chinese characters known / contained in your project. Then for those unknown characters likely to come from user input, you will create 3 additional fallback font assets which will contains the remaining 8105 character defined in the Table of General Standard Chinese Characters. As a result, the first fallback will contain the 3500 characters from the list minus those already in your primary. The 2nd fallback will contains the next 3000 minus again those in the primary and lastly the third the remaining 1605 minus those in the primary.

    For East Asian characters, a sampling point size of 36 to 48 is actually pretty good with padding value of 4 to 5. Try to keep the padding at about 10% of sampling point size.

    When creating the Primary and Fallback font assets, keep in mind that they do not need to be using the same sampling point size and padding. So the Primary can be sampled at higher quality (since you know these characters are contained in the project / UI and menus) and then use a lower quality for the fallback since they will likely come from user input which is typically plain white text and smaller on screen where the higher quality won't be noticeable.

    The only important part is maintaining the same ratio of Sampling Point Size to Padding for the Primary and Fallbacks. For instance if the primary is using a sampling point size of 80 with padding of 8 then the fallbacks could be using sampling point size of 50 with padding of 5. Maintaining the same ratio will ensure the same visual appearance in regards to styling (outline, shadow, etc.) between the primary and fallbacks when using Material Presets.

    I certainly understand this is more involved than you wanted but like many aspects of game development where we have to create efficient geometry / topology and UV mapping for models or bake NavMeshes or Lightmaps to achieve the visual results and performance we seek, the same is true for text which granted isn't as cool as these other things still remain important.

    Having said all of that, a hybrid dynamic SDF system is in the works and will make this process much simpler. The recommended workflow will still include creating primary font assets that contains all the known / used characters in the project for each language or sets of languages but you will be able to use fallback font assets (set to dynamic mode) where characters not covered in your primary or other fallbacks can be added into those font asset at runtime.

    The idea is to have the vast majority of characters already baked in your primary and existing fallbacks thus providing best quality and performance while relying on the dynamic system for those few characters that were unknown and coming from user input where the performance impact here is not noticeable by users since human type slow.
     
    ltlejeune, JohnKe, gracezhu and 2 others like this.
  3. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    826
    Just a quick addition: If you're using an excel sheet or google sheets for your translations I made a small handy tool to get all unique characters of a language: https://github.com/JohannesDeml/CsvCharacterExtractor
    That way it is really easy to just include the characters you really need.
     
    Wattosan, awallick-sd, Znol and 8 others like this.
  4. Necronomicron

    Necronomicron

    Joined:
    Mar 4, 2015
    Posts:
    108
    Well, then maybe I could pick only those I will use. It's 1000+ characters for now and later I will probably extend this number to around 3000. I will use them all and in huge size (1 character = 1 level, like this). And it's only hieroglyphs (no latin, cyrillic or else). Can I fit them all in one asset or should I split them into parts somehow? Or maybe it will split them automatically? I'm asking because testing it myself would take enormous amount of time, yesterday I've tried to create asset of 100 hieroglyphs and it took something like 5 minutes...
     
    Last edited: Mar 19, 2020
  5. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    For handling about 3000 glyphs, depending on the sampling point size and padding that you need, you could likely fit all of those in one 2048 x 2048 font asset but I think it would be easier to simply split those between two or even 3 font assets if needed. Assuming you end up with 2 or 3 font assets, the primary should contain the known text (menu, UI, common text) and then the other font asset the remaining characters. These other font assets (fallbacks) should be assigned to the primary in the Fallback list.

    In terms of the time it takes to create the font asset, it can take several minutes depending on the sampling point size, padding and number of characters. This only has to be done once in theory so figure out how you will split these characters and start baking the font assets. BTW: This process will be much faster in the next release of TMP :)
     
    Necronomicron likes this.
  6. Necronomicron

    Necronomicron

    Joined:
    Mar 4, 2015
    Posts:
    108
    What are disadvantages of fallback assets? Can primary and fallback assets be of the same quality?
     
  7. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Since a font atlas texture size / space is limited, you can only add so many characters at a certain point size per atlas. The Fallback system allows you to split the characters between multiple font asset / atlases thus removing this limitation.

    When a certain character is requested, TMP will look in the primary (font asset assigned to the text object) but if this character is not available, TMP will then look thru the list of fallback font assets assigned to the primary as well as their own fallbacks. If the character is still not found, TMP will look at thru the list of general fallbacks assigned in the TMP Settings file. If the character is still not found, then TMP will look in the Default Font Asset assigned in the TMP Settings and then if the character is still not found, it will display the missing glyph character specified in the TMP Settings.

    The system is even more flexible as sprite assets assigned to the text object and in the TMP Settings are also scanned as Sprites can now have unicode values assigned to them.

    The only down side to using fallback is the extra draw call that you get for using characters from the other atlas textures. So if you have a primary with 2 fallbacks and use characters from all of them, you get 3 draw calls instead of 1. Since there is no real measurable performance difference between 1 and 20 draw calls this is fine even of old mobile devices.

    They can but do no have to. The primary can be using a higher sampling quality and larger texture vs the fallbacks. The only thing to be mindful of is maintaining the same ratio of sampling point size to padding between the two. If the Primary is using sampling point size of 100 with padding of 10, then the fallbacks could be using sampling point size of 60 with padding of 6 or anything matching this 10% ration.

    Currently font assets and their fallbacks are loaded when the primary is loaded. In the future, I want to make the fallbacks load on demand. However, even without the on demand loading, you would be loading a much larger texture vs. loading several smaller textures so no memory usage difference here. More importantly, it is faster to read from a smaller texture than a larger texture on many mobile devices so in that regards using several smaller textures is better than one big one.
     
    gracezhu and Necronomicron like this.
  8. Necronomicron

    Necronomicron

    Joined:
    Mar 4, 2015
    Posts:
    108
  9. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I checked the font file and this glyph is present in it so unless it could not fit in the texture it should be included.

    I'll check into it later this afternoon since I am working on Font Asset Creator stuff anyway.
     
    Necronomicron likes this.
  10. Necronomicron

    Necronomicron

    Joined:
    Mar 4, 2015
    Posts:
    108
    I tried to create asset of only this symbol and it was missed. In fact, there were 2 missed characters, all of them were something else (D8 42 and DF 9F instead of D8 42 DF 9F). The problem may be that this symbol is outside the Unicode BMP and is interpreted as 2 others.
     
    Last edited: Jul 21, 2018
  11. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    Does this new dynamic system involve GPU SDF calculations of some sort?
     
  12. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    It does not. I think most users would prefer we keep the GPU free to do other cooler stuff.

    Don't get me wrong I obviously think text is cool but I seem to be in the minority here ;)
     
  13. Kumo-Kairo

    Kumo-Kairo

    Joined:
    Sep 2, 2013
    Posts:
    343
    My point is - if we're loading our CPU for a pretty long time, causing one-long-frame stutter, and we can't really do anything on GPU either (as pipeline is stalling), why not make a distance calculation shader that does the same thing but quite a few times faster. I think that OpenGL ES 2.0 capabilities would be enough for it, no need for complex compute shaders. It will be used only during one frame in which we won't be able to do anything else anyway.
    Or am I missing something?

    As for the editor-based generation that takes some 5 minutes for Chinese characters - it would possibly benefit from this approach as well.

    I would dig into this problem myself and will see if it's possible with gles2 and how much time it would take compared to a CPU-based approach (on a low-end mobile device)
     
  14. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    The idea is still to create font assets that will include coverage for all the known characters used in a project. Then to rely on the dynamic system for unknown characters coming mostly from user input. In such case, we will be talking about few characters to be rastered and added to the font atlas at runtime and given this will be happening mostly when a user is typing, the overhead of doing this should not be perceivable by any user.

    In other words, I think we'll be able to achieve the performance we need without having to tap the GPU resources. Should that not be the case, then I will most certainly explore alternative options including offloading the task to the GPU.
     
    Last edited: Jul 23, 2018
    Kumo-Kairo likes this.
  15. ALL-CAPS

    ALL-CAPS

    Joined:
    Jun 23, 2014
    Posts:
    9
    Is there any ETA on the release of new hybrid font system? We're really looking forward to that feature.
     
  16. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
  17. Egil-Sandfeld

    Egil-Sandfeld

    Joined:
    Oct 8, 2012
    Posts:
    72
    I was looking for something like this! Nice.
    Quick learning from me was to arrange my csv like yours and save with unicode-UTF8.
     
  18. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    826
    ProGameDevUser and missli93 like this.
  19. seltar_

    seltar_

    Joined:
    Apr 16, 2015
    Posts:
    15
    You could also extract the characters from a text with javascript.

    Code (JavaScript):
    1. const getUniqueChars = (text) => {
    2. let chars = {};
    3. for(var i = 0; i < text.length; i++){ chars[text.charCodeAt(i)] = true; }
    4. return Object.keys(chars);
    5. }
    Usage:
    Code (JavaScript):
    1. const text = `abcdefghijklmnopqrstuvwxyz0123456789
    2. ABCDEFGHIJKLMNOPQRSTUVWXYZ-.,*=+`;
    3.  
    4. const chars = getUniqueChars(text).join(",");
    5.  
    6. console.log(chars);
    and paste the results in to the textmesh font creator with character set as custom range.
     
  20. Catlard

    Catlard

    Joined:
    Sep 23, 2011
    Posts:
    3
    Hi there @Stephan_B ! I was very happy to find this thread. How's that dynamic font rendering system coming along? I'm sorely in need of it!
     
  21. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    It is coming along nicely.

    Mostly tweaking editors / clean up at this stage while trying to address as many reported issues as possible which helps me further test the new system and changes.

    Planning on creating a new video in the next few days to cover key changes to Font Assets, Sprite Assets and their Editors / Inspectors. This will give me another good opportunity to further test everything as issues tend to surface when I am about 10 minutes into recording. Murphy's Law always lurking around when recording videos or during demos / big presentations looking to trip you up ;)
     
    tvilarinhoAquiris likes this.
  22. Catlard

    Catlard

    Joined:
    Sep 23, 2011
    Posts:
    3
    What great news! It sounds like it will be out soonish (at least, before I try to release MY app in January).

    Quick question: after I download my translations.json file and compile my list of unique characters in it, I'd like to be able to put all those custom characters I want to render in one file, and then manually trigger the SDF to re-render with the same settings, and those characters. Is that going to be possible with this fancy dancy new dynamic system? I hope it will!
     
  23. Cromfeli

    Cromfeli

    Joined:
    Oct 30, 2014
    Posts:
    202
    For example:
    https://www.google.com/get/noto/#sans-hans

    Use these ranges:
    https://forum.unity.com/threads/table-of-general-standard-chinese-characters.559882/
     
    Last edited: Mar 21, 2019
  24. YoungXi

    YoungXi

    Joined:
    Jun 5, 2013
    Posts:
    63
    Did you find any reason that might cause this ? I'm using 1.2.2, but having the same problem: Characters exist in my font, but missing in the result.
     
  25. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    What specific characters and what font file?

    Are you running Unity 2017 or 2018 or possibly 2019?
     
  26. WookieWookie

    WookieWookie

    Joined:
    Mar 10, 2014
    Posts:
    35
    Sorry, the assertion that we know "all known characters" in a given game is silly. If you've worked on any large scale social mobile title, you know that ALL the characters will be used eventually. So I'll reiterate the OP's question:

    How do you deal with getting all the characters in a font atlased without being an expert on Unicode character ranges?

    Yet another reason Unity Text is better because of workflow, regardless of performance. TMP is a huge pain in my ass. I'd rather drop my fonts in a project and just USE them than deal with all this atlasing crap. I've spent HOURS today trying to get Japanese to show up on my screen.
     
  27. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    You should know all the known text used in a given project. By known text I mean all the text in your menus, UI, dialogues, etc... that is everything with the exception of user input or dynamic data coming form some outside source.

    You should know all this text because at some point this text will need to be provided to someone for translation / localization. BTW: Good localization tools usually have functionality to extract all the known text for the purpose of making it easy to provide to localization / translators.

    Characters coming from user input or unknown external sources are handled by dynamic font assets most of the time assigned as local fallback to a primary static font asset or as global fallback in the TMP Settings. There are many other variants / potential way to structure this with the new Multi Atlas Texture feature but the key is dynamic font assets.

    To handle CJK, create a dynamic font asset using the NotoSansCJK font and enable Multi Atlas*1 support on it and done. That is not the most efficient way since everything is dynamic where it doesn't need to be but every single character in this font file will be displayed if it exists.

    I realize that having to learn about Unicode or text may not be exciting but that is no different than having to learn about efficient modeling and how to create geometry or how to efficiently manage lighting / lightmaps, etc. or how to write good code. Text is another area where some expertise needs to be developed just like many other aspects of game development.

    Of course, I want to make the system more user friendly and more intuitive but it remains important to learn / understand the details at some point just like it does in terms of programming.

    *1 Multi Atlas support is available in version 1.5.0-preview.x for Unity 2018.4 or version 2.1.0-preview.x for Unity 2019.x or version 3.0.0-preview.x for Unity 2020.x
     
    Last edited: Mar 16, 2020
  28. Ruchmair

    Ruchmair

    Joined:
    Sep 20, 2015
    Posts:
    544
    Its now become that for TMP. Quite frankly this workflow introduces unnecessary complexity. Everything is fantastic until it comes to fonts.

    Right now i have a 136 MB Font asset just to support Japanese. I need ALL characters all hiragana, katakan, and Kanji characters. I managed to download a NotoSanaJP.otf and use the Font Asset Creator to generate the monstrously large font asset after including the hex codes for punctuation, katakana and hiragana.

    I literally found no example of how to do this anywhere else. So I hope I did it incorrectly because 136 MB font asset is 0% usable in any game.

    The NotoSansCJK.otf is about 5 times larger then the NotoSanaJP.otf so the generated file will be way larger. either way i don't not have Multi Atlasing in my current version nor can i update right now. So i set my atlas resolution to 8192 this is the only way I can fit the characters in and it still looks bad

    would Multi Atlasing reduce the file size ?
     
  29. Ruchmair

    Ruchmair

    Joined:
    Sep 20, 2015
    Posts:
    544
    yup, still 136MB . This is incorrect. is it not ?
     
  30. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    First in order to minimize size, your primary font asset should only include the Japanese characters known ahead of time in the project. Ie. only those in menus, UI, dialogue, etc. Everything else coming from user input or other sources should be handled dynamically.

    This primary font asset could be static (best for performance) where you might be able to get away with a 1024 x 1024 or 2048 x 2048 (depending on the number of known characters, sampling point size and padding which for CJK should not exceed 90 point size and could still look good at 48 but you will have to test that). This primary should have a dynamic fallback with Multi Atlas enabled on it which should enable it to handle everything else contained in the font file. This dynamic fallback (at shipping time) should be Reset (via the context menu) to make sure it is empty with its atlas texture at size zero.

    Note that changes to dynamic font asset in the Editor are persistent but not in builds. As such, these dynamic font assets get reset back to their initial state (empty with atlas at size zero) for each play session. This ensures that even with multi atlas texture enabled, these dynamic font asset won't grow to some crazy size over time.

    The NotoSansJP-Regular.otf is 4MB. Your static primary assuming 2048 x 2048 should be 4MB and then the dynamic fallback with Multi Atlas (0kb). With this configuration, the contribution to the build size would be about 8MB and be able to handle all the characters contains in this font asset.
     
  31. Ruchmair

    Ruchmair

    Joined:
    Sep 20, 2015
    Posts:
    544
    So that is , select the

    1) select the NotoSansJP-Regular.otf in the project window.
    2) Right click > Create > TextMeshPro > FontAsset or (Shift + Ctrl + f12)
    3)Use the generated font asset

    a nice small 7kb file.
     
  32. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    Indeed. The key is making sure this dynamic fallback is Reset (if desired) before you create the final build as while working with it in the Editor (good for testing) it will get populated.
     
    Ruchmair likes this.