Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

TextMesh Pro FindIntersectingWord for Japanese texts

Discussion in 'UGUI & TextMesh Pro' started by Milten2222, Oct 21, 2020.

  1. Milten2222

    Milten2222

    Joined:
    Nov 14, 2017
    Posts:
    19
    Hey sup guys :)

    I want to add Japanese localisation for my game. I have a mechanic that allows player to highlight a word in text and click on it to make a note. I am using FindIntersectingWord method, but the problem is, for Japanese it just returns me the whole text. Is there a way to get around it? I assume this behaviour is caused by the fact Japanese text does not have spaces. If that is the case, would it be possible to maybe add some kind of character that would help TextMeshPro split the text by words but at the same time would not be visible to player?
     
  2. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    I need to think about how to handle CJK and the lack of spaces which delineate words.

    In the mean time, and although I have not tested this, perhaps inserting a Zero Width Space <ZWSP> or \u200B could do the trick.
     
  3. Milten2222

    Milten2222

    Joined:
    Nov 14, 2017
    Posts:
    19

    Thanks for your reply!

    I tried both <ZWSP> and \u200B and they both seem to work just fine but I ended up using \u200B since I have some logic that dynamically adds tags to my tags so extra "<", ">" characters kinda breaks it a bit.

    Would it be even possible to parse CJK without any mistakes? I don't know about Chinese and Korean, but for me it seems like only having a built it Japanese dictionary or something would help solve this problem. That's only my guess, of course.
     
  4. Stephan_B

    Stephan_B

    Joined:
    Feb 26, 2017
    Posts:
    6,595
    That is something that I will need to explore. Ie. some mechanism to identify words for CJK given words are not delineated by spaces like it is in Latin languages.