Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Bug BlobString lost some character in Chinese

Discussion in 'Entity Component System' started by FengH, Aug 15, 2023.

  1. FengH

    FengH

    Joined:
    Jan 6, 2020
    Posts:
    3
    upload_2023-8-15_20-31-12.png
    This log should have output two words, but it only outputs one. Here is the relevant code
     
    Last edited: Aug 15, 2023
  2. FengH

    FengH

    Joined:
    Jan 6, 2020
    Posts:
    3
    upload_2023-8-15_20-37-38.png

    I can't submit my code because of forum restrictions,so I had to take a screenshot
     
  3. Singtaa

    Singtaa

    Joined:
    Dec 14, 2010
    Posts:
    485
    It's due to
    var utf8Capacity = value.Length * 2 + 1;
    inside of
    AllocateString()
    .

    The calculation
    value.Length * 2 + 1
    incorrectly assumes a maximum of two bytes per character, causing issues with non-ASCII characters (which may take up to 4 bytes per character).

    This replacement extension method will fix it for the time-being:
    Code (CSharp):
    1. public static class BlobStringExts {
    2.     public static unsafe void AllocateStringEx(ref this BlobBuilder builder, ref BlobString blobStr, string value)
    3.     {
    4.         FieldInfo dataField = typeof(BlobString).GetField("Data", BindingFlags.NonPublic | BindingFlags.Instance);
    5.  
    6.         fixed (char* c = value) {
    7.             int utf8Capacity = Encoding.UTF8.GetMaxByteCount(value.Length);
    8.             byte* b = (byte*)UnsafeUtility.Malloc(utf8Capacity, 1, Allocator.Temp);
    9.             Unicode.Utf16ToUtf8(c, value.Length, b, out int utf8Length, utf8Capacity);
    10.             b[utf8Length] = 0;
    11.  
    12.             byte* dataFieldAddress = (byte*)UnsafeUtility.AddressOf(ref blobStr) + UnsafeUtility.GetFieldOffset(dataField);
    13.             ref BlobArray<byte> data = ref *(BlobArray<byte>*)dataFieldAddress;
    14.  
    15.             var res = builder.Allocate<byte>(ref data, utf8Length + 1);
    16.             UnsafeUtility.MemCpy(res.GetUnsafePtr(), b, utf8Length + 1);
    17.         }
    18.     }
    19. }
     
    Last edited: Aug 16, 2023
    siuncyclone and apkdev like this.
  4. FengH

    FengH

    Joined:
    Jan 6, 2020
    Posts:
    3
    It works, thank you!