Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Bug IN-40828 - Binary Catalog causes illegal alignment crashes on Android IL2CPP ARMv7

Discussion in 'Addressables' started by AlkisFortuneFish, May 11, 2023.

  1. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    We have run into a crash bug after enabling binary catalog support. The unaligned read happens in BinaryStorageBuffer.ReadValue<ObjectInitializationData.Serializer.Data>(uint id):

    Code (CSharp):
    1. public T ReadValue<T>(uint id) where T : unmanaged
    2. {
    3.     if (id == uint.MaxValue)
    4.         return default;
    5.     if (id >= m_Buffer.Length)
    6.         throw new Exception($"Data offset {id} is out of bounds of buffer with length of {m_Buffer.Length}.");
    7.  
    8.     fixed (byte *pData = m_Buffer)
    9.         return *(T*)&pData[id]; // Unaligned read happens here.
    10. }
    Looking at the actual crash in LLDB and the managed debugger, when it crashes id is 437, while pData is word aligned, the result of that base+offset is definitely not aligned to a word boundary.

    This does not repro on ARM64, and it also does not repro if the C++ project is compiled with the debug profile.
     
    Last edited: May 12, 2023
  2. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    Trying to see if I can work around this, the data it is deserializing is naturally unaligned, and we're too close to release to rebuild our asset bundles with binary catalogs disabled.
     
  3. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    Okay, we have worked around this for now, using a UnsafeUtility.MemCpy to do the unaligned read.
     
    Alan-Liu likes this.
  4. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    345
  5. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    I did consider that, but we had to make the existing catalogue work, too close to release to rebuild fully tested bundles, so I rewrote the read to be:
    Code (CSharp):
    1. T t;
    2. UnsafeUtility.MemCpy(&t, pData + id, sizeof(T));
    3. return t;
    Interestingly, in our case that is not the case in the slightest. The json catalog is 35MB and binary is 16MB, that is one of the main reasons we need to switch over, deserialising that much json is slow.

    Yep, run into this one (and fixed it).

    Interesting, I'll have a look at that (although we don't use the location name so I'm not too bothered about that).

    Basically we have gone through two full QA cycles on iOS and Android ARM64 with these bundles and the binary catalogue and did not run into a showstopper until this crash was discovered when we moved to testing on more devices.
     
    Last edited: May 12, 2023
  6. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    345
    For our project, the binary catalog is also smaller than json catalog, but it's not small enough. It handles some cases badly and has bug. For example, some prefixes of address don't get extracted and duplicate in the catalog. The bug report was created specifically for them. I did some project-specific customization to make it smaller.

    I don't use it directly either, it's used internally in Addressables. It causes a lot of same strings of "UnityEngine.AddressableAssets.ResourceLocators.ContentCatalogData+ResourceLocator+ResourceLocation" at runtime. Becasue each time ResourceLocation.ToString is called, it returns a new string object. I fixed it by defining ToString() method and let it return a string literal. It may be related to engine. I tested using Unity 2020.3.30 + IL2CPP + Android.
     
    Last edited: May 12, 2023
  7. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    Considering the amount of customisation we already have in addressables, I will very much consider looking into project specific optimisations, thanks.

    Hm, I'll have a look, thanks.
     
    Alan-Liu likes this.
  8. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    Bump! Sure, we have a workaround, but for some reason the (100% reproducible crash) bug hasn't even been confirmed by QA yet.
     
  9. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    959
    Yep, run into that, and also run into a much worse one:
    ContentCatalogData.ResourceLocator.Locate(object key, Type type, out IList<IResourceLocation> locations), end of the method. Someone, who had the amazing idea that "locations" and "locs" are valid variable names for "filteredLocations" and "unfilteredLocations" has added this lovely little locations = locs at the end, which results in the filtered list being cached for the key-type combo, and then the unfiltered list being returned to the caller.

    It breaks loading sprites correctly. I have various components that load the same sprite, and kept getting "Addressables.Release was called on an object that Addressables was not previously aware of ", because that lovely little swap resulted in Locate returning the unfiltered locations list.

    AddressablesImpl.LoadAssetAsync<TObject>() takes the first location returned where a valid provider is returned for the type and location, but GetResourceProvider will happily return a cached provider for that location totally ignoring the type. As a result, we were ending up with the first sprite being returned being tied to a handle to the Texture2D, which is the first location in the unfiltered list for that key, and the rest to the Sprite, which was the only location in the cached and correctly filtered list..

    The returned sprite (because it did actually provide both with the right asset) is then added to the m_resultToHandle dictionary, pointing at the handle for the Texture2D location, and when stuff starts getting released this handle immediately hits zero and is released. That removes the sprite object from m_resultToHandle, orphaning it (if the caller did not keep hold of the handle, which we usually don't). Any further relase calls result in the error being printed.

    I have no idea how this did not cause bigger issues on the project considering the amount of testing we've done. ‍*shrugs*

    Commenting out ContentCatalogData.cs:1020 fixes the issue.

    How on earth do I even start to make a repro to report this? Scrub that, I just did. IN-41274
     
    Last edited: May 16, 2023