Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Bug Cache error when upgrading packages results in invalid scripts, which then causes asset corruption.

Discussion in 'Unity Accelerator' started by Tim2021, Nov 14, 2023.

  1. Tim2021

    Tim2021

    Joined:
    Jan 19, 2021
    Posts:
    18
    I ran into a very weird error, and I think I have tracked it back to a caching error in Accelerator.

    It has taken me a week of debugging and experiments to get to this point, and I've essentially run out of leads to follow as well as run out of time to work on this so I'm going to dump everything I've discovered here in the hope that it helps someone.

    TL;DR: Updating the package manifest while Unity is closed (when you pull from source control) causes packages to import incorrectly from the Accelerator cache, breaking the project and then causing data corruption in serialised assets.

    Wall of Text:

    After upgrading the com.unity.addressables package from 1.21.17 to 1.21.19 It seemed to work fine on my machine. It was just a simple change to the package manifest.json and I was able to build the game locally along with addressables. However when I merged the change back in to the main branch our CI build started spitting out builds very fast. The builds were tiny compared to what they should have been and on inspection it turned out they had no addressable/asset bundles.

    This then started happening randomly on other team members machines.

    After some digging what I found was that certain scripts within the addressables package had not imported correctly. the source code was there, and it was compiling correctly But for some reason Unity's MonoImporter hadn't recognized them as containing valid classes for serialization.

    If I look at one of the broken scripts (BundledAssetGroupSchema.cs is the one that I first noticed was bas) in the inspector (in Debug mode) they look like this

    upload_2023-11-14_15-52-44.png

    Note the missing "Class Name" property and so on when compared to a working script that looks like this:

    upload_2023-11-14_15-53-28.png

    Because the serialization for these classes was now missing all the serialized assets of that type were now missing (sort of), selecting one of these in the project shows an empty object. Interestingly the script does show up as correct, not missing, but all the other contents of the asset are not there. All my BundledAssetGroupSchema scriptable objects were empty and looked like this (empty):

    upload_2023-11-14_15-58-37.png


    Whereas they should look like this (full of lovely data):
    upload_2023-11-14_16-3-23.png

    More images and discussion in thread...
     

    Attached Files:

  2. Tim2021

    Tim2021

    Joined:
    Jan 19, 2021
    Posts:
    18
    If I look at the AddressabeGroups that reference these schemas they should have two entries in their schema set (one for the ContentUpdate schema, and one for the BundledAsset schema), However we only see the one schema:

    upload_2023-11-14_16-9-42.png

    Interestingly The serialized asset still has the guids for both schemas, it is just that one doesn't even show up. It is also interesting that it is completely missing, it is not that the list is two long and one of the entries is null or missing, In the inspector it looks like the list is only 1 element long even though in the actual .asset on disk the list is two long:

    upload_2023-11-14_16-13-21.png

    These missing schemas were what was causing our builds to fail. There were probably other subtle errors caused by the other missing scripts, but this was one was the first we noticed and where I've been focusing my diagnostics.


    If I locally force reimport the Addressable package everything does go back to working, the Monoscripts import correctly, the Schema objects reappear, the lists of schemas (sometimes) go back to having two entries and builds start working again

    I do however see the following warnings in the log, which all correspond to the scripts that had had trouble importing.

    upload_2023-11-14_16-17-47.png



    The big trouble is that if something causes the addressable groups to reserialize while the project is in this weird inconsistent state then Anything referencing one of the broken assets (like the Addressable Group Schemas) will permanently serialize with either a null or shortened List, meaning it stays broken even after the package is manually reimported, until we go back and manually recover the data from source control.

    Unpicking that mess has consumed a large amount of my time since this first hit us last week.
     
  3. Tim2021

    Tim2021

    Joined:
    Jan 19, 2021
    Posts:
    18
    I've been working on recreating the issue in as minimal project at possible in preparation for submitting a formal bug report, but I think at this point the actual issue is that at some point in the past our cache server ended up with corrupt data in it for some unknown reason, and all of that weird behaviour I described above is actually a symptom of that rather than the root cause.
    I can recreate the symptoms of the issue using the following steps:
    1. Start with the package manifest.json pointing to com.unity.addressables 1.21.19
    2. Force reimport the Addressables package so that everything is nominally correct.
    3. Edit manifest.json to point to addressables version 1.21.17 (This simulates switching branch in git to the main branch in preparation for merging the upgrade to main) It doesn't matter if Unity is open or closed when this happens, it seems to work fine and the package successfully downgrades a version
    4. Close Unity
    5. Edit Manifest.json to point to addessables version 1.21.19 (This simulates the git operation of someone merging the package update from main to their branch)
    6. Open Unity and observe that the addressables package has failed to import as descibed above.
    Interestingly the breakage only happens if Unity is closed when the manifest changes. If you have Unity open and change from 1.21.17 to 1.21.19 then the package imports correctly. I suspect that this is why the problem seemed to cropped up on random people's PCs, it all depended on whether

    I can recreate these symptoms in our full project, our project with ALL of the assets deleted and only the package manifest left, and in a completely empty, brand new project.

    The symptoms seem to go away if I do any of the following:
    1. Disable the cache server in the project settings
    2. Disable "Download" in the Cache server settings
    3. Change the "Namespace prefix" in the Cache server settings.
    This is why I now suspect that this is a caching bug and our cache server has somehow ended up with corrupt data in the original "default" namespace.

    I have not managed to recreate the circumstances that created the corruption in a new namespace, which is why I'm out of leads to follow at this point.

    Disabling downloading from the cache server is not a viable workaround as it completely negates the point of having a cache server.

    Changing the namespace prefix whenever we upgrade a package version is painful, it takes a couple of hours to completely reimport the project and rebuild the cache in the new namespace. It also isn't foolproof, if someone forgets to do it then we get back into the situation where objects are invalid and any reserialisation of data around them will result in semi-permanent lost data getting checked back in to source control.

    I have actually seen this problem once before. It occured when we were last updating Unity version (and a whole bunch of packages) that time it impacted the com.unity.inputsystem package, but I wrote it off at the time as weirdness around the Unity upgrade as there were a lot of upgrade issues we had to work through.

    I'm happy to submit this via the bug reporter, but If I just submit the empty project the bug probably won't be recreatable without also using the same cache server. Currently the cache server data is well over 100GB.


    All of this is in Unity 2022.3.7 (LTS) with Accelerator version v1.0.941+g6b39b61 both running on Windows.
     
  4. unity_Jonny

    unity_Jonny

    Unity Technologies

    Joined:
    Feb 11, 2020
    Posts:
    22
    Firstly, thanks for the amazing write up! That's really helpful.

    There's a lot going on there, but yes, it does seem like somehow in the package upgrade process on launching Unity, the cache receives some invalid data.
    It is strange that it works if the Editor is open when the manifest is edited, but fails if its closed. Perhaps there is some different code path related to package management on first run up of the Editor, and if the import happens in stages, a cache event could happen on partial or invalid imports.
    I wonder if the actual poisoned cache event happens in step 3 you described above - the Editor inadvertently caches invalid data during the downgrade, but the import eventually succeeds locally and does not update the cached data, then on the next launch of the Editor, it downloads the corrupted data?
    I'd like to monitor the download events that occur to find out when the local files get updated

    If you're happy to submit a bug report that would be awesome, you don' tneed to supply a repro project if you have a set of repro steps that reliably show the issue, on a fresh project. Just detail the steps as much as you can.
    If you use the Bug Reporter in the Editor it'll prompt for the details and it'll also grab the current Unity version etc.
    Thanks!
     
  5. Tim2021

    Tim2021

    Joined:
    Jan 19, 2021
    Posts:
    18
    I can only recreate the issue in a clean project if I have that project pointed to our instance of the cache server, and the namespace prefix set to be the old "default" value that our main project was using when all this happened. If I'm running on a clean cache namespace everything works fine.

    Like I said somewhere in the wall of text, I suspect the actual bug happened a while ago, that cache namespace now contains bad data and what we are seeing as I go back and forth between versions is just a symptom of that original corruption that never seems to get flushed out.

    I don't know the recreation steps that corrupted the cache.

    I can probably zip up and include the cache server data folder so that you could set up a local instance with the same data but at this point it is ~120GB will the bug reporter even cope with that?
     
  6. Tim2021

    Tim2021

    Joined:
    Jan 19, 2021
    Posts:
    18
    Bug report submitted. The bug reporter really struggled with the size of the files I needed to attach so that you could replicate the cache server but I think we got there in the end.
     
  7. unity_Jonny

    unity_Jonny

    Unity Technologies

    Joined:
    Feb 11, 2020
    Posts:
    22
    That's great thanks.
     
  8. dKleinTriCAT

    dKleinTriCAT

    Joined:
    Jul 2, 2019
    Posts:
    18
    @unity_Jonny @Tim2021
    Some information from our side, as I think we are suffering from the same issue.

    We believe by now that this issue is not caused by the accelerator, it just gets worsened by it because the accelerator accepts and spreads the corrupted files.

    We disabled the accelerator and deleted the library folder on all of our machines, but after a couple of package updates the problem keeps reappearing. It's always scripts that are not imported properly/corrupted and then cause all kinds of other things to fail. We are using many custom packages to modularize our internal framework and so we actually do several package updates a week. Things don't break with every package update we do, but often enough that we currently do a full reimport for every nightly build that we make. Also every 1-2 weeks our entire dev department has to stop what they're doing and reimport their library for 2 hours because things will be broken that prohibit even entering playmode.

    We have tried to create a small repro project and have managed to create those broken imports on package update even there once, but sadly it seems far more unreliable than in our actual project and we don't quite know exactly what is causing it.
     
  9. jamie_xr

    jamie_xr

    Joined:
    Feb 28, 2020
    Posts:
    65
    I'll add to this. We have indeed seen the same problems and have disabled accelerator in order to circumvent it.

    Package upgrades for us would more often than not require the entire cache server to be purged.
    @Tim2021 This would be similar to you changing the namespace (you essentially have cleared the cache there as you are pointing at a namespace with 0 cache). We however use a namespace prefix other than "default" as we have multiple projects using the cache server (well we did until this issue derailed the whole thing).

    We were also unable to repro it in a small project - even in our project it was quite intermittent. I've wanted to supply a small project for a bug report also for nearly a year now, but we've not been able to, and unable to figure out what caused it.

    It's sad that we had to disable the cache server. Our projects get bigger and bigger and as the team grows there are is such a high velocity of changes that even a few hours you can fall behind and it's going to be at least an hour or more the import the newest stuff.

    @unity_Jonny Can you provide any update?
    @Tim2021 Can you link the bug report, I'd like to monitor progress and upvote it.