Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Bug Burst crash in IJobChunk Unity.Entities.ChunkDataUtility.GetIndexInTypeArray

Discussion in 'Entity Component System' started by slims, Feb 14, 2023.

  1. slims

    slims

    Joined:
    Dec 31, 2013
    Posts:
    86
    I have a persistent and hard to reliably reproduce bug in one of my IJobChunk jobs.

    The crash happens when the chunk's native array for the LocalTransform type is accessed:

    Code (CSharp):
    1. [ReadOnly] public ComponentTypeHandle<LocalTransform> LocalTransformType;
    2.  
    3. public void Execute(in ArchetypeChunk chunk, int unfilteredChunkIndex, bool useEnabledMask,
    4.   in v128 chunkEnabledMask)
    5. {
    6.   var localTransforms = chunk.GetNativeArray(ref LocalTransformType);
    It chokes inside GetNativeArray when the Unity.Entities.ChunkDataUtility.GetIndexInTypeArray method is called.

    The query for this job includes a .WithAll<LocalTransform> so the chunk should definitely have a LocalTransform.

    From the crash dmp:

    lib_burst_generated!Unity.Entities.ChunkDataUtility.GetIndexInTypeArray [inlined in lib_burst_generated!Systems.Knn.KnnSystem/AssignQueryResultsJob::Systems.Knn.KnnSystem.AssignQueryResultsJob.Execute+0x317]:
    00007ffa`5f04da37 418bbc24a0000000 mov edi,dword ptr [r12+0A0h] ds:00000000`000000a0=????????
    Resetting default scope

    EXCEPTION_RECORD: (.exr -1)
    ExceptionAddress: 00007ffa5f04da37 (lib_burst_generated!Unity.Entities.ChunkDataUtility.GetIndexInTypeArray)
    ExceptionCode: c0000005 (Access violation)
    ExceptionFlags: 00000000
    NumberParameters: 2
    Parameter[0]: 0000000000000000
    Parameter[1]: 00000000000000a0
    Attempt to read from address 00000000000000a0

    This causes a lot of crashes in release builds for my playtesters, and it's super hard to tell what I could be doing wrong here. Any ideas or leads I could follow up on?
     
  2. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,626
    So this is nearly always caused by a separate issue that happened earlier; 95% of the time being EntityCommandBuffer or a Lookup writing data to an entity that no longer exists and then corrupting the chunk.

    For example you schedule these 2 in this order
    ecb.DestroyEntity(entity)
    ecb.AddComponent<T>(entity)

    This might not crash at the playback of the command buffer but instead just write data to some random piece of memory and cause issues down the track - i.e. when doing a chunk lookup.

    This is nearly impossible to track down without safety enabled (and if you have the habit of turning off safety then good luck to you.)

    If you are having a hard time trying to repo it in editor, your best chance is to enable UNITY_DOTS_DEBUG in the build as it can catch a lot of the issues. It will still crash as it'll throw an exception, but it should hopefully do this at the actual point of failure with a hint to help you.
     
  3. slims

    slims

    Joined:
    Dec 31, 2013
    Posts:
    86
    @tertle Appreciate the advice as always.

    I ran into the issue you're describing early in development of the game and architected things so I shouldn't ever run into it (simplifying it: entities (almost) never get deleted mid frame, they get tagged, and a deletion system uses a command buffer that runs at the end of each frame to clean them up; queries to entities always exclude stuff tagged in this way).

    Anyway I fixed the problem but it's not 100% conclusive what fixed it. I was using a deprecated method from before Entities 1.0:
    ToComponentDataArrayAsync
    . I replaced this with the updated one that uses a native list instead
    ToComponentDataListAsync
    . This caused me to have to refactor things a bit, and after I did that the problem went away. So the candidate might just be a bug with the deprecated method, or I changed something non obvious to me in the system code that fixed it.