Search Unity

Bug IL2CPP marshalling issue with an empty arrays of structs?

Discussion in 'Scripting' started by Andefob, Apr 10, 2023.

  1. Andefob

    Andefob

    Joined:
    Sep 17, 2018
    Posts:
    99
    I have a very rare crash issue related to querying for servers using FacePunch / Steam (Unity 2021.3.16f1)

    I submitted a bug to Facepunch, but while I debugged it, it looks like it could be an issue with IL2CPP or something else Unity related: https://github.com/Facepunch/Facepunch.Steamworks/issues/658

    What happens is that an empty array of structs is passed as a parameter to a native function. This works most of the time, but very rarely there is a crash that seems to be caused by something going wrong with the marshalling even before the actual call is done.

    More debug information in the comments in the link above.

    Any ideas on what to do? Unfortunately, reproducing this is very hard to all I can do is to make a fix attempt, make a new build, and wait for weeks to see if it still happens.
     
  2. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,930
    I'm not quite sure what the problem is there either. Do you know the C# code that uses marshaling that causes this issue? I don't see it in the Facepunch.Steamworks issue there. With that code I might be able to make a small test case and run it many times to see if something shakes loose.
     
  3. Andefob

    Andefob

    Joined:
    Sep 17, 2018
    Posts:
    99
    All related C# code below. Is this enough or do you need something else? (I don't know much about marshalling and native calls)

    This is code in my game:

    Code (CSharp):
    1.        public async Task<bool> GetDedicatedServersAsync()
    2.         {
    3.             try
    4.             {
    5.                 using (var list = new Steamworks.ServerList.Internet())
    6.                 {
    7.                     bool result = await list.RunQueryAsync(timeoutSeconds: 10.0f);
    8.                  ...
    9.  


    This is what Facepunch does when that is run:

    Code (CSharp):
    1.  
    2.         public virtual async Task<bool> RunQueryAsync( float timeoutSeconds = 10 )
    3.         {
    4.             var stopwatch = System.Diagnostics.Stopwatch.StartNew();
    5.             Reset();
    6.             LaunchQuery();
    7. ...
    8.  
    And LaunchQuery does this for Internet:

    Code (CSharp):
    1.  
    2. namespace Steamworks.ServerList
    3. {
    4.     public class Internet : Base
    5.     {
    6.         internal override void LaunchQuery()
    7.         {
    8.             var filters = GetFilters();
    9.  
    10.             request = Internal.RequestInternetServerList( AppId.Value, ref filters, (uint)filters.Length, IntPtr.Zero );
    11.         }
    12.     }
    13. }
    GetFilters should just return an empty array that is created like this:

    Code (CSharp):
    1.         internal List<MatchMakingKeyValuePair> filters = new List<MatchMakingKeyValuePair>();
    2.         internal virtual MatchMakingKeyValuePair[] GetFilters() => filters.ToArray();
    MatchmakingKeyValuePair has been defined with these:

    Code (CSharp):
    1.     internal partial struct MatchMakingKeyValuePair
    2.     {
    3.         [DllImport( Platform.LibraryName, EntryPoint = "SteamAPI_MatchMakingKeyValuePair_t_Construct", CallingConvention = Platform.CC)]
    4.         internal static extern void InternalConstruct( ref MatchMakingKeyValuePair self );
    5.      
    6.     }
    7.  
    8.     [StructLayout( LayoutKind.Sequential, Pack = Platform.StructPackSize )]
    9.     internal partial struct MatchMakingKeyValuePair
    10.     {
    11.         [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
    12.         internal string Key;
    13.         [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
    14.         internal string Value;
    15.     }
    16.  
    This is how the actual native call is done in FacePunch:

    Code (CSharp):
    1.         #region FunctionMeta
    2.         [DllImport( Platform.LibraryName, EntryPoint = "SteamAPI_ISteamMatchmakingServers_RequestInternetServerList", CallingConvention = Platform.CC)]
    3.         private static extern HServerListRequest _RequestInternetServerList( IntPtr self, AppId iApp, [In,Out] ref MatchMakingKeyValuePair[]  ppchFilters, uint nFilters, IntPtr pRequestServersResponse );
    4.      
    5.         #endregion
    6.         internal HServerListRequest RequestInternetServerList( AppId iApp, [In,Out] ref MatchMakingKeyValuePair[]  ppchFilters, uint nFilters, IntPtr pRequestServersResponse )
    7.         {
    8.             var returnValue = _RequestInternetServerList( Self, iApp, ref ppchFilters, nFilters, pRequestServersResponse );
    9.             return returnValue;
    10.         }
     
  4. Andefob

    Andefob

    Joined:
    Sep 17, 2018
    Posts:
    99
    I wonder if there is any reason why the array returned by GetFilters() could become invalid before its reference is actually used? Could there be some special bug in the situation when the size/length is 0? It could explain the symptoms if very rarely the array is garbage collected before the reference is used. But I have no idea why that would happen and if it can happen, there might be other related issues, too.

    I could make a version that doesn't just create a temporary array but holds it in some permanent place and/or always makes sure there is something in the array. But as reproducing the issue is very hard, it is hard to know if it really helps (unless of course the crash happens again but it may take weeks or months).

    If you have any ideas how to possible make the issue reproduce more often, let me know.
     
  5. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,930
    Thanks for the code snippets. Nothing immediately jumps out at me as a cause. I do think you might be on to something with the possible GC issue. If you can write some additional code to make sure these objects are not collected, that sounds like a good approach to me.
     
  6. Andefob

    Andefob

    Joined:
    Sep 17, 2018
    Posts:
    99
    Do you have better ideas on how to do that except to store the array on a permanent place (maybe so that it is not replaced until the next call is made)?

    But of course, the problem with this is that as said, it can take months before I can perhaps say that it has probably fixed the issue, and still, the same issue could cause other issues in some other places.
     
  7. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,930
    I don't have a better idea now, sorry!
     
  8. StarBornMoonBeam

    StarBornMoonBeam

    Joined:
    Mar 26, 2023
    Posts:
    209
    I am no pro here so I could not answer many questions, but I know one thing

    If the array is corrupt, it will crash the GC when collected. If the array is corrupt and crashing the GC then not collecting will still result you on that rare occasion with a corrupt array that you don't dispose of.

    Why is the array corrupt?

    Maybe sometimes you didn't receive all of the data I couldn't comment.

    If I am a total idiot here ignore me

    But it may result that well, it's just on the rare occasion you have to just live with it. or confirm the array somehow*
     
  9. Andefob

    Andefob

    Joined:
    Sep 17, 2018
    Posts:
    99
    With C++, a common cause for random issues is that something is freed but some part of the code still keep using it. So, very often the program works just fine until the same memory address is used for something else. IF this issue is related to this, it could be that most of the time the issue is not noticed, only when the chunk of memory is used for something else. If it was possible to at runtime ask whether "filters" has been garbage collected too early, it might be possible to detect the cause of issue even when it is not crashing. But I don't know if there is any way to do it.

    Another possible cause for the issue could be that some other part of the code just has a buffer overflow or for some other reason writes a random memory address. This could in theory explain any bugs. Maybe it could be noticed by creating a dummy array just before creating filters and checking that it is intact after the call to RequestInternetServerList.