Search Unity

Resolved IL2CPP implementation of Thread.VolatileRead and Thread.VolatileWrite

Discussion in 'Scripting' started by Alan-Liu, Oct 20, 2021.

  1. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    I encountered a multi-thread issue recently, and tried to solve it by using Thread.VolatileRead/Write.But when I saw the IL2CPP implementation of these two functions, it seems that the implementation is wrong.
    For example, the IL2CPP implementation of 'int Thread.VolatileRead(ref int address)' can be found in Unity\Hub\Editor\2019.4.31f1\Editor\Data\il2cpp\libil2cpp\icalls\mscorlib\System.Threading\Thread.cpp:
    Code (CSharp):
    1. int32_t Thread::VolatileReadInt32(volatile void* address)
    2. {
    3.     il2cpp::os::Atomic::FullMemoryBarrier();
    4.     return *reinterpret_cast<volatile int32_t*>(address);
    5. }
    The 'MemoryBarrier' should be called after loading from 'address', here is the implementation of .Net Framework (https://referencesource.microsoft.com/#mscorlib/system/threading/thread.cs,e6226614b8f4abc8):
    Code (CSharp):
    1. public static int VolatileRead(ref int address)
    2. {
    3.     int ret = address;
    4.     MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
    5.     return ret;
    6. }
    So, is the IL2CPP implemetation is correct?
     
  2. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,981
    I'm certainly not an expert here, but I think it doesn't really make a difference. A memory barrier does not somehow lock threads or anything. They just ensure whatever code was before that instruction has been executed before any of the code after that barrier instruction. It's a simply ordering instructions which prevents CPUs from applying out of order optimisations which could result in wrong behaviour.

    So using the VolatileRead in code does simply read the current value but as a whole sets a marker in code to ensure the CPU does not mess with the ordering. As I said, I'm not an expert in that field. What exact issue do you have encountered and why do you thing a VolatileRead should solve it? It certainly does not solve any of the usual concurrency issues due to multithreading when two or more threads mess with the same variables. It just ensures that the ordering of instructions within the same thread is not messed up for optimisation purposes. A compiler does not know about other threads that may run at the same time. So within a single thread a CPU could reorder certain operations for performance improvements. For example imagine you have code like this

    Code (CSharp):
    1. a = 5;
    2. b = 6;
    3. c = 5;
    4. done = true;
    Logically we would assume that variable "a" is assigned first, then "b", then "c" and finally "done". However the CPU could, for what reason ever, actually assign "done" first, then "c" then "a" then "b" because for this thread it doesn't make any difference as all 4 variables are not used in between. It may group "a" and "c" since we assign the same value to it. "done" may be assigned first because it already has a value of "1" / true loaded in a register from code before it so the code becomes faster.

    However such an optimisation could cause issues when you don't have proper synchronisation between multiple thread and the other thread "assumes" that the other thread would execute those things in order. In a hypothetical scenario thread one may execute the "done = true" line and then gets interrupted. Now the other thread may watch the "done" variable to start its own processing and in turn reads the values of a, b and c. Of course those values have not yet been assigned. A memory fence before the "done = true" line would ensure that a, b and c are assigned before done is set to true.
     
    Last edited: Oct 20, 2021
    adamgolden likes this.
  3. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    Thank you for replying, Bunny83.

    I think I should spend some time to describe the issue I encountered. It's related to Newtonsoft.Json.

    Recently, I found our players encountered the issue mentioned in https://github.com/JamesNK/Newtonsoft.Json/issues/1753. Although it had been fixed in 2018, but our project used a old version of Newtonsoft.Json(https://assetstore.unity.com/packages/tools/input-management/json-net-for-unity-11347) which don't have the fix. When I saw the commit of this fix, I doubted whether it is correct on all platforms.

    Then I confirmed the fix is not working on Linux with ARM cpu by a reproducible program: https://github.com/JamesNK/Newtonsoft.Json/issues/2598. My fix is using Volatile.Read/Write to ensure hardware don't reorder the load/store of '_mask' and '_entries'. Because our project used Unity 2017 .Net 3.5 which don't have Volatile.Read/Write API, so I decided to use Thread.VolatileRead/Write.

    Back to my question, as far as I know, VolatileRead should implement the acquire semantic:
    For the implementation of Get method of DefaultJsonNameTable from Newtonsoft.Json, it's important that '_mask' should be read prior to '_entries'. IL2CPP implementation of Thread.VolatileRead uses MemoryBarrier before the load, so I think it can't guarantee that hardware don't reorder the two loads of '_mask' and '_entries'.
     
  4. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,981
    Thanks for that detailed reply. Though this seems to me like a misguided issue. The whole class doesn't look to be thread-safe to begin with and all the issues seems to come from the lack of proper synchronisation as far as I can tell. To me that test case you presented in the issue is already flawed in several ways. You have one thread adding items to the collection and 3 threads reading from it at the same time. You have a lock inside the writing thread which is pretty pointless if that's the only thread using the lock, so the lock is pointless. Locks only work if two or more threads use it.

    On the other hand, as I said the "DefaultJsonNameTable" class is not thread safe at all as both the Add as well as the Get method are not in any way atomic or suggest that they are. No amount of volatile of prevention of reordering would fix that fundamental flaw in that test case. Look inside the Grow method (which is the only real issue here) it will set a new mask and the new entities array, both belong together. The method first sets the entities array, then the mask. So even if the ordering is fine in both threads, this code would cause the wrong result in case of a race condition as it may use the old mask but the new entities array. So you would index into the wrong array element. Yes, the wrong ordering would cause an index out of bounds exception because using the new mask on the old entities array would cause that since after Grow the mask is twice as large but the old array has half the size the new mask would cover.

    So the issue here is that the class is not thread safe in the first place and having more than one thread accessing it simply won't work. The reordering issue is irrelevant when proper synchronisation between threads is used.

    So you're essentially trying to duct tape the index out of bounds exception away that is caused by your race condition. However that does not fix the fundamental flaw in the first place. So just as likely it is that you currently get an out of bounds exception, it's just as likely to that the get call returns null, even though the key does exist.

    While some race conditions could be catched afterwards, maybe even automatically because you have a guard clause that already checks if the value returned is null, depending on the usage you just get inconsistant behaviour which you want to avoid at all cost.
     
  5. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    You're correct that the DefaultJsonNameTable itself is not thread safe. But the Newtonsoft.Json's usage dosen't rely on it.

    The library takes a lock when it calls the Add method(https://github.com/JamesNK/Newtonso...Json/Serialization/DefaultContractResolver.cs Line 1382). The lock in my test program just simulates it, and I should admit it's pointless in the program. When the Get method returns null, the library will fallback to create a new string(https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/JsonTextReader.cs Line 1607). So, even the old '_mask' is loaded first and then the new '_entries' is loaded in the Get method, null will be returned and it will not cause any problem.

    I think the author implemented this way is due to performance. Provided that adding a new string is not frequently, the Get method can return the string from the cache without using any synchronization. In rare case that the Get method returns null even the string exists in the cache, creating a new string seems to be acceptable.
     
  6. Bunny83

    Bunny83

    Joined:
    Oct 18, 2010
    Posts:
    3,981
    Well, I haven't really looked that deep into the classes, but the most obvious fix would be to simply add a lock to every call of Get. Like the line you mentioned:

    Code (CSharp):
    1.     lock(PropertyNameTable)
    2.     {
    3.         propertyName = PropertyNameTable.Get(_stringReference.Chars, _stringReference.StartIndex, _stringReference.Length);
    4.     }
    5.  
    Looks like the only two cases where Get is actually called is in the reader in the file you already mentioned and in addition in the async extension of the partial JsonTextReader class.

    This should fix all issues.
     
  7. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    I did a simple test to compare the performance between using lock and Volatile.Read/Write. The test programs are built using Release configuration and .net5.

    Windows 10, using Volatile.Read/Write
    CPU Arch: X64, Processor count: 16
    100: 997ms
    200: 961ms
    300: 967ms
    400: 954ms
    500: 963ms

    Windows 10, using lock
    CPU Arch: X64, Processor count: 16
    100: 1267ms
    200: 1237ms
    300: 1235ms
    400: 1235ms
    500: 1232ms

    Raspberry Pi, using Volatile.Read/Write
    CPU Arch: Arm, Processor count: 4
    100: 6143ms
    200: 6069ms
    300: 6076ms
    400: 6162ms
    500: 6106ms

    Raspberry Pi, using lock
    CPU Arch: Arm, Processor count: 4
    100: 7527ms
    200: 7452ms
    300: 7391ms
    400: 7361ms
    500: 7365ms

    As you can see, the performance becomes worse when using lock.
     
    Last edited: Oct 21, 2021
  8. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,926
    I believe you are current in noticed that the IL2CPP implementation here is incorrect. However, the code is a bit misleading, as IL2CPP will not actually use this code - it will remap the managed function call to an intrinsic function named VolatileRead that is templated on the argument type. This method does have the proper implementation, and will be inlined, so should provide better performance.
     
  9. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    I saw the code genereated by IL2CPP, it's different from what you described.
    For the code below
    Code (CSharp):
    1. public class Test : MonoBehaviour
    2. {
    3.     public int Value;
    4.  
    5.     public void Test1()
    6.     {
    7.         Thread.VolatileRead(ref Value);
    8.     }
    9.  
    10.     public void Test2()
    11.     {
    12.         Volatile.Read(ref Value);
    13.     }
    14. }
    IL2CPP (Unity 2019.4.31) generated the following code for Test1 method:
    Code (CSharp):
    1. // System.Void Test::Test1()
    2. IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void Test_Test1_m7ED7AF9906542CFA0FD837DD35961C28E18761D6 (Test_tD59136436184CD9997A7B05E8FCAF0CB36B7193E * __this, const RuntimeMethod* method)
    3. {
    4.     {
    5.         // Thread.VolatileRead(ref Value);
    6.         int32_t* L_0 = __this->get_address_of_Value_4();
    7.         Thread_VolatileRead_mD75AEA4F8E4E9DCF1291BD984EAF16EBE17DBB35((int32_t*)L_0, /*hidden argument*/NULL);
    8.         // }
    9.         return;
    10.     }
    11. }
    12.  
    13. // System.Int32 System.Threading.Thread::VolatileRead(System.Int32&)
    14. IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR int32_t Thread_VolatileRead_mD75AEA4F8E4E9DCF1291BD984EAF16EBE17DBB35 (int32_t* ___address0, const RuntimeMethod* method)
    15. {
    16.     typedef int32_t (*Thread_VolatileRead_mD75AEA4F8E4E9DCF1291BD984EAF16EBE17DBB35_ftn) (int32_t*);
    17.     using namespace il2cpp::icalls;
    18.     return  ((Thread_VolatileRead_mD75AEA4F8E4E9DCF1291BD984EAF16EBE17DBB35_ftn)mscorlib::System::Threading::Thread::VolatileReadInt32) (___address0);
    19. }
    20.  
    and the following code for Test2 method:
    Code (CSharp):
    1. // System.Void Test::Test2()
    2. IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void Test_Test2_mF3557C8DFB7F713B30B4CD4169E4F84A677D3D4C (Test_tD59136436184CD9997A7B05E8FCAF0CB36B7193E * __this, const RuntimeMethod* method)
    3. {
    4.     {
    5.         // Volatile.Read(ref Value);
    6.         int32_t* L_0 = __this->get_address_of_Value_4();
    7.         VolatileRead((int32_t*)L_0);
    8.         // }
    9.         return;
    10.     }
    11. }
    12.  
    As you can see, Thread.VolatileRead finally calls the C++ method I mentioned which I think the implementation is not correct, and Volatile.Read finally calls the function you mentioned:
    Code (CSharp):
    1. template<typename T>
    2. inline T VolatileRead(T* location)
    3. {
    4.     T result = *location;
    5.     il2cpp_codegen_memory_barrier();
    6.     return result;
    7. }
     
    PutridEx likes this.
  10. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,926
    Oh, that is a good point! Can you submit a bug report then? We should address this.

    https://unity3d.com/unity/qa/bug-reporting
     
  11. Alan-Liu

    Alan-Liu

    Joined:
    Jan 23, 2014
    Posts:
    391
    I submitted a bug report: Case 1373882.
     
    JoshPeterson likes this.