Search Unity

Huge jump on tgkill (signal 6) crashes after moving to IL2CPP.

Discussion in 'Android' started by Fesener, Jul 25, 2019.

  1. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    We have seen about 500% increase on this crash after moving from 2017.4 to 2018.4 and switching using IL2CPP. Unfortunately stack trace doesn't tell much.

    Pre-IL2CPP crash that occurred rarely:
    Code (CSharp):
    1. Caused by java.lang.Error: signal 6 (SIGABRT), code -6 (?), fault addr --------
    2. Build fingerprint: 'samsung/j6ltedx/j6lte:8.0.0/R16NW/J600GDXU3ARK5:user/release-keys'
    3. Revision: '2'
    4. pid: 8907, tid: 9048, name: UnityMain  >>> net.test.text <<<
    5.     r0 00000000  r1 00002358  r2 00000006  r3 00000008
    6.     r4 000022cb  r5 00002358  r6 cb97ec80  r7 0000010c
    7.     r8 00000056  r9 cd8d6e00  sl cb97ee90  fp cb97ecf4
    8.     ip 00000000  sp cb97ec70  lr f59e44d7  pc f5a1580c  cpsr 00000056
    9.  
    10.        at libc.tgkill + 12(tgkill:12)
    11.        at libc.abort + 54(abort:54)
    12.        at libc.__libc_fatal + 24(__libc_fatal:24)
    13.        at libc.__pthread_internal_find(long) + 88(__pthread_internal_find:88)
    14.        at libc.pthread_gettid_np + 2(pthread_gettid_np:2)
    15.        at libc.pthread_kill + 18(pthread_kill:18)
    16.        at libmono.0023fef0()
    17.        at libmono.mono_thread_suspend_all_other_threads + 1424(mono_thread_suspend_all_other_threads:1424)
    18.        at libmono.mono_unity_jit_cleanup + 28(mono_unity_jit_cleanup:28)
    19.        at libunity.0072fba0()
    20.        at libunity.00705764()
    21.        at libunity.002df31c()
    22.        at libunity.002e2148()
    23.        at base.00116113()
    Post IL2CPP tgkill crashes that increased by a huge margin:

    Code (CSharp):
    1. Caused by java.lang.Error: signal 6 (SIGABRT), code -6 (?), fault addr --------
    2. Build fingerprint: 'samsung/grandppltedx/grandpplte:6.0.1/MMB29T/G532GDXU1ASA5:user/release-keys'
    3. Revision: '5'
    4. pid: 3179, tid: 3466, name: Thread-6859  >>> net.test.text <<<
    5.     r0 00000000  r1 00000d8a  r2 00000006  r3 833ae978
    6.     r4 833ae980  r5 833ae930  r6 00000000  r7 0000010c
    7.     r8 000004d6  r9 b433de44  sl 00000008  fp b4323820
    8.     ip 00000006  sp 833ae7d0  lr b6cdfde5  pc b6ce21d4  cpsr 833ae4e0
    9.  
    10.        at libc.tgkill + 12(tgkill:12)
    11.        at libc.pthread_kill + 32(pthread_kill:32)
    12.        at libc.raise + 10(raise:10)
    13.        at libc.__libc_android_abort + 34(__libc_android_abort:34)
    14.        at libc.abort + 4(abort:4)
    15.        at libc.abort + 4(abort:4)
    16.        at libc.abort + 4(abort:4)
    17.        at libc.abort + 4(abort:4)
    18.        at libc.abort + 4(abort:4)
    19.        at libc.abort + 4(abort:4)
    20.        at libc.abort + 4(abort:4)
    21.        at libc.abort + 4(abort:4)
    22.        at libc.abort + 4(abort:4)
    23.        at libc.abort + 4(abort:4)
    24.        at libc.abort + 4(abort:4)
    25.        at libc.abort + 4(abort:4)
    26.        at libc.abort + 4(abort:4)
    27.        at libc.abort + 4(abort:4)
    28.        at libc.abort + 4(abort:4)
    29.        at libc.abort + 4(abort:4)
    30.        at libc.abort + 4(abort:4)
    31.        at libc.abort + 4(abort:4)
    32.        at libc.abort + 4(abort:4)
    33.        at libc.abort + 4(abort:4)
    34.        at libc.abort + 4(abort:4)
    35.        at libc.abort + 4(abort:4)
    36.        at libc.abort + 4(abort:4)
    37.        at libc.abort + 4(abort:4)
    38.        at libc.abort + 4(abort:4)
    39.        at libc.abort + 4(abort:4)
    40.        at libc.abort + 4(abort:4)
    41.        at libc.abort + 4(abort:4)
    Any ideas?
     
    Last edited: Jul 25, 2019
  2. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,938
    Something seems wrong about this call stack. Nothing in libmono should be running for an IL2CPP build. Are you sure this is from an IL2CPP build of the app?
     
  3. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    My bad, the first log was from pre-IL2CPP version, it was happening rarely back then, but the second one is the one that increased this crash report by 500% after IL2CPP swap, it's the one I'm concerned about right now.
     
  4. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,938
    Oops, sorry, I did not notice that before. I wonder, is it possible ti symbolicate the send call stack? It looks like something is wrong there, as all of those frames are probably not in abort.
     
  5. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    I used to do some symbolication on Mono stacks but I don't know how to symbolicate IL2CPP stacks, would you guide me to it?
     
  6. JoshPeterson

    JoshPeterson

    Unity Technologies

    Joined:
    Jul 21, 2014
    Posts:
    6,938
    gsylvain likes this.
  7. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    I have used that article to symbolicate mono crashes before, but it doesn't tell about IL2CPP symbolication sadly.

    I have tried

    Code (CSharp):
    1. ndk-stack -sym /Applications/Unity/Hub/Editor/2018.4.3f1/PlaybackEngines/AndroidPlayer/Variations/ -dump myCrashDump.txt
    but didn't work.
     
  8. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    Hey,

    out of curiosity, the logs you're showing, where are they from? Because it seems like there was already an attempt to symbolicate the stacktrace, and it didn't go well.

    I was hopping to see a format like this
    Code (csharp):
    1.  
    2. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000094
    3. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
    4. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: Build type 'Release', Scripting Backend 'mono', CPU 'armeabi-v7a'
    5. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: Build fingerprint: 'OnePlus/OnePlus3/OnePlus3:8.0.0/OPR1.170623.032/1812060016:user/release-keys'
    6. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: Revision: '0'
    7. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: pid: 30759, tid: 30803, name: Thread-88  >>> .............. <<<
    8. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH:     r0 00000000  r1 ce3da200  r2 f3f00040  r3 00000080
    9. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH:     r4 ca45ed50  r5 ca45e7f8  r6 ca45e7f8  r7 ca45ecf8
    10. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH:     r8 dd8bca10  r9 dd8bca1c  sl ca45e918  fp dd8bca18
    11. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH:     ip f28efd5c  sp ca45e7f0  lr cd86524c  pc cd5633fc  cpsr 00007853
    12. 2019-05-17 12:00:58.823 30759-30803/? E/CRASH: backtrace:
    13. 2019-05-17 12:00:58.830 30759-30803/? E/CRASH:     #00  pc 002983fc  /data/app/......../lib/arm/libunity.so
    14. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #01  pc 002b3b3c  /data/app/......../lib/arm/libunity.so
    15. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #02  pc 002b6ee4  /data/app/......../lib/arm/libunity.so
    16. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #03  pc 002b6b00  /data/app/......../lib/arm/libunity.so
    17. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #04  pc 002b6a40  /data/app/......../lib/arm/libunity.so
    18. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #05  pc 002b6750  /data/app/......../lib/arm/libunity.so
    19. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #06  pc 0029a280  /data/app/......../lib/arm/libunity.so
    20. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #07  pc 00a01a60  /data/app/......../lib/arm/libunity.so
    21. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #08  pc 0064abe0  /data/app/......../lib/arm/libunity.so
    22. 2019-05-17 12:00:58.831 30759-30803/? E/CRASH:     #09  pc b3078ea4  <unknown/absolute>
    23.  
     
  9. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Hi,

    Both logs are from the Fabric Crashlytics, there is nothing we do manually to enhance/symbolicate as far as I know.

    Honestly, I have no idea what would have alter these crash dumps, are you sure they are not what they are supposed to be like?
     
  10. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    May that's how Fabric Crashlytics prints them, if you would try to crash your app locally, and check the logcat, the information should look like the one I posted above.

    Maybe you have a logcat from your crash ?
     
  11. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Unfortunately I couldn't manage to reproduce this issue in any way so I don't have access to any logcat details.
     
    Last edited: Jul 31, 2019
  12. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    Can you maybe somehow instruct Fabric Crashlytics to give you raw stacktrace?
     
  13. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Maybe I can find a way to attach the logcats to exceptions that are caught by Fabric, I will look into that.

    Meanwhile this attached log is all the information it provides to us about an occurrence of this crash. Maybe that helps you somehow?
     

    Attached Files:

  14. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    Code (csharp):
    1.  
    2.        at libc.tgkill + 12(tgkill:12)
    3.        at libc.pthread_kill + 32(pthread_kill:32)
    4.        at libc.raise + 10(raise:10)
    5.        at libc.__libc_android_abort + 34(__libc_android_abort:34)
    6.        at libc.abort + 4(abort:4)
    7.        at libc.abort + 4(abort:4)
    8.        at libc.abort + 4(abort:4)
    9.        at libc.abort + 4(abort:4)
    10.        at libc.abort + 4(abort:4)
    11.        at libc.abort + 4(abort:4)
    12.        at libc.abort + 4(abort:4)
    13.        at libc.abort + 4(abort:4)
    14.        at libc.abort + 4(abort:4)
    15.        at libc.abort + 4(abort:4)
    16.        at libc.abort + 4(abort:4)
    17.        at libc.abort + 4(abort:4)
    18.        at libc.abort + 4(abort:4)
    19.        at libc.abort + 4(abort:4)
    20.        at libc.abort + 4(abort:4)
    21.        at libc.abort + 4(abort:4)
    22.        at libc.abort + 4(abort:4)
    23.        at libc.abort + 4(abort:4)
    24.        at libc.abort + 4(abort:4)
    25.        at libc.abort + 4(abort:4)
    26.        at libc.abort + 4(abort:4)
    27.        at libc.abort + 4(abort:4)
    28.        at libc.abort + 4(abort:4)
    29.        at libc.abort + 4(abort:4)
    30.        at libc.abort + 4(abort:4)
    31.        at libc.abort + 4(abort:4)
    32.        at libc.abort + 4(abort:4)
    33.        at libc.abort + 4(abort:4)
    34.  
    looks incorrect, either it's a memory corruption, or the stacktrace was resolved incorrectly
     
  15. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Terrible news, so we have actually nothing to help us to resolve this at all. What else would you suggest me to do to get some information about this crash?
     
  16. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    The crash says, it crashes on Samsung Galaxy J6 with Android 8.0

    Did you try testing on that specific phone?
     
  17. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Yeah, I have tried it on J6 and J7 but sadly couldn't produce the problem. This problem is not specific to those phones either, see the screenshots attached.
     

    Attached Files:

  18. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    I see, that's too bad...that makes things very difficult to fix. The only thing I can suggest, is to try to update to latest 2018 LTS, maybe it will contain stability fix to your problem.
     
  19. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    Thanks for your time anyway, I couldn't find anything related to this issue on 2018.4 and 2018.5 releases and to be honest kind of scared to upgrade, so far 2018 LTS upgrades introduced nothing but new crashes/bugs for us, but I guess I'll give it a try.
     
  20. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    I'm finally able to reproduce the problem, and did it on development mode however log is exactly the same.

    Code (CSharp):
    1. 08-09 14:47:46.956 14765 15279 F art     : art/runtime/thread.cc:1238] Native thread exited without calling DetachCurrentThread: Thread[26,tid=15279,Native,Thread*=0x6539c200,peer=0x138d20a0,"Thread-12925"]
    2. 08-09 14:47:46.956 14765 15279 F art     : art/runtime/runtime.cc:368] Runtime aborting...
    3. 08-09 14:47:46.956 14765 15279 F art     : art/runtime/runtime.cc:368]
    4. 08-09 14:47:46.956 14765 15279 E CRASH   : signal 6 (SIGABRT), code -6 (?), fault addr --------
    5. ********** Crash dump: **********
    6. Build fingerprint: 'samsung/kltejv/klte:6.0.1/MMB29M/G900FQJVS1CSB1:user/release-keys'
    7. pid: 14765, tid: 15279, name: Thread-12925  >>> my.awesome.app <<<
    8. Stack frame #00  pc 00041ff8  /system/lib/libc.so (tgkill+12)
    9. Stack frame #01  pc 0003fc05  /system/lib/libc.so (pthread_kill+32)
    10. Stack frame #02  pc 0001c38b  /system/lib/libc.so (raise+10)
    11. Stack frame #03  pc 00019609  /system/lib/libc.so (__libc_android_abort+34)
    12. Stack frame #04  pc 0001755c  /system/lib/libc.so (abort+4)
    13. Stack frame #05  pc 0001755c  /system/lib/libc.so (abort+4)
    14. Stack frame #06  pc 0001755c  /system/lib/libc.so (abort+4)
    15. Stack frame #07  pc 0001755c  /system/lib/libc.so (abort+4)
    16. Stack frame #08  pc 0001755c  /system/lib/libc.so (abort+4)
    17. Stack frame #09  pc 0001755c  /system/lib/libc.so (abort+4)
    18. Stack frame #10  pc 0001755c  /system/lib/libc.so (abort+4)
    19. Stack frame #11  pc 0001755c  /system/lib/libc.so (abort+4)
    20. Stack frame #12  pc 0001755c  /system/lib/libc.so (abort+4)
    21. Stack frame #13  pc 0001755c  /system/lib/libc.so (abort+4)
    22. Stack frame #14  pc 0001755c  /system/lib/libc.so (abort+4)
    23. Stack frame #15  pc 0001755c  /system/lib/libc.so (abort+4)
    24. Stack frame #16  pc 0001755c  /system/lib/libc.so (abort+4)
    25. Stack frame #17  pc 0001755c  /system/lib/libc.so (abort+4)
    26. Stack frame #18  pc 0001755c  /system/lib/libc.so (abort+4)
    27. Stack frame #19  pc 0001755c  /system/lib/libc.so (abort+4)
    28. Stack frame #20  pc 0001755c  /system/lib/libc.so (abort+4)
    29. Stack frame #21  pc 0001755c  /system/lib/libc.so (abort+4)
    30. Stack frame #22  pc 0001755c  /system/lib/libc.so (abort+4)
    31. Stack frame #23  pc 0001755c  /system/lib/libc.so (abort+4)
    32. Stack frame #24  pc 0001755c  /system/lib/libc.so (abort+4)
    33. Stack frame #25  pc 0001755c  /system/lib/libc.so (abort+4)
    34. Stack frame #26  pc 0001755c  /system/lib/libc.so (abort+4)
    35. Stack frame #27  pc 0001755c  /system/lib/libc.so (abort+4)
    36. Stack frame #28  pc 0001755c  /system/lib/libc.so (abort+4)
    37. Stack frame #29  pc 0001755c  /system/lib/libc.so (abort+4)
    38. Stack frame #30  pc 0001755c  /system/lib/libc.so (abort+4)
    39. Stack frame #31  pc 0001755c  /system/lib/libc.so (abort+4)
    Seems like a threading problem, but no idea how to progress based on this log, it actually says not much besides the new "Native thread exited without calling DetachCurrentThread:" part.

    I also have bunch load of;

    Code (CSharp):
    1. 08-09 14:49:18.656 14765 14765 E CRASH   : other thread is trapped; signum = 6
    2. 08-09 14:49:18.656 14765 14765 E CRASH   : main thread is trapped; signum = 6
    3. 08-09 14:49:18.656 14765 14765 E CRASH   : other thread is trapped; signum = 6
    4. 08-09 14:49:18.656 14765 14765 E CRASH   : main thread is trapped; signum = 6
    5. 08-09 14:49:18.656 14765 14765 E CRASH   : other thread is trapped; signum = 6
    6. 08-09 14:49:18.666 14765 14765 E CRASH   : main thread is trapped; signum = 6
    7. 08-09 14:49:18.666 14765 14765 E CRASH   : other thread is trapped; signum = 6
    8. 08-09 14:49:18.666 14765 14765 E CRASH   : main thread is trapped; signum = 6
    9. 08-09 14:49:18.666 14765 14765 E CRASH   : other thread is trapped; signum = 6
    10. 08-09 14:49:18.666 14765 14765 E CRASH   : main thread is trapped; signum = 6
    11. 08-09 14:49:18.666 14765 14765 E CRASH   : other thread is trapped; signum = 6
    How I can debug what's going wrong during runtime from this, any ideas?
     
    Last edited: Aug 9, 2019
  21. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    I did more digging around and found this (it was in the code base for years not something we added after 2018 switch):

    Code (CSharp):
    1.         #region Local Notification
    2.         public void ShowLocalNotification (string title, string message) {
    3.             AndroidJNI.AttachCurrentThread();
    4.             Facade.ShowLocalNotification(title, message);
    5.         }
    6.         #endregion
    7.  
    Then I have added

    Code (CSharp):
    1. AndroidJNI.DetachCurrentThread();
    After calling native method here. Since this is the only instance where JNI calls happen in the code base.

    Failed to produce any freezes after this change, maybe this fixed the problem, I will keep testing it.

    That doesn't explain why it would increase dramatically after IL2CPP switch though, any ideas on that?
     
    Marcos-Elias likes this.
  22. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    This look like a deadlock, many things could have impacted this, like different thread priorities, etc. With mono, you were probably simply lucky.

    I assume Facade.ShowLocalNotification(title, message); doesn't exit until you close the notification?
     
  23. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    It does exit before closing the notification.

    Did you mean `until you display the notification` by any chance?
     
  24. Tomas1856

    Tomas1856

    Unity Technologies

    Joined:
    Sep 21, 2012
    Posts:
    3,919
    I meant the function (ShowLocalNotification) itself, it doesn't exit while showing the notification?
     
  25. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    It exits right after a notification is shown by the native code (native calls are blocking afaik?), so it exists while showing the notification and detaches currently.
     
  26. Fesener

    Fesener

    Joined:
    May 3, 2018
    Posts:
    33
    After the fix I've mentioned, the exceptions have almost disappeared; from 0.1% of every sessions to less than 0.01%.

    There are still some cases causes this exception, there seems to be some kind of race condition issue with the network thread (ShowLocalNotification method was also being called by the network thread) so I need to check through the network service to fix all tgkill exceptions.

    Thanks for helping out @Tomas1856 and @JoshPeterson
     
    Marcos-Elias likes this.
  27. Voxel-Busters

    Voxel-Busters

    Joined:
    Feb 25, 2015
    Posts:
    1,967
    Do you mean DetachCurrentThread needs to be in place always when using IL2CPP?