Android ANR排查分析
-
常见的ANR类型
- KeyDispatchTimeOut:主要类型,按键或者触摸事件在特定时间内无响应(5s)
- BroadcastTimeOut(10s:BroadcastReceiver在特定的时间内无法完成处理
- ServiceTimeOut(20s 小概率事件):Service在特定的时间内无法完成处理
-
一些典型的ANR场景
- 最常见错误,UI线程在等待其他线程释放锁,导致UI线程无法处理用户输入事件
- 游戏中每帧动画都进行了大量的耗时计算,导致CPU负载
- Web应用中,网络状态不稳定,而界面在等待网络数据
- UI线程中进行了IO等耗时操作
- 手机被其他App占用着CPU,自己获取不到CPU片段
-
排查分析思路
-
通过ANR日志定位
当anr发生时,我们会通过logcat和traces文件的相关信息输出定位问题。
包含主要以下信息- 基本信息:包括进程名、进程号、包名、系统build号、ANR类型等
- CPU 使用信息,包括活跃进程的cpu 平均占用率、IO情况等
- 线程堆栈信息,所属进程包括发生ANR的进程、负进程、最近活动的三个进程等等
首先获取logcat异常信息,当anr产生时,logcat会打印出一段log,会输出类似下面的信息。
- 首先可以得到ANR所在进程的进程名、进程号及出错的控件
- 其中Reason主要描述了ANR产生的具体原因/分类
- CPU usage…ago主要记录了ANR发生前CPU的使用情况
- CPU usage…lager则记录了ANR发生之后CPU的使用情况
2019-05-31 14:58:36.207 1259-1355/? E/ANRManager: ANR in com.lazyqing.simulate (com.lazyqing.simulate/.MainActivity), time=70687906
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 8509.1ms.)
Load: 15.7 / 14.65 / 14.68
Android time :[2019-05-31 14:58:36.15] [70692.898]
CPU usage from 8339ms to 2313ms ago (2019-05-31 14:58:22.825 to 2019-05-31 14:58:28.851) with 99% awake:
16% 1259/system_server: 10% user + 5.9% kernel / faults: 241 minor 1 major
7.7% 4504/com.alibaba.android.rimet: 6.6% user + 1.1% kernel / faults: 241 minor
4.3% 17275/adbd: 1.4% user + 2.8% kernel / faults: 1098 minor
2.8% 359/surfaceflinger: 1.4% user + 1.3% kernel / faults: 24 minor
2.6% 620/akmd09911: 0.6% user + 1.9% kernel
2.1% 211/hps_main: 0% user + 2.1% kernel
1.9% 10243/com.amigo.keyguard:umengck: 1.4% user + 0.4% kernel / faults: 29 minor
1.6% 309/logd: 0.6% user + 0.9% kernel / faults: 4 minor
1.3% 11089/com.ss.android.ugc.aweme: 0.9% user + 0.3% kernel / faults: 9 minor
1.3% 15675/kworker/0:2: 0% user + 1.3% kernel
1.1% 17554/logcat: 0.4% user + 0.6% kernel / faults: 1 minor
0.8% 7146/kworker/0:0: 0% user + 0.8% kernel
0.8% 9872/com.android.systemui: 0.6% user + 0.1% kernel / faults: 98 minor
0.6% 176/teei_switch_thr: 0% user + 0.6% kernel
0.3% 4654/com.alibaba.android.rimet:channel: 0.1% user + 0.1% kernel / faults: 12 minor
0.6% 10049/com.amigo.keyguard:remote: 0.6% user + 0% kernel / faults: 31 minor
0.4% 8/rcu_sched: 0% user + 0.4% kernel
0.4% 2783/com.android.browser: 0.3% user + 0.1% kernel / faults: 4 minor
0.4% 13323/tx_thread: 0% user + 0.4% kernel
0.2% 13346/wpa_supplicant: 0.1% user + 0% kernel
0% 14984/air.tv.douyu.android:pushservice: 0% user + 0% kernel / faults: 97 minor
0.4% 15445/kworker/0:1: 0% user + 0.4% kernel
0.3% 69/dvfs_nfy: 0% user + 0.3% kernel
0.3% 249/mtk-tpd: 0% user + 0.3% kernel
0% 3/ksoftirqd/0: 0% user + 0% kernel
0.1% 7/rcu_preempt: 0% user + 0.1% kernel
0.1% 10/migration/0: 0% user + 0.1% kernel
0.1% 11/migration/1: 0% user + 0.1% kernel
0.1% 134/pbm: 0% user + 0.1% kernel
0% 214/exe_cq: 0% user + 0% kernel
0.1% 216/mmcqd/0: 0% user + 0.1% kernel
0.1% 357/mtkmal: 0.1% user + 0% kernel
0% 358/servicemanager: 0% user + 0% kernel
0% 463/mtk_agpsd: 0% user + 0% kernel
0% 1775/com.android.phone: 0% user + 0% kernel / faults: 3 minor
0% 2458/com.gionee.softmanager:remote: 0% user + 0% kernel / faults: 40 minor
0.1% 3732/com.tencent.mm: 0.1% user + 0% kernel / faults: 6 minor
0.1% 7467/kworker/u17:1: 0% user + 0.1% kernel
0% 10012/com.amigo.keyguard: 0% user + 0% kernel / faults: 27 minor
0.1% 11211/com.coolwind.weather:external: 0.1% user + 0% kernel / faults: 4 minor
0% 11578/com.ss.android.ugc.aweme:pushservice: 0% user + 0% kernel
0% 15840/kworker/u16:3: 0% user + 0% kernel
0% 16393/com.jianshu.haruki: 0% user + 0% kernel
0.1% 16658/kworker/u16:5: 0% user + 0.1% kernel
0% 19630/com.alibaba.android.rimet? 0% user + 0% kernel
0% 27419/com.tencent.mm:tools: 0% user + 0% kernel
0% 31630/com.gionee.cloud.gpe: 0% user + 0% kernel / faults: 23 minor
41% TOTAL: 21% user + 20% kernel + 0.2% softirq结合此部分信息进行分析可以初步得到基本原因以及排除是其他app引起的(如果发现ANR前后有某个进程占用大量的CPU,那么ANR产生的原因很肯是其他app引起的)。
ANR Traces
在知道了Cause reason之后,需要进一步的信息来确定ANR在代码中的位置,这个时候就要查看traces文件了,每次产生ANR后,系统都会向/data/anr/traces.txt中写入新的文件,内容如下:
- 介于 ----pid 10049 xxxx—与----end 10049 ----之间的为进程10049的所有线程堆栈信息。一般来说,发生ANR的进程信息会在文件头部
- “main” prio=5 tid=1 TIMED_WAIT分为线程名、线程优先级(默认值为5)、线程ID、线程状态;主线之后会接着打印进程中其他线程的信息
----- pid 10049 at 2019-05-31 13:56:42 -----
Cmd line: com.amigo.keyguard:remote
Build fingerprint: ‘GIONEE/S10/GIONEE_SW17G04:7.0/NRD90M/1521111777:user/release-keys’
ABI: ‘arm’
Build type: optimized
Zygote loaded classes=5807 post zygote classes=1573
Intern table: 50926 strong; 887 weak
JNI: CheckJNI is off; globals=580 (plus 82 weak)
Libraries: /system/app/WebViewGoogle/WebViewGoogle.apk!/lib/armeabi-v7a/libwebviewchromium.so /system/lib/libandroid.so /system/lib/libcompiler_rt.so /system/lib/libdcfdecoderjni.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/libwebviewchromium_loader.so /system/lib/libwebviewchromium_plat_support.so libjavacore.so libopenjdk.so (11)
Heap: 5% free, 7MB/8MB; 86497 objects
Dumping cumulative Gc timings
suspend all histogram: Sum: 108.372ms 99% C.I. 8us-3212.799us Avg: 172.019us Max: 7372us
DALVIK THREADS (78):
“Signal Catcher” daemon prio=5 tid=3 Runnable
| group=“system” sCount=0 dsCount=0 obj=0x2ac100d0 self=0xe63f1200
| sysTid=10055 nice=0 cgrp=default sched=0/0 handle=0xef2a0920
| state=R schedstat=( 62732383 1278694 30 ) utm=2 stm=3 core=0 HZ=100
| stack=0xef1a4000-0xef1a6000 stackSize=1014KB
| held mutexes= “mutator lock”(shared held)
native: #00 pc 0034f215 /system/lib/libart.so ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+128)
native: #01 pc 0032f981 /system/lib/libart.so ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+308)
native: #02 pc 00341a35 /system/lib/libart.so
----- end 10049 ----- -
线程状态介绍
-
Thread.java中定义的状态 | Thread.cpp定义的状态 | 说明 |
---|---|---|
TERMINATED | ZOMBIE | 线程死亡,终止运行 |
Runnable | Runnable/Running | 线程可运行或正在运行 |
TIMD_WATING | TIMED_WAIT | 执行了带有超时参数的sleep/wait或join函数 |
BLOCKED | MONITOR | 线程阻塞,等待获取对象锁 |
NEW | INITIALIZING | 新建,正在初始化,为其分配资源 |
NEW | STARTING | 新建,正在启动 |
RUNNABLE | NATIVE | 正在执行JNI本地函数 |
WAITING | VMWAIT正在等待VM资源 | |
RUNNABLE | SUSPENDED | 线程暂停,通常是由于GC或debug被暂停 |
无 | UNKNOWN | 未知状态 |