安卓ANR问题最全解析


前言

ANR即Application Not Responding(应用程序无响应),一般在ANR的时候会弹出一个应用无响应对话框,同时会候产生一个日志文件trace.txt,位于/data/anr/文件夹下面,trace文件是Android Davik虚拟机在收到异常终止信号时产生的,最常见的一个触发条件就是Android应用中产生了FC(force close)。由于该文件的产生是在DVM中的,所以只有运行DVM实例的进程才能产生该文件,也就是说只有Java代码才能产生该文件,App应用的Native层(如Android Library、用c/c++编译的库)即使异常也不会产生ANR日志文件。我们可以通过ANR产生的traces日志文件分析应用在哪里产生了ANR,以此来有效解决应用中的ANR。

一、ANR产生原因

只用当应用程序的UI线程响应超时才会引起ANR,超时原因一般有两种:

  • 1.当前的事件没有机会得到处理,例如UI线程正在响应另一个事件,当前事件由于某种原因被阻塞了。
  • 2.当前的事件正在被处理,但是由于好事太长没有能够及时完成

根据ANR产生的原因不同,超时时间也不尽相同,从本质上讲,产生ANR的场景有四种,大致可以对应到Android中的(Activity、BroadcastReceive、Service、InputDispatch)

第一种:InputDispatching Timeout :最常见的一种类型,原因是View事件或者触摸事件在特定时间内无法得到响应
第二种:Broadcast Timeout:BroadcastReceiver的onReceive函数运行在主线程,并在特定的时间内无法完成处理
第三种:Service Timeout:Service的各个生命周期函数在特定时间内无法完成响应
第四种:ContentProvider Timeout :ContentProvider的操作在特定时间内无法完成响应。

二、Timeout时长

  • 对于前台服务,则超时为SERVICE_TIMEOUT=20S;
  • 对于后台服务,则超时为SERVICE BACKGROUND TIMEOUT=200S·
  • 对于前台广播,则超时为BROADCAST FG TIMEOUT=10S;
  • 对于后台广播,则超时为BROADCAST BG TIMEOUT=60s;.
  • ContentProvider超时为CONTENT PROVIDER PUBLISH TIMEOUT = 10S:.
  • InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件。

注意事项: Input的超时机制与其他的不同,对于input来说即便某次事件执行时间超过timeout时长,只要用户后续在没有再生成输入事件,则不会触发ANR


三、典型的ANR场景

  • 1.主线程频繁进行IO操作,比如读写文件或者数据库;
  • 2.硬件操作如进行调用照相机或者录音等操作;
  • 3.多线程操作的死锁,导致主线程等待超时;
  • 4.主线程操作调用join()方法、sleep()方法或者wait()方法;
  • 5.耗时动画/耗资源行为导致CPU负载过重
  • 6.system server中发生WatchDog ANR;
  • 7.service binder的数量达到上限

四、超时检测机制

1.Service超时检测机制:

  • 超过一定时间没有执行完相应操作来触发移除延时消息,则会触发anr;

2.BroadcastReceiver超时检测机制:

  • 有序广播的总执行时间超过 2receiver个数timeout时长,则会触发anr;
  • 有序广播的某一个receiver执行过程超过 timeout时长,则会触发anr;

3.另外:

  • 对于Service, Broadcast, Input发生ANR之后,最终都会调用AMS.appNotResponding;
  • 对于provider,在其进程启动时publish过程可能会出现ANR,则会直接杀进程以及清理相应信息,而不会弹出ANR的对话框

五、前台与后台ANR

  • 前台ANR:用户能感知,比如拥有前台可见的activity的进程,或者拥有前台通知的fg-service的进程,此时发生ANR对用户体验影响比较大,需要弹框让用户决定是否退出还是等待
  • 后台ANR:只抓取发生无响应进程的trace,也不会收集CPU信息,并且会在后台直接杀掉该无响应的进程,不会弹框提示用户

六、ANR分析流程

1. 前台ANR发生后,系统会马上去抓取现场的信息,用于调试分析,收集的信息如下:

  • 将am anr信息输出到EventLog,也就是说ANR触发的时间点最接近的就是EventLog中输出的am anr信息
  • 收集以下重要进程的各个线程调用栈trace信息,保存在data/anr/traces.txt文件
    ①当前发生ANR的进程,system server进程以及所有persistent进程
    ②audioserver,cameraserver, mediaserver, surfaceflinger等重要的native进程
    ③CPU使用率排名前5的进程
  • 将发生ANR的reason以及CPU使用情况信息输出到main log。
  • 将traces文件和CPU使用情况信息保存到dropbox,即data/system/dropbox目录。
  • 对用户可感知的进程则弹出ANR对话框告知用户,对用户不可感知的进程发生ANR则直接杀掉

2. 分析步骤

1.定位发生ANR时间点
2.查看trace信息
3.分析是否有耗时的message,binder调用,锁的竞争,CPU资源的抢占
4.结合具体的业务场景的上下文来分析


七、trace.txt文件解读

1.人为的收集trace.txt的命令

adb shell ki11-3 888 //可指定进程pid

执行完该命令后traces信息的结果保存到文件/data/anr/traces.txt

2.trace文件解读

----- pid 888 at 2016-11-11 22:22:22 -----
Cmd line:system server
ABI:arm
Build type: optimized
Zygote loaded classes=4113 post zygote classes=3239
Intern table:57550 strong;9315 weak
JNI:CheckINI is off;globals=2418(plus 115 weak)
Libraries:/system/lib/libandroid.so /system/lib/libandroid_servers.so
/system/lib/libaudioeffect jni.so /system/lib/libcompiler rt.so /system/lib/libjavacrypto.so 
/system/lib/libjnigraphics.so /system/lib/libmedia jni.so /system/lib/librs_jni.so 
/system/lib/libsechook.so /system/lib/libshell_jni.so /system/lib/libsoundpool.so 
/system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so 
/vendor/lib/libalarmservice_jni.so /vendor/lib/liblocationservice.so libjavacore.so (16)
//已分配堆内存大小40MB,其中29M已用,总分配207772个对象
Heap:27%free,29MB/40MB;307772 objects
·.· //省略GC相关信息
//当前进程总99个线程
DALVIK THREADS (99):
//主线程调用栈
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x75bd9fb0 self=0x5573d4f770
| sysTid=12078 nice=-2 cgrp=default sched=0/0 handle=0x7fa75fafe8
| state=S schedstat=(5907843636 827600677 5112)utm=453 stm=137 core=0 HZ=100
| stack=0x7fd64ef000-0x7fd64f1000 stackSize=8MB
| held mutexes=
//内核栈
kernel:_switch_to+0x70/0x7c
Kernel:SyS_epoll_wait+0x2a0/0x324
kernel:SyS epoll_pwait+0xa4/0x120
kernel:cpu_switch_to+x48/0x4c
native:#00 pc 0000080000869be4 /system/lib64/libc.so(_epoll_pwait+8)
native:#01 pc 008000000001cca4 /system/lib64/libc.so(epoll_pwait+32)
native:#82 pc 888808008881ad74 /system/lib64/libutils.so(_ZN7android6Looper9pollInnerEi+144)
native:#03 pc 000000000001b154 /system/lib64/libutils.so
(_ZN7android6Looper8poll0nceEiPiS1 PPv+80)
native: #04 pc 00000000000d4bc0 /system/lib64/libandroid_runtime.so
(_ZN7android18NativeMessageQueue8poll0nceEP7 INIEnvP8 jobjecti+48)
native:#05 pc 000000000000082c /data/dalvik-cache/arm64/system@framework@boot.oat
(Java android os MessageQueue nativePollOnce_JI+144)
at android.os.MessageQueue.nativePoll0nce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:323)
at android.os.Looper.loop(Looper.java:135)
at com.android.server.SystemServer.run(SystemServer.java:290)
at com.android.server.SystemServer.main(SystemServer.java:175)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(zygoteInit.java:738)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:628)

"Binder 1"prio=5 tid=8 Native
| group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
| sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
| state=S schedstat=(796240075 863170759 3586)utm=50 stm=29 core=1 HZ=100
| stack=0x7fa2647000-0x7fa2649000stackSize=1013KB
| held mutexes=
kernel:__switch_to+0x70/0x7c
kernel:binder_thread_read+0xd78/0xeb0
kernel:binder_ioctl_write_read+0x178/0x24c
kernel:binder_ioctl+0x2b0/0x5e0
kernel:do_vfs_ioct1+0x4a4/0x578
kernel:SyS_ioctl+x5c/0x88
kernel:cpu_switch_to+x48/0x4c
native:#00 pc 000000000069cd0/system/lib64/libc.so(__ioctl+4)
native:#01 pc 0000000000073cf4 /system/lib64/libc.so(ioctl+100)
native:#02 pc 000000000002d6e8 /system/lib64/libbinder.so
(_ZN7android14IPcThreadstate14talkWithDriverEb+164)
native:#03 pc 800000000002df3c /system/lib64/libbinder.so
(_ZN7android14IPCThreadstate20getAndExecuteCommandEv+24)
native:#04 pc 800000000002e114/system/lib64/libbinder.so
(_ZN7android14IPcThreadState14ioinThreadPoolEb+124)
native:#05 pc 800000800036c38 /system/lib64/libbinder.so(???)
native:#06 pc 0000000001579c/system/lib64/libutils.so
(_ZN7android6Thread11threadLoopEPv+208)
native:#07 pc 0000000000090598 /system/lib64/libandroid_runtime.so
(_ZN7android14AndroidRuntime15javaThreadShellEPv+96)
native:#08 pc 0000000000014fec /system/lib64/libutils.so(???)
native:#09 pc 8888000000067754 /system/lib64/libc.so(_ZL15_pthread_startPv+52)
native:#10 pc 000000000001c644 /system/lib64/libc.so(_start_thread+16)
(no managed stack frames)
.·//此处省略剩余的N个线程

3.trace参数解读

"Binder 1"prio=5 tid=8 Native
| group="main" sCount=1 dsCount=0 obi=0x12c610a0 self=0x5573e5c750
| sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
| state=S schedstat=(796240075 863170759 3586)utm=50 stm=29 core=1 HZ=100
| stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
| held mutexes=
  • 第0行:
    线程名: Binder_1(如有daemon则代表守护线程)
    prio:线程优先级,优先级取值范围[1,10],详见Thread类:
    tid: 线程内部id
    线程状态: NATIVE。状态取值如下,详见Thread.State枚举类:

线程优先级

//最小取值
public final static int MIN_PRIORITY = 1;
//默认优先级
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;

线程状态

NEW, //线程还没启动

RUNNABLE, //正在执行

BLOCKED, //等待获取锁

WAITING, //等待其他线程执行一个特定的动作,比如说调用Object.notify或Object.notifyAll()

TIMED_WAITING, //等待一定时间

TERMINATED //执行完毕
  • 第1行:
    group:线程所属的线程组
    sCount: 线程挂起次数
    dsCount: 用于调试的线程挂起次数
    obj: 当前线程关联的java线程对象
    self: 当前线程地址
  • 第2行:
    sysTid:线程真正意义上的tid
    nice: 调度有优先级
    cgrp:进程所属的进程调度组
    sched: 调度策略
    handle:函数处理地址
  • 第3行:
    state: 线程状态
    schedstat:CPU调度时间统计
    utm/stm: 用户态/内核态的CPU时间(单位是jiffies)
    core: 该线程的最后运行所在核
    HZ: 时钟频率
  • 第4行:
    stack:线程栈的地址区间
    stackSize:栈的大小
  • 第5行:
    mutex:所持有mutex类型,有独占锁exclusive和共享锁shared两类
  • schedstat含义说明:
    nice值越小则优先级越高。此处nice=-2,可见优先级还是比较高的;
    schedstat括号中的3个数字依次是Running、Runable、Switch,紧接着的是utm和stm
    Running时间:CPU运行的时间,单位ns
    Runable时间:RQ队列的等待时间,单位ns
    Switch次数:CPU调度切换次数
    utm:该线程在用户态所执行的时间,单位是jiffies,jiffies定义为sysconf(_SC_CLK_TCK),默认等于10ms
    stm:该线程在内核态所执行的时间,单位是jiffies,默认等于10ms

可见,该线程Running=796240075 ns,也约等于796ms。在CPU运行时间包括用户态(utm)和内核态(stm)。utm+stm=(50+29)x10 ms=790ms。

结论:utm+stm=schedstat 第一个参数值


八、ANR定位分析实例

8.1 Logcat分析

查询关键字(ANR|ActivityManager)

/system_process E/ActivityManager:
ANR in com.anrdemo (com.anrdemo/.MainActivity)
	PID: 12639
	Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5841.3ms.)
	Load: 13.28 / 11.53 / 9.48
	CPU usage from 71228ms to 0ms ago (1970-03-01 00:08:47.864 to 1970-03-01 00:09:59.093):
	15% 920/system_server: 9.2% user + 5.9% kernel / faults: 10139 minor 285 major
	8.2% 13985/com.tencent.mm: 6.5% user + 1.6% kernel / faults: 35481 minor 1879 major
	0.1% 555/fingerprintd: 0% user + 0.1% kernel / faults: 51 minor 22 major
	3.5% 266/mmcqd/0: 0% user + 3.5% kernel
	3.2% 4424/com.tencent.android.qqdownloader:daemon: 1.7% user + 1.5% kernel / faults: 12007 minor 357 major
	3.2% 430/adbd: 0.4% user + 2.7% kernel / faults: 2002 minor
	3.1% 135/kswapd0: 0% user + 3.1% kernel
	1.8% 12271/com.tencent.mobileqq: 1.2% user + 0.5% kernel / faults: 8799 minor 580 major
	1.6% 11695/com.cleanmaster.mguard: 1.5% user + 0.1% kernel / faults: 9171 minor 72 major
	1.3% 13039/com.tencent.mm:push: 0.9% user + 0.3% kernel / faults: 9969 minor 192 major
	1.2% 4954/com.google.android.gms: 0.9% user + 0.2% kernel / faults: 7546 minor 209 major
	1.1% 342/logd: 0.3% user + 0.8% kernel / faults: 628 minor 3 major
	1.1% 1485/kworker/u13:3: 0% user + 1.1% kernel
	1.1% 10/rcu_preempt: 0% user + 1.1% kernel
	1.1% 12166/com.android.vending: 0.8% user + 0.3% kernel / faults: 9897 minor 380 major
	1% 3846/com.tencent.qqmusic:QQPlayerService: 0.6% user + 0.4% kernel / faults: 7354 minor 485 major
	1% 10486/kworker/u13:4: 0% user + 1% kernel
	1% 11251/kworker/u13:1: 0% user + 1% kernel
	1% 9498/com.cleanmaster.security:DefendService: 0.7% user + 0.3% kernel / faults: 11998 minor 287 major
	0.9% 1973/kworker/u13:5: 0% user + 0.9% kernel
	0.9% 14113/com.tencent.mm:exdevice: 0.7% user + 0.1% kernel / faults: 8802 minor 142 major
	0.8% 10814/com.tencent.qqmusic: 0.4% user + 0.4% kernel / faults: 9823 minor 221 major
	0.8% 2062/sogou.mobile.explorer: 0.5% user + 0.3% kernel / faults: 8532 minor 1288 major
	0.8% 528/netd: 0.1% user + 0.7% kernel / faults: 4203 minor 35 major
	0.8% 3925/com.google.android.gms.persistent: 0.4% user + 0.3% kernel / faults: 12578 minor 546 major
	0.7% 556/cnss_diag: 0.6% user + 0% kernel / faults: 121 minor 5 major
	0.6% 4971/com.cleanmaster.mguard:service: 0.4% user + 0.2% kernel / faults: 7468 minor 174 major
	0.5% 9180/com.tencent.android.qqdownloader: 0.2% user + 0.3% kernel / faults: 9571 minor 216 major
	0.5% 15/ksoftirqd/1: 0% user + 0.5% kernel
	0.4% 3280/com.android.phone: 0.2% user + 0.2% kernel / faults: 1857 minor 133 major
	0.4% 8489/kworker/0:1: 0% user + 0.4% kernel
	0.4% 3127/sdcard: 0% user + 0.4% kernel / faults: 92 minor 17 major
	0.4% 397/servicemanager: 0.1% user + 0.2% kernel / faults: 127 minor
	0.4% 11878/swift.space.cleaner.free:service: 0.2% user + 0.1% kernel / faults: 6087 minor 140 major
	0.4% 20/ksoftirqd/2: 0% user + 0.4% kernel
	0.4% 8761/com.netease.cloudmusic:play: 0.3% user + 0% kernel / faults: 4819 minor 977 major
	0.3% 6525/com.tencent.qqlive:services: 0.2% user + 0.1% kernel / faults: 7834 minor 483 major
	0.3% 12250/com.tencent.mobileqq:MSF: 0.2% user + 0.1% kernel / faults: 5368 minor 114 major
	0.3% 9717/com.speed.boost.booster:service: 0.1% user + 0.1% kernel / faults: 5526 minor 77 major
	0.3% 8524/sogou.mobile.explorer:push_service: 0.2% user + 0.1% kernel / faults: 6280 minor 42 major
	0.3% 427/irq/215-fc38800: 0% user + 0.3% kernel
	0.3% 8721/kworker/1:2: 0% user + 0.3% kernel
	0.3% 260/cfinteractive: 0% user + 0.3% kernel
	0.3% 294/msm-core:sampli: 0% user + 0.3% kernel
	0.1% 8756/kworker/u12:3: 0% user + 0.1% kernel
	0.1% 11204/kworker/2:0: 0% user + 0.1% kernel
	0.3% 3150/com.android.systemui: 0.1% user + 0.1% kernel / faults: 3318 minor 129 major
	0.2% 7244/kworker/u12:2: 0% user + 0.2% kernel
	0% 3084/VosMCThread: 0% user + 0% kerne

可以看到,Logcat日志信息中主要包含如下内容:

1)导致ANR的类名及所在包名:
com.anrdemo (com.anrdemo/.MainActivity)
2)导致ANR的进程名及ID:com.anrdemo,12639
3)ANR产生的原因(类型):Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5841.3ms.),明显属于KeyDispatchTimeout类型。
4)系统中CPU使用率的统计信息及某段时间内的变换情况
从trace.txt文件分析

ANR Logcat信息可以帮助我们分析引发ANR的具体类的信息以及ANR的类型,但是却无法帮助我们定位到具体引发问题的代码行,这时候就需要借助ANR发生过程中生成的堆栈信息文件data/anr/trace.txt

获取trace文件:

adb pull /data/anr/trace.txt

8.2 trace.txt文件分析

由于trace文件记录的是整个手机的ANR日志,所以我们需要根据进程名(包名)和ANR发生的时间找到对应日志,具体以pid 进程id开始,以pid进程id结束。由于阻塞始终会发生在主线程,因此我们需要关注线程名为main的线程状态。下面摘出一部分关键日志进行分析:

//关键日志的起始标记
----- pid 20678 at 2024-05-13 21:58:59 -----
//应用程序包名
Cmd line: com.anrdemo
Build fingerprint: 'Android/aosp_bullhead/bullhead:7.1.2/N2G48C/han-li03221443:userdebug/test-keys'

//手机的CPU架构
ABI: 'arm64'

//堆内存信息
Heap: 33% free, 4MB/6MB; 27144 objects
Dumping cumulative Gc timings

Total time spent in GC: 10.680ms
Mean GC size throughput: 68MB/s
Mean GC object throughput: 1.32996e+06 objects/s
Total number of allocations 41348
Total bytes allocated 5MB
Total bytes freed 753KB

//手机Memory内存信息
Free memory 2MB
Free memory until GC 2MB
Free memory until OOME 187MB
Total memory 6MB
Max memory 192MB
Zygote space size 1128KB
Total mutator paused time: 463us
Total time waiting for GC to complete: 938ns
Total GC count: 1
Total GC time: 10.680ms
Total blocking GC count: 0
Total blocking GC time: 0

//主线程日志分析
//基本信息:main-线程名称,prio-线程优先级,tid-线程id,Sleeping-线程状态
"main" prio=5 tid=1 Sleeping
  //详细信息:group-线程组名称,sCount-线程被挂起的次数,dsCount-线程被调试器刮起的次数,obj-线程的Java对象地址,self-线程本身的Native对象地址
  | group="main" sCount=1 dsCount=0 obj=0x75190ed0 self=0x7f72095a00
  //线程的调度信息:sysTid-linux系统中内核线程id(观察发现主线程的线程号和进程号相同),nice-线程调度的优先级,cgrp-线程调度组,sched-线程调度策略和优先级,handle-线程处理函数地址
  | sysTid=20678 nice=-10 cgrp=default sched=0/0 handle=0x7f75fe0a98
  //上下文信息:state-线程调度状态,schedstat-线程在CPU中的执行时间、线程等待时间、线程执行的时间片长度,utm-线程在用户状态中调度的时间值,core-最后执行这个线程的CPU核序号
  | state=S schedstat=( 274637713 30817705 229 ) utm=20 stm=6 core=1 HZ=100
  //堆栈信息:stack-堆栈地址,stackSize-堆栈大小,余下的为堆栈信息,也是我们分析引发ANR问题的关键
  | stack=0x7fc8318000-0x7fc831a000 stackSize=8MB
  | held mutexes=
  at java.lang.Thread.sleep!(Native method)
  - sleeping on <0x02f69763> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:371)
  - locked <0x02f69763> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:313)
  at com.anrdemo.MainActivity$1.onClick(MainActivity.java:24)
  at android.view.View.performClick(View.java:5637)
  at android.view.View$PerformClick.run(View.java:22429)
  at android.os.Handler.handleCallback(Handler.java:751)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:154)
  at android.app.ActivityThread.main(ActivityThread.java:6121)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:889)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:779)

"Jit thread pool worker thread 0" daemon prio=5 tid=2 Native
  | group="main" sCount=1 dsCount=0 obj=0x12c64790 self=0x7f6a635000
  | sysTid=20683 nice=9 cgrp=default sched=0/0 handle=0x7f71701450
  | state=S schedstat=( 696978 5677 2 ) utm=0 stm=0 core=2 HZ=100
  | stack=0x7f71603000-0x7f71605000 stackSize=1021KB
  | held mutexes=
  kernel: __switch_to+0x8c/0x98
  kernel: futex_wait_queue_me+0xd4/0x130
  kernel: futex_wait+0xfc/0x210
  kernel: do_futex+0xe0/0x920
  kernel: SyS_futex+0x11c/0x1b0
  kernel: cpu_switch_to+0x48/0x4c
  native: #00 pc 000000000001bcec  /system/lib64/libc.so (syscall+28)
  native: #01 pc 00000000000e7e4c  /system/lib64/libart.so (_ZN3art17ConditionVariable16WaitHoldingLocksEPNS_6ThreadE+156)
  native: #02 pc 000000000046c910  /system/lib64/libart.so (_ZN3art10ThreadPool7GetTaskEPNS_6ThreadE+248)
  native: #03 pc 000000000046bdac  /system/lib64/libart.so (_ZN3art16ThreadPoolWorker3RunEv+124)
  native: #04 pc 000000000046b6d0  /system/lib64/libart.so (_ZN3art16ThreadPoolWorker8CallbackEPv+132)
  native: #05 pc 0000000000068734  /system/lib64/libc.so (_ZL15__pthread_startPv+208)
  native: #06 pc 000000000001da7c  /system/lib64/libc.so (__start_thread+16)
  (no managed stack frames)

//关键日志的结束标记
----- end 20678 -----

由于ANR只会发生在主线程,所以我们需要关注主线程的状态,从上 面日志中分析可以知道:发生ANR时主线程的状态为Sleep,且引起该状态的地方在MainActivity$1.onClick(MainActivity.java:24)


九、ANR源码分析

下面分析Service内onCreate发生ANR异常时源代码的执行情况

大家都知道Android中的四大组件启动时,都会通过跨进程的方式利用Handler消息机制最终将消息Push到ActivityThread中的内部类去处理,因此我先在ActivityThread中搜索service.onCreate调用,然后依次追溯:

ActivityThread中调用:

  private void handleCreateService(CreateServiceData data) {
            //调用service的onCreate
            service.onCreate();
    }

该方法由Activity的内部类H(继承与Handler)接收到Servic onCreate消息时调用,该消息由通过scheduleCreateService发出,该方法被ActiveService类调用

        public final void scheduleCreateService(IBinder token,
                ServiceInfo info, CompatibilityInfo compatInfo, int processState) {
            updateProcessState(processState, false);
            CreateServiceData s = new CreateServiceData();
            s.token = token;
            s.info = info;
            s.compatInfo = compatInfo;
            sendMessage(H.CREATE_SERVICE, s);
        }

ActiveServicde的realStartServiceLocked方法调用上面方法,这个方法比较关键需要认真分析:

private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {

        r.app = app;
        r.restartTime = r.lastActivity = SystemClock.uptimeMillis();

        final boolean newService = app.services.add(r);
        //发送delay消息(SERVICE_TIMEOUT_MSG)
        bumpServiceExecutingLocked(r, execInFg, "create");
        mAm.updateLruProcessLocked(app, false, null);
        updateServiceForegroundLocked(r.app, /* oomAdj= */ false);
        mAm.updateOomAdjLocked();

        boolean created = false;
        try {
            synchronized (r.stats.getBatteryStats()) {
                r.stats.startLaunchedLocked();
            }
            mAm.notifyPackageUse(r.serviceInfo.packageName,
                                 PackageManager.NOTIFY_PACKAGE_USE_SERVICE);
            app.forceProcessStateUpTo(ActivityManager.PROCESS_STATE_SERVICE);
            //最终执行服务的onCreate()方法
            app.thread.scheduleCreateService(r, r.serviceInfo,
                    mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                    app.repProcState);
            r.postNotification();
            created = true;
        } catch (DeadObjectException e) {
            Slog.w(TAG, "Application dead when creating service " + r);
            mAm.appDiedLocked(app);
            throw e;
        } finally {
            if (!created) {
                // Keep the executeNesting count accurate.
                final boolean inDestroying = mDestroyingServices.contains(r);
                //当service启动完毕,则remove SERVICE_TIMEOUT_MSG消息
                serviceDoneExecutingLocked(r, inDestroying, inDestroying);

                // Cleanup.
                if (newService) {
                    app.services.remove(r);
                    r.app = null;
                }

                // Retry.
                if (!inDestroying) {
                    scheduleServiceRestartLocked(r, false);
                }
            }
        }

        if (r.whitelistManager) {
            app.whitelistManager = true;
        }

        requestServiceBindingsLocked(r, execInFg);

        updateServiceClientActivitiesLocked(app, null, true);

        // If the service is in the started state, and there are no
        // pending arguments, then fake up one so its onStartCommand() will
        // be called.
        if (r.startRequested && r.callStart && r.pendingStarts.size() == 0) {
            r.pendingStarts.add(new ServiceRecord.StartItem(r, false, r.makeNextStartId(),
                    null, null, 0));
        }

        sendServiceArgsLocked(r, execInFg, true);
    }

bumpServiceExecutingLocked方法内又调用了scheduleServiceTimeoutLocked方法:

   void scheduleServiceForegroundTransitionTimeoutLocked(ServiceRecord r) {
        if (r.app.executingServices.size() == 0 || r.app.thread == null) {
            return;
        }
        //指定id为SERVICE_TIMEOUT_MSG的消息
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG);
        msg.obj = r;
        r.fgWaiting = true;
        //前台Service和后台Service发送的Delay消息时间不同
        mAm.mHandler.sendMessageDelayed(msg, SERVICE_START_FOREGROUND_TIMEOUT);
    }

serviceDoneExecutingLocked方法

   private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
            boolean finishing) {
        r.executeNesting--;
        if (r.executeNesting <= 0) {
            if (r.app != null) {
                if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
                        "Nesting at 0 of " + r.shortName);
                r.app.execServicesFg = false;
                r.app.executingServices.remove(r);
                if (r.app.executingServices.size() == 0) {
                    if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
                            "No more executingServices of " + r.shortName);
                   //remove掉id为SERVICE_TIMEOUT_MSG的消息,从上面的分析可知,该方法是onCreate调用之前发出的一个Delay执行的消息
                   mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
                } else if (r.executeFg) {
                    // Need to re-evaluate whether the app still needs to be in the foreground.
                    for (int i=r.app.executingServices.size()-1; i>=0; i--) {
                        if (r.app.executingServices.valueAt(i).executeFg) {
                            r.app.execServicesFg = true;
                            break;
                        }
                    }
                }
                if (inDestroying) {
                    mDestroyingServices.remove(r);
                    r.bindings.clear();
                }
                mAm.updateOomAdjLocked(r.app, true);
            }
            r.executeFg = false;
            if (r.tracker != null) {
                r.tracker.setExecuting(false, mAm.mProcessStats.getMemFactorLocked(),
                        SystemClock.uptimeMillis());
                if (finishing) {
                    r.tracker.clearCurrentOwner(r, false);
                    r.tracker = null;
                }
            }
            if (finishing) {
                if (r.app != null && !r.app.persistent) {
                    r.app.services.remove(r);
                    if (r.whitelistManager) {
                        updateWhitelistManagerLocked(r.app);
                    }
                }
                r.app = null;
            }
        }
    }

小结:

Service启动调用onCreate方法之前send一个Delay执行的消息,当发生异常或者Service的onCreate方法执行完毕之后,remove掉之前send的消息,如果onCreate方法执行时间超过Delay的时间,那么该消息就会被处理,此时如果Delay的时间是我们设定的ANR时间,则发生ANR,系统作出处理,否则该消息被remove,不会被执行。以一种观察者模式的角度去实现。

id为SERVICE_TIMEOUT_MSG的消息被AMS中MainHandler接收

           case SERVICE_TIMEOUT_MSG: {
                if (mDidDexOpt) {
                    mDidDexOpt = false;
                    Message nmsg = mHandler.obtainMessage(SERVICE_TIMEOUT_MSG);
                    nmsg.obj = msg.obj;
                    mHandler.sendMessageDelayed(nmsg, ActiveServices.SERVICE_TIMEOUT);
                    return;
                }
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;

ActiveService的serviceTimeout方法中调用AppErrors中的appNotResponding方法,很明显是ANR异常处理的方法:

 final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
        SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);

        if (mService.mController != null) {
            try {
                // 0 == continue, -1 = kill process immediately
                int res = mService.mController.appEarlyNotResponding(
                        app.processName, app.pid, annotation);
                if (res < 0 && app.pid != MY_PID) {
                    app.kill("anr", true);
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }

        long anrTime = SystemClock.uptimeMillis();
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
        }

        // Unless configured otherwise, swallow ANRs in background processes & kill the process.
        boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
                Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;

        boolean isSilentANR;

        synchronized (mService) {
            // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
            if (mService.mShuttingDown) {
                Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.notResponding) {
                Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
                return;
            } else if (app.crashing) {
                Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.killedByAm) {
                Slog.i(TAG, "App already killed by AM skipping ANR: " + app + " " + annotation);
                return;
            } else if (app.killed) {
                Slog.i(TAG, "Skipping died app ANR: " + app + " " + annotation);
                return;
            }

            // In case we come through here for the same app before completing
            // this one, mark as anring now so we will bail out.
            app.notResponding = true;

            // Log the ANR to the event log.
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);

            // Dump thread traces as quickly as we can, starting with "interesting" processes.
            firstPids.add(app.pid);

            // Don't dump other PIDs if it's a background ANR
            isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
            if (!isSilentANR) {
                int parentPid = app.pid;
                if (parent != null && parent.app != null && parent.app.pid > 0) {
                    parentPid = parent.app.pid;
                }
                if (parentPid != app.pid) firstPids.add(parentPid);

                if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

                for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
                    ProcessRecord r = mService.mLruProcesses.get(i);
                    if (r != null && r.thread != null) {
                        int pid = r.pid;
                        if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                            if (r.persistent) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                            } else if (r.treatLikeActivity) {
                                firstPids.add(pid);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                            } else {
                                lastPids.put(pid, Boolean.TRUE);
                                if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                            }
                        }
                    }
                }
            }
        }

        //输出ANR异常Log

        // Log the ANR to the main log.
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        //ANR发生时的进程Name
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        //进程ID
        info.append("PID: ").append(app.pid).append("\n");
        //ANR发生的原因
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }

        ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        // don't dump native PIDs for background ANRs unless it is the process of interest
        String[] nativeProcs = null;
        if (isSilentANR) {
            for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
                if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
                    nativeProcs = new String[] { app.processName };
                    break;
                }
            }
        } else {
            nativeProcs = NATIVE_STACKS_OF_INTEREST;
        }

        int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
        ArrayList<Integer> nativePids = null;

        if (pids != null) {
            nativePids = new ArrayList<Integer>(pids.length);
            for (int i : pids) {
                nativePids.add(i);
            }
        }

        // For background ANRs, don't pass the ProcessCpuTracker to
        // avoid spending 1/2 second collecting stats to rank lastPids.
        //dump栈信息
        File tracesFile = mService.dumpStackTraces(true, firstPids,
                                                   (isSilentANR) ? null : processCpuTracker,
                                                   (isSilentANR) ? null : lastPids,
                                                   nativePids);

        String cpuInfo = null;
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                //各进程使用CPU情况
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            //CPU当前负载
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        info.append(processCpuTracker.printCurrentState(anrTime));

        Slog.e(TAG, info.toString());
        if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit's threads to the log
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }
        //将anr信息添加到dropbox
        mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
                cpuInfo, tracesFile, null);

        if (mService.mController != null) {
            try {
                // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
                int res = mService.mController.appNotResponding(
                        app.processName, app.pid, info.toString());
                if (res != 0) {
                    if (res < 0 && app.pid != MY_PID) {
                        app.kill("anr", true);
                    } else {
                        synchronized (mService) {
                            mService.mServices.scheduleServiceTimeoutLocked(app);
                        }
                    }
                    return;
                }
            } catch (RemoteException e) {
                mService.mController = null;
                Watchdog.getInstance().setActivityController(null);
            }
        }

        synchronized (mService) {
            mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);

            if (isSilentANR) {
                app.kill("bg anr", true);
                return;
            }

            // Set the app's notResponding state, and look up the errorReportReceiver
            makeAppNotRespondingLocked(app,
                    activity != null ? activity.shortComponentName : null,
                    annotation != null ? "ANR " + annotation : "ANR",
                    info.toString());

            通过Handler消息机制弹出ANR对话框
            // Bring up the infamous App Not Responding dialog
            Message msg = Message.obtain();
            HashMap<String, Object> map = new HashMap<String, Object>();
            msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
            msg.obj = map;
            msg.arg1 = aboveSystem ? 1 : 0;
            map.put("app", app);
            if (activity != null) {
                map.put("activity", activity);
            }

            mService.mUiHandler.sendMessage(msg);
        }
    }

Android中的ANR监测以Handler消息机制实现,使用观察者模式,当开始监测时注册消息(该消息在规定时间后执行),事件处理完之后移除消息,当该消息执行时说明事件处理超过了规定的时间,即ANR。


十、ANR实例及分析

10.1 应用在主线程上进行长时间的计算

//使用冒泡排序对一个大数组排序
private fun sortBigArray() {
    val currTime = System.currentTimeMillis()
    val random = IntArray(1000000)
    for (i in random.indices) {
        random[i] = (Math.random() * 10000000).toInt()
    }
    BubbleSort.sort(random)
    println("耗时" + (System.currentTimeMillis() - currTime) + "ms")
    for (i in random.indices) {
        println(random[i].toString())
    }
}

点击一个按钮调用sortBigArray()方法,内部调用BubbleSort类的sort()方法对一个大数组(100万)进行排序,然后点击几次返回键,然后就出现ANR了。

看一下Logcat日志输出

//debug级别日志
2020-06-03 21:20:24.209 com.example.android.jetpackdemo I/art: Wrote stack traces to '/data/anr/traces.txt'

//error级别日志
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.7 / 7.48 / 7.35
    CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
      0.7% 2661/com.android.systemui: 0.6% user + 0.1% kernel / faults: 1648 minor 1 major
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 21 minor
      0.7% 24463/com.huawei.hwid.persistent: 0.6% user + 0% kernel / faults: 4650 minor 1 major
      0.5% 4018/com.huawei.android.launcher: 0.4% user + 0% kernel / faults: 16025 minor 3 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.4% 28932/com.huawei.appmarket: 0.3% user + 0% kernel / faults: 2526 minor
     //...
   2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
        1.8% 2862/Binder:2001_6: 0% user + 1.8% kernel
        1.8% 3089/Binder:2001_7: 1.8% user + 0% kernel
      5.3% 29428/adbd: 0% user + 5.3% kernel / faults: 480 minor
        3.5% 29430/->transport: 0% user + 3.5% kernel
        1.7% 29428/adbd: 0% user + 1.7% kernel
      1.3% 53/rcuop/6: 0% user + 1.3% kernel
    16% TOTAL: 14% user + 2.1% kernel + 0.2% irq + 0.2% softirq

上面的日志中输出了堆栈信息的保存在 /data/anr/traces.txt文件中

com.example.android.jetpackdemo I/dalvikvm: Wrote stack traces to '/data/anr/traces.txt'

发生ANR进程的包名信息,所在的类,进程id和ANR的类型

2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)  

包名com.example.android.jetpackdemo,
具体的类com.example.android.jetpackdemo.StartActivity,
进程号是PID: 15564,
ANR的类型是Input dispatching timed out。

 CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
//...

2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
//...

注意:
在ANR发生之前,2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139,这段时间CPU的使用并不高。

在ANR发生的时候,2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389,这段时间CPU的使用相当高,已经达到99%了。

99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel

99%:内存占用率
15564/com.example.android.jetpackdemo:进程id和进程名。

这两段CPU 信息分别代表ANR发生前和ANR时的CPU占用率,在输出的CPU使用信息中我们也可以看出一些端倪,注意到进程CPU的占用率比较高,说明我们的进程比较忙碌,这里需要说明一下,进程忙碌并不一定代表主线程忙碌,也可能是进程中的后台线程忙碌。

现在知道了ANR发生的所在的类,需要分析发生ANR的时候保存的traces.txt文件了。

导出traces文件
使用adb命令导出traces.txt文件

adb pull /data/anr/traces.txt traces_1.txt 
/data/anr/traces.txt: 1 file pulled, 0 skipped. 28.5 MB/s (701726 bytes in 0.023s)

traces.txt部分信息

----- pid 15564 at 2020-06-03 21:20:24 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
ABI: 'arm64'
//...

在traces.txt文件的最顶部,首先输出的是发生ANR的进程号和包名信息,然后在traces.txt中搜索我们的进程号或者包名。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
  at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
//...

看一下和线程相关的部分信息。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)

线程基本信息:
线程名:main
线程优先级:prio=5,优先级取值范围[1,10],详见Thread类:

//最小取值
public final static int MIN_PRIORITY = 1;
//默认优先级
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;

线程id: tid=1,1代表主线程
线程状态:Runnable,状态取值如下,详见Thread.State枚举类:

NEW, //线程还没启动

RUNNABLE, //正在执行

BLOCKED, //等待获取锁

WAITING, //等待其他线程执行一个特定的动作,比如说调用Object.notify或Object.notifyAll()

TIMED_WAITING, //等待一定时间

TERMINATED //执行完毕

线程组名称:group=“main”
线程被挂起的次数:sCount=0
线程被调试器挂起的次数:dsCount=0
线程的java的对象地址:obj= 0x77d21af8
线程本身的Native对象地址:self= 0x7fa2ea2a00

线程调度信息:
Linux系统中内核线程id: sysTid= 15564 与进程号相同
线程调度优先级:nice=-10
线程调度组:cgrp=default
线程调度策略和优先级:sched=0/0
线程处理函数地址:handle= 0x7fa6f4ba98

线程的堆栈信息:
堆栈地址和大小:stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
held mutexes:持有互斥锁信息

从上面traces.txt文件中这段信息可以看出,导致ANR的最终原因是在BubbleSort.java的第45行。

 at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
 at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
 at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
 at java.lang.reflect.Method.invoke!(Native method)

在这里插入图片描述

10.2 应用在主线程上执行耗时的I/O的操作

/**
 * 拷贝文件,注意要有读写权限
 */
private fun doIo() {
    val prePath = Environment.getExternalStorageDirectory().path
    val file = File("${prePath}/test/View.java")
    if (file.exists()) {
        Log.d(TAG, "doIo: ${file.length()}")

        val reader = FileReader(file)
        val fileWriter = FileWriter("${prePath}/test/ViewCopy.java", true)

        for (index in 0 until 5) {
            var count: Int
            while (reader.read().also { count = it } != -1) {
                fileWriter.write(count)
            }
            try {
                reader.reset()
            } catch (e: IOException) {
                Log.d(TAG, "doIo: error ${e.message}")
            }
        }
    }
}

调用doIo()方法以后,多次点击返回键,制造ANR。

Logcat日志输出

2020-06-04 21:05:24.462 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 16295
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.49 / 7.45 / 7.24
    CPU usage from 87491ms to 0ms ago (2020-06-04 21:03:53.035 to 2020-06-04 21:05:20.526):
      7.8% 2001/system_server: 6.1% user + 1.7% kernel / faults: 34095 minor 3 major
      4.2% 28932/com.huawei.appmarket: 3.7% user + 0.5% kernel / faults: 12314 minor 5 major
      2.8% 2661/com.android.systemui: 2.2% user + 0.5% kernel / faults: 4222 minor 1 major
      2% 412/msm-core:sampli: 0% user + 2% kernel
      1.7% 24463/com.huawei.hwid.persistent: 1.5% user + 0.1% kernel / faults: 3317 minor 1 major
      1.5% 607/surfaceflinger: 1% user + 0.5% kernel / faults: 24 minor
     //...    
2020-06-04 21:05:24.462 ? E/ActivityManager: CPU usage from 1696ms to 2226ms later (2020-06-04 21:05:22.222 to 2020-06-04 21:05:22.752):
      84% 16295/com.example.android.jetpackdemo: 84% user + 0% kernel / faults: 562 minor 1 major
        68% 16295/oid.jetpackdemo: 68% user + 0% kernel
        12% 16317/RenderThread: 12% user + 0% kernel
        1.8% 16307/HeapTaskDaemon: 1.8% user + 0% kernel
       +0% 16461/DeferredSaveThr: 0% user + 0% kernel
      9.1% 2001/system_server: 1.8% user + 7.3% kernel / faults: 7 minor
        7.3% 2014/ActivityManager: 0% user + 7.3% kernel
        3.6% 2536/Binder:2001_3: 3.6% user + 0% kernel
      5.5% 607/surfaceflinger: 2.7% user + 2.7% kernel
        1.3% 607/surfaceflinger: 1.3% user + 0% kernel
        1.3% 658/Binder:607_1: 0% user + 1.3% kernel
      4.3% 2661/com.android.systemui: 2.9% user + 1.4% kernel / faults: 26 minor
        4.3% 3614/RenderThread: 2.9% user + 1.4% kernel
        1.4% 2661/ndroid.systemui: 1.4% user + 0% kernel
      1.3% 25/rcuop/2: 0% user + 1.3% kernel
      1.3% 339/irq/171-tsens_i: 0% user + 1.3% kernel
      1.5% 11851/mdss_fb0: 0% user + 1.5% kernel
      1.6% 14246/kworker/u16:5: 0% user + 1.6% kernel
      1.6% 16318/kworker/u16:4: 0% user + 1.6% kernel
    15% TOTAL: 13% user + 1.8% kernel

从上面的日志信息中我们也看出来发生ANR的时候,进程com.example.android.jetpackdemoCPU占用率是比较高的,说明我们进程内存在比较忙碌的线程。然后我们继续看一下对应的traces.txt文件。
traces.txt部分信息

----- pid 16295 at 2020-06-04 21:05:20 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 16295搜索

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=16295 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 16406184130 12254163 407 ) utm=1630 stm=10 core=6 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0000000000478088  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
  native: #01 pc 0000000000478084  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
  native: #02 pc 000000000044c604  /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+524)
  native: #03 pc 0000000000463f60  /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
  native: #04 pc 000000000044d510  /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
  native: #05 pc 00000000000ff870  /system/lib64/libart.so (_ZN3art27ScopedObjectAccessUncheckedD2Ev+576)
  native: #06 pc 000000000010a764  /system/lib64/libart.so (_ZN3art8CheckJNI23GetPrimitiveArrayRegionEPKcNS_9Primitive4TypeEP7_JNIEnvP7_jarrayiiPv+1164)
  native: #07 pc 0000000000022ee4  /system/lib64/libjavacore.so (???)
  native: #08 pc 00000000004747a8  /data/dalvik-cache/arm64/system@framework@boot-core-libart.oat (Java_libcore_icu_NativeConverter_encode__J_3CI_3BI_3IZ+244)
  at libcore.icu.NativeConverter.encode(Native method)
  at java.nio.charset.CharsetEncoderICU.encodeLoop(CharsetEncoderICU.java:169)
  at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:579)
  at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:271)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
  - locked <0x05b5279d> (a java.io.FileWriter)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:113)
  at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
  at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

重点看一下这段信息

at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)

从上面这段信息可以看出,导致ANR的最终原因是在OutputStreamWriter.java的第194行。而我们的代码出问题的地方是StartActivity.kt的116行。
在这里插入图片描述
在这里插入图片描述

10.3 主线程处于阻塞状态,等待获取锁

//锁资源
val lockedResource: Any = Any()

fun onClick(v: View) {
    when (v.id) {
        R.id.btnWaitLockedResource -> {
            LockTask().execute(arrayListOf<Int>())
            Log.d(TAG, "onClick: 主线程先睡眠一会,避免先获取到锁")
            Thread.sleep(200)
            Log.d(TAG, "onClick: 主线程先睡眠结束,尝试获取锁")
            synchronized(lockedResource) {
                for (index in 0 until 10) {
                    Log.d(TAG, "onClick: 主线程获取到锁了$index")
                }
            }
        }
    }
}


//LockTask后台线程
inner class LockTask : AsyncTask<MutableList<Int>, Int, Unit>() {
    override fun doInBackground(vararg params: MutableList<Int>) =
        synchronized(lockedResource) {
            val list = params[0]
            for (i in 0 until 1000000) {
                list.add((Math.random() * 10000000).toInt())
            }
            list.forEach {
                Log.d(TAG, "doInBackground: for each element is $it")
            }
        }
}

调用onClick()方法以后,先让后台线程获取锁,然后主线程再尝试获取锁。然后多次点击返回键,制造ANR。

Logcat日志输出

2020-06-04 09:55:04.396 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 20008
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 8.27 / 7.73 / 7.37
    CPU usage from 83152ms to 0ms ago (2020-06-04 09:53:36.842 to 2020-06-04 09:54:59.995) with 99% awake:
      19% 508/logd: 15% user + 3.5% kernel / faults: 533 minor 1 major
      5.5% 2001/system_server: 3.9% user + 1.5% kernel / faults: 10843 minor 7 major
      4.9% 28932/com.huawei.appmarket: 4.3% user + 0.6% kernel / faults: 13003 minor 79 major
      2.6% 2661/com.android.systemui: 2.2% user + 0.3% kernel / faults: 7158 minor 2 major
      1.5% 607/surfaceflinger: 0.9% user + 0.6% kernel / faults: 190 minor 1 major
      1.2% 24307/logcat: 0.7% user + 0.4% kernel
      0.8% 11161/com.android.settings: 0.6% user + 0.1% kernel / faults: 9084 minor 20 major
      0.6% 24305/logcat: 0.2% user + 0.3% kernel
      0.4% 24301/fingerprint_log: 0% user + 0.4% kernel
      0.3% 15363/kworker/u16:10: 0% user + 0.3% kernel
      0.2% 6831/kworker/u16:5: 0% user + 0.2% kernel
      0.2% 837/imonitor: 0% user + 0.1% kernel
      //...
     2020-06-04 09:55:04.396 ? E/ActivityManager: CPU usage from 2211ms to 2742ms later (2020-06-04 09:55:02.206 to 2020-06-04 09:55:02.737):
      105% 20008/com.example.android.jetpackdemo: 92% user + 13% kernel / faults: 220 minor
        99% 20096/AsyncTask #1: 86% user + 13% kernel
        5.6% 20019/HeapTaskDaemon: 5.6% user + 0% kernel
      103% 508/logd: 99% user + 3.7% kernel / faults: 8 minor
        92% 24315/logd.reader.per: 92% user + 0% kernel
        7.5% 511/logd.writer: 5.6% user + 1.8% kernel
        3.7% 24314/logd.reader.per: 0% user + 3.7% kernel
        1.8% 24313/logd.reader.per: 0% user + 1.8% kernel
      11% 2661/com.android.systemui: 11% user + 0% kernel / faults: 52 minor
        9.3% 3614/RenderThread: 7.5% user + 1.8% kernel
        1.8% 2661/ndroid.systemui: 1.8% user + 0% kernel
      9.3% 607/surfaceflinger: 9.3% user + 0% kernel
        3.7% 607/surfaceflinger: 3.7% user + 0% kernel
        1.8% 2614/Binder:607_4: 0% user + 1.8% kernel
      5.6% 2001/system_server: 1.8% user + 3.7% kernel / faults: 2 minor
        5.6% 2014/ActivityManager: 0% user + 5.6% kernel
      3.3% 19794/adbd: 1.6% user + 1.6% kernel / faults: 147 minor
        1.6% 19794/adbd: 0% user + 1.6% kernel
        1.6% 19796/->transport: 0% user + 1.6% kernel
        1.6% 19797/<-transport: 0% user + 1.6% kernel
      3.4% 24307/logcat: 0% user + 3.4% kernel
      1.3% 624/mm-pp-dpps: 0% user + 1.3% kernel
        1.3% 717/ABA_THREAD: 1.3% user + 0% kernel
      1.6% 18971/kworker/0:2: 0% user + 1.6% kernel
      1.6% 18974/kworker/u16:0: 0% user + 1.6% kernel
      1.6% 19095/mdss_fb0: 0% user + 1.6% kernel
      1.7% 24301/fingerprint_log: 1.7% user + 0% kernel
      1.7% 24305/logcat: 1.7% user + 0% kernel
    31% TOTAL: 26% user + 4% kernel + 0.2% irq + 0.2% softirq

进程CPU占用率是比较高的,说明进程内存在比较忙碌的线程。然后继续看一下对应的traces.txt文件。

traces.txt部分信息

----- pid 20008 at 2020-06-04 09:55:00 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 20008搜索

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=20008 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 278831875 7233747 156 ) utm=22 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
  - waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

关键信息

at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16

在StartActivity的61行,在等待一个锁对象<0x0f8c80b0>,该对象是一个Object对象
(a java.lang.Object),这个锁对象正在被线程id为16的线程持有。那么我们下面在traces.txt文件中搜索一下这个锁对象<0x0f8c80b0>。如下所示:

DALVIK THREADS (16):
"AsyncTask #1" prio=5 tid=16 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x12cd61f0 self=0x7f93187200
  | sysTid=20096 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x7f84346450
  | state=R schedstat=( 13814173056 6030204 1355 ) utm=1193 stm=188 core=3 HZ=100
  | stack=0x7f84244000-0x7f84246000 stackSize=1037KB
  | held mutexes= "mutator lock"(shared held)
  at java.lang.Integer.stringSize(Integer.java:414)
  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:630)
  at java.lang.StringBuilder.append(StringBuilder.java:220)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
  - locked <0x0f8c80b0> (a java.lang.Object)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:99)
  at android.os.AsyncTask$2.call(AsyncTask.java:316)
  at java.util.concurrent.FutureTask.run(FutureTask.java:237)
  at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:255)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
  at java.lang.Thread.run(Thread.java:776)

关键信息

 at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
 - locked <0x0f8c80b0> (a java.lang.Object)

正是这个AsyncTask在107行持有锁对象0x0f8c80b0,导致主线程无法获取锁而阻塞,最终导致ANR。

在这里插入图片描述

10.4 主线程与其他线程之间发生死锁

val resourceFirst = "resourceFirst"
val resourceSecond = "resourceSecond"

private fun mockDeadLock() {
    //启动一个后台线程
    thread(start = false) {
        synchronized(resourceSecond) {
            Log.d(TAG, "工作线程获取了锁 resourceSecond")
            Thread.sleep(100)
            Log.d(TAG, "工作线程尝试获取锁 resourceFirst")
            synchronized(resourceFirst) {
                while (true) {
                    Log.d(TAG, "工作线程 mockDeadLock")
                }
            }
        }
    }.start()

    //主线程睡眠30ms后开始获取锁
    Thread.sleep(30)

    synchronized(resourceFirst) {
        Log.d(TAG, "主线程获取了锁 resourceFirst")

        Log.d(TAG, "主线程尝试获取锁 resourceSecond")
        synchronized(resourceSecond) {
            Log.d(TAG, "主线程获取了锁 resourceFirst")
            while (true) {
                Log.d(TAG, "主线程 mockDeadLock")
            }
        }
    }
}

上面这段代码逻辑:

  1. 工作线程先获取锁resourceSecond,然后睡眠100ms保证主线程能获取到锁resourceFirst
  2. 主线程睡眠30ms后先获取锁resourceFirst,然后再尝试获取锁resourceSecond,这时候是获取不到的,因为工作线程已经持有锁resourceSecond并且不释放。
  3. 工作线程睡眠结束以后尝试获取锁resourceFirst,这时候是获取不到的,因为主线程持有了锁resourceFirst并且不释放。
  4. 最终,造成死锁。

调用mockDeadLock()方法以后,多次点击返回键,制造ANR。

Logcat输出

2020-06-04 15:07:41.246 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 13626
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.53 / 6.81 / 6.4
    CPU usage from 177565ms to 0ms ago (2020-06-04 15:04:39.715 to 2020-06-04 15:07:37.281):
      11% 2001/system_server: 7.1% user + 4.4% kernel / faults: 68219 minor 37 major
      3.4% 2661/com.android.systemui: 2.8% user + 0.6% kernel / faults: 20555 minor 29 major
      2% 508/logd: 0.9% user + 1.1% kernel / faults: 76 minor
      1.8% 607/surfaceflinger: 1.1% user + 0.7% kernel / faults: 82 minor 1 major
      0% 24463/com.huawei.hwid.persistent: 0% user + 0% kernel / faults: 7819 minor 24 major
      0.9% 2823/com.huawei.systemmanager:service: 0.6% user + 0.2% kernel / faults: 13277 minor 12 major
    //...      
2020-06-04 15:07:41.246 ? E/ActivityManager: CPU usage from 1714ms to 2243ms later (2020-06-04 15:07:38.994 to 2020-06-04 15:07:39.523):
      12% 2001/system_server: 9% user + 3.6% kernel / faults: 8 minor
        10% 2014/ActivityManager: 5.4% user + 5.4% kernel
        1.8% 2399/UEventObserver: 1.8% user + 0% kernel
      1.5% 13652/kworker/u16:7: 0% user + 1.5% kernel
    2.3% TOTAL: 1.1% user + 1.1% kernel

上面的Logcat输出并没有关于我们进程的CUP信息,说明我们的进程CPU占用率很低。那么我们继续看一下traces.txt文件。

traces.txt部分信息

----- pid 13626 at 2020-06-04 15:07:37 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 13626搜索

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=13626 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 288564792 6939269 224 ) utm=23 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.mockDeadLock(StartActivity.kt:142)
  - waiting to lock <0x0a43b5c8> (a java.lang.String) held by thread 17
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:70)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

主线程状态是线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0a43b5c8>是一个String对象(a java.lang.String),该对象被线程id为17的线程持有。然后我们搜索这个锁对象。

"Thread-2" prio=5 tid=17 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x12c89dc0 self=0x7f931cd000
  | sysTid=13763 nice=0 cgrp=default sched=0/0 handle=0x7f84344450
  | state=S schedstat=( 886406 280365 2 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7f84242000-0x7f84244000 stackSize=1037KB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:127)
  - waiting to lock <0x0ec26674> (a java.lang.String) held by thread 1
  - locked <0x0a43b5c8> (a java.lang.String)
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:21)
  at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

Thread-2,线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0ec26674>是一个String对象(a java.lang.String),这个对象被线程id为1的线程(也就是主线程)持有。并且当前线程持有锁对象<0x0a43b5c8>。

最终,主线程和工作线程Thread-2造成死锁,导致应用无响应。

10.5 主线程对另一个进程进行同步Binder调用,后者需很长时间才能返回

代码实现是从客户端的两个EditText中获取两个数字,然后通过Binder调用服务端的方法计算两个数的和返回给客户端,然后客户端讲计算结果展示在界面上。

客户端部分代码

private IMyAidlInterface iMyAidlInterface;
    
private ServiceConnection conn = new ServiceConnection() {
    @Override
    public void onServiceConnected(ComponentName name, IBinder service) {
         //获取Binder对象
         iMyAidlInterface = IMyAidlInterface.Stub.asInterface(service);
    }
    //...
};
    
public void onClick(View view) {
    switch (view.getId()) {
        case R.id.btn_count:
            mNum1 = Integer.parseInt(etNum1.getText().toString());
            mNum2 = Integer.parseInt(etNum2.getText().toString());
            try {
                //在主线程进行同步binder调用
                mTotal = iMyAidlInterface.add(mNum1, mNum2);
            } catch (RemoteException e) {
                e.printStackTrace();
                Log.e(TAG, "onClick: " + e.getMessage());
            }
            editShowResult.setText("mTotal=" + mTotal);
            break;
       }
}

服务端部分代码

public class IRemoteService extends Service {

    private static final String TAG = "IRemoteService";

    private IBinder iBinder = new IMyAidlInterface.Stub() {
        @Override
        public int add(int num1, int num2) throws RemoteException {
            Log.d(TAG, "remote method add: start sleep thread id =" + Thread.currentThread().getId()+"," +
                    "thread name = "+Thread.currentThread().getName());
            try {
                //睡眠一段时间,然后才进行计算
                Thread.sleep(120000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            Log.d(TAG, "remote method add: finish sleep return calculate result");
            return num1 + num2;
        }
    };

    public IRemoteService() {
    }

    @Override
    public IBinder onBind(Intent intent) {
        return iBinder;
    }
}

注意:我们需要先把Binder服务端运行起来,然后再运行Binder客户端执行相应的方法。

Logcat输出

2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: ANR in com.hm.aidlclient (com.hm.aidlclient/.BaseKnowledgeActivity)
    PID: 18096
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.55 / 7.26 / 6.87
    CPU usage from 755516ms to 0ms ago (2020-06-04 15:37:07.545 to 2020-06-04 15:49:43.062) with 99% awake:
      5.1% 2001/system_server: 3.5% user + 1.5% kernel / faults: 139606 minor 17 major
      1.2% 508/logd: 0.5% user + 0.6% kernel / faults: 35 minor
      0.9% 2661/com.android.systemui: 0.7% user + 0.1% kernel / faults: 13039 minor 4 major
      0.8% 12442/adbd: 0.2% user + 0.6% kernel / faults: 23957 minor
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 183 minor 2 major
      0.6% 28932/com.huawei.appmarket: 0.5% user + 0.1% kernel / faults: 9311 minor 64 major
      0.4% 24463/com.huawei.hwid.persistent: 0.3% user + 0% kernel / faults: 11607 minor 6 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.3% 4128/com.google.android.gms: 0.2% user + 0% kernel / faults: 26970 minor 16 major
      //...
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: CPU usage from 1701ms to 2232ms later (2020-06-04 15:49:44.762 to 2020-06-04 15:49:45.293):
      28% 2001/system_server: 21% user + 7.2% kernel / faults: 38 minor
        16% 2010/HeapTaskDaemon: 16% user + 0% kernel
        9% 2014/ActivityManager: 1.8% user + 7.2% kernel
        1.8% 2001/system_server: 0% user + 1.8% kernel
        1.8% 2540/NetdConnector: 1.8% user + 0% kernel
      9% 607/surfaceflinger: 9% user + 0% kernel
        5.4% 607/surfaceflinger: 5.4% user + 0% kernel
        1.8% 658/Binder:607_1: 0% user + 1.8% kernel
        1.8% 677/EventThread: 0% user + 1.8% kernel
      7.1% 2661/com.android.systemui: 5.3% user + 1.7% kernel / faults: 38 minor
        8.9% 3614/RenderThread: 7.1% user + 1.7% kernel
      1.3% 508/logd: 1.3% user + 0% kernel
      1.3% 624/mm-pp-dpps: 1.3% user + 0% kernel
        2.7% 717/ABA_THREAD: 1.3% user + 1.3% kernel
      1.5% 15978/mdss_fb0: 0% user + 1.5% kernel
      1.6% 18228/logcat: 0% user + 1.6% kernel
    8.3% TOTAL: 5.8% user + 2.5% kernel

并没有什么有价值的信息。继续看一下traces.txt文件。
traces.txt中客户端相关信息

----- pid 18096 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlclient
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 18096搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=18096 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 464662186 22498334 359 ) utm=38 stm=8 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: binder_thread_read+0x4cc/0x13f0
  kernel: binder_ioctl+0x53c/0xbcc
  kernel: do_vfs_ioctl+0x570/0x5a8
  kernel: SyS_ioctl+0x60/0x88
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ad6c  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 000000000001fa48  /system/lib64/libc.so (ioctl+144)
  native: #02 pc 00000000000555a4  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+260)
  native: #03 pc 0000000000056388  /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+352)
  native: #04 pc 000000000004b250  /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
  native: #05 pc 0000000000103354  /system/lib64/libandroid_runtime.so (???)
  native: #06 pc 0000000000b36238  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:617)
  at com.hm.aidlserver.IMyAidlInterface$Stub$Proxy.add(IMyAidlInterface.java:90)
  at com.hm.aidlclient.BaseKnowledgeActivity.onClick(BaseKnowledgeActivity.java:109)
  at com.hm.aidlclient.BaseKnowledgeActivity_ViewBinding$1.doClick(BaseKnowledgeActivity_ViewBinding.java:41)
  at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

看到Binder客户端主线程的状态是Native,这个状态是native线程的一个状态,对应java线程的RUNNABLE状态。更详细的对应关系可以参考VMThread.java。然后从上面的信息中我们只看到BinderProxy调用了transactNative()方法,这是一个本地方法,最终会调用服务端Binder对象的transact()方法,实现真正的跨进程通信。除了这些我们没有看到其他有用的信息了。那么我们接下来看一看服务端的一些信息,看看能不能找到一些线索。

traces.txt中服务端相关信息

----- pid 17773 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlserver
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 17773搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=17773 nice=0 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 213791882 16481247 206 ) utm=18 stm=3 core=1 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: SyS_epoll_wait+0x2d4/0x394
  kernel: SyS_epoll_pwait+0xc4/0x150
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ac80  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001e21c  /system/lib64/libc.so (epoll_pwait+64)
  native: #02 pc 00000000000181d8  /vendor/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+156)
  native: #03 pc 000000000001808c  /vendor/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+60)
  native: #04 pc 00000000000f66dc  /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  native: #05 pc 0000000000b91ec0  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:356)
  at android.os.Looper.loop(Looper.java:138)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

服务端的进程号是pid 17773,我们看到服务端的主线程中也没有什么线索,不要慌,这里我们似乎忘了一点什么,服务端的Binder对象是运行在服务端的Binder线程池中的
那我们怎么找到具体是Binder线程池中的哪个线程呢?其实在traces.txt文件中也是输出了的。

----- binder transactions -----
18096:18096(m.hm.aidlclient:m.hm.aidlclient) -> 17773:17788(m.hm.aidlserver:Binder:17773_2) code: 1
----- end binder transactions -----

上面这段信息的意思就是,我们是在进程id为18096,内核线程为18096的线程(就是主线程)向进程id为17773,内核线id为17788的线程发起跨进程通信。内核线程id为17788的线程的线程名称是Binder:17773_2。那么我们就搜索一下Binder:17773_2。搜索结果如下所示:

"Binder:17773_2" prio=5 tid=10 Sleeping
  | group="main" sCount=1 dsCount=0 obj=0x32c064c0 self=0x7f9a624800
  | sysTid=17788 nice=-10 cgrp=default sched=0/0 handle=0x7fa0fc3450
  | state=S schedstat=( 3077762 6086666 14 ) utm=0 stm=0 core=6 HZ=100
  | stack=0x7fa0ec9000-0x7fa0ecb000 stackSize=1005KB
  | held mutexes=
  at java.lang.Thread.sleep!(Native method)
  - sleeping on <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:379)
  - locked <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:321)
  at com.hm.aidlserver.IRemoteService$1.add(IRemoteService.java:18)
  at com.hm.aidlserver.IMyAidlInterface$Stub.onTransact(IMyAidlInterface.java:55)
  at android.os.Binder.execTransact(Binder.java:565)

这里我们终于发现了原因,我们看到Binder:17773_2状态是Sleeping,就是服务端的Binder对象的add()方法内部第18行调用了Thread.sleep方法造成长时间无法返回,从而使客户端方法执行无法结束,最终导致ANR。
在这里插入图片描述

十一、总结

当ANR发生时,先通过logcat定位ANR的类型,然后通过trace信息分析产生ANR的原因,需要重点关注主线程(main)的当前状态和CPU的使用情况!当然ANR还是以预防为主,切记不要在主线程做耗时操作,不管是主动发起还是调用系统方法都需要对耗时的地方进行评估!

  • 17
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值