SBA of anr/watchdog timeout
anr和wathcdog timeout的分析很相似,主要是在bugreport/anr history中分析call stack过程.
其区别就是:watchdog timeout是从进程system_server(线程不固定,可以为main,foregroud等)为切点,anr是从app进程main线程为切入点.
下以watchdog案例分析.
1.查看是否发生anr/watchdog timeout,及其时间点
1.1 anr
发生在app进程
(1)main log中搜索"am_anr"
(2)bugreport中搜索"ANR in"
这里能知道是“哪个APP”发生ANR
1.2 watchdog timeout
发生在system_server进程
(1)main log中搜索"watchdog"
(2)bugreport中搜索"WATCHDOG KILLING SYSTEM PROCESS"
这里知道system_server进程的"哪个线程"发生watchdog
2.查看"上层"阻塞在哪个线程及哪个函数
2.1 线程
(1) anr
一定是阻塞在APP的main线程
(2)watchdog timeout
从"WATCHDOG KILLING SYSTEM PROCESS"这里能得到system_server blocked的线程.比如main,foregroud线程
[
#3928-12.3-wt_logs\3928-12.3-wt_logs\log_20181203110024\bugreport.txt
12-03 10:58:16.034 1000 2934 3945 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on main thread (main) >>>>这个刚好是main线程
]
2.2 函数
对于Video stability,一般是调用MediaPlayer/MediaRecorder Java API,然后进行binder IPC.
搜索Cmd line: system_server或者APP name,可能出现多个,找时间点附近的call stack
[
#3928-12.3-wt_logs\3928-12.3-wt_logs\log_20181203110024\anr\anr_2018-12-03-10-57-21-569
----- pid 2934 at 2018-12-03 10:57:21 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x729aa710 self=0x73064c2a00
| sysTid=2934 nice=-2 cgrp=default sched=0/0 handle=0x738b2769a8
...
native: #00 pc 000000000006a50c /system/lib64/libc.so (__ioctl+4)
native: #01 pc 0000000000024478 /system/lib64/libc.so (ioctl+136)
native: #02 pc 00000000000549e8 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+256) >>>>
...
native: #07 pc 000000000009b738 /system/lib64/libmedia.so (android::MediaPlayer::setDataSource(int, long, long)+292) >>>>
native: #08 pc 00000000000401e0 /system/lib64/libmedia_jni.so (???)
native: #09 pc 000000000073da08 /system/framework/arm64/boot-framework.oat (Java_android_media_MediaPlayer__1setDataSource__Ljava_io_FileDescriptor_2JJ+168)
at android.media.MediaPlayer._setDataSource(Native method) >>>>
...
at android.media.Ringtone.setUri(Ringtone.java:307)
>>>>这个是调用MediaPlayer::setDataSource,之后看到IPCThreadState::talkWithDriver,就表示在做binder IPC.
]
3.查看"上层"binder IPC到哪个进程及哪个线程
查看binder transaction信息,搜索“from <process1>:<thread1>“ to <process2>:<thread2>
正常全面的log,应该是有binder transaction的,如下;但有些customer log中无此信息,那就得知道MediaPlayer或者MediaRecorder flow的进程关系.
<
#BinderTraces.txt
proc 2934
context binder
thread 2934: l 10 need_return 0 tr 0
outgoing transaction 90748022: ffffffc033ad7980 from 2934:2934 to 2366:1316 code 1 flags 10 pri 0:118 r1 >>>>
这里是<process2>:<thread2>为2366:1316
>
4.查看"第二层"进程阻塞在哪个线程及哪个函数(通常是mediaserver进程)
4.1 找哪个线程是等锁,哪个线程是持有锁 => Mutex::Autolock lock(mLock);
call stack中函数下一个就是lock,说明是等锁
call stack中函数调用别的函数,说明是持有锁
[
----- pid 2366 at 2018-12-03 10:57:27 -----
Cmd line: /system/bin/mediaserver
//(No)sysTid=1316 is waiting for lock which is hold by sysTid=25863
"Binder:2366_A" sysTid=1316
#00 pc 0001901c /system/lib/libc.so (syscall+28)
#01 pc 00048145 /system/lib/libc.so (__pthread_mutex_lock_with_timeout(pthread_mutex_internal_t*, bool, timespec const*)+164) >>>>等MediaPlayerService锁
#02 pc 0003e64f /system/lib/libmediaplayerservice.so (android::MediaPlayerService::create(android::sp<android::IMediaPlayerClient> const&, audio_session_t)+170) >>>>
...
]
//(No) sysTid=25863 is waiting for lock which is hold by sysTid=9520
"Binder:2366_B" sysTid=25863
#00 pc 0001901c /system/lib/libc.so (syscall+28)
#01 pc 00048145 /system/lib/libc.so (__pthread_mutex_lock_with_timeout(pthread_mutex_internal_t*, bool, timespec const*)+164)
#02 pc 0004c989 /system/lib/libmediaplayerservice.so (android::StagefrightRecorder: