Hang Detect 问题分析案例

此题是 上层发生SWT 喂狗超时 hang detect 主动call 重启 

我司测试反馈跑monkey发生过重启现象

LOG 中有明显的db.fatal.00.KE  DB文件, 解压DB, 在exp_main 里面可以看到:

Flavor Info: 'None'
Exception Log Time:[Tue Nov 19 18:12:25 CST 2019] [17.91493]

Exception Class: Kernel (KE)
PC is at [<ffffff80087db8b4>] hang_detect_thread+0x20c/0x338
LR is at [<ffffff80087db8ac>] hang_detect_thread+0x204/0x338

Current Executing Process:
[hang_detect, 196][kthreadd, 2]

Backtrace:
[<ffffff800808a798>] bug_handler.part.2+0x40/0x80      
[<ffffff800808a7fc>] bug_handler+0x24/0x30     
[<ffffff80080829e8>] brk_handler+0x88/0xd0     
[<ffffff8008081704>] do_debug_exception+0x94/0x148     
[<ffffff800808470c>] el1_dbg+0x18/0x8c 
[<ffffff80080c4014>] kthread+0xec/0x100
[<ffffff8008084fa0>] ret_from_fork+0x10/0x30   
[<ffffffffffffffff>] 0xffffffffffffffff

从exp_main log种可看出此题是hang detect 主动call bug, 即hang detect 检测到android 上层或底层已经卡住,主动重启手机。

查看SYS_KERNEL_LOG hang detect 信息:

11-19 18:11:58.461 <6>[12513.115747] (0)[196:hang_detect][Hang_Detect] hang_detect thread counts down -1:14, status 1.

hang_detect counts 为14 (tick 420s, system_server 喂狗间隔为30s)说明420s前上层可能卡主即有SWT发生,

查看mtklog 有看到mtklog\data_aee_exp\db.fatal.00.SWT  此题可以定性为上层卡主引起的重启。

那么接下来分析SWT db

SWT信息如下:

Flavor Info: 'None'
Exception Log Time:[Tue Nov 19 18:04:06 CST 2019] [12041.303324]

Exception Class: SWT
Exception Type: system_server_watchdog

Current Executing Process: 
system_server


Trigger time:[2019-11-19 18:04:06.294283] pid:855

Backtrace: 
Process: system_server
Subject: Blocked in monitor com.freeme.server.am.FreemeActivityManagerService on foreground thread (android.fg)
Build: DOOV//Q6401:8.1.0/O11019/1572577497:user/release-keys

打开SWT_JBT_TRACES文件 查看该时间点system_server的trace

查看main线程 被tid=123的线程blocked

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73a44798 self=0x6f598c3a00
  | sysTid=855 nice=-2 cgrp=default sched=0/0 handle=0x6fdee5f9a8
  | state=S schedstat=( 67160603057 21558724792 225928 ) utm=4409 stm=2306 core=2 HZ=100
  | stack=0x7ffca6c000-0x7ffca6e000 stackSize=8MB
  | held mutexes=
  at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:20350)
  - waiting to lock <0x0e37dba6> (a com.freeme.server.am.FreemeActivityManagerService) held by thread 123
  at android.app.ActivityManager.broadcastStickyIntent(ActivityManager.java:4186)
  at android.app.ActivityManager.broadcastStickyIntent(ActivityManager.java:4176)
  at com.android.server.BatteryService$12.run(BatteryService.java:822)
  at android.os.Handler.handleCallback(Handler.java:790)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:164)
  at com.android.server.SystemServer.run(SystemServer.java:450)
  at com.android.server.SystemServer.main(SystemServer.java:290)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:846)

继续看tid=123的线程是什么状态

"Binder:855_1A" prio=5 tid=123 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14a017a0 self=0x6f3f7d4c00
  | sysTid=8128 nice=-10 cgrp=default sched=0/0 handle=0x6f350d24f0
  | state=S schedstat=( 6537215621 1714416972 12079 ) utm=498 stm=154 core=5 HZ=100
  | stack=0x6f34fd8000-0x6f34fda000 stackSize=1005KB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/8128/stack)
  native: #00 pc 000000000006a824  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 00000000000240a0  /system/lib64/libc.so (ioctl+136)
  native: #02 pc 0000000000054b50  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+260)
  native: #03 pc 00000000000558dc  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+360)
  native: #04 pc 0000000000055614  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224)
  native: #05 pc 000000000004c318  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
  native: #06 pc 000000000011f12c  /system/lib64/libandroid_runtime.so (???)
  native: #07 pc 000000000091b5b4  /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.app.IActivityController$Stub$Proxy.activityStarting(IActivityController.java:172)
  at com.android.server.am.ActivityStarter.startActivity(ActivityStarter.java:460)
  at com.android.server.am.ActivityStarter.startActivityLocked(ActivityStarter.java:289)
  at com.android.server.am.ActivityStarter.startActivityMayWait(ActivityStarter.java:900)
  - locked <0x0e37dba6> (a com.freeme.server.am.FreemeActivityManagerService)
  at com.android.server.am.ActivityManagerService.startActivityAsUser(ActivityManagerService.java:4791)
  at com.android.server.am.ActivityManagerService.startActivity(ActivityManagerService.java:4727)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:121)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3062)
  at android.os.Binder.execTransact(Binder.java:697)

该线程状态为native,并且callstack中含有

IPCThreadState::talkWithDriver

IPCThreadState::waitForResponse

说明binder的对端有卡住

查看SYS_BINDER_INFO 看下对端是什么进程

 incoming transaction 23437272: ffffffc01a730a00 from 855:8128 to 21019:21031 code 1 flags 10 pri 0:110 r1 node 20769437 size 440:0 data ffffff8016000008,对端为21019:21031

可以看出对端为21019:21031

查看SYS_PROCESSES_AND_THREADS 进程信息

u:r:shell:s0                   shell        21019 21031  5300 3708136  90744 wait_woken 7eff2fa890 S  29 -10      -   0  fg  64   0 com.android.commands.monkey Binder:21019_2

查看tid=21031的进程信息

"Binder:21019_2" prio=5 tid=9 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12e00950 self=0x7e7c625600
  | sysTid=21031 nice=-10 cgrp=default sched=0/0 handle=0x7e736454f0
  | state=S schedstat=( 42917455 13946540 59 ) utm=2 stm=1 core=5 HZ=100
  | stack=0x7e7354b000-0x7e7354d000 stackSize=1005KB
  | held mutexes=
  kernel: __switch_to+0x8c/0xb0
  kernel: wait_woken+0x4c/0xa8
  kernel: n_tty_write+0x3a8/0x460
  kernel: tty_write+0x188/0x2b0
  kernel: __vfs_write+0x34/0xf8
  kernel: vfs_write+0x94/0x1a8
  kernel: SyS_write+0x5c/0xc8
  kernel: el0_svc_naked+0x34/0x38
  native: #00 pc 000000000006b890  /system/lib64/libc.so (write+8)
  native: #01 pc 0000000000032024  /system/lib64/libjavacore.so (???)
  native: #02 pc 00000000002371c4  /system/framework/arm64/boot-core-libart.oat (Java_libcore_io_Linux_writeBytes__Ljava_io_FileDescriptor_2Ljava_lang_Object_2II+196)
  at libcore.io.Linux.writeBytes(Native method)
  at libcore.io.Linux.write(Linux.java:288)
  at libcore.io.BlockGuardOs.write(BlockGuardOs.java:345)
  at libcore.io.IoBridge.write(IoBridge.java:553)
  at java.io.FileOutputStream.write(FileOutputStream.java:326)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
  - locked <0x0ffc3d7e> (a java.io.BufferedOutputStream)
  at java.io.PrintStream.write(PrintStream.java:503)
  - locked <0x04cd5032> (a java.io.PrintStream)
  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
  at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
  at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
  - locked <0x0696c2df> (a java.io.OutputStreamWriter)
  at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
  at java.io.PrintStream.write(PrintStream.java:554)
  - locked <0x04cd5032> (a java.io.PrintStream)
  at java.io.PrintStream.print(PrintStream.java:698)
  at java.io.PrintStream.println(PrintStream.java:835)
  - locked <0x04cd5032> (a java.io.PrintStream)
  at com.android.commands.monkey.Logger$1.println(Logger.java:27)
  at com.android.commands.monkey.Monkey$ActivityController.activityStarting(Monkey.java:272)
  at android.app.IActivityController$Stub.onTransact(IActivityController.java:62)
  at android.os.Binder.execTransact(Binder.java:697)

如上对端为monkey,比较特殊,用互接触不到 可无需关注

若继续深追此题,要check 发生swt前30s的io流情况和cpu情况,为什么monkey写入数据慢

可通过MTK perlyzer 工具 解析SYS_BLOCKIO,查看发生SWT前30s的io情况和cpu情况

  • 0
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UTF-8XD

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值