TEE OS内发生segment fault导致的Android Crash

现象
使用错误手指触摸指纹5次,指纹被锁定后,按power灭屏再亮屏,手动解图形锁概率发生卡住,另外插入USB不复现。

分析
Exception Class: SWT
Exception Type: system_server_watchdog
Current Executing Process: 
system_server
Trigger time:[2018-01-02 12:51:47.93449] pid:631

Backtrace: 
Process: system_server
Subject: Blocked in handler on  main  thread (main)
Build: alps/full_k400/k400:8.1.0/O11019/1516598058:userdebug/dev-keys

main thread
这个是system_server进程的主线程,等待锁:0x0ae821b0
"main" prio=5  tid=1  Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x70e1c9f8 self=0xabe2d000
  |  sysTid=631  nice=-2 cgrp=default sched=0/0 handle=0xb046e4a4
  | state=S schedstat=( 9786847713 6915040794 14170 ) utm=749 stm=229 core=0 HZ=100
  | stack=0xbe0c8000-0xbe0ca000 stackSize=8MB
  | held mutexes=
  at com.android.server.locksettings.LockSettingsService.havePattern(LockSettingsService.java:990)
  - waiting to lock < 0x0ae821b0 > (a com.android.server.locksettings.SyntheticPasswordManager) held by  thread 111
  at com.android.internal.widget.LockPatternUtils.savedPatternExists(LockPatternUtils.java:536)
  at com.android.internal.widget.LockPatternUtils.isLockPatternEnabled(LockPatternUtils.java:1157)

搜索:0x0ae821b0
这个是system_server进程的binder线程线程,名称是Binder:631_19,对于内核中的pid是4148.
"Binder: 631_19 " prio=5  tid=111  Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x1358ab20 self=0x94c09800
  | sysTid= 4148  nice=0 cgrp=default sched=0/0 handle=0x8b836970
  | state=S schedstat=( 2006656770 1821935538 4272 ) utm=152 stm=48 core=2 HZ=100
  | stack=0x8b73c000-0x8b73e000 stackSize=1006KB
  | held mutexes=
  kernel: binder_thread_read+0x458/0x1054
  kernel: binder_ioctl+0x46c/0x9e8
  kernel: do_vfs_ioctl+0x9c/0x69c
  kernel: SyS_ioctl+0x54/0x78
  native: #00 pc 000493a4  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 0001df8f  /system/lib/libc.so (ioctl+38)
  native: #02 pc 0004242f  /system/lib/libbinder.so (android::IPCThreadState:: talkWithDriver (bool)+170)
  native: #03 pc 00042de9  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
  native: #04 pc 0003d2e5  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
  native: #05 pc 000bcdf5  /system/lib/libandroid_runtime.so (???)
  native: #06 pc 00298fb5  /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:764)
  at android.service.gatekeeper.IGateKeeperService$Stub$Proxy.verifyChallenge( IGateKeeperService .java:253)
  at com.android.server.locksettings.SyntheticPasswordManager.unwrapPasswordBasedSyntheticPassword(SyntheticPasswordManager.java:810)
  at com.android.server.locksettings.LockSettingsService.spBasedDoVerifyCredential(LockSettingsService.java:2105)
  - locked < 0x0ae821b0 > (a com.android.server.locksettings.SyntheticPasswordManager)
  at com.android.server.locksettings.LockSettingsService.doVerifyCredential(LockSettingsService.java:1553)
  at com.android.server.locksettings.LockSettingsService.checkCredential(LockSettingsService.java:1526)
  at com.android.internal.widget.ILockSettings$Stub.onTransact(ILockSettings.java:164)

binder call一般是同步调用,很明显这里就是.调用栈显示binder client端4118线程发生了binder call长等待,
迟迟收不到来自binder server端的回应导致,那么这个binder server端是谁呢?
分析binder proc信息数据(搜索4148):
proc 631
context binder
thread 4148: l 11 need_return 0 tr 0
outgoing transaction 386736: dde3c180 from 631:4148 to 314:314 code 3 flags 10 pri 0:120 r1
很明显,这个binder server端就是pid=314进程了,搜索314进程:
Line 108828: <4>[ 1412.757750] -(2)[899:watchdog] gatekeeperd 314 273297.350405 292 120 273297.350405 131.026859 1306310.341713 /
所以sysTid=4148的binder线程的一直没等到314进程 gatekeeperd 的回应,长久占用binder资源导致system_server的main线程卡住超时,最终被watchdog发送: kill -9 631
<7>[ 1414.774647] -(1)[899:watchdog][name:mtprof&][signal][899:watchdog] send death sig 9 to [631:system_server:R]
<6>[ 1415.102373] (2)[4300:kworker/u8:9]binder: release 631:4148 transaction 386736 out, still active
<6>[ 1415.102398] (2)[4300:kworker/u8:9]binder: release 631:4148 transaction 386652 in, still active

下一步需要分析gatekeeperd线程为什么会卡住(进入深度睡眠)不动了。
kernel log没发现gatekeeperd的相关调用栈,按道理watchdog会把所有卡住的进程调用栈打印出来的,这是怎么回事?
排查分析D态进程可以看到 gatekeeper@1.0- 调用栈,gatekeeperd进程的hal就是通过 gatekeeper@1.0- 和内核交互的.
所以gatekeeper@1.0-被卡就会导致gatekeeperd被卡.

gatekeeper@1.0-
<6>[ 1412.710599]  (2)[899:watchdog] gatekeeper@1.0-   D  c0958ff8     0   261      1 0x00000000
<4>[ 1412.710618]  (2)[899:watchdog]Backtrace: 
<4>[ 1412.710638]  (2)[899:watchdog][<c0958c08>] (__schedule) from [<c0959398>] (schedule+0x90/0x9c)
<4>[ 1412.710647]  (2)[899:watchdog] r10:c0e1fe48 r9:00000007 r8:c0e1fe48 r7:dd975480
<4>[ 1412.710681]  (2)[899:watchdog][<c0959308>] (schedule) from [<c095bf44>] (schedule_timeout+0x2c/0x214)
<4>[ 1412.710698]  (2)[899:watchdog][<c095bf18>] (schedule_timeout) from [<c095b244>] (__down+0x78/0xcc)
<4>[ 1412.710707]  (2)[899:watchdog] r9:00000007 r8:00000002 r7:dd975480 r6:7fffffff
<4>[ 1412.710743]  (2)[899:watchdog][<c095b1cc>] (__down) from [<c01669e4>] (down+0x34/0x44)
<4>[ 1412.710752]  (2)[899:watchdog] r8:aa0b0000 r7:00005403 r6:c0e1fe48 r5:80000013
<4>[ 1412.710788]  (2)[899:watchdog][<c01669b0>] ( down ) from [<c05def3c>] ( fp_ioctl +0x54/0x498)
<4>[ 1412.710797]  (2)[899:watchdog] r5:00005403 r4:aa0b0000
<4>[ 1412.710824]  (2)[899:watchdog][<c05deee8>] (fp_ioctl) from [<c023abdc>] (do_vfs_ioctl+0x9c/0x69c)
<4>[ 1412.710833]  (2)[899:watchdog] r9:00000007 r8:aa0b0000 r7:00005403 r6:c7509780
<4>[ 1412.710867]  (2)[899:watchdog][<c023ab40>] (do_vfs_ioctl) from [<c023b230>] (SyS_ioctl+0x54/0x78)
<4>[ 1412.710876]  (2)[899:watchdog] r10:00000000 r9:00000007 r8:aa0b0000 r7:00005403

分析 fp_ioctl 代码,这个是kernel tee driver代码里面的.
89 static long fp_ioctl(struct file *filp, unsigned cmd, unsigned long arg)
90 {
91 unsigned int args_len = 0;
92 unsigned int fp_cid = 0xFF;
93 unsigned int fp_fid = 0xFF;
94 unsigned char args[16] = {0};
95 unsigned int buff_len = 0;
96 if (_IOC_TYPE(cmd) != TEEI_IOC_MAGIC)
97 return -ENOTTY;
98 if (_IOC_NR(cmd) > TEEI_IOC_MAXNR)
99 return -ENOTTY;
100 down(& fp_api_lock);
代码分析结合调用栈显示拿不到 fp_api_lock信号量 锁而被阻塞,需要看谁拿住了锁,继续分析D进程.

android.hardwar
排查D进程发现android.hardwar可疑,可以结合代码调用栈分析得出是它拿住了上面的信号量锁 fp_api_lock
执行send_fp_command后久久不返回导致长久持锁,导致gatekeeperd久久拿不到这个锁而被卡住.
<6>[ 1412.710976]  (2)[899:watchdog] android.hardwar D  c0958ff8     0   630      1 0x00000000
<4>[ 1412.710996]  (2)[899:watchdog]Backtrace: 
<4>[ 1412.711016]  (2)[899:watchdog][<c0958c08>] (__schedule) from [<c0959398>] (schedule+0x90/0x9c)
<4>[ 1412.711025]  (2)[899:watchdog] r10:c0e1fe48 r9:d7480000 r8:c0e1fe48 r7:dd1bb400
<4>[ 1412.711060]  (2)[899:watchdog][<c0959308>] (schedule) from [<c095bf44>] (schedule_timeout+0x2c/0x214)
<4>[ 1412.711076]  (2)[899:watchdog][<c095bf18>] (schedule_timeout) from [<c095b244>] (__down+0x78/0xcc)
<4>[ 1412.711085]  (2)[899:watchdog] r9:d7480000 r8:00000002 r7:dd1bb400 r6:7fffffff
<4>[ 1412.711120]  (2)[899:watchdog][<c095b1cc>] (__down) from [<c01669e4>] (down+0x34/0x44)
<4>[ 1412.711129]  (2)[899:watchdog] r8:00000051 r7:c100d2f4 r6:00000000 r5:60010013
<4>[ 1412.711167]  (2)[899:watchdog][<c01669b0>] (down) from [<c05d8974>] (send_fp_command+0xe8/0x194)
<4>[ 1412.711176]  (2)[899:watchdog] r5:c100d178 r4:c0e1fe48
<4>[ 1412.711202]  (2)[899:watchdog][<c05d888c>] ( send_fp_command ) from [<c05df158>] ( fp_ioctl +0x270/0x498)
<4>[ 1412.711211]  (2)[899:watchdog] r6:c0e1fe48 r5:00002418 r4:b387f7a8
<4>[ 1412.711240]  (2)[899:watchdog][<c05deee8>] (fp_ioctl) from [<c023abdc>] (do_vfs_ioctl+0x9c/0x69c)
<4>[ 1412.711249]  (2)[899:watchdog] r9:00000005 r8:b387f7a8 r7:00005402 r6:db10b840

这里代码调用栈的行为是LINUX内核使用SMC CALL指令跟EL3(TEE OS)通信,但是很不幸一直没等待到回应.
故接下来需要分析EL3(TEE OS)为什么会没返回了.

TZ_LOG
<7>[ 1278.054685] -(0)[52:cfinteractive][name:mt_freqhopping&](PLL_CON1): 0x9a000
<7>[ 1278.054733] -(0)[52:cfinteractive][name:mt_freqhopping&]PLL_CON1: 0x000c8000<7>[ 1278.054768]  (0)[52:cfinteractive][name:mt_cpufreq&][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 1200mv, freq = 1300000 KHz
<4>[ 1278.055076] -(0)[139:teei_switch_thr][TZ_LOG] [01;32mREEagent| [E][tee_listener.cc:110/smc_listener_list_driver_handle]____________________________ smc_listener_list_driver_handle : 110 _______________________________[0m
<4>[ 1278.055093] -(0)[139:teei_switch_thr][TZ_LOG] [01;32mREEagent| [0m
<4>[ 1278.055108] -(0)[139:teei_switch_thr][TZ_LOG] [01;32mREEagent| [E][tee_listener.cc:123/smc_listener_list_driver_handle]____________________________ smc_listener_list_driver_handle : 123 _______________________________[0m
<4>[ 1278.055119] -(0)[139:teei_switch_thr][TZ_LOG] [01;32mREEagent| [0m
<4>[ 1278.055130] -(0)[139:teei_switch_thr][ TZ_LOG ] [01;31mFP_S    | Doing  SIGSEGV [0m
<4>[ 1278.055142] -(0)[139:teei_switch_thr][ TZ_LOG ] [01;31mFP_S    |  Invalid user memory for sigframe ...[0m
<2>[ 1278.559001]  (0)[93:hps_main]Boot slave CPU

log显示TEE内部发生了 SIGSEGV .所以是需要提供给TEE提供商分析的。

根本原因

CPU off的时候没有按ARM规范flush D-cache影响了mem的一致性所致.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值