FTM下执行full test,进入camera test卡住,稍后手机重启

遇到一个重启问题,比较简单,记录下分析思路,6580平台.
现象:
FTM (apk)下执行full test,到进入camera test的时候高概率出现卡顿死机现象。

第一感觉是camera test模块出了问题,拿到mtklog,查看aee_exp_backup目录,发现有SWI跟KE出现,使用最新版本GAT工具解开SWI和KE 的db.
首先看NE的db解析:
Exception Class: SWT
Exception Type: system_server_watchdog

Current Executing Process: 
system_server

Backtrace: 
Process: system_server
Subject: Blocked in handler on PowerManagerService (PowerManagerService)
Build: TINNO/k300/k300:7.0/NRD90M/660307543273:user/release-keys

上面很明显提示是PowerManagerService发生了block,查看当前该线程的backtrace
"PowerManagerService" prio=5 tid=22 Native
  | group="main" sCount=1 dsCount=0 obj=0x12e11100 self=0x91109500
  | sysTid=846 nice=0 cgrp=default sched=0/0 handle=0x90a8c920
  | state=D schedstat=( 1992630486 1878084775 7956 ) utm=104 stm=95 core=2 HZ=100
  | stack=0x9098a000-0x9098c000 stackSize=1038KB
  | held mutexes=
  at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native method)
  at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:625)
  at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:555)
  at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:184)
  - locked <0x0881207f> (a java.util.HashMap)
  at android.hardware.SensorManager.unregisterListener(SensorManager.java:663)
  at com.android.server.display.AutomaticBrightnessController.setLightSensorEnabled(AutomaticBrightnessController.java:329)
  at com.android.server.display.AutomaticBrightnessController.configure(AutomaticBrightnessController.java:250)
  at com.android.server.display.DisplayPowerController.updatePowerState(DisplayPowerController.java:732)
  at com.android.server.display.DisplayPowerController.-wrap6(DisplayPowerController.java:-1)
  at com.android.server.display.DisplayPowerController$DisplayControllerHandler.handleMessage(DisplayPowerController.java:1410)
  at android.os.Handler.dispatchMessage(Handler.java:110)

是这里发生卡住了导致软件看门狗复位了么?那么在Android log中应该也有相关线索,搜索SYS_ANDROID_LOG中的watchdog:
01-01 08:14:02.861   807  1065 E Watchdog: **SWT happen **Blocked in handler on PowerManagerService (PowerManagerService)
果然,从上面PowerManagerService的call stack可以看出这里是自动调节背光亮度线程在做als sensor的disable操作,该调用属于同步调用,但是一直没有等回来调用的结果,所以导致PowerManagerService被blcok住,进而导致system_server发生SWI异常!
从里看出本题其实跟camera test没有什么直接关系,所以说眼睛时常容易骗人,需要具体分析才能得出比较靠谱的结论。

这里有一个明显的关键问题,为什么自动背光去操作disable sensor会出现等不到结果呢?按理说是完全没问题的才对啊,从操作系统的调度上讲,一个线程(到内核都是线程)不在运行了,通常是有这些状态:停止态、浅度睡眠态(柱塞)、深度睡眠态,从当前的案例来说,应该明显属于浅度睡眠态,什么样的情况会导致线程进入睡眠呢?最容易想到的就是要拿锁等资源的情况,这种竞争态就容易出现这样的情况。另外这里有一个细节,就是ftm apk 做full test的时候,先执行als/ps test再执行camera test,也就是说执行camera test之前已经执行过了als/ps test,而这个就很可能给自动背光的操作冲突!为了验证猜想,需要解开KE db做进一步分析。
/* 分析调用栈的目的是为了分析当前进程在做什么动作,发生了什么事情  */

解开KE db,当前调用栈显示是发生了und 未定义指令异常:
Exception Class: Kernel (KE)
PC is at [<c0745ab0>] hang_detect_thread+0x240/0x2e8
LR is at [<c0ab494c>] _raw_spin_unlock_irqrestore+0x30/0x5c

Current Executing Process:
[hang_detect, 119][kthreadd, 2]

Backtrace:
[<c01001cc>] do_undefinstr+0x1a4/0x1ec 
[<c010c94c>] __und_svc_finish+0x0/0x34 
[<c0745ab0>] hang_detect_thread+0x240/0x2e8    
[<c013d634>] kthread+0xe8/0xfc 
[<c01070a0>] ret_from_fork+0x14/0x34   
[<ffffffff>] 0xffffffff

und指令异常,看不出具体什么问题,继续查看异常栈的backtrace,发现发生了典型的 kernel panic :
[ 1538.718486] -(1)[119:hang_detect]Backtrace: 
[ 1538.718808] -(1)[119:hang_detect][<c0745870>] (hang_detect_thread) from [<c013d634>] (kthread+0xe8/0xfc)
[ 1538.718950] -(1)[119:hang_detect] r9:00000000 r8:00000000 r7:c0745870 r6:00000000 r5:00000000 r4:ddc7f500
[ 1538.719609] -(1)[119:hang_detect][<c013d54c>] (kthread) from [<c01070a0>] (ret_from_fork+0x14/0x34)
[ 1538.719751] -(1)[119:hang_detect] r7:00000000 r6:00000000 r5:c013d54c r4:ddc7f500
[ 1538.720282] -(1)[119:hang_detect]Code: e34c00de ebffe2ce e3070530 ebe8f9b2 (e7f001f2) 
[ 1538.720498] -(1)[119:hang_detect]---[ end trace 81ce5b8f89369f28 ]---
[ 1539.673347] -(1)[119:hang_detect]Kernel panic - not syncing: Fatal exception

从main log知道是在做disable als sensor的操作导致的,追查kernel log中sensor的log信息发现有明显的异常backtrace:
[ 1501.504953]  (1)[119:hang_detect]Backtrace: 
[ 1501.504972]  (1)[119:hang_detect][<c0ab0ad8>] (__schedule) from [<c0ab125c>] (schedule+0x38/0x84)
[ 1501.504980]  (1)[119:hang_detect] r10:00000001 r9:d433e018 r8:c11313c0 r7:c1019600 r6:0001d55f r5:d433fd9c
[ 1501.505007]  (1)[119:hang_detect] r4:c11313c0
[ 1501.505025]  (1)[119:hang_detect][<c0ab1224>] (schedule) from [<c0ab3c8c>] (schedule_timeout+0x12c/0x1f0)
[ 1501.505042]  (1)[119:hang_detect][<c0ab3b60>] (schedule_timeout) from [<c01841b0>] (msleep+0x34/0x40)
[ 1501.505050]  (1)[119:hang_detect] r9:c104a888 r8:c104a974 r7:c115d64c r6:c115d64c r5:00000002 r4:d433e000
[ 1501.505085]  (1)[119:hang_detect][<c018417c>] (msleep) from [<c042a7c0>] (ltr553_als_operate+0x134/0x260)
[ 1501.505093]  (1)[119:hang_detect] r5:ddef6474 r4:ddef6400
[ 1501.505115]  (1)[119:hang_detect][<c042a68c>] (ltr553_als_operate) from [<c0423670>] (hwmsen_work_func+0x1a0/0x50c)
[ 1501.505122]  (1)[119:hang_detect] r7:c042a68c r6:00000004 r5:00000000 r4:00000010
[ 1501.505151]  (1)[119:hang_detect][<c04234d0>] (hwmsen_work_func) from [<c01380f0>] (process_one_work+0x158/0x45c)
[ 1501.505159]  (1)[119:hang_detect] r10:00000008 r9:00000000 r8:defb2f00 r7:c109cf58 r6:defacd80 r5:cd060f80
[ 1501.505186]  (1)[119:hang_detect] r4:dde92f2c
[ 1501.505203]  (1)[119:hang_detect][<c0137f98>] (process_one_work) from [<c0138444>] (worker_thread+0x50/0x490)
[ 1501.505211]  (1)[119:hang_detect] r10:00000008 r9:cd060f80 r8:cd060f98 r7:c109ccb4 r6:d433e020 r5:defacd94
[ 1501.505237]  (1)[119:hang_detect] r4:defacd80
[ 1501.505255]  (1)[119:hang_detect][<c01383f4>] (worker_thread) from [<c013d634>] (kthread+0xe8/0xfc)
[ 1501.505263]  (1)[119:hang_detect] r10:00000000 r9:00000000 r8:00000000 r7:c01383f4 r6:cd060f80 r5:00000000
[ 1501.505289]  (1)[119:hang_detect] r4:d301d680
[ 1501.505308]  (1)[119:hang_detect][<c013d54c>] (kthread) from [<c01070a0>] (ret_from_fork+0x14/0x34)
[ 1501.505316]  (1)[119:hang_detect] r7:00000000 r6:00000000 r5:c013d54c r4:d301d680
[ 1501.505350]  (1)[119:hang_detect]PowerManagerSer D
[ 1501.505362]  (1)[119:hang_detect] c0ab0e1c [ 1501.505376]  (1)[119:hang_detect]    0  6355   6077 0x00000001
[ 1501.505385]  (1)[119:hang_detect]Backtrace: 
[ 1501.505404]  (1)[119:hang_detect][<c0ab0ad8>] (__schedule) from [<c0ab125c>] (schedule+0x38/0x84)
[ 1501.505412]  (1)[119:hang_detect] r10:c104b364 r9:ce9e4018 r8:00000002 r7:c7fa1200 r6:ffffffff r5:c104b360
[ 1501.505438]  (1)[119:hang_detect] r4:c104b35c
[ 1501.505457]  (1)[119:hang_detect][<c0ab1224>] (schedule) from [<c0ab12c0>] (schedule_preempt_disabled+0x18/0x24)
[ 1501.505473]  (1)[119:hang_detect][<c0ab12a8>] (schedule_preempt_disabled) from [<c0ab2b58>] (__mutex_lock_slowpath+0xb8/0x194)
[ 1501.505489]  (1)[119:hang_detect][<c0ab2aa0>] (__mutex_lock_slowpath) from [<c0ab2c88>] (mutex_lock+0x54/0x58)
[ 1501.505497]  (1)[119:hang_detect] r10:00000000 r9:00000001 r8:00000028 r7:dde92f00 r6:00000004 r5:00000000
[ 1501.505523]  (1)[119:hang_detect] r4:c104b35c
[ 1501.505541]  (1)[119:hang_detect][<c0ab2c34>] (mutex_lock) from [<c0422b04>] (hwmsen_enable+0x68/0x3e8)
[ 1501.505549]  (1)[119:hang_detect] r4:00000010 r3:dded66c0
[ 1501.505571]  (1)[119:hang_detect][<c0422a9c>] (hwmsen_enable) from [<c04232b8>] (hwmsen_unlocked_ioctl+0x434/0x64c)
[ 1501.505579]  (1)[119:hang_detect] r9:9d8abf54 r8:c4142338 r7:9d8abf54 r6:9d8abf54 r5:00000000 r4:c115d604
[ 1501.505613]  (1)[119:hang_detect][<c0422e84>] (hwmsen_unlocked_ioctl) from [<c02316f8>] (do_vfs_ioctl+0x3e4/0x5a8)
[ 1501.505621]  (1)[119:hang_detect] r10:00000000 r9:9d8abf54 r8:c4142338 r7:40049104 r6:9d8abf54 r5:c7eb0480
[ 1501.505647]  (1)[119:hang_detect] r4:c7eb0481
[ 1501.505665]  (1)[119:hang_detect][<c0231314>] (do_vfs_ioctl) from [<c0231930>] (SyS_ioctl+0x74/0x84)
[ 1501.505672]  (1)[119:hang_detect] r10:00000000 r9:0000004a r8:00000000 r7:40049104 r6:9d8abf54 r5:c7eb0480
[ 1501.505699]  (1)[119:hang_detect] r4:c7eb0481
[ 1501.505717]  (1)[119:hang_detect][<c02318bc>] (SyS_ioctl) from [<c0107000>] (ret_fast_syscall+0x0/0x38)
[ 1501.505725]  (1)[119:hang_detect] r9:ce9e4000 r8:c01071a4 r7:00000036 r6:00000000 r5:a981138c r4:a9811380
[ 1501.505874]  (1)[119:hang_detect][Hang_Detect] dump init all thread bt
[ 1501.505885]  (1)[119:hang_detect]init            S
[ 1501.505896]  (1)[119:hang_detect] c0ab0e1c [ 1501.505911]  (1)[119:hang_detect]    0     1      0 0x00000000

上面有明显的提示问题所在函数了,需要分析下驱动代码,调出当前调用栈的als/ps驱动代码 ltr553.c:
3171 int ltr553_als_operate(void* self, uint32_t command, void* buff_in, int size_in,
3172                        void* buff_out, int size_out, int* actualout)
3173 {
...
3238             static  int  count_flag = 0;
3239             while((ltr553_obj->als_enable == 0)&&(count_flag < 12))
3240             {
3241                msleep(100);  /* 执行als disable的时候会跑这里,这段代码设计好像不合理? */
3242                count_flag++ ;
3243                APS_ERR(" count_flag = %d \n " ,count_flag);
3244             }

到现在就可以撸一撸了:从上面的调用栈关系结合代码可以看到当前hwmsen_enable()(此处是自动背光disable操作)会要去拿mutex_lock,如果拿不到就会陷入阻塞(主动释放CPU资源进入浅度睡眠),而该mutex_lock目前可以判断已经被ltr553_als_operate()的线程(用户空间对应是ftm als test)拿住,且该线程已经进入了msleep(100)*12的睡眠等待,但是这个也才1.2s啊,也不会导致SWT啊,SWT需要30s? 这里有一个细节就是上面的这个count_flag是个static类型的变量,这说明它是属于线程间共享的内存,也就是说,如果两个线程同时操作这个代码就有可能导致这个while循环一直出不来,从而导致timeout,分析上下文后发现确实会有这种情况出现,所以导致来自powerManagerServicehwmsen_enable()迟迟拿不到互斥锁mutex_lock陷入睡眠,最终导致system_server进程触发SWT异常.

解决方案

找相关负责人评估修改als/ps驱动的disable逻辑部分上面的while部分代码,需要优化.

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值