1、系统无法进入suspend
系统没有进入suspend,主要的原因是因为系统有锁导致.
锁一般分为:APP透过PowerManager拿锁,以及kernel wakelock.
分析上层持锁的问题:
目前PowerManagerService的log 默认不会打开,可以通过修改:
/frameworks/base/services/core/java/com/android/server/power/PowerManagerService.java
private static final boolean DEBUG = true; private static final boolean DEBUG_SPEW = DEBUG && false; 修改为: private static final boolean DEBUG = true; private static final boolean DEBUG_SPEW = true; 打开上层的log
通过syslog:搜索关键字:total_time=来确定持锁的时间.
PowerManagerService: releaseWakeLockInternal: lock=31602562 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=600051ms
或者通过正则表达式:total_time=[\d]{4,}ms 过滤出持锁时间比较长的锁.
PowerManagerService: releaseWakeLockInternal: lock=31602562 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=600051ms
PowerManagerService: releaseWakeLockInternal: lock=56317918 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=283062ms
PowerManagerService: releaseWakeLockInternal: lock=216012597 [AudioMix], flags=0x0, total_time=120003ms
PowerManagerService: releaseWakeLockInternal: lock=41036921 [AudioMix], flags=0x0, total_time=167984ms
PowerManagerService: releaseWakeLockInternal: lock=70859243 [GsmInboundSmsHandler], flags=0x0, total_time=3206ms
PowerManagerService: releaseWakeLockInternal: lock=242046348 [AudioMix], flags=0x0, total_time=122205ms
2 如何分析wakelock(wakeup source)持锁问题
锁一般分为:APP透过PowerManager拿锁,以及kernel wakelock.
分析上层持锁的问题:
目前PowerManagerService的log 默认不会打开,可以通过修改:
/frameworks/base/services/core/java/com/android/server/power/PowerManagerService.java
private static final boolean DEBUG = true; private static final boolean DEBUG_SPEW = DEBUG && false; 修改为: private static final boolean DEBUG = true; private static final boolean DEBUG_SPEW = true; 打开上层的log
通过syslog:搜索关键字:total_time=来确定持锁的时间.
PowerManagerService: releaseWakeLockInternal: lock=31602562 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=600051ms
或者通过正则表达式:total_time=[\d]{4,}ms 过滤出持锁时间比较长的锁.
PowerManagerService: releaseWakeLockInternal: lock=31602562 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=600051ms
PowerManagerService: releaseWakeLockInternal: lock=56317918 [*job*/DownloadManager:com.android.providers.downloads], flags=0x0, total_time=283062ms
PowerManagerService: releaseWakeLockInternal: lock=216012597 [AudioMix], flags=0x0, total_time=120003ms
PowerManagerService: releaseWakeLockInternal: lock=41036921 [AudioMix], flags=0x0, total_time=167984ms
PowerManagerService: releaseWakeLockInternal: lock=70859243 [GsmInboundSmsHandler], flags=0x0, total_time=3206ms
PowerManagerService: releaseWakeLockInternal: lock=242046348 [AudioMix], flags=0x0, total_time=122205ms
kernel的锁默认不会打印出来,一般是待机结束后通过节点来获取:
adb shell cat /sys/kernel/debug/wakeup_sources > wakeup_sources.log
active_count:对应wakeup source被激活的次数.
event_count:被信号唤醒的次数
wakeup_count:中止suspend的次数.
expire_count:对应wakeup source超时的次数.
active_since:上一次还活跃的时间点.时间单位跟kernel log前缀时间是一样(kernel单调递增时间).
total_time:对应wakeup source活跃的总时长.
max_time:对应的wakeup source持续活跃最长的一次时间.
last_change:上一次wakeup source变化的时间(从持锁到释放or释放到持锁),时间单位跟kernel log前缀时间是一样(kernel单调递增时间).
prevent_suspend_time:对应wakeup source阻止进入autosleep的总累加时间.
一般情况下:
如果是复现机,前面没有捉log,也没有dump log,只有一份wakeup_sources.log
可以看下prevent_suspend_time,一般时间越大越可能是阻止系统进入suspend的wakeup sources.
如果测试前后,都有捉 wakeup_sources.log 请对两份wakeup_sources.log的total time的差值.
差值时间跟灭屏的时间对得上,一般就是这个锁引起的问题.
把捉出来的wakeup_sources.log复制到excel表格中,比较好对齐,一个是比较好计算.
其中dispsys_wakelock total_time的时间有697614mS 也就是总共有697s.
或者在待机测试结束后通过命令:
adb bugreport > bugreport.txt
搜索关键:
底层的锁:
All kernel wake locks:
Kernel Wake lock ttyC0 : 1h 33m 15s 668ms (3856 times) realtime
Kernel Wake lock radio-interface: 1h 20m 56s 210ms (3995 times) realtime
Kernel Wake lock ccci3_at : 1h 9m 43s 491ms (2932 times) realtime
Kernel Wake lock ccci_fs : 1h 0m 52s 818ms (3432 times) realtime
Kernel Wake lock ccci3_at2 : 41m 16s 938ms (2465 times) realtime
上层的锁:
All partial wake locks:
Wake lock 1001 RILJ: 5m 29s 768ms (13118 times) realtime
Wake lock 1000 *alarm*: 4m 7s 823ms (2330 times) realtime
Wake lock 1000 ConnectivityService: 59s 513ms (1 times) realtime
Wake lock u0a111 *alarm*: 50s 334ms (751 times) realtime
Wake lock u0a111 WakerLock:999603354: 28s 655ms (125 times) realtime
Wake lock 1000 NetworkStats: 11s 434ms (569 times) realtime
3 如何看SPM的状态是否正确
待机被唤醒之后,确认:大部分debug_flag最后的bit位是不是f or ff, 说明系统还有模块咬住26M,系统并没有最后进入真正的suspend.
譬如下面的log都是有问题的:
<4>[ 250.874040] -(0)[1244:system_server][SPM] wake up byCONN2AP, timer_out = 8635, r13 = 0x20045038, debug_flag = 0x90
<4>[ 600.704307] -(0)[2722:system_server][SPM] wake up by R12_EINT_EVENT_B, timer_out = 81779, r13 = 0xe040000, debug_flag = 0x113f0
4
如何查找待机唤醒源
系统场景的唤醒源:EINT/CONN/CLDMA
EINT:
PMIC的唤醒.
a.Powerkey
唤醒后面的log会有pwrkey_int_handler
b. rtc alarm
唤醒后面的log会有alarm time is up
<2>[ 1145.475797]<3>-(0)[4497:kworker/u8:10][SPM] wake up by EINT, timer_out = 1406849, r13 = 0x41208, debug_flag = 0x9f
<7>[ 1145.475797]<3>-(0)[4497:kworker/u8:10]EINT 206 is pending
<5>[ 1145.476310]<3> (0)[53:pmic_thread]mtk_rtc_common: alarm time is up
<6>[ 1145.895970]<0> (0)[4497:kworker/u8:10]PM: suspend exit 2016-01-24 17:23:04.023386308 UTC
<2>[ 1146.231012]<3>-(0)[4497:kworker/u8:10][SPM] md_settle = 99, settle = 99
具体类型的唤醒包,可以确认:
从syslog里面搜索关键字wakeup alarm:
01-25 01:23:04.026 830 898 D AlarmManager: wakeup alarm = Alarm{3e671462 type 2 when 27213196 com.android.phone}; package = com.android.phoneneedGrouping = true
一般alarm的唤醒,除了第三方APK之外,有时会遇到类似android/phone APK的唤醒.
确认具体android的唤醒的原因,需要确定唤醒后,紧接着发下来的intent事件.
01-24 19:50:30.031 830 898 D AlarmManager: wakeup alarm = Alarm{9ba1b41 type 2 when 7259546 android}; package = androidneedGrouping = false
01-24 19:50:30.031 830 898 V ActivityManager: Broadcast: Intent { act=android.content.syncmanager.SYNC_ALARM flg=0x114 (has extras) } ordered=true userid=0 callerApp=null
搜索:android.content.syncmanager.SYNC_ALARM
可以定位到:/frameworks/base/services/core/java/com/android/server/content/SyncManager.java
进一步找对应owner确认.
phone apk的唤醒:数据网络的定时恢复.
c. others
kernel log有关键字:EINT.*is pending
序号:206
EINT 206 is pending
需要结合DCT跟cat /proc/interrupts
1.通过DCT:
2.cat /proc/interrupts :pmic-eint对应的序号是150.
289: 149 mt-eint 1 TOUCH_PANEL-eint
291: 0 mt-eint 3 11240000.msdc1 cd
294: 0 mt-eint 6 ALS-eint
295: 0 mt-eint 7 mrdump_ext_rst-eint
314: 73 mt-eint 26 irq_nfc-eint
332: 246 mt-eint 44
432: 0 mt-eint 144 iddig_eint
438: 341 mt-eint 150 pmic-eint
440: 0 mt-eint 152 spm_vcorefs_start_eint
441: 0 mt-eint 153 spm_vcorefs_end_eint
442: 0 mt-eint 154 spm_vcorefs_err_eint
<5>[30640.939329] -(0)[1191:system_server]EINT 150 is pending
......
<3>[30640.942131] (0)[69:pmic_thread]kpd: Power Key generate, pressed=1
<3>[30640.942189] (0)[69:pmic_thread]kpd: kpd: (pressed) HW keycode =116 using PMIC
CLDMA:
确认唤醒的channel ID,关键字:CLDMA_MD wakeup source
<2>[ 3072.796534]<2>-(0)[14284:kworker/u8:0][SPM] wake up by CLDMA_MD, timer_out = 5598687, r13 = 0x20001238, debug_flag = 0x9f
//唤醒点之后第一句CLDMA_MD wakeup source会打印出channel ID. 譬如该题是10
<5>[ 3072.810528]<2> (1)[23510:kworker/u9:0][ccci1/mcd]CLDMA_MD wakeup source:(1/10)
<6>[ 3072.872362]<3> (1)[14284:kworker/u8:0]PM: suspend exit 2016-02-05 11:28:18.030905923 UTC
<6>[ 3074.000405]<2> (0)[14284:kworker/u8:0]PM: suspend exit 2016-02-05 11:28:19.158954462 UTC
<2>[ 3075.262223]<3>-(0)[14284:kworker/u8:0][SPM] md_settle = 99, settle = 99
<2>[ 3075.262223]<3>-(0)[14284:kworker/u8:0][SPM] wake up by EINT, timer_out = 4159454, r13 = 0x41000, debug_flag = 0x9f
<6>[ 3075.341978]<3> (1)[14284:kworker/u8:0]PM: suspend exit 2016-02-05 11:30:28.031774693 UTC
<2>[ 3075.996184]<2>-(0)[14284:kworker/u8:0][SPM] md_settle = 99, settle = 99
[channel 10]:
需要搜索radio log确认是什么AT command唤醒了系统.
时间点:
2016-02-05 11:28:18.030905923 + 8h = 2016-02-05 19:28:18.030905923
以19:28:18搜索radio log,往下搜索,找到第一条AT<
02-05 19:28:18.032 20835 20856 D AT : AT< +CIREPI: 0
说明唤醒的AT command是AT< +CIREPI
针对AT command的唤醒,可以下载DCC文档 :AT_DOCUMENT_Modem
常见的AT command的唤醒:参考下面【常见AT 命令解析】
AT +ECOPS ------->PLMN信息变化.
搜网的次数 ------------> AT: CREG CGREG (一个是CS,一个是PS)
网络PDP状态变化的次数 ----------> AT:CGEV (PDN activate/deactivate)
VOLTE功能导致唤醒的次数 -----------> AT: CIREPI CIREPH CNEMS1 CIREG EIMS
LTE数据连接 ------------> AT: EDRBSTATE
[channel 14]
一般是跟channel10一起产生,modem小区消息变化,会记录小区的信息到nvram.
[channel20/24]
数据连接的唤醒.
<2>[10133.766886]<2>-0)[SPM] wake up by CLDMA_MD, timer_out = 1901490, r13 = 0x340a133c, debug_flag = 0x31f
<5>[10133.774491]<2> 2)[ccci1/mcd]CLDMA_MD wakeup source:(4/24)
<6>[10133.822549]<1> (0)[14053:kworker/u16:2]PM: suspend exit 2016-01-06 16:37:59.027987308 UTC
<2>[10134.994190]<0>-0)[SPM] md_settle = 99, settle = 99
<2>[10134.994190]<0>-0)[SPM] wake up by CLDMA_MD, timer_out = 1918768, r13 = 0x340a133c, debug_flag = 0x31f
<5>[10134.996421]<0> 0)[ccci1/mcd]CLDMA_MD wakeup source:(4/24)
<6>[10135.087194]<1> (0)[14053:kworker/u16:2]PM: suspend exit 2016-01-06 16:38:59.069914154 UTC
<2>[10136.211339]<1>-0)[SPM] md_settle = 99, settle = 99
使用winshark打开netlog:
main log里面搜索IP 地址:
112.17.251.148
01-06 21:47:58.060 313 25080 D libc-netbsd: res_queryN name = push.hexin.cn succeed
01-06 21:47:58.060 25056 25081 I AppStore.StorageManager: ab:e:150407-372=>in removeSpuriousFiles
01-06 21:47:58.060 21561 25079 D libc-netbsd: getaddrinfo: push.hexin.cn get result from proxy >>
01-06 21:47:58.060 21561 25079 I System.out: propertyValue:true
01-06 21:47:58.061 21561 25079 I System.out: [CDS]connect[push.hexin.cn/112.17.251.148:8887] tm:90
请参考【如何确认具体一次唤醒,是哪个APK引起的】:
[channel32]
modem SIM driver获取sim GPIO状态所造成的唤醒. 这部分情况比较少,确认review 贵司sim driver是否有相关的design.
[channel34]
WIFI 跟4G 有部分频段是重叠的,modem 需要把频段的信息通知WIFI所造成的唤醒. 这部分属于正常的design.
[channel55]
VOLTE 网络的唤醒.
[channel6/42]
这是打开modem log造成的,分析功耗问题除非发现跟modem有关系,否则捉log时,不要打开modem log
具体的channel 对应的定义可以参考:
N版本:
/kernel-4.4/drivers/misc/mediatek/eccci/ccci_core.h
M版本:
/kernel-3.18/drivers/misc/mediatek/include/mt-plat/mt_ccci_common.h
L版本:
/kernel-3.10/include/mach/mt_ccci_common.h
[channel 14与10]常见AT 命令解析,注意红色字体部分字段
命令2:AT< +COPS(运营商信息)
命令4: AT< +CGREG
命令6:AT+EGTYPE(attach状态)
命令8:AT+EI3GPPIRAT(C2K切网)
信号强度相关AT:
命令7:AT< +ECSQ(信号强度)
网络注册状态AT:
命令3:AT< +CREG(NEWORK REGISTER)
命令4: AT< +CGREG
网络VOLTE支持情况上报:
命令5:AT+CIREP (IMS网络支持情况)
详细解析:
命令1:AT+EDRBSTATE( 4G data link 的URC data shapping,状态变原因是data link状态是否发生变化)
+EDRBSTATE主动上报有三种情况:
(1)每次下AT+EDRB=1,会主动上报EDRBSTATE
(2)如果当前act的状态和之前记录的act的状态一个是7,一个非7,那么就会主动上报EDRBSTATE。需要继续check log中主动上报的CGREG并不多
(3)当主动上报的消息MSG_ID_L4C_EVAL_CONNECTED_DRB_IND中带的参数is_drb_existent与之前得到的不同时,会主动上报EDRBSTATE。
AT< +EDRBSTATE: 0,7
第一个参数:
0:DRB(data radio state)不存在 DRB OFF 表示現在可能是4G data link but AS IDLE/ 23G / 4G no data link
1:drb存在 DRB ON 表示 4G data link but AS CONNECTED
第二参数:act 同其他 可以同CREG一起理解
AT+EDRB:
MODE:0 :TURN OFF URC+EDRBSTATE 1:TURN ON URC+EDRBSTATE
命令2:AT< +COPS(运营商信息)
AT< +COPS: 0,2,"46001",2
第一个参数mode:0 自动,1自动
第二参数:format 对第三个参数的制式
第三个参数:oper 运营商string,46000 移动 46001联通
第四个参数:网络act
0:gsm
2:utran
7:lte
命令3:AT< +CREG(NEWORK REGISTER)
AT< +CREG: 3,1,"A80F","00CD5C14",6,0,0
第1个参数:0:disable 网络register urc 1: enable 网络register +cgreg 2:enable 网络cgreg+local info +cgreg stat+lac+ci 3:同2+error
第2个参数:0:not register 1:register home network 2:没有注册,但是try 3:register 拒绝 4:...
第3个参数:lac:location area code
第4个参数:ci:cell info
第5个参数:网络act
0:gsm
2:utran
3:gsm w/egprs
4:utran w/hsdpa (High Speed Downlink Packet Access)高速下行分组接入
5:utran w/hsupa(high speed uplink packet access)高速上行链路分组接入
6:utran w/hsdpa and hsupa
7:e-utran
命令4: AT< +CGREG(GPRS NETWORK):AT< +CEREG(EPS/LTE NETWORK) 这个是注册gprs网络和LTE网络的命令
AT< +CGREG: 3,1,"A80F","00CD5C14",6,"0F",0,0
第1个参数:0:disable 网络register urc 1: enable 网络register +cgreg 2:enable 网络cgreg+local info +cgreg stat+lac+ci 3:同2+error
第2个参数:0:not register 1:register home network 2:没有注册,但是try 3:register 拒绝 4:...
第3个参数:lac:location area code
第4个参数:ci:cell info
第5个参数:网络act
0:gsm
2:utran
3:gsm w/egprs
4:utran w/hsdpa
5:utran w/hsupa
6:utran w/hsdpa and hsupa
7:e-utran
第6个参数:rac:route area code
第7个参数:cause type
第8个参数:reject cause
命令3,命令4情况:registration status changed, modem report +CREG/+CGREG URC
命令5:AT+CIREP (IMS网络支持情况)
AT+ECSQ信号格相关
命令6:AT+EGTYPE(attach状态)
1: Always attached
0: attached when needed
命令7:AT< +ECSQ(信号强度)
AT< +ECSQ: 41,99,-279,1,1,1,1,0
主要解析第一个参数,第一个参数变化说明信号强度发生变化
命令8:AT+EI3GPPIRAT(C2K切网)
C2K切网的AT命令
+EI3GPPIRAT= <act>, <src_rat>, <dst_rat>
<act>: Integer
1 source RAT start
2 target RAT start
3 source RAT finish
4 target RAT finish
<src_rat>: Integer
1 1xRTT
2 HRPD
3 EHRPD
4 LTE
解释下EI3GPPIRAT的含义,
1(start),4(LTE),3(CDMA)
3(finishe),4(LTE),4(LTE)
命令9:modem开关
▪ AT+EPOF – used to power off modem
▪ AT+EPON – used to power on modem
▪ The first command trigger whole protocol stack: AT+CFUN - Set phone functionality
– 0: minimum functionality
– 1: full functionality
– 2: disable TX RF circuits
– 3: disable RX RF circuits
– 4: disable both TX and RX circuits
命令10:AT+COPS( turn on / off ECOPS) EOPS(ENhence operator selection) COPS(operator selection)
0:disable +EOPS URC
1:enable+EOPS URC
AT< +COPS +N +LIST plmn
NW->MS 第一个参数plmn count
第二个参数 plmn list
命令11:AT+ESMCI(switch modem cause info)
AT< +ESMCI: "46000",0,64
第一参数:plmn id
第二参数:plmn source
0:plmn search proc
1:cell reselection proc
255:invalid proc
第三参数:rat
0:RATE_NONE(DEFAULT)
1:GSM
2:UMTS
4:FDD_LTE
64:TD_LTE
68:TD_FDD_LTE
channel对应的表格:
CONN:
一般情况是因为打开了wifi 或者蓝牙,一般是wifi居多,APK如果建立起TCP 连接,网络包通过wifi回传给手机,都会这种类型的唤醒.
其中一次完整的唤醒有如下的log:
//第一句唤醒,wake up by CONN2AP,代表被connectivity wakeup,大部分情况是WIFI数据.
<2>[ 172.612907]<1>-(0)[142:kworker/u8:4][SPM] wake up by CONN2AP, timer_out = 19411, r13 = 0x20047238, debug_flag = 0x9c
//唤醒的时间点:
<6>[ 172.679674]<1> (3)[142:kworker/u8:4]PM: suspend exit 2015-12-27 03:09:38.023614769 UTC
<6>[ 172.948171]<1> (3)[142:kworker/u8:4]PM: suspend exit 2015-12-27 03:09:38.292113615 UTC
<6>[ 173.572666]<1> (0)[142:kworker/u8:4]PM: suspend exit 2015-12-27 03:09:38.916604538 UTC
<6>[ 173.989622]<3> (1)[142:kworker/u8:4]PM: suspend exit 2015-12-27 03:09:39.333548769 UTC
<6>[ 174.399687]<3> (3)[142:kworker/u8:4]PM: suspend exit 2015-12-27 03:09:39.743622615 UTC
//最后一句sleep的log. 只有出现md_settle才代表真正sleep.
<2>[ 176.028364]<3>-(0)[95:kworker/u8:2][SPM] md_settle = 99, settle = 99
2015-12-27 03:09:38.023614769对应的main log时间为:2015-12-27 11:09:38.023614769
确认引起唤醒的问题包:
在main log里面搜索关键字:Posix_connect Debug:
Posix_connect Debug的log,只有在应用第一次建立TCP的连接的时候才会打印,后期TCP的来回握手,并不会打印出来.
譬如下面的log,可以发现对应的时间点,是豌豆荚这个应用在发包
【下面的log代表是那些APK在发包,有发包就有收包】12-27 11:05:44.279 2895 3138 D Posix : [Posix_connect Debug]Process com. lava:pushservice :80
12-27 11:06:02.786 4715 4778 D Posix : [Posix_connect Debug]Process com.sina.weibo :443
12-27 11:06:07.779 4715 4798 D Posix : [Posix_connect Debug]Process com.sina.weibo :443
12-27 11:06:10.080 4827 4917 D Posix : [Posix_connect Debug]Process com.sina. weibo:remote :4828
12-27 11:06:15.465 4827 4917 D Posix : [Posix_connect Debug]Process com.sina. weibo:remote :4828
12-27 11:09:38 .313 3635 3670 D Posix : [Posix_connect Debug] Process com.wandoujia.phoenix2:update_service :443
12-27 11:09:42.347 4827 4917 D Posix : [Posix_connect Debug]Process com.sina. weibo:remote :4828
12-27 11:09:49.055 4827 4917 D Posix : [Posix_connect Debug]Process com.sina. weibo:remote :4829
12-27 11:09:49.111 4827 4942 D Posix : [Posix_connect Debug]Process com.sina. weibo:remote :80
12-27 11:09:51.199 3495 3570 D Posix : [Posix_connect Debug]Process com.wandoujia.phoenix2 :80