Android ANR-Thermal condition

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/laisse/article/details/47258601

Log pattern
limit CPU frequency (QCT projects): If cpu’s temperature is over 60 degree, KTM thermal driver will start to limit cpu frequence.

// QCT: in this case, cpu0 temperature reaches 80℃, the max frequency has been constrained to 960000Hz
@device_log
I ThermalEngine: TM Id ‘CPU0_MONITOR’ Sensor ‘cpu0’ - alarm raised 4 at 80.0 degC
I ThermalEngine: ACTION: CPU - Setting CPU[0] to 960000

I PnPMgr : Write ‘/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq’ = ‘960000’

limit CPU core (QCT projects): If cpu’s temperature is over 80 degree, KTM thermal drive will start to off line CPU core number.

// QCT: in this case, CPU 2/3 be disabled due to temperature reach threshold 96℃
@kernel log
<6>[12426.253624] c0 20646 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[12426.253672] c0 20646 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[12426.259053] c2 0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[12426.264927] c3 0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[12426.272855] c0 1732 _cpu_up: attempt to bring up CPU 3 failed
<4>[12426.274905] c0 1732 _cpu_up: attempt to bring up CPU 2 failed

limit CPU frequency / core (MTK projects)

  • idx: the index of the thermal table
  • limited_max_ncpu: limit CPU core
  • limited_max_freq: limit CPU frequency (Hz)

// MTK: in this case, limit CPU core is 3, and frequency is 1183000Hz
@kernel log
<5>[229823.673401] (0)[1498:kworker/0:1][Power/cpufreq] _thermal_limited_verify(): idx = 43, limited_max_ncpu = 3, limited_max_freq = 1183000

current Temperature (MTK projects)

// cpu temperature is 47.3 degC. (if CPU temp. is over 59/60 degC will be treat as over hot)
// ap (pcb) temperature is 36 degC. (if pcb temp. is over 44 degC will be treat as over hot)
@kernel log

03-22 00:55:44 <6>[76338.649280] [32129:kworker/0:2][K][THERMAL] Sensor 0 ( cpu) = 47300
03-22 00:55:44 <6>[76338.649307][32129:kworker/0:2][K][THERMAL] Sensor 1 (  abb) = 40300
03-22 00:55:44 <6>[76338.649334] [32129:kworker/0:2][K][THERMAL] Sensor 2 ( pmic) = 44790
03-22 00:55:44 <6>[76338.649360][32129:kworker/0:2][K][THERMAL] Sensor 3 ( batt) =36000
03-22 00:55:44 <6>[76338.649386] [32129:kworker/0:2][K][THERMAL] Sensor 4 (md1) = 44000
03-22 00:55:44 <6>[76338.649413] [32129:kworker/0:2][K][THERMAL] Sensor 6 ( wifi) = 44000 
03-22 00:55:44 <6>[76338.649439] [32129:kworker/0:2][K][THERMAL] Sensor 8 ( buck) = 43790
03-22 00:55:44 <6>[76338.649464] [32129:kworker/0:2][K][THERMAL] Sensor 9 (   ap) = 36000

current CPU frequency / core (MTK projects)

// CPU freq. : the max cpufreq. is 1326000 KHz in this case
03-22 00:57:49 <6>[76463.841392] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 940mv, Vsram = 1031mv, freq = 1326000 KHz 
03-22 00:57:49 <6>[76463.900591] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 970mv, Vsram = 1062mv, freq = 1469000 KHz
03-22 00:57:49 <6>[76463.920676] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 900mv, Vsram = 1000mv, freq = 1183000 KHz

// CPU core : there are 8 cores in this case 
03-22 00:57:54 <6>[76469.004028] (0)[32129:kworker/0:2][K] CPU Usage    PID Name    CPU Time(Total: 7912) 
03-22 00:57:54 <6>[76469.004041](0)[32129:kworker/0:2][K]   7%  2282    aee_dumpstate   581 
03-22 00:57:55 <6>[76469.111962] (0)[32129:kworker/0:2][K]  3%   841    system_server      240
03-22 00:57:55 <6>[76469.111980](0)[32129:kworker/0:2][K]   1%  385 sdcard      143 
03-22 00:57:55<6>[76469.111989] (0)[32129:kworker/0:2][K]   1%  2647    librank     122
03-22 00:57:55 <6>[76469.111998] (0)[32129:kworker/0:2][K]  1%  2375    er.line.android    116
03-22 00:57:55 <6>[76469.112006] (0)[32129:kworker/0:2][K] CPU usage per core: [69%][52%][40%][31%][16%][6%][5%][3%]

Case Study


Case: A51_DTUL_K44_DESIRE_SENSE60 PQCS#3048
camera ANR happened at ‘Camera stability field trial testing’ in High temperature situation.
Issue description:

  • Can’t find the specific blocked root cause, but the overall performance is slow when ANR happened (according to the callstack at ANR timestamp).
  • Check kernel log, and found the CPU core number have be limited due to thermal condition protection mechanism.
  • Ask SSD power team to comment if any abnormal in thermal configuration.
Log analysis:
Default checking, and suspect this ANR may cause by the overall performance low. 

> @event log
// ANR happened at 14:25:48
10-17 14:25:48.008  1098  1130 I am_anr  : [0,2525,com.htc.camera,1083752005,Input dispatching timed out (Waiting because the touched window has not finished processing the input events that were previously delivered to it.)]

@device log // according to message-history, the last message spent 6163ms and still not finished 
10-17 14:25:48.008  2525  2536 E ActivityThread: MSG: Dump looper, process: com.htc.camera 
10-17 14:25:48.008  2525  2536 E ActivityThread: MSG: ===== Message History (Most recent to least recent) =====
10-17 14:25:48.008  2525  2536 E ActivityThread: MSG: Handler (com.htc.camera.HTCCamera$MainHandler) {41dfc4f0} null: 10200, Unfinished(6163ms), start=73773
10-1714:25:46.718  2525  2587 W ThreadMonitor: Stack trace for thread 'main' (tid = 2525) 
10-17 14:25:46.718  2525  2587 W ThreadMonitor:   -> android.graphics.BitmapFactory.nativeDecodeAsset (Native Method) 10-17 14:25:46.718  2525  2587 W ThreadMonitor:     -> android.graphics.BitmapFactory.decodeStream (BitmapFactory.java:708)

10-17 14:25:49.818  2525  2587 W ThreadMonitor: Thread 'main' (tid = 2525) is not responding, last response time = 04/27 12:05:11.221 (8.5492 sec ago)
10-17 14:25:49.818  2525  2587 W ThreadMonitor: Stack trace for thread 'main' (tid = 2525)
10-17 14:25:49.818  2525  2587 W ThreadMonitor:     -> android.os.BinderProxy.transact (Native Method)
10-17 14:25:49.818  2525  2587 W ThreadMonitor:     -> android.app.ActivityManagerProxy.getContentProvider (ActivityManagerNative.java:3036)
10-17 14:25:49.818  2525  2587 W ThreadMonitor:     -> android.app.ActivityThread.acquireProvider (ActivityThread.java:4999)

10-17 14:26:19.298  2525  2587 W ThreadMonitor: Thread 'Thumbnail Controller Worker Thread' (tid = 2716) is not responding, last response time = 04/30 11:16:52.147 (6.2412 sec ago)
10-17 14:26:19.348  2525  2587 W ThreadMonitor: Stack trace for thread 'Thumbnail Controller Worker Thread' (tid = 2716)
10-17 14:26:19.368  2525  2587 W ThreadMonitor:     -> android.media.MediaMetadataRetriever.setDataSource (Native Method)
10-17 14:26:19.368  2525  2587 W ThreadMonitor:     -> android.media.MediaMetadataRetriever.setDataSource (MediaMetadataRetriever.java:78)

// the CPU usage is litte high around that time
10-17 14:26:28.188  1098  1130 E ActivityManager: CPU usage from 25626ms to 39741ms later:
10-17 14:26:28.188  1098  1130 E ActivityManager:   44% 1098/system_server: 31% user + 12% kernel / faults: 8589 minor
10-17 14:26:28.188  1098  1130 E ActivityManager:     5.8% 1130/ActivityManager: 3.8% user + 2% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     3.9% 1102/GC: 3.6% user + 0.2% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     3.5% 1131/android.bg: 1.1% user + 2.3% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     2.8% 1098/system_server: 1.3% user + 1.5% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     1.9% 1103/Signal Catcher: 1.4% user + 0.5% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     1.9% 1105/Compiler: 1.4% user + 0.5% kernel
10-17 14:26:28.188  1098  1130 E ActivityManager:     1.4% 1142/android.ui: 1.4% user + 0% kernel

found the ‘thermal condition’ log pattern in kernel log, and confirmed it caused by thermal.

@kernel log 
//the device keeps in high temperature and limit CPU frequency & core since boot-up
<6>[1.767858] c0    147 msm_thermal:do_core_control Set Offline: CPU2 Temp: 93
<3>[1.767885] c0    147 msm_thermal:do_core_control Error -16 offline core 2
<6>[1.767947] c0    147 msm_thermal:do_cluster_freq_ctrl Limiting CPU0 max frequency to 499200. Temp:93
<6>[1.767969] c0    147 msm_thermal:do_cluster_freq_ctrl Limiting CPU1 max frequency to 499200. Temp:93
<6>[1.767991] c0    147 msm_thermal:do_cluster_freq_ctrl Limiting CPU2 max frequency to 499200. Temp:93
<6>[1.768013] c0    147 msm_thermal:do_cluster_freq_ctrl Limiting CPU3 max frequency to 499200. Temp:93
...
<6>[69.912586] c0     35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912679] c0     35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912707] c0     35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912734] c0     35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<4>[70.565719] c0    457 _cpu_up: attempt to bring up CPU 1 failed
<4>[70.566178] c0    457 _cpu_up: attempt to bring up CPU 2 failed
<4>[70.566647] c0    457 _cpu_up: attempt to bring up CPU 3 failed

// ANR happened at 14:25:48.008, and there is only one CPU core atthat time 
Update: [2014/10/17 14:25:41 (73.432463930)] ... 
<6>[70.745164] c0     35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95 
<6>[70.745233] c0   35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[70.745259] c0   35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95 
<6>[70.745285] c0    35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<4>[71.287859] c0    457 _cpu_up: attempt to bring up CPU 1 failed
<4>[71.288317] c0    457 _cpu_up: attempt to bring up CPU 2 failed
<4>[71.288725] c0    457 _cpu_up: attempt to bring up CPU 3 failed

Case: A3_UL_K44_DESIRE_SENSE60 #419

systemui ANR happened at Boot testing in High temp. situation.

Issue description:

[Boot test-Restart mode][High temperature] The device pop “Process com.android.systemui isn’t responding. DO you want to close it?”

Log analysis:

  • Some issue cases caused by system performance low due to CPU core disabled with thermal condition.
  • In normal case, all CPU core should still be enabled when device in high temp.
  • The log below indicates that the CPU core 2 and 3 are disabled. The C# is the CPU core index.If there are kernel log dumped by c2, it means core 2 is enabled.
@kernel log
<6>[   87.308709] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 100
<6>[   87.308739] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 100
<6>[   87.308778] c0     32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 96
<6>[   87.321984] c1      0 CPU1: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[   87.329715] c2   1647 IRQ40 no longer affine to CPU2
<6>[   87.331735] c2      0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[   87.336939] c3      0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[   87.622214] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 87
<6>[   87.622269] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 87
<6>[   87.622339] c0     32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 90
Update: [2014/02/22 09:48:17 (90.705365959)]
<6>[   87.935747] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[   87.935821] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[   87.948349] c2      0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[   87.952685] c0   1700 _cpu_up: attempt to bring up CPU 2 failed
<4>[   87.955305] c3   1725 IRQ40 no longer affine to CPU3
<6>[   87.957575] c3      0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[   88.249085] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 89
<6>[   88.249138] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 89
<6>[   88.562543] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 98
<6>[   88.562602] c0     32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 98
<6>[   88.562682] c0     32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 95

@device log 
02-22 09:47:52.828   346   383 I ThermalEngine: TM Id 'CPU0-1_MONITOR' Sensor 'cpu0-1' - alarm cleared 4 at 87.0 degC 
02-22 09:47:52.828   346   383 I ThermalEngine: ACTION: CPU - Setting CPU[0] to 787200 
02-22 09:47:52.828   346   383 I ThermalEngine: ACTION: CPU - Setting CPU[1] to 787200 
02-22 09:47:52.828   353   410 I PnPMgr  : Write '/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq' ='787200' <Rule: Foreground Apps> 
02-22 09:47:52.828   353   410 I PnPMgr  : Write '/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq' = '787200' <Rule: thermal_c0>
...
02-22 09:48:04.518  1080  1080 I BroadcastQueue: Deliver order [FG] BF{44bb6d80 u0 RL{44bb6d20 1342 com.android.systemui/10051/u0 remote:44c5e840}}:BR@44c539d0
02-22 09:48:14.874  1080  1222 W BroadcastQueue: Timeout of broadcast BroadcastRecord{44c539d0 u-1 android.intent.action.TIME_TICK callingPid=-1 callingUid=1000} - receiver=android.os.BinderProxy@44c5e840
@events log
02-22 09:48:10.345  1080  1559 I binder_sample: [com.android.server.am.ActivityManagerService,29,1009,execTransact,100]
02-22 09:48:12.065  1080  1559 I binder_sample: [com.android.server.content.ContentService,2,1292,execTransact,100]
02-22 09:48:12.065  1080  1107 I binder_sample: [com.android.server.content.ContentService,2,1239,execTransact,100]
02-22 09:48:12.185  1471  1471 I binder_sample: [android.content.IContentService,2,1240,com.android.phone,100]
02-22 09:48:13.035  1080  1110 I binder_sample: [com.android.server.InputMethodManagerService,12,1846,execTransact,100]
02-22 09:48:13.045  1506  1506 I binder_sample: [com.android.internal.view.IInputMethodManager,12,2088,com.htc.launcher,100]
02-22 09:48:14.964  1080  1222 I am_anr  : [0,1342,com.android.systemui,1086897741,Broadcast of Intent { act=android.intent.action.TIME_TICK flg=0x50000014 (has extras) }]
阅读更多

没有更多推荐了,返回首页