Log pattern
limit CPU frequency (QCT projects): If cpu’s temperature is over 60 degree, KTM thermal driver will start to limit cpu frequence.
// QCT: in this case, cpu0 temperature reaches 80℃, the max frequency has been constrained to 960000Hz
@device_log
I ThermalEngine: TM Id ‘CPU0_MONITOR’ Sensor ‘cpu0’ - alarm raised 4 at 80.0 degC
I ThermalEngine: ACTION: CPU - Setting CPU[0] to 960000
…
I PnPMgr : Write ‘/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq’ = ‘960000’
limit CPU core (QCT projects): If cpu’s temperature is over 80 degree, KTM thermal drive will start to off line CPU core number.
// QCT: in this case, CPU 2/3 be disabled due to temperature reach threshold 96℃
@kernel log
<6>[12426.253624] c0 20646 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[12426.253672] c0 20646 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[12426.259053] c2 0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[12426.264927] c3 0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[12426.272855] c0 1732 _cpu_up: attempt to bring up CPU 3 failed
<4>[12426.274905] c0 1732 _cpu_up: attempt to bring up CPU 2 failed
limit CPU frequency / core (MTK projects)
- idx: the index of the thermal table
- limited_max_ncpu: limit CPU core
- limited_max_freq: limit CPU frequency (Hz)
// MTK: in this case, limit CPU core is 3, and frequency is 1183000Hz
@kernel log
<5>[229823.673401] (0)[1498:kworker/0:1][Power/cpufreq] _thermal_limited_verify(): idx = 43, limited_max_ncpu = 3, limited_max_freq = 1183000
current Temperature (MTK projects)
// cpu temperature is 47.3 degC. (if CPU temp. is over 59/60 degC will be treat as over hot)
// ap (pcb) temperature is 36 degC. (if pcb temp. is over 44 degC will be treat as over hot)
@kernel log
03-22 00:55:44 <6>[76338.649280] [32129:kworker/0:2][K][THERMAL] Sensor 0 ( cpu) = 47300
03-22 00:55:44 <6>[76338.649307][32129:kworker/0:2][K][THERMAL] Sensor 1 ( abb) = 40300
03-22 00:55:44 <6>[76338.649334] [32129:kworker/0:2][K][THERMAL] Sensor 2 ( pmic) = 44790
03-22 00:55:44 <6>[76338.649360][32129:kworker/0:2][K][THERMAL] Sensor 3 ( batt) =36000
03-22 00:55:44 <6>[76338.649386] [32129:kworker/0:2][K][THERMAL] Sensor 4 (md1) = 44000
03-22 00:55:44 <6>[76338.649413] [32129:kworker/0:2][K][THERMAL] Sensor 6 ( wifi) = 44000
03-22 00:55:44 <6>[76338.649439] [32129:kworker/0:2][K][THERMAL] Sensor 8 ( buck) = 43790
03-22 00:55:44 <6>[76338.649464] [32129:kworker/0:2][K][THERMAL] Sensor 9 ( ap) = 36000
current CPU frequency / core (MTK projects)
// CPU freq. : the max cpufreq. is 1326000 KHz in this case
03-22 00:57:49 <6>[76463.841392] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 940mv, Vsram = 1031mv, freq = 1326000 KHz
03-22 00:57:49 <6>[76463.900591] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 970mv, Vsram = 1062mv, freq = 1469000 KHz
03-22 00:57:49 <6>[76463.920676] (2)[56:cfinteractive][Power/cpufreq] @_cpufreq_set_locked(): Vproc = 900mv, Vsram = 1000mv, freq = 1183000 KHz
// CPU core : there are 8 cores in this case
03-22 00:57:54 <6>[76469.004028] (0)[32129:kworker/0:2][K] CPU Usage PID Name CPU Time(Total: 7912)
03-22 00:57:54 <6>[76469.004041](0)[32129:kworker/0:2][K] 7% 2282 aee_dumpstate 581
03-22 00:57:55 <6>[76469.111962] (0)[32129:kworker/0:2][K] 3% 841 system_server 240
03-22 00:57:55 <6>[76469.111980](0)[32129:kworker/0:2][K] 1% 385 sdcard 143
03-22 00:57:55<6>[76469.111989] (0)[32129:kworker/0:2][K] 1% 2647 librank 122
03-22 00:57:55 <6>[76469.111998] (0)[32129:kworker/0:2][K] 1% 2375 er.line.android 116
03-22 00:57:55 <6>[76469.112006] (0)[32129:kworker/0:2][K] CPU usage per core: [69%][52%][40%][31%][16%][6%][5%][3%]
Case Study
Case: A51_DTUL_K44_DESIRE_SENSE60 PQCS#3048
camera ANR happened at ‘Camera stability field trial testing’ in High temperature situation.
Issue description:
- Can’t find the specific blocked root cause, but the overall performance is slow when ANR happened (according to the callstack at ANR timestamp).
- Check kernel log, and found the CPU core number have be limited due to thermal condition protection mechanism.
- Ask SSD power team to comment if any abnormal in thermal configuration.
Log analysis:
Default checking, and suspect this ANR may cause by the overall performance low.
> @event log
// ANR happened at 14:25:48
10-17 14:25:48.008 1098 1130 I am_anr : [0,2525,com.htc.camera,1083752005,Input dispatching timed out (Waiting because the touched window has not finished processing the input events that were previously delivered to it.)]
@device log // according to message-history, the last message spent 6163ms and still not finished
10-17 14:25:48.008 2525 2536 E ActivityThread: MSG: Dump looper, process: com.htc.camera
10-17 14:25:48.008 2525 2536 E ActivityThread: MSG: ===== Message History (Most recent to least recent) =====
10-17 14:25:48.008 2525 2536 E ActivityThread: MSG: Handler (com.htc.camera.HTCCamera$MainHandler) {41dfc4f0} null: 10200, Unfinished(6163ms), start=73773
10-1714:25:46.718 2525 2587 W ThreadMonitor: Stack trace for thread 'main' (tid = 2525)
10-17 14:25:46.718 2525 2587 W ThreadMonitor: -> android.graphics.BitmapFactory.nativeDecodeAsset (Native Method) 10-17 14:25:46.718 2525 2587 W ThreadMonitor: -> android.graphics.BitmapFactory.decodeStream (BitmapFactory.java:708)
10-17 14:25:49.818 2525 2587 W ThreadMonitor: Thread 'main' (tid = 2525) is not responding, last response time = 04/27 12:05:11.221 (8.5492 sec ago)
10-17 14:25:49.818 2525 2587 W ThreadMonitor: Stack trace for thread 'main' (tid = 2525)
10-17 14:25:49.818 2525 2587 W ThreadMonitor: -> android.os.BinderProxy.transact (Native Method)
10-17 14:25:49.818 2525 2587 W ThreadMonitor: -> android.app.ActivityManagerProxy.getContentProvider (ActivityManagerNative.java:3036)
10-17 14:25:49.818 2525 2587 W ThreadMonitor: -> android.app.ActivityThread.acquireProvider (ActivityThread.java:4999)
10-17 14:26:19.298 2525 2587 W ThreadMonitor: Thread 'Thumbnail Controller Worker Thread' (tid = 2716) is not responding, last response time = 04/30 11:16:52.147 (6.2412 sec ago)
10-17 14:26:19.348 2525 2587 W ThreadMonitor: Stack trace for thread 'Thumbnail Controller Worker Thread' (tid = 2716)
10-17 14:26:19.368 2525 2587 W ThreadMonitor: -> android.media.MediaMetadataRetriever.setDataSource (Native Method)
10-17 14:26:19.368 2525 2587 W ThreadMonitor: -> android.media.MediaMetadataRetriever.setDataSource (MediaMetadataRetriever.java:78)
// the CPU usage is litte high around that time
10-17 14:26:28.188 1098 1130 E ActivityManager: CPU usage from 25626ms to 39741ms later:
10-17 14:26:28.188 1098 1130 E ActivityManager: 44% 1098/system_server: 31% user + 12% kernel / faults: 8589 minor
10-17 14:26:28.188 1098 1130 E ActivityManager: 5.8% 1130/ActivityManager: 3.8% user + 2% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 3.9% 1102/GC: 3.6% user + 0.2% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 3.5% 1131/android.bg: 1.1% user + 2.3% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 2.8% 1098/system_server: 1.3% user + 1.5% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 1.9% 1103/Signal Catcher: 1.4% user + 0.5% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 1.9% 1105/Compiler: 1.4% user + 0.5% kernel
10-17 14:26:28.188 1098 1130 E ActivityManager: 1.4% 1142/android.ui: 1.4% user + 0% kernel
found the ‘thermal condition’ log pattern in kernel log, and confirmed it caused by thermal.
@kernel log
//the device keeps in high temperature and limit CPU frequency & core since boot-up
<6>[1.767858] c0 147 msm_thermal:do_core_control Set Offline: CPU2 Temp: 93
<3>[1.767885] c0 147 msm_thermal:do_core_control Error -16 offline core 2
<6>[1.767947] c0 147 msm_thermal:do_cluster_freq_ctrl Limiting CPU0 max frequency to 499200. Temp:93
<6>[1.767969] c0 147 msm_thermal:do_cluster_freq_ctrl Limiting CPU1 max frequency to 499200. Temp:93
<6>[1.767991] c0 147 msm_thermal:do_cluster_freq_ctrl Limiting CPU2 max frequency to 499200. Temp:93
<6>[1.768013] c0 147 msm_thermal:do_cluster_freq_ctrl Limiting CPU3 max frequency to 499200. Temp:93
...
<6>[69.912586] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912679] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912707] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[69.912734] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<4>[70.565719] c0 457 _cpu_up: attempt to bring up CPU 1 failed
<4>[70.566178] c0 457 _cpu_up: attempt to bring up CPU 2 failed
<4>[70.566647] c0 457 _cpu_up: attempt to bring up CPU 3 failed
// ANR happened at 14:25:48.008, and there is only one CPU core atthat time
Update: [2014/10/17 14:25:41 (73.432463930)] ...
<6>[70.745164] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[70.745233] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[70.745259] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<6>[70.745285] c0 35 msm_thermal:hotplug_notify tsens_tz_sensor9 reach temp threshold: 95
<4>[71.287859] c0 457 _cpu_up: attempt to bring up CPU 1 failed
<4>[71.288317] c0 457 _cpu_up: attempt to bring up CPU 2 failed
<4>[71.288725] c0 457 _cpu_up: attempt to bring up CPU 3 failed
Case: A3_UL_K44_DESIRE_SENSE60 #419
systemui ANR happened at Boot testing in High temp. situation.
Issue description:
[Boot test-Restart mode][High temperature] The device pop “Process com.android.systemui isn’t responding. DO you want to close it?”
Log analysis:
- Some issue cases caused by system performance low due to CPU core disabled with thermal condition.
- In normal case, all CPU core should still be enabled when device in high temp.
- The log below indicates that the CPU core 2 and 3 are disabled. The C# is the CPU core index.If there are kernel log dumped by c2, it means core 2 is enabled.
@kernel log
<6>[ 87.308709] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 100
<6>[ 87.308739] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 100
<6>[ 87.308778] c0 32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 96
<6>[ 87.321984] c1 0 CPU1: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[ 87.329715] c2 1647 IRQ40 no longer affine to CPU2
<6>[ 87.331735] c2 0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[ 87.336939] c3 0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[ 87.622214] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 87
<6>[ 87.622269] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 87
<6>[ 87.622339] c0 32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 90
Update: [2014/02/22 09:48:17 (90.705365959)]
<6>[ 87.935747] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[ 87.935821] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 96
<6>[ 87.948349] c2 0 CPU2: msm_pm_power_collapse: change clock rate (old rate = 0)
<4>[ 87.952685] c0 1700 _cpu_up: attempt to bring up CPU 2 failed
<4>[ 87.955305] c3 1725 IRQ40 no longer affine to CPU3
<6>[ 87.957575] c3 0 CPU3: msm_pm_power_collapse: change clock rate (old rate = 0)
<6>[ 88.249085] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 89
<6>[ 88.249138] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 89
<6>[ 88.562543] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 98
<6>[ 88.562602] c0 32 msm_thermal: tsens_tz_sensor1 reach temp threshold: 98
<6>[ 88.562682] c0 32 msm_thermal: tsens_tz_sensor5 reach temp threshold: 95
@device log
02-22 09:47:52.828 346 383 I ThermalEngine: TM Id 'CPU0-1_MONITOR' Sensor 'cpu0-1' - alarm cleared 4 at 87.0 degC
02-22 09:47:52.828 346 383 I ThermalEngine: ACTION: CPU - Setting CPU[0] to 787200
02-22 09:47:52.828 346 383 I ThermalEngine: ACTION: CPU - Setting CPU[1] to 787200
02-22 09:47:52.828 353 410 I PnPMgr : Write '/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq' ='787200' <Rule: Foreground Apps>
02-22 09:47:52.828 353 410 I PnPMgr : Write '/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq' = '787200' <Rule: thermal_c0>
...
02-22 09:48:04.518 1080 1080 I BroadcastQueue: Deliver order [FG] BF{44bb6d80 u0 RL{44bb6d20 1342 com.android.systemui/10051/u0 remote:44c5e840}}:BR@44c539d0
02-22 09:48:14.874 1080 1222 W BroadcastQueue: Timeout of broadcast BroadcastRecord{44c539d0 u-1 android.intent.action.TIME_TICK callingPid=-1 callingUid=1000} - receiver=android.os.BinderProxy@44c5e840
@events log
02-22 09:48:10.345 1080 1559 I binder_sample: [com.android.server.am.ActivityManagerService,29,1009,execTransact,100]
02-22 09:48:12.065 1080 1559 I binder_sample: [com.android.server.content.ContentService,2,1292,execTransact,100]
02-22 09:48:12.065 1080 1107 I binder_sample: [com.android.server.content.ContentService,2,1239,execTransact,100]
02-22 09:48:12.185 1471 1471 I binder_sample: [android.content.IContentService,2,1240,com.android.phone,100]
02-22 09:48:13.035 1080 1110 I binder_sample: [com.android.server.InputMethodManagerService,12,1846,execTransact,100]
02-22 09:48:13.045 1506 1506 I binder_sample: [com.android.internal.view.IInputMethodManager,12,2088,com.htc.launcher,100]
02-22 09:48:14.964 1080 1222 I am_anr : [0,1342,com.android.systemui,1086897741,Broadcast of Intent { act=android.intent.action.TIME_TICK flg=0x50000014 (has extras) }]