高通Thermal Overview之thermal-engine

最新推荐文章于 2023-07-28 13:55:57 发布

丛林野和尚

最新推荐文章于 2023-07-28 13:55:57 发布

阅读量1w

点赞数 5

分类专栏： power 文章标签：高通 Thermal

本文链接：https://blog.csdn.net/bs66702207/article/details/72782431

版权

power 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

网络上的参考有限，仅有的一个thermal-engine的一篇博客也不是最新的，参考的这篇博客再加上最新的msm8953平台的thermal-engine，对thermal-engine的配置、算法、加载进行介绍。

代码位置vendor/qcom/proprietary/thermal-engine/

默认配置文件thermal-engine.conf，里面目前用到的thermal控制算法monitor、ss、virtual。

1.base format:

Strings	说明
{debug}	可选，若有此选项表示使能调试信息
sampling	默认采样率，以ms为单位
[<Algorithminstance label>]	算法实例标签
algo_type	算法类型，必须是算法实例的第一个字段
disable	可选，用于默认情况下禁止该算法实例起作用

2.算法类型

inc/thermal_config.h
enumalgo_type {
UNKNOWN_ALGO_TYPE= -1,
MONITOR_ALGO_TYPE= 0,
PID_ALGO_TYPE,
SS_ALGO_TYPE,
TB_ALGO_TYPE,
EQUILIBRIUM_TYPE,
VIRTUAL_SENSOR_TYPE,
BWLM_ALGO_TYPE,
ALGO_IDX_MAX,
};

■monitor

如果温度达到thresholds，触发响应的动作(限制)，当温度低于thresholds_clr，结束动作(限制)

Strings	说明
algo_type	monitor
sensor	传感器名称
sampling	采样率ms
descending	可选，默认门限是升序，有此字段后门限顺序为降序
thresholds	门限值 mCor mA
thresholds_clr	清除门限值
actions	达到门限时的动作，多个动作时用'+'连接
action_info	动作额外信息，多个额外信息间用'+'连接

■ pid

ProportionalIntegralDerivative，比例-积分-微分控制，当温度超过最大允许值时，CPU需要降頻，并且使用pid算法将温度控制在set_point，该控制器输出的结果可以被转换成最大允许CPU频率。

pid简易的解释，操作人员控制电加热炉的炉温，炉温会设定一个set_point(需要维持的温度)，并且显示当前的温度，操作人员根据观察后，通过旋转电位器的转角，来控制电流的大小，进一步来控制温度。增加或者减小电流的比例，与P值相关；这个比例的误差，与I值相关；变化速率和未来趋势与D值相关。

Strings	说明
algo_type	pid
sampling	采样率ms
sensor	传感器名称
device	被PID算法调整的设备
set_point	PID算法调整的目标值mCor mA
set_point_clr	PID算法停止调整的值
p_const	PID算法中的P常量
i_const	PID算法中的I常量
d_const	PID算法中的D常量
i_samples	积分组件报错的积分样本数
dev_units_per_calc	Unitsof device adjustment per PID calculation outcome 每一个PID算法输出调整设备单元值
freq_scale	Frequencyscaling factor for DUAL PID

■ ss

DTM策略，即DynamicThermalManagement，动态控制算法有两个模式，DTM(用于频率控制)和PID(用于温度保持)。PID上文已经描述，DTM是基于温度setpoint，提升或这降低CPU的最大允许频率。每一个最大允许频率的调解是基于DCVSfrequency table中的可用频率steps。温度高于setpoint时，最大允许频率会按step逐渐下降，温度高于setpoint时，最大允许频率会按step逐渐上升。该策略可以用于控制GPU频率。

Strings	说明
algo_type	ss
sampling	采样率ms
sensor	传感器名称
device	被PID算法调整的设备
set_point	PID算法调整的目标值mCor mA
set_point_clr	PID算法停止调整的值
time_constant	Multiplierof sampling period for holding off adjustments whencurrent and last error sample are equivalent
device_max_limit	可选，fieldto specify device performance mitgation Ifit is defined, this instance mitigates device up todevice_max_limit value. Itexpects value in KHz for cpu device and Hz for gpu device
device_perf_floor	可选，fieldto specify device performance mitgation floor. Ifit is defined, this instance stops mitigating a device to a levelwith a corresponding perf_lvl at or above device_perf_floor.

■ tb

8998最新支持的算法TokenBucket算法。频率输出通过反馈进行控制，类似PID算法概念。当温度高于set_point的时候，频率降幅相对较大，然后通过频率限制，进行温控维持。

Strings	说明
algo_type	ss
sampling	采样率ms
sensor	传感器名称
device	被PID算法调整的设备
set_point	PID算法调整的目标值mCor mA
set_point_clr	PID算法停止调整的值
time_constant	Multiplierof sampling period for holding off adjustments whencurrent and last error sample are equivalent
up_loop_gain	tokenbucket reward calculations的增益乘法器
down_loop_gain	tokenbucket penalty calculations的增益乘法器
auto_penalty	高于set_point时，增加的额外penalty。 Shouldbe in units of temperature/temp_scale_factor
auto_reward	低于set_point时，增加的额外reward。 Shouldbe in units of temperature/temp_scale_factor
temp_scale_factor	温度级别划分因子。Dividerfor temperature to get it to units of Degree Celsius from thevalue returned by reading the sensor.
freq_scale_factor	频率级别划分因子。Multiplierfor device frequency readings. Default device frequency unit isKHz, so set to 1 if device frequency is read in KHz, 1000 for Hz.This is necessary because GPU freq is often in Hz.
quadratic_reward	Boolean(zero false, nonzero true) to use a quadratic reward system in thecontrol loop
quadratic_penalty	Boolean(zero false, nonzero true) to use a quadratic penalty system inthe control loop
use_timeout	Boolean(zero false, nonzero true) to use a timeout when exiting theactive control zone for this client. This helps in smoothinghandoff between LMH and thermal sensors
timeout	如果use_timeout==true，当小于等于set_point时，该算法结束之前将等待'timeout'number of interrupts，feedingreward into the system of value 'set_point' - 'set_point_clr'.
unified_rail	If'unified_rail' is true for any configuration instance of thealgorithm, the unified rail logic of token bucket is applied

■virtual

可以利用两个以上的Tsensor，通过加权，组成一个虚拟的sensor。8998视频时间15:20

Strings	说明
algo_type	virtual
trip_sensor	传感器名称
set_point	当高于此温度时，虚拟传感器开始polling模式
set_point_clr	当高于此温度时，虚拟传感器停止polling模式
sensors	用于计算温度总和的传感器数组
weights	权重值数组
math	默认值0，使用weigths，可选的，1for minimum of sensors, 2 for max of sensors
sampling	采样率ms

3.配置文件字段取值解释

●'thresholds'/'thresholds_clr'/'actions'/'action_info'，最多接受8个空格分开的values

●'actions'field

actions	说明
none	-ACTION - Do nothing -ACTION_INFO - ignored
report	-ACTION - Report threshold crossing to UI -ACTION_INFO - ignored 备注：通过抽象本地socket"THERMALD_UI"，以每行独立的string格式，发送thresholdcrossing information。清除或者触发alevel，或者其他action达到这个level将会导致消息的发送。参数按照以下顺序发送 sensorname - Name of sensor reporting temperature - Current temperature current_threshold_level - current threshold level triggered orcleared is_trigger - "true" on level trigger, "false"on level clearing
cpu	-ACTION - CPU调整cpu频率 -ACTION_INFO - Max CPU frequency in KHz
cpuN	-ACTION - CPU frequency scaling where N is the specific CPU core[0..MAX CORES] -ACTION_INFO - Max CPU frequency in KHz
clusterN	-ACTION - CLUSTER frequency scaling where N is the specific CLUSTERID -ACTION_INFO - Max CLUSTER frequency in KHz
hotplug_N	-ACTION –拔掉第N个cpu -ACTION_INFO - 0 for online a core, or 1 to offline it.
lcd	-ACTION - LCD brightness throttling，限制lcd的亮度 -ACTION_INFO - 0-255 value for max LCD brightness
modem	-ACTION - Request throttling of modem functionality，请求限制modem功能 -ACTION_INFO - 0-3 throttling level for modem mitigation
fusion	-ACTION - Request throttling of fusion modem functionality -ACTION_INFO - 0-3 throttling level for fusion modem mitigation
battery	-ACTION - Battery charging current throttling -ACTION_INFO - 0-3 throttling level for battery charging current
gpu	-ACTION - GPU frequency scaling -ACTION_INFO - Max GPU frequency in Hz
wlan	-ACTION - WLAN throttling -ACTION_INFO - 0-4 throttling level for WLAN mitigation
shutdown	-ACTION - Shutdown target -ACTION_INFO - Shutdown delay in ms
vdd_ restriction	-ACTION - Request voltage restriction of all vdd rails on SoC -ACTION_INFO - 1 for request for vdd_restriction, 0for release vdd_restriction
camera	-ACTION - camera fps throttling and camera shutdown mitigation -ACTION_INFO - 0-3 throttling level for camera fps mitigation, 10level for camera app shutdown
camcorder	-ACTION - camcorder fps throttling and camcorder shutdownmitigation -ACTION_INFO - 0-3 throttling level for camcorder fps mitigation, 10level for camcorder app shutdown
mdp	-ACTION - Request throttling of MDP CX voting -ACTION_INFO - 0-3 throttling level for MDP mitigation
venus	-ACTION - Request throttling of VENUS CX voting -ACTION_INFO - 0-3 throttling level for VENUS mitigation
modem_cx	-ACTION - Request throttling of modem CX voting -ACTION_INFO - 0-3 throttling level for MODEM CX mitigation

●'device'field

device	说明
cpu	-DEVICE - Dynamic CPU frequency scaling
cpuN	-DEVICE - Dynamic CPU frequency scaling where N is the specific CPUcore [0..MAX CORES]
clusterN	-DEVICE - Dynamic CLUSTER frequency scaling where N is the specificCLUSTER ID
gpu	-DEVICE - Dynamic GPU frequency scaling

4.配置文件示例

例1:

sampling 1000

[PMIC_THERM_MON]
algo_type monitor
sensor PMIC_THERM
sampling 5000
thresholds 40200 45000 50000
thresholds_clr 38000 43000 48000
actions cpu+report cpu cpu
action_info 1188000+0 368640 245760

描述：

1)默认采样率为1s；sensorPMIC_THERM设置的采样率是5s，覆盖了默认的；

2)当温度升到40.2度以上时，触发门限1，调节CPU最大允许频率为1188000KHz，在本例中，由于此频率是最大值，因此实际无动作；同时上报此消息，action_info值0被忽略；

3)上升到45度以上时，触发门限2，下降到43度以下时，清除门限2，调节CPU最大允许频率为368640 KHz；

4)当门限2在低于43度清除时，调节CPU的最大允许频率回到1188000KHz上；

5)当门限1在低于38度清除时，产生一个清楚门限1的report，allmitigation被reset。

例2:

debug
sampling 2000

[PMIC_THERM_MONITOR]
algo_type monitor
sensor PMIC_THERM
sampling 5000
thresholds 40200 45000 50000
thresholds_clr 38000 43000 48000
actions cpu+report cpu report+shutdown
action_info 768000+0 368640 0+6000

描述：

1)使能debuglogging输出；

2)默认采样率2s，sensorPMIC_THERM的采样率设置为5s；

3)上升到大于40.2度触发门限1，下降到低于38度清除门限1，当门限1触发时，最大允许CPU的频

率是768000KHz，并且产生一个report(action_infovalue 0 is ignored)；

4)上升到大于45度触发门限2，下降到低于43度清除门限2，当门限2触发时，最大允许CPU的频率

是368640KHz；

5)上升到大于50度触发门限3，下降到低于48度清除门限3，当门限3触发时，产生一个report并且

6s后关机。

例3:

debug
sampling 2000

[bcl_monitor]
algo_type monitor
descending
sensor bcl
sampling 1000
thresholds 100 0
thresholds_clr 500 100
actions report report
action_info 0 0

描述：

1)使能信息调试输出；

2)默认采样率是2s，batterycurrent limit 'bcl'的采样率设置为1s

3)当ibat上升到(imax- 100mA)时，触发门限1，当下降到(imax– 500mA)时，清除门限1，When

triggered,generate a report (action_info value 0 is ignored).

4)当ibat上升到imax时，触发门限2，当下降到(imax– 100mA)时，清除门限2，When

triggered,generate a report (action_info value 0 is ignored).

例4:

debug

[TEST_PID]
algo_type pid
sensor tsens_tz_sensor0
device cpu
sampling 1000
set_point 85000
set_point_clr 65000
p_const 1.0
i_const 1.0
d_const 1.0
i_samples 10
dev_units_per_calc10000

描述：

1)使能debuglogging

2)PID算法实例标签为TEST_PID

3)使用tsens_tz_sensor0传感器

4)被调节的设备是CPU

5)采样时间1s

6)set_point值是PID调节算法的门限值和PID算法的设定值；

7)set_point_clr是停止PID调节算法的门限值；

8)p_const, i_const, d_const是PID等式中的p,i,d常量；

9)dev_units_per_calc 10000kHz(kHz unit because this is CPU device) ismultiplied

withPID calculation outcome to determine adjustment on the cpu device.

例5:

debug

[virtual-sensor-0]
algo_type virtual
trip_sensor tsens_tz_sensor8
set_point 35000
set_point_clr 30000
sensors tsens_tz_sensor1 tsens_tz_sensor5
weights 40 60
sampling 250

[Test-PID]
algo_type pid
sensor virtual-sensor-0
device cpu1
sampling 250
set_point 55000
set_point_clr 50000
p_const 1.25
i_const 0.8
d_const 0.5
i_samples 10
dev_units_per_calc5000

描述：

1)PID实例Test-PID基于virtual-sensor-0的结果；

2)virtual-sensor-0需要用户手动定义；

3)trip_sensor用来指示虚拟传感器何时开始进入polling模式(轮询)；

4)trip_sensor必须为常规传感器，不能为另外一个虚拟传感器；

5)set_point是tripsensor的门限值，当高于此门限值时，tripsensor将从中断模式进入轮询模式，轮询

频率由虚拟传感器的sampling字段设置

6)set_point_clr是tripsensor的门限值，当低于此门限值时，tripsensor将停止轮询模式然后等待下一个

门限事件；

7)sensors定义了常规传感器数组，这些传感器要参与权重温度的计算

8)weights给定了传感器数组的权重值；

9)虚拟传感器的set_point必须小于pid算法的set_point，以便当达到set_point时pid能收到通知，另外，

如果虚拟传感器未进入轮询模式，pid将不能获取到它的温度；

10)如果tirpsensor不支持从中断模式到查询模式的改变，第8条可以被忽略。此时虚拟传感器的采样率将应该和pid的采样率一致。

例6:

debug

[bcm_monitor]
algo_type monitor
sensor bcm
sampling 1000
thresholds 70000 90000 //注意，单位为m%
thresholds_clr 69000 89000
actions cpu cpu
action_info 768000 384000

描述：

1)使能debuglogging

2)当采样电流达到imax的70%时，门限1触发，调整cpu最大允许频率为768000KHz；

3)当采样电流达到imax的90%时，门限2触发，调整cpu最大允许频率为384000KHz；

4)bcm仅支持2级门限调节；

5)有效的门限值取值为：40000,50000,60000,70000,80000,90000。

例7:

debug

[TB-CPU4]
algo_type tb
sampling 10
sensor tsens_tz_sensor13
device cluster1
set_point 85000
set_point_clr 50000
time_constant 1
up_loop_gain 2
down_loop_gain 3
auto_penalty 1.0
auto_reward 0.0
temp_scale_factor 1000
freq_scale_factor 1
quadratic_reward 1
quadratic_penalty 1
use_timeout 0
timeout 1

1)使能debuglogging；

2)我们配置CPU4被tokenbucket控制。由于算法是tb，sensor是tsens_tz_sensor13，device是cluster1，

这个instance将monitortsens_tz_sensor13并且控制cluster1的频率；

3)每10ms发生一次温度读取和频率mitigation

4)当温度上升到85度时，将触摸3)事件，温度下降到50度时clr

5)time_constant is currently unused but is reserved for futureuse.当前使用，为未来使用保留

6)up_loop_gain set to 2 means that for every degree thattsens_tz_sensor13 is below its set_point, the reward
willincrease by a factor of 2. --2倍

7)down_loop_gain set to 3 means that for every degree thattsens_tz_sensor13 overshoots its set_point, the

penalty willincrease by a factor of 3. --3倍

8)auto_penalty of 1.0 means that the number of degrees of overshootused to calculate penalty will always be

incrementedby 1.0. This means that at 85C, the temperature overshoot will be 1C,86C->2C, etc.

9)auto_reward of 0.0 means that the number of degrees of undershootused to calculate the reward will not be

augmented.This means that at 80C, the temperature undershoot will be 5C,81C->4C, etc.

10)temp_scale_factorof 1000 means that the temperature readings from tsens_tz_sensor13are received in mC

andnot C. This used to vary between tsens sensors and other types ofsensors in the thermal-engine such as

gensensors.

11)freq_scale_factorof 1 means that the frequency readings from device1 are recieved inkHz. This varies

betweendevices, for instance the GPU frequency readings on 8994 are receivedin Hz, and therefore the

freq_scale_factorfor the 8994 GPU is 1000.

12)quadratic_rewardof 1 means to square the amount of reward. Whereas reward wouldnormally be

(undershoot *up_loop_gain), if this value is nonzero then reward is calculated as(undershoot *

up_loop_gain)^2.

13)quadratic_penaltyof 1 means to square the amount of penalty. Whereas penaltywouldnormally be

(overshoot *down_loop_gain), if this value is nonzero then penalty is calculatedas (overshoot *

down_loop_gain)^2.

14)use_timeoutof 0 means that when the readings on tsens_tz_sensor13 reach theirset_point_clr, the algorithm

shouldimmediately stop polling and stop mitigating. If this value werenonzero, then the algorithm would

wait fortimeout number of sequential interrupts (i.e. timeout * sampling ms)in which the reading from

tsens_tz_sensor13was below set_point_clr before ceasing to poll and mitigate.

5. 加载

关于sperakercall：80-N9649-1_D中的内容，ThermalCalibration Procedure for Speaker Coil Protection

speakercoil(线圈)的thermal校准的目标是准确的测量出来自温度传感器中的speakercoil的温度

可以通过三步实现：

1.转换resistanceto temperature –通过实际测量找出他们之间的线性关系b= T- mR

2.决定thermalmanagement algorithm中的offsetvalue – 查看logcat显示最小温度32度，实际万用表测量是27度，那么offset= 27 -32 = -5度

3.Program offset into device –这个是thermal-engine-o输出sperker-cal内容

[SPEAKER-CAL]
sampling30000 30000 10 1800000
sensorpm8953_tz
sensorstsens_tz_sensor1 tsens_tz_sensor2 tsens_tz_sensor3 tsens_tz_sensor14tsens_tz_sensor15
temp_range6000 10000 2000
max_temp45000
offset-4000

已经定义的宏，ANDROID

未定以的宏，ENABLE_OLD_PARSER

thermal.c

structthermal_setting_tthermal_settings;//包含一个setting_info链表，它包含关于thermal算法的基本数据
intmain(int argc, char **argv)
{
…
if(!config_file) {
if((config_file = get_target_default_thermal_config_file()))
info("Usingtarget config file '%s'\n", config_file);
else//这条路system/etc/thermal-engine.conf
info("Notarget config file, falling back to '%s'\n",
CONFIG_FILE_DEFAULT);
}
if(output_conf) {//thermal-engine -o，dumpconfig file of active settings
devices_manager_init();
devices_init(minimum_mode);
sensors_manager_init();
...
return 0;
}
…
devices_manager_init();//啥也没做
devices_init(minimum_mode);//通过读取sys节点信息，初始化gpufreq、cpufreq、clusterfreq、
thermal_ioctl、qmi_communication、各种dev、vdd*、profile_switch、lcd、battery_mitigation等等
target_algo_init();//

/*Vote to keep kernel mitigation enabled until init is done */
kernel_dev =devices_manager_reg_clnt("kernel");//获取/sys/kernel/下的节点信息，并且组装成链表
if(kernel_dev == NULL) {
msg("%sFailed to create kernel device handle\n", __func__);
}
req.value =1;
device_clnt_request(kernel_dev,&req);

sensors_manager_init//啥也没做
sensors_init(minimum_mode);//1.通过/sys/module/msm_thermal/sensor_info开始初始化sensors[]，
2./sys/devices/virtual/thermal/thermal_zone*，继续初始化sensors[]，
3.将bcl加入structsensors_mgr_sensor_info *sensor_list
if(thermal_algo_framework_init() != THERMAL_ALGO_SUCCESS){//thermal_threads = NULL;
info("%s:Error initializing thermal algorithm framework",
__func__);
return 1;
}

init_settings(&thermal_settings);//清0
pid_init_data(&thermal_settings);//初始化定义好的pid数据，加入thermal_settings.setting_info里
thermal_monitor_init_data(&thermal_settings);//monitor
speaker_cal_init_data(&thermal_settings);
ss_init_data(&thermal_settings);
tb_init_data(&thermal_settings);

virtual_sensors_init_data(&thermal_settings);
virtual_sensors_init(&thermal_settings,config_file);//初始化并且loadconfig_file数据到虚拟传感器链表

load_config(&thermal_settings,config_file, LOAD_ALL_FLAG);//thermal_config_v2.c

thermal_server_init();
/*创建4个server套接字，并且使用select方法监听
1.sockfd_server_send，通知已经注册client端thermalcurrent level
2.sockfd_server_recv，获得client发送数据，并调用callback相应更新，在pid、montior、speaker、ss后面等初始化时加入了callback
3.sockfd_server_log，将获得与客户端链接的新fd加入socket_fd[MAX_SOCKET_FD]
4.sockfd_server_recv_passive，清缓存，清除select描述符集，清除thermal_send_fds[]，里面存储了每次新连接的clientfd
*/
pid_algo_init(&thermal_settings);//初始化pid，并且开启线程pid_algo_monitor
thermal_monitor(&thermal_settings);
ss_algo_init(&thermal_settings);
speaker_cal_init(&thermal_settings);
if(tb_algo_init(&thermal_settings) != THERMAL_ALGO_SUCCESS) {
info("%s:Error initializing token bucket", __func__);
return 1;
}
if(kernel_dev)
device_clnt_cancel_request(kernel_dev);
while (1)
pause();
...//释放资源
}

丛林野和尚

关注

5
点赞
踩
50

收藏

觉得还不错? 一键收藏
1
评论
高通Thermal Overview之thermal-engine

网络上的参考有限，仅有的一个thermal-engine的一篇博客也不是最新的，参考的这篇博客再加上最新的msm8953平台的thermal-engine，对thermal-engine的配置、算法、加载进行介绍。代码位置vendor/qcom/proprietary/thermal-engine/默认配置文件thermal-engine.conf，里面目前用到的thermal控制算法
复制链接

扫一扫