Linux cpufreq子系统
目录
文章目录
前言
修订记录
日期 | 作者 | 版本 | 修改说明 |
---|---|---|---|
2023.08.15 | 枫潇潇 | V1.0.0 | 初始版本 |
1 Linux cpufreq概述
Cpufreq framework的功能称作动态电压/频率调整(Dynamic Voltage/Frequency Scaling, DVFS)。通过调整CPU的电压和频率,可以在功耗和性能之间找到一个平衡点。在不需要高性能时,降低电压和频率,以降低功耗;在需要高性能时,提高电压和频率,以提高性能。要达到此目的,有两个关键点:
- 1)如何控制CPU core的电压和频率;
- 2)何时改变CPU core的电压和频率。
针对这两个关键点,CPU core有两种实现。
-
实现1:CPU core根据自身的负荷,自动调整电压和频率,不需要OS级别的软件参与。
这种实现,软件复杂度非常低,通常情况下,只需要告诉CPU core电压和频率的调整范围(通过频率表示,scaling_min_freq和scaling_max_freq,也称作policy),CPU core即可自行调整。因此:
关键点1,由CPU core自行处理;
关键点2,OS需要根据大致的应用场景(例如,是高性能场景,还是低性能场景),设定一个频率范围,改变时机,由CPU core自行决定。
注1:由于软件参与度小,该实现的省电效率可能较低 -
实现2:CPU core不参与任何的逻辑动作,由OS软件根据系统运行情况,调整电压和频率。
这种实现,几乎完全由软件掌控DVFS行为:
关键点1,基于clock framework和regulator framework提供的接口,控制CPU core的频率和电压;
关键点2,根据应用场景,手动(用户发起,例如省电模式)或者自动(软件自动调整,例如HMP)的调整。
注2:对关键点2来说,如果调整比较频繁,则需要CPU core在不同频率之间转换的速度足够快,后面会详细介绍。
为了实现上述功能需求,cpufreq framework抽象出cpufreq driver、cpufreq policy(策略)、cpufreq governor等多个软件实体。
2 软件架构
对下,cpufreq framework基于cpu subsystem driver、OPP、clock framework、regulator framework等模块,提供对CPU core频率和电压的控制。这一部分主要由cpufreq driver实现。
对上,cpufreq framework会通过cpufreq core、cpufreq governors、cpufreq stats等模块,以sysfs的形式,向用户空间提供cpu frequency的查询、控制等接口。同时,在频率改变的时候,通过notifier通知关心的driver。
内部,cpufreq framework包括cpufreq core、cpufreq driver、cpufreq governors、cpufreq stats等模块。
3 软件模块的功能
3.1 Cpufreq core
cpufreq core是cpufreq framework的核心模块,和kernel其它framework类似,它主要实现三类功能:
对上,以sysfs的形式向用户空间提供统一的接口,以notifier的形式向其它driver提供频率变化的通知;
对下,提供CPU core频率和电压控制的驱动框架,方便底层driver的开发;同时,提供governor框架,用于实现不同的频率调整机制;
内部,封装各种逻辑,实现所需功能。这些逻辑主要围绕struct cpufreq_driver、struct cpufreq_policy和struct cpufreq_governor三个数据结构进行,下面会详细分析。
3.1.1 struct cpufreq_drive
struct cpufreq_driver {
struct module *owner;
char name[CPUFREQ_NAME_LEN]; //名字唯一
u8 flags;
/* needed by all drivers */
int (*init) (struct cpufreq_policy *policy); //必须实现,填充policy内容
int (*verify) (struct cpufreq_policy *policy); //必须实现,验证policy的内容是否符合硬件要求
/* define one out of two */
用于设置CPU core动态频率调整的范围(即policy)
int (*setpolicy) (struct cpufreq_policy *policy);
//设定CPU指定频率,接口为旧接口
int (*target) (struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
/* should be defined, if possible */
unsigned int (*get) (unsigned int cpu);
/* optional */
unsigned int (*getavg) (struct cpufreq_policy *policy,
unsigned int cpu);
int (*bios_limit) (int cpu, unsigned int *limit);
int (*exit) (struct cpufreq_policy *policy);
int (*suspend) (struct cpufreq_policy *policy);
int (*resume) (struct cpufreq_policy *policy);
struct freq_attr **attr;
};
相关API:
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
const char *cpufreq_get_current_driver(void);
void *cpufreq_get_driver_data(void);
3.1.2 struct cpufreq_cpuinfo
struct cpufreq_cpuinfo {
unsigned int max_freq;
unsigned int min_freq;
/* in 10^(-9) s = nanoseconds */
unsigned int transition_latency;
};
cpuinfo,该cpu调频相关的固定信息,包括最大频率、最小频率、切换延迟,其中最大频率、最小频率可以通过frequency table推导得出
3.1.3 struct cpufreq_policy
struct cpufreq_policy {
cpumask_var_t cpus; /* CPUs requiring sw coordination */
cpumask_var_t related_cpus; /* CPUs with any coordination */
unsigned int shared_type; /* ANY or ALL affected CPUs
should set cpufreq */
unsigned int cpu; /* cpu nr of registered CPU */
struct cpufreq_cpuinfo cpuinfo;/* see above */
unsigned int min; /* in kHz */
unsigned int max; /* in kHz */
unsigned int cur; /* in kHz, only needed if cpufreq * governors are used */
unsigned int policy; /* see above */
//不能自动调频的CPU,需要governor设置具体的频率
struct cpufreq_governor *governor; /* see below */
struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */
struct cpufreq_real_policy user_policy;
struct kobject kobj;
struct completion kobj_unregister;
};
3.1.4 struct cpufreq_governor
struct cpufreq_governor {
char name[CPUFREQ_NAME_LEN];
//用于governor状态切换的回调函数
int (*governor) (struct cpufreq_policy *policy, unsigned int event);
//用于提供 sysfs setspeed attribute文件的回调函数
ssize_t (*show_setspeed) (struct cpufreq_policy *policy, char *buf);
int (*store_setspeed) (struct cpufreq_policy *policy, unsigned int freq);
//该governor所能容忍的最大频率切换延迟
unsigned int max_transition_latency; /* HW must be able to switch to
next freq faster than this value in nano secs or we
will fallback to performance governor */
struct list_head governor_list;
struct module *owner;
};
3.2 cpufreq drivers
3.2.1 cpufreq driver编写步骤
Cpufreq driver主要完成平台的相关的CPU频率/电压的控制,他在cpufreq framework中是非常简单的模块,编写步骤如下:
- 1)平台相关的初始化动作,包括CPU core的clock/regulator获取、初始化等。
- 2)生成frequency table,即CPU core所支持的频率/电压列表。并在初始化时将该table保存在policy中。
- 3)定义一个struct cpufreq_driver变量,填充必要的字段,并根据平台的特性,实现其中的回调函数。
- 4)调用cpufreq_register_driver将driver注册到cpufreq framework中。
- 5)cpufreq core会在CPU设备添加时,调用driver的init接口。driver需要在该接口中初始化struct cpufreq_policy变量。
- 6)系统运行过程中,cpufreq core会根据实际情况,调用driver的setpolicy或者target/target_index等接口,设置CPU的调频策略或者频率值。
- 7)系统suspend的时中,会将CPU的频率设置为指定的值,或者调用driver的suspend回调函数;系统resume时,调用driver的resume回调函数。
3.2.2 cpufreq driver有关的API即功能分析
3.2.2.1 frequency table
frequency table是CPU core可以正确运行的一组频率/电压组合,一般情况下,会在项目启动的初期,通过“try频点”的方法,确定出稳定性、通用性都符合要求的频点。
frequency table之所以存在的一个思考点是:table是频率和电压之间的一个一一对应的组合,因此cpufreq framework只需要关心频率,所有的策略都称做“调频”策略。而cpufreq driver可以在“调频”的同时,通过table取出和频率对应的电压,进行修改CPU core电压,实现“调压”的功能。
/* Special Values of .frequency field */
#define CPUFREQ_ENTRY_INVALID ~0u
#define CPUFREQ_TABLE_END ~1u
/* Special Values of .flags field */
#define CPUFREQ_BOOST_FREQ (1 << 0)
struct cpufreq_frequency_table {
unsigned int flags;
unsigned int driver_data; /* driver specific data, not used by core */
unsigned int frequency; /* kHz - doesn't need to be in ascending order */
};
flags,现在只有一个----CPUFREQ_BOOST_FREQ,表示这个频率值是一个boost频率。
3.2.2.2 struct cpufreq_drive
struct cpufreq_driver {
char name[CPUFREQ_NAME_LEN];
u8 flags;
void *driver_data;
/* needed by all drivers */
int (*init)(struct cpufreq_policy *policy);
int (*verify)(struct cpufreq_policy *policy);
/* define one out of two */
int (*setpolicy)(struct cpufreq_policy *policy);
/*
* On failure, should always restore frequency to policy->restore_freq
* (i.e. old freq).
*/
int (*target)(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation); /* Deprecated */
int (*target_index)(struct cpufreq_policy *policy,
unsigned int index);
unsigned int (*fast_switch)(struct cpufreq_policy *policy,
unsigned int target_freq);
/*
* Caches and returns the lowest driver-supported frequency greater than
* or equal to the target frequency, subject to any driver limitations.
* Does not set the frequency. Only to be implemented for drivers with
* target().
*/
unsigned int (*resolve_freq)(struct cpufreq_policy *policy,
unsigned int target_freq);
/*
* Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION
* unset.
*
* get_intermediate should return a stable intermediate frequency
* platform wants to switch to and target_intermediate() should set CPU
* to to that frequency, before jumping to the frequency corresponding
* to 'index'. Core will take care of sending notifications and driver
* doesn't have to handle them in target_intermediate() or
* target_index().
*
* Drivers can return '0' from get_intermediate() in case they don't
* wish to switch to intermediate frequency for some target frequency.
* In that case core will directly call ->target_index().
*/
unsigned int (*get_intermediate)(struct cpufreq_policy *policy,
unsigned int index);
int (*target_intermediate)(struct cpufreq_policy *policy,
unsigned int index);
/* should be defined, if possible */
unsigned int (*get)(unsigned int cpu);
/* optional */
int (*bios_limit)(int cpu, unsigned int *limit);
int (*exit)(struct cpufreq_policy *policy);
void (*stop_cpu)(struct cpufreq_policy *policy);
int (*suspend)(struct cpufreq_policy *policy);
int (*resume)(struct cpufreq_policy *policy);
/* Will be called after the driver is fully initialized */
void (*ready)(struct cpufreq_policy *policy);
struct freq_attr **attr;
/* platform specific boost support code */
bool boost_enabled;
int (*set_boost)(int state);
};
-
1)init函数的实现:
init回调函数是cpufreq driver的入口,由cpufreq core在CPU device添加之后调用,其主要功能就是初始化policy变量(把它想象成cpufreq device)。对driver而言,不需要太关心struct cpufreq_policy的内部实现(其实cpufreq framework也在努力实现这个目标,包括将相应的初始化过程封装成一个API等)。
对driver而言,需要在init中初始化policy的如下内容:
cpus,告诉cpufreq core,该policy适用于哪些cpu。大多数情况下,系统中所有的cpu core都由相同的硬件逻辑,统一控制cpu frequency,因此只需要一个policy,就可以管理所有的cpu core。
clk,clock指针,cpufreq core可以利用该指针,获取当前实际的frequency值。
cpuinfo,该cpu调频相关的固定信息,包括最大频率、最小频率、切换延迟,其中最大频率、最小频率可以通过frequency table推导得出。
min、max,调频策略所对应的最小频率、最大频率,初始化时,可以和上面的cpuinfo中的min、max相同。
freq_table,所对应的frequency table。
初始化policy的接口:
int cpufreq_generic_init(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table,
unsigned int transition_latency);
- 2)verify回调函数
当上层软件需要设定一个新的policy时,将调用driver的verify回调函数,检查该policy是否合法。Cpufreq core封装了下面两个接口,辅助完成该功能:
int cpufreq_frequency_table_verify(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table);
int cpufreq_generic_frequency_table_verify(struct cpufreq_policy *policy);
具体实现方法:
cpufreq_frequency_table_verify根据指定的frequency table,检查policy是否合法,检查逻辑很简单:policy的频率范围{min,max},是否超出policy->cpuinfo的频率范围,是否超出frequency table中的频率范围。
cpufreq_generic_frequency_table_verify更简单,它以policy中保存的frequency table为参数(policy->freq_table),调用cpufreq_frequency_table_verify接口。
在这里先提一下cpufreq framework中“频率”的几个层次。
1)最底层,是frequency table中定义的频率,有限的离散频率,代表了cpu的调频能力。
2)往上,是policy->cpuinfo中的频率范围,它对cpu调频进行的简单的限制,该限制可以和frequency table一致,也可以小于table中的范围。必须在driver初始化时给定,之后不能再修改。
3)再往上,是policy的频率范围,代表调频策略。对于可以自动调频的CPU,只需要把这个范围告知CPU即可,此时它是调频的基本单位。对于不可以自动调频的CPU,它是软件层面的一个限制。该范围也可以通过sysfs修改。
4)最上面,是policy中的频率值,对那些不可以调频的CPU,该值就是CPU的运行频率。
-
3)setpolicy回调函数
对于自动调频的CPU,driver需要提供该接口,将调频范围告知CPU。 -
4)target_index 回调函数
对于不可以自动调频的CPU,该接口用于指定CPU的运行频率。Index表示frequency table中index。 -
5)get_intermediate、target_intermediate,在没有提供target接口的时候使用,希望看这篇文章对的工程师不要使用。
-
6)get回调函数
用于获取指定cpu的频率值,如果可以的话,driver应尽可能提供。 -
7)exit,和init对应,在CPU device被remove时调用。
-
8)stop_cpu,在CPU被stop时调用。
-
9)suspend、resume回调函数
系统给suspend的时候,clock、regulator等driver有可能被suspend,因此需要在这之前将CPU设置为一个确定的频率值。driver可以通过suspend回调设置,也可以通过policy中的suspend_freq字段设置(cpufreq core会自动切换)。
3.2.2.3 cpufreq_driver flags
/* flags */
#define CPUFREQ_STICKY (1 << 0) /* driver isn't removed even if all ->init() calls failed */
//表示频率的调整,不影响loops_per_jiffy等kernel常来的计算
#define CPUFREQ_CONST_LOOPS (1 << 1) /* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */
//suspend/resume过程的相关flag
#define CPUFREQ_PM_NO_WARN (1 << 2) /* don't warn on suspend/resume speed mismatches */
/*
* This should be set by platforms having multiple clock-domains, i.e.
* supporting multiple policies. With this sysfs directories of governor would
* be created in cpu/cpu/cpufreq/ directory and so they can use the same
* governor with different tunables for different clusters.
*/
// 表示不同的CPU,有不同的频率控制方式,cpufreq core会每一个CPU创建一个cpufreq调频
// 接口。否则,一个调频接口可以实现所有CPU的频率。
#define CPUFREQ_HAVE_GOVERNOR_PER_POLICY (1 << 3)
/*
* Driver will do POSTCHANGE notifications from outside of their ->target()
* routine and so must set cpufreq_driver->flags with this flag, so that core
* can handle them specially.
*/
#define CPUFREQ_ASYNC_NOTIFICATION (1 << 4)
/*
* Set by drivers which want cpufreq core to check if CPU is running at a
* frequency present in freq-table exposed by the driver. For these drivers if
* CPU is found running at an out of table freq, we will try to set it to a freq
* from the table. And if that fails, we will stop further boot process by
* issuing a BUG_ON().
*/
#define CPUFREQ_NEED_INITIAL_FREQ_CHECK (1 << 5)
3.2.2.4 cpufreq_register_driver
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
该接口简单,只需实现cpufreq_driver中的必须内容及场景所需的内容即可。
4 cpufreq core
4.1 供上层用户空间的接口
cpufreq framework通过sysfs向用户空间提供接口,具体如下:
/sys/devices/system/cpu/cpu0/cpufreq/
|-- affected_cpus
|-- cpuinfo_cur_freq //readonly,cpu core的当前频率
|-- cpuinfo_max_freq //readonly
|-- cpuinfo_min_freq //readonly
|-- cpuinfo_transition_latency //readonly,频率转换延迟
|-- related_cpus
|-- scaling_available_frequencies
|-- scaling_available_governors
|-- scaling_cur_freq
|-- scaling_driver
|-- scaling_governor
|-- scaling_max_freq
|-- scaling_min_freq
|-- scaling_setspeed
`—stats
|-- time_in_state
|-- total_trans
`-- trans_table
scaling_max_freq和scaling_min_freq表示调频策略所允许的最大和最小频率,对于可以自动调整频率的cpu,修改它们,就是最终的频率调整。
对不能自动调整频率的cpu,则需要通过其它方式,主动的设置cpu频率,这些都是由具体的governor完成。其中有一个特例:
如果使用的governor是“userspace” governor,则可以通过scaling_setspeed节点,直接修改cpu频率。
4.2 频率调整步骤
开始分析之前,我们先以“userspace” governor为例,介绍一下频率调整的步骤。“userspace”governor是所有governor中最简单的一个,同时又是驱动工程师比较常用的一个,借助它,可以从用户空间修改cpu的频率,操作方法如下(为了简单,以shell脚本的形式给出):
cd /sys/devices/system/cpu/cpu0/cpufreq/
cat cpuinfo_max_freq; cat cpuinfo_min_freq #获取“物理”上的频率范围
cat scaling_available_frequencies #获取可用的频率列表
cat scaling_available_governors #获取可用的governors
cat scaling_governor #当前的governor
cat cpuinfo_cur_freq; cat scaling_cur_freq #获取当前的频率信息,可以比较一下是否不同
cat scaling_max_freq; cat scaling_min_freq #获取当前调频策略所限定的频率范围
#假设CPU不可以自动调整频率
echo userspace > scaling_governor #governor切换为userspace
#如果需要切换的频率值在scaling_available_frequencies内,且在cpuinfo_max_freq/cpuinfo_min_freq的范围内。
#如果需要切换的频率不在scaling_max_freq/scaling_min_freq的范围内,修改这两个值
echo xxx > scaling_max_freq; echo xxx > scaling_min_freq
#最后,设置频率值 KHz
echo xxx > scaling_setspeed
4.3 内部逻辑
4.3.1 初始化
4.3.1.1 cpufreq_interface
cpufreq driver注册时,会调用subsys_interface_register接口,注册一个subsystem interface,该interface的定义如下:
/* drivers/cpufreq/cpufreq.c */
static struct subsys_interface cpufreq_interface = {
.name = "cpufreq",
.subsys = &cpu_subsys,
.add_dev = cpufreq_add_dev,
.remove_dev = cpufreq_remove_dev,
};
该interface的subsys是“cpu_subsys”,就是cpu bus(struct bus_type cpu_subsys),提供了add_dev和remove_dev两个回调函数
4.3.1.2 __cpufreq_add_dev
static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
{
unsigned int j, cpu = dev->id;
int ret = -ENOMEM;
struct cpufreq_policy *policy;
unsigned long flags;
bool recover_policy = cpufreq_suspended;
#ifdef CONFIG_HOTPLUG_CPU
struct cpufreq_policy *tpolicy;
#endif
//cpu offline 直接返回
if (cpu_is_offline(cpu))
return 0;
pr_debug("adding CPU %u\n", cpu);
#ifdef CONFIG_SMP//多核CPU core使用相同的cpufreq policy的情况处理
/* check whether a different CPU already registered this
* CPU because it is in the same boat. */
policy = cpufreq_cpu_get(cpu);
if (unlikely(policy)) { //注册时,判断policy是否有代劳
cpufreq_cpu_put(policy);
return 0;
}
#endif
if (!down_read_trylock(&cpufreq_rwsem))
return 0;
#ifdef CONFIG_HOTPLUG_CPU
/* Check if this cpu was hot-unplugged earlier and has siblings */
read_lock_irqsave(&cpufreq_driver_lock, flags);
list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) {
if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) {
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
ret = cpufreq_add_policy_cpu(tpolicy, cpu, dev);
up_read(&cpufreq_rwsem);
return ret;
}
}
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
#endif
/* 分配cpufreq policy
* Restore the saved policy when doing light-weight init and fall back
* to the full init if that fails.
*/
policy = recover_policy ? cpufreq_policy_restore(cpu) : NULL;
if (!policy) {
recover_policy = false;
policy = cpufreq_policy_alloc();
if (!policy)
goto nomem_out;
}
/*
* In the resume path, since we restore a saved policy, the assignment
* to policy->cpu is like an update of the existing policy, rather than
* the creation of a brand new one. So we need to perform this update
* by invoking update_policy_cpu().
*/
if (recover_policy && cpu != policy->cpu)
WARN_ON(update_policy_cpu(policy, cpu, dev));
else
policy->cpu = cpu;
//cpumask类型,记录该CPU可控制online的CPU
cpumask_copy(policy->cpus, cpumask_of(cpu));
init_completion(&policy->kobj_unregister);
INIT_WORK(&policy->update, handle_update);
/* call driver. From then on the cpufreq must be able
* to accept all calls to ->verify and ->setpolicy for this CPU
*/
//调用cpufreq driver init函数
ret = cpufreq_driver->init(policy);
if (ret) {
pr_debug("initialization failed\n");
goto err_set_policy_cpu;
}
/* related cpus should atleast have policy->cpus */
cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);
/*
* affected cpus must always be the one, which are online. We aren't
* managing offline cpus here.
*/
cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
if (!recover_policy) {
policy->user_policy.min = policy->min;
policy->user_policy.max = policy->max;
}
down_write(&policy->rwsem);
write_lock_irqsave(&cpufreq_driver_lock, flags); //初始化所有其它共用
for_each_cpu(j, policy->cpus)//cpufreq policy的、处于online状态的CPU
per_cpu(cpufreq_cpu_data, j) = policy;//(policy->cpus)的
write_unlock_irqrestore(&cpufreq_driver_lock, flags);// cpufreq_cpu_data变量
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
policy->cur = cpufreq_driver->get(policy->cpu);
if (!policy->cur) {
pr_err("%s: ->get() failed\n", __func__);
goto err_get_freq;
}
}
/*
* Sometimes boot loaders set CPU frequency to a value outside of
* frequency table present with cpufreq core. In such cases CPU might be
* unstable if it has to run on that frequency for long duration of time
* and so its better to set it to a frequency which is specified in
* freq-table. This also makes cpufreq stats inconsistent as
* cpufreq-stats would fail to register because current frequency of CPU
* isn't found in freq-table.
*
* Because we don't want this change to effect boot process badly, we go
* for the next freq which is >= policy->cur ('cur' must be set by now,
* otherwise we will end up setting freq to lowest of the table as 'cur'
* is initialized to zero).
*
* We are passing target-freq as "policy->cur - 1" otherwise
* __cpufreq_driver_target() would simply fail, as policy->cur will be
* equal to target-freq.
*/
//若定义CPUFREQ_NEED_INITIAL_FREQ_CHECK
if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK)
&& has_target()) { //检查cur freq是否在范围内
/* Are we running at unknown frequency ? */
ret = cpufreq_frequency_table_get_index(policy, policy->cur);
if (ret == -EINVAL) {
/* Warn user and fix it */
pr_warn("%s: CPU%d: Running at unlisted freq: %u KHz\n",
__func__, policy->cpu, policy->cur);
ret = __cpufreq_driver_target(policy, policy->cur - 1,
CPUFREQ_RELATION_L); //重新设置频率
/*
* Reaching here after boot in a few seconds may not
* mean that system will remain stable at "unknown"
* frequency for longer duration. Hence, a BUG_ON().
*/
BUG_ON(ret);
pr_warn("%s: CPU%d: Unlisted initial frequency changed to: %u KHz\n",
__func__, policy->cpu, policy->cur);
}
}
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
CPUFREQ_START, policy);
if (!recover_policy) {
ret = cpufreq_add_dev_interface(policy, dev);
if (ret)
goto err_out_unregister;
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
CPUFREQ_CREATE_POLICY, policy);
}
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_add(&policy->policy_list, &cpufreq_policy_list);//添加policy到链表
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
cpufreq_init_policy(policy);//为新建的policy分配governor,并调用
// cpufreq_set_policy接口,未该CPU配置一个默认的policy
if (!recover_policy) {
policy->user_policy.policy = policy->policy;
policy->user_policy.governor = policy->governor;
}
up_write(&policy->rwsem);
kobject_uevent(&policy->kobj, KOBJ_ADD);
up_read(&cpufreq_rwsem);
pr_debug("initialization complete\n");
return 0;
err_out_unregister:
err_get_freq:
write_lock_irqsave(&cpufreq_driver_lock, flags);
for_each_cpu(j, policy->cpus)
per_cpu(cpufreq_cpu_data, j) = NULL;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
err_set_policy_cpu:
if (recover_policy) {
/* Do not leave stale fallback data behind. */
per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
cpufreq_policy_put_kobj(policy);
}
cpufreq_policy_free(policy);
nomem_out:
up_read(&cpufreq_rwsem);
return ret;
}
4.3.1.3 多coreCPU共用cpufreq policy
在SMP系统中,多个CPU core可能会由相同的调频机制(其实大多数平台都是这样的)控制,也就是说,所有CPU core的频率和电压,是同时调节的。这种情况下,只需要创建一个cpufreq policy即可,涉及到的代码逻辑包括:
a)primary CPU枚举时,cpufreq_add_dev会调用cpufreq driver的init接口(cpufreq_driver->init),driver需要根据当前的系统情况,设置policy->cpus,告诉cpufreq core哪些CPU共用同一个cpufreq policy。
b)primary CPU的cpufreq_add_dev继续执行,初始化policy->related_cpus,并将policy->cpus中处于offline状态的CPU剔除。具体可参考上面的代码分析。
c)primary CPU的cpufreq_add_dev继续执行,创建sysfs接口,同时为policy->cpus中的其它CPU创建相应的符号链接。
d)secondary CPUs枚举,执行cpufreq_add_dev,判断primary CPU已经代劳之后,直接退出。
e)对于hotplugable的CPU,hotplug in时,由于primary CPU没有帮忙创建sysfs的符号链接,或者hotplug out的时候符号链接被删除,因此需要重新创建。
4.3.2 频率调整
cpufreq framework的频率调整逻辑,总结如下:
通过调整policy(struct cpufreq_policy),确定CPU频率调整的一个大方向,主要是由min_freq和max_freq组成的频率范围;通过cpufreq governor,确定最终的频率值。
4.3.2.1 cpufreq_set_policy
cpufreq_set_policy用来设置一个新的cpufreq policy,调用的时机包括:
a)初始化时(__cpufreq_add_dev->cpufreq_init_policy->cpufreq_set_policy),将cpufreq_driver->init时提供的基础policy,设置生效。
b)修改scaling_max_freq或scaling_min_freq时(store_one->cpufreq_set_policy),将用户空间设置的新的频率范围,设置生效。
c)修改cpufreq governor时(scaling_governor->store_scaling_governor->cpufreq_set_policy),更新governor。
static int cpufreq_set_policy(struct cpufreq_policy *policy,
struct cpufreq_policy *new_policy)
{
struct cpufreq_governor *old_gov;
int ret;
pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
new_policy->cpu, new_policy->min, new_policy->max);
memcpy(&new_policy->cpuinfo, &policy->cpuinfo, sizeof(policy->cpuinfo));
if (new_policy->min > policy->max || new_policy->max < policy->min)
return -EINVAL;
//调用driver中的verify接口,判断policy是否有效
/* verify the cpu speed can be set within this limit */
ret = cpufreq_driver->verify(new_policy);
if (ret)
return ret;
/* adjust if necessary - all reasons */
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
CPUFREQ_ADJUST, new_policy);
// CPUFREQ_INCOMPATIBLE,notifier机制回调
/* adjust if necessary - hardware incompatibility*/
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
CPUFREQ_INCOMPATIBLE, new_policy);
/*
* verify the cpu speed can be set within this limit, which might be
* different to the first one
*/
ret = cpufreq_driver->verify(new_policy);
if (ret)
return ret;
/* notification of the new policy */
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
CPUFREQ_NOTIFY, new_policy);
policy->min = new_policy->min;
policy->max = new_policy->max;
pr_debug("new min and max freqs are %u - %u kHz\n",
policy->min, policy->max);
// driver提供了setpolicy回调, CPU core可在指定范围内自行调整频率
if (cpufreq_driver->setpolicy) {
policy->policy = new_policy->policy;
pr_debug("setting range\n");
return cpufreq_driver->setpolicy(new_policy);
}
//如果新旧governor相同,直接返回
if (new_policy->governor == policy->governor)
goto out;
pr_debug("governor switch\n");
/* save old, working values */
old_gov = policy->governor;
/* end old governor */
//就governor存在,则将其停止,流程如下:
if (old_gov) { //CPUFREQ_GOV_STOP---->CPUFREQ_GOV_POLICY_EXIT
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
up_write(&policy->rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
}
/* start new governor */
policy->governor = new_policy->governor;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
goto out; //启动新的governor,流程是:
//CPUFREQ_GOV_POLICY_INIT---->CPUFREQ_GOV_START---->CPUFREQ_GOV_LIMITS
up_write(&policy->rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
down_write(&policy->rwsem);
}
/* new governor failed, so re-start old one */
pr_debug("starting governor %s failed\n", policy->governor->name);
if (old_gov) {
policy->governor = old_gov;
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
__cpufreq_governor(policy, CPUFREQ_GOV_START);
}
return -EINVAL;
out:
pr_debug("governor: change or update limits\n");
return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
}
4.3.2.2 scaling_setspeed
static ssize_t store_scaling_setspeed(struct cpufreq_policy *policy,
const char *buf, size_t count)
{
unsigned int freq = 0;
unsigned int ret;
if (!policy->governor || !policy->governor->store_setspeed)
return -EINVAL;
ret = sscanf(buf, "%u", &freq);
if (ret != 1)
return -EINVAL;
policy->governor->store_setspeed(policy, freq);
return count;
}
policy只规定了频率调整的一个范围,如果driver不支持setpolicy操作,则需要由cpufreq governor确定具体的频率值,并调用driver的target或者target_index接口,修改CPU的频率值。
5 cpufreq governer
cpufreq policy负责设定cpu调频的一个大致范围,而cpu的具体运行频率,则需要由相应的cufreq governor决定。
5.1 Cpufreq governor的实现
5.1.1 struct cpufreq_governor
/* include/linux/cpufreq.h */
struct cpufreq_governor {
char name[CPUFREQ_NAME_LEN]; //governor的唯一标识
int initialized; //governor初始化标志
int (*governor) (struct cpufreq_policy *policy,
unsigned int event);
ssize_t (*show_setspeed) (struct cpufreq_policy *policy,
char *buf);
int (*store_setspeed) (struct cpufreq_policy *policy,
unsigned int freq);
unsigned int max_transition_latency; /* HW must be able to switch to
next freq faster than this value in nano secs or we
will fallback to performance governor */
struct list_head governor_list;
struct module *owner;
};
show_setspeed和store_setspeed两个回调函数,用于响应用户空间的scaling_setspeed请求。
governor,cpufreq governor的主要功能都是通过该回调函数实现,该函数借助不同的event,以状态机的形式,实现governor的启动、停止等操作。
5.1.2 governor event
kernel将governor的控制方式抽象为下面的5个event,cpufreq core在合适的时机,以event的形式(.governor回调),控制governor完成相应的调频动作。
/* include/linux/cpufreq.h */
/* Governor Events */
#define CPUFREQ_GOV_START 1
#define CPUFREQ_GOV_STOP 2
#define CPUFREQ_GOV_LIMITS 3
#define CPUFREQ_GOV_POLICY_INIT 4
#define CPUFREQ_GOV_POLICY_EXIT 5
CPUFREQ_GOV_POLICY_INIT,policy启动新的governor之前(通常在cpufreq policy刚创建或者governor改变时)发送。governor接收到这个event之后,会进行前期的准备工作,例如初始化一些必要的数据结构(如timer)等。并不是所有governor都需要这个event。
CPUFREQ_GOV_START启动governor。
CPUFREQ_GOV_STOP、CPUFREQ_GOV_POLICY_EXIT,和前面两个event的意义相反。
CPUFREQ_GOV_LIMITS,通常在governor启动后发送,要求governor检查并修改频率值,使其在policy规定的有效范围内。
5.1.3 governor register
所有governor都是通过cpufreq_register_governor注册到kernel中的,该接口比较简单,查找是否有相同名称的governor已经注册,如果没有,将这个governor挂到全局的链表即可,如下:
int cpufreq_register_governor(struct cpufreq_governor *governor)
{
int err;
if (!governor)
return -EINVAL;
if (cpufreq_disabled())
return -ENODEV;
mutex_lock(&cpufreq_governor_mutex);
err = -EBUSY;
if (!find_governor(governor->name)) {
err = 0;
list_add(&governor->governor_list, &cpufreq_governor_list);
}
mutex_unlock(&cpufreq_governor_mutex);
return err;
}
EXPORT_SYMBOL_GPL(cpufreq_register_governor);
5.2 governor相关的调用流程
5.2.1 启动流程
添加cpufreq设备时,会调用cpufreq_init_policy,该接口的主要功能是为当前的cpufreq policy分配并启动一个cpufreq governor,如下:
static void cpufreq_init_policy(struct cpufreq_policy *policy)
{
struct cpufreq_governor *gov = NULL;
struct cpufreq_policy new_policy;
int ret = 0;
memcpy(&new_policy, policy, sizeof(*policy));
//热拔插前判断是否有governor,有则使用当前的
/* Update governor of new_policy to the governor used before hotplug */
gov = __find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
if (gov)
pr_debug("Restoring governor %s for cpu %d\n",
policy->governor->name, policy->cpu);
else //没有,则使用默认的,默认可通过kernel配置,如:performance
gov = CPUFREQ_DEFAULT_GOVERNOR;
new_policy.governor = gov;
/* Use the default policy if its valid. */
if (cpufreq_driver->setpolicy)
cpufreq_parse_governor(gov->name, &new_policy.policy, NULL);
/* set default policy */
ret = cpufreq_set_policy(policy, &new_policy);
if (ret) {
pr_debug("setting policy failed\n");
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
}
}
20~21行:如果cpufreq driver提供了setpolicy接口,则说明CPU可以在policy指定的有效范围内,确定具体的运行频率,因此不再需要governor确定运行频率。但如果此时的governor是performace和powersave两种,则有必要通知到cpufreq driver,以便它的setpolicy接口可以根据实际情况正确设置频率范围。怎么通知呢?通过struct cpufreq_policy结构中的policy变量(名字很费解啊!),可选的值有两个,CPUFREQ_POLICY_PERFORMANCE和CPUFREQ_POLICY_POWERSAVE。
5.2.2 调频流程
1)有两种类型的cpu:一种只需要给定调频范围,cpu会在该范围内自行确定运行频率;另一种需要软件指定具体的运行频率。
2)对第一种cpu,cpufreq policy中会指定频率范围policy->{min, max},之后通过setpolicy接口,使其生效即可。
3)对第二种cpu,cpufreq policy在指定频率范围的同时,会指明使用的governor。governor在启动后,会动态的(例如启动一个timer,监测系统运行情况,并根据负荷调整频率),或者静态的(直接设置为某一个合适的频率值),设定cpu运行频率。
kernel document对这个过程有详细的解释,如下:
Documentation\cpu-freq\governors.txt
CPU can be set to switch independently | CPU can only be set
within specific "limits" | to specific frequencies
"CPUfreq policy"
consists of frequency limits (policy->{min,max})
and CPUfreq governor to be used
/ \
/ \
/ the cpufreq governor decides
/ (dynamically or statically)
/ what target_freq to set within
/ the limits of policy->{min,max}
/ \
/ \
Using the ->setpolicy call, Using the ->target/target_index call,
the limits and the the frequency closest
"policy" is set. to target_freq is set.
It is assured that it
is within policy->{min,max}
5.3常用的governor
1)Performance
性能优先的governor,直接将cpu频率设置为policy->{min,max}中的最大值。
2)Powersave
功耗优先的governor,直接将cpu频率设置为policy->{min,max}中的最小值。
3)Userspace
由用户空间程序通过scaling_setspeed文件修改频率。
4)Ondemand
根据CPU的当前使用率,动态的调节CPU频率。
5)Conservative
类似Ondemand,不过频率调节的会平滑一下,不会忽然调整为最大值,又忽然调整为最小值。
6 cpufreq调频策略
当前系统支持以下调频策略:
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors
ondemand powersave userspace performance
- ondemand 动态调频模式
- powersave 节能模式
- userspace 应用模式
- performance 性能模式
6.1 系统模式切换
# echo powersave > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
powersave
#
# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
performance
6.2 动态调频策略
如果想让系统动态调频,可以考虑用ondemand策略,量产项目请勿使用此模式。
此时可以随时看各个频点的时间统计,如下:
# cat /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state
12000 132779
38400 16944
76800 450
96000 980
128000 6482
192000 617
384000 3002
768000 47334
6.3 特定频率调频
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
12000 24000 30000 40000 60000 120000 240000 378000 408000 450000 480000 600000 612000 696000 708000 756000
#
# echo userspace > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# userspace
#
# echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
# cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
# 600000
7 Q&A
7.1 如何查看频率表
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
12000 24000 30000 40000 60000 120000 240000 378000 408000 450000 480000 600000 612000 696000 708000 756000
7.2 如何查看当前频率
# cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
# 600000
7.3 如何查看当前模式
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# userspace
7.4 如何切换当前模式
# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
7.5 如何单独调整频率
# echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed