Linux cpufreq子系统

Linux cpufreq子系统

目录

前言

修订记录

日期作者版本修改说明
2023.08.15枫潇潇V1.0.0初始版本

1 Linux cpufreq概述

Cpufreq framework的功能称作动态电压/频率调整(Dynamic Voltage/Frequency Scaling, DVFS)。通过调整CPU的电压和频率,可以在功耗和性能之间找到一个平衡点。在不需要高性能时,降低电压和频率,以降低功耗;在需要高性能时,提高电压和频率,以提高性能。要达到此目的,有两个关键点:

  • 1)如何控制CPU core的电压和频率;
  • 2)何时改变CPU core的电压和频率。

针对这两个关键点,CPU core有两种实现。

  • 实现1:CPU core根据自身的负荷,自动调整电压和频率,不需要OS级别的软件参与。
    这种实现,软件复杂度非常低,通常情况下,只需要告诉CPU core电压和频率的调整范围(通过频率表示,scaling_min_freq和scaling_max_freq,也称作policy),CPU core即可自行调整。因此:
    关键点1,由CPU core自行处理;
    关键点2,OS需要根据大致的应用场景(例如,是高性能场景,还是低性能场景),设定一个频率范围,改变时机,由CPU core自行决定。
    注1:由于软件参与度小,该实现的省电效率可能较低

  • 实现2:CPU core不参与任何的逻辑动作,由OS软件根据系统运行情况,调整电压和频率。
    这种实现,几乎完全由软件掌控DVFS行为:
    关键点1,基于clock framework和regulator framework提供的接口,控制CPU core的频率和电压;
    关键点2,根据应用场景,手动(用户发起,例如省电模式)或者自动(软件自动调整,例如HMP)的调整。
    注2:对关键点2来说,如果调整比较频繁,则需要CPU core在不同频率之间转换的速度足够快,后面会详细介绍。

为了实现上述功能需求,cpufreq framework抽象出cpufreq driver、cpufreq policy(策略)、cpufreq governor等多个软件实体。

2 软件架构

在这里插入图片描述

对下,cpufreq framework基于cpu subsystem driver、OPP、clock framework、regulator framework等模块,提供对CPU core频率和电压的控制。这一部分主要由cpufreq driver实现。

对上,cpufreq framework会通过cpufreq core、cpufreq governors、cpufreq stats等模块,以sysfs的形式,向用户空间提供cpu frequency的查询、控制等接口。同时,在频率改变的时候,通过notifier通知关心的driver。

内部,cpufreq framework包括cpufreq core、cpufreq driver、cpufreq governors、cpufreq stats等模块。

3 软件模块的功能

3.1 Cpufreq core

cpufreq core是cpufreq framework的核心模块,和kernel其它framework类似,它主要实现三类功能:

对上,以sysfs的形式向用户空间提供统一的接口,以notifier的形式向其它driver提供频率变化的通知;

对下,提供CPU core频率和电压控制的驱动框架,方便底层driver的开发;同时,提供governor框架,用于实现不同的频率调整机制;

内部,封装各种逻辑,实现所需功能。这些逻辑主要围绕struct cpufreq_driver、struct cpufreq_policy和struct cpufreq_governor三个数据结构进行,下面会详细分析。

3.1.1 struct cpufreq_drive
struct cpufreq_driver {
	struct module *owner;
	char			name[CPUFREQ_NAME_LEN];	//名字唯一
	u8			flags;

	/* needed by all drivers */
	int	(*init)		(struct cpufreq_policy *policy);	//必须实现,填充policy内容
	int	(*verify)	(struct cpufreq_policy *policy);	//必须实现,验证policy的内容是否符合硬件要求
	
	/* define one out of two */
	用于设置CPU core动态频率调整的范围(即policy)
	int	(*setpolicy)	(struct cpufreq_policy *policy);
	//设定CPU指定频率,接口为旧接口
	int	(*target)	(struct cpufreq_policy *policy,
				 unsigned int target_freq,
				 unsigned int relation);
	
	/* should be defined, if possible */
	unsigned int	(*get)	(unsigned int cpu);
	
	/* optional */
	unsigned int (*getavg)	(struct cpufreq_policy *policy,
				 unsigned int cpu);
	int	(*bios_limit)	(int cpu, unsigned int *limit);
	
	int	(*exit)		(struct cpufreq_policy *policy);
	int	(*suspend)	(struct cpufreq_policy *policy);
	int	(*resume)	(struct cpufreq_policy *policy);
	struct freq_attr	**attr;
};

相关API:

int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);

const char *cpufreq_get_current_driver(void);
void *cpufreq_get_driver_data(void);
3.1.2 struct cpufreq_cpuinfo
struct cpufreq_cpuinfo {
	unsigned int		max_freq;
	unsigned int		min_freq;

	/* in 10^(-9) s = nanoseconds */
	unsigned int		transition_latency;
};

cpuinfo,该cpu调频相关的固定信息,包括最大频率、最小频率、切换延迟,其中最大频率、最小频率可以通过frequency table推导得出

3.1.3 struct cpufreq_policy
struct cpufreq_policy {
	cpumask_var_t		cpus;	/* CPUs requiring sw coordination */
	cpumask_var_t		related_cpus; /* CPUs with any coordination */
	unsigned int		    shared_type; /* ANY or ALL affected CPUs
						should set cpufreq */
	unsigned int		    cpu;    /* cpu nr of registered CPU */
	struct cpufreq_cpuinfo	cpuinfo;/* see above */

	unsigned int		min;    /* in kHz */
	unsigned int		max;    /* in kHz */
	unsigned int		cur;    /* in kHz, only needed if cpufreq * governors are used */
	unsigned int		policy; /* see above */
	
	//不能自动调频的CPU,需要governor设置具体的频率
	struct cpufreq_governor	*governor; /* see below */
	
	struct work_struct	update; /* if update_policy() needs to be * called, but you're in IRQ context */
	
	struct cpufreq_real_policy	user_policy;
	
	struct kobject		kobj;
	struct completion	kobj_unregister;
};
3.1.4 struct cpufreq_governor
struct cpufreq_governor {
	char	name[CPUFREQ_NAME_LEN];
	//用于governor状态切换的回调函数
	int	(*governor)	(struct cpufreq_policy *policy,  unsigned int event);

	//用于提供 sysfs setspeed attribute文件的回调函数
	ssize_t	(*show_setspeed)	(struct cpufreq_policy *policy, char *buf);
	int	(*store_setspeed)	(struct cpufreq_policy *policy, unsigned int freq);
	//该governor所能容忍的最大频率切换延迟
	unsigned int max_transition_latency; /* HW must be able to switch to
			next freq faster than this value in nano secs or we
			will fallback to performance governor */
	struct list_head	governor_list;
	struct module		*owner;
};

3.2 cpufreq drivers

3.2.1 cpufreq driver编写步骤

Cpufreq driver主要完成平台的相关的CPU频率/电压的控制,他在cpufreq framework中是非常简单的模块,编写步骤如下:

  • 1)平台相关的初始化动作,包括CPU core的clock/regulator获取、初始化等。
  • 2)生成frequency table,即CPU core所支持的频率/电压列表。并在初始化时将该table保存在policy中。
  • 3)定义一个struct cpufreq_driver变量,填充必要的字段,并根据平台的特性,实现其中的回调函数。
  • 4)调用cpufreq_register_driver将driver注册到cpufreq framework中。
  • 5)cpufreq core会在CPU设备添加时,调用driver的init接口。driver需要在该接口中初始化struct cpufreq_policy变量。
  • 6)系统运行过程中,cpufreq core会根据实际情况,调用driver的setpolicy或者target/target_index等接口,设置CPU的调频策略或者频率值。
  • 7)系统suspend的时中,会将CPU的频率设置为指定的值,或者调用driver的suspend回调函数;系统resume时,调用driver的resume回调函数。
3.2.2 cpufreq driver有关的API即功能分析
3.2.2.1 frequency table

frequency table是CPU core可以正确运行的一组频率/电压组合,一般情况下,会在项目启动的初期,通过“try频点”的方法,确定出稳定性、通用性都符合要求的频点。

frequency table之所以存在的一个思考点是:table是频率和电压之间的一个一一对应的组合,因此cpufreq framework只需要关心频率,所有的策略都称做“调频”策略。而cpufreq driver可以在“调频”的同时,通过table取出和频率对应的电压,进行修改CPU core电压,实现“调压”的功能。

/* Special Values of .frequency field */
#define CPUFREQ_ENTRY_INVALID   ~0u
#define CPUFREQ_TABLE_END       ~1u
/* Special Values of .flags field */
#define CPUFREQ_BOOST_FREQ      (1 << 0)
 
struct cpufreq_frequency_table {
	unsigned int    flags;
	unsigned int    driver_data; /* driver specific data, not used by core */
	unsigned int    frequency;   /* kHz - doesn't need to be in ascending order */
};

flags,现在只有一个----CPUFREQ_BOOST_FREQ,表示这个频率值是一个boost频率。

3.2.2.2 struct cpufreq_drive
struct cpufreq_driver {
	char		name[CPUFREQ_NAME_LEN];
	u8		flags;
	void		*driver_data;

	/* needed by all drivers */
	int		(*init)(struct cpufreq_policy *policy);
	int		(*verify)(struct cpufreq_policy *policy);

	/* define one out of two */
	int		(*setpolicy)(struct cpufreq_policy *policy);

	/*
	 * On failure, should always restore frequency to policy->restore_freq
	 * (i.e. old freq).
	 */
	int		(*target)(struct cpufreq_policy *policy,
				  unsigned int target_freq,
				  unsigned int relation);	/* Deprecated */
	int		(*target_index)(struct cpufreq_policy *policy,
					unsigned int index);
	unsigned int	(*fast_switch)(struct cpufreq_policy *policy,
				       unsigned int target_freq);

	/*
	 * Caches and returns the lowest driver-supported frequency greater than
	 * or equal to the target frequency, subject to any driver limitations.
	 * Does not set the frequency. Only to be implemented for drivers with
	 * target().
	 */
	unsigned int	(*resolve_freq)(struct cpufreq_policy *policy,
					unsigned int target_freq);

	/*
	 * Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION
	 * unset.
	 *
	 * get_intermediate should return a stable intermediate frequency
	 * platform wants to switch to and target_intermediate() should set CPU
	 * to to that frequency, before jumping to the frequency corresponding
	 * to 'index'. Core will take care of sending notifications and driver
	 * doesn't have to handle them in target_intermediate() or
	 * target_index().
	 *
	 * Drivers can return '0' from get_intermediate() in case they don't
	 * wish to switch to intermediate frequency for some target frequency.
	 * In that case core will directly call ->target_index().
	 */
	unsigned int	(*get_intermediate)(struct cpufreq_policy *policy,
					    unsigned int index);
	int		(*target_intermediate)(struct cpufreq_policy *policy,
					       unsigned int index);

	/* should be defined, if possible */
	unsigned int	(*get)(unsigned int cpu);

	/* optional */
	int		(*bios_limit)(int cpu, unsigned int *limit);

	int		(*exit)(struct cpufreq_policy *policy);
	void		(*stop_cpu)(struct cpufreq_policy *policy);
	int		(*suspend)(struct cpufreq_policy *policy);
	int		(*resume)(struct cpufreq_policy *policy);

	/* Will be called after the driver is fully initialized */
	void		(*ready)(struct cpufreq_policy *policy);

	struct freq_attr **attr;

	/* platform specific boost support code */
	bool		boost_enabled;
	int		(*set_boost)(int state);
};

  • 1)init函数的实现:
    init回调函数是cpufreq driver的入口,由cpufreq core在CPU device添加之后调用,其主要功能就是初始化policy变量(把它想象成cpufreq device)。

    对driver而言,不需要太关心struct cpufreq_policy的内部实现(其实cpufreq framework也在努力实现这个目标,包括将相应的初始化过程封装成一个API等)。

    对driver而言,需要在init中初始化policy的如下内容:
    cpus,告诉cpufreq core,该policy适用于哪些cpu。大多数情况下,系统中所有的cpu core都由相同的硬件逻辑,统一控制cpu frequency,因此只需要一个policy,就可以管理所有的cpu core。

​ clk,clock指针,cpufreq core可以利用该指针,获取当前实际的frequency值。

​ cpuinfo,该cpu调频相关的固定信息,包括最大频率、最小频率、切换延迟,其中最大频率、最小频率可以通过frequency table推导得出。

​ min、max,调频策略所对应的最小频率、最大频率,初始化时,可以和上面的cpuinfo中的min、max相同。

​ freq_table,所对应的frequency table。

初始化policy的接口:

int cpufreq_generic_init(struct cpufreq_policy *policy,
                   struct cpufreq_frequency_table *table,
                	unsigned int transition_latency);
  • 2)verify回调函数
    当上层软件需要设定一个新的policy时,将调用driver的verify回调函数,检查该policy是否合法。Cpufreq core封装了下面两个接口,辅助完成该功能:
int cpufreq_frequency_table_verify(struct cpufreq_policy *policy,
                               struct cpufreq_frequency_table *table);
int cpufreq_generic_frequency_table_verify(struct cpufreq_policy *policy);

具体实现方法:
cpufreq_frequency_table_verify根据指定的frequency table,检查policy是否合法,检查逻辑很简单:policy的频率范围{min,max},是否超出policy->cpuinfo的频率范围,是否超出frequency table中的频率范围。

cpufreq_generic_frequency_table_verify更简单,它以policy中保存的frequency table为参数(policy->freq_table),调用cpufreq_frequency_table_verify接口。

在这里先提一下cpufreq framework中“频率”的几个层次。
1)最底层,是frequency table中定义的频率,有限的离散频率,代表了cpu的调频能力。
2)往上,是policy->cpuinfo中的频率范围,它对cpu调频进行的简单的限制,该限制可以和frequency table一致,也可以小于table中的范围。必须在driver初始化时给定,之后不能再修改。
3)再往上,是policy的频率范围,代表调频策略。对于可以自动调频的CPU,只需要把这个范围告知CPU即可,此时它是调频的基本单位。对于不可以自动调频的CPU,它是软件层面的一个限制。该范围也可以通过sysfs修改。
4)最上面,是policy中的频率值,对那些不可以调频的CPU,该值就是CPU的运行频率。

  • 3)setpolicy回调函数
    对于自动调频的CPU,driver需要提供该接口,将调频范围告知CPU。

  • 4)target_index 回调函数
    对于不可以自动调频的CPU,该接口用于指定CPU的运行频率。Index表示frequency table中index。

  • 5)get_intermediate、target_intermediate,在没有提供target接口的时候使用,希望看这篇文章对的工程师不要使用。

  • 6)get回调函数
    用于获取指定cpu的频率值,如果可以的话,driver应尽可能提供。

  • 7)exit,和init对应,在CPU device被remove时调用。

  • 8)stop_cpu,在CPU被stop时调用。

  • 9)suspend、resume回调函数
    系统给suspend的时候,clock、regulator等driver有可能被suspend,因此需要在这之前将CPU设置为一个确定的频率值。driver可以通过suspend回调设置,也可以通过policy中的suspend_freq字段设置(cpufreq core会自动切换)。

3.2.2.3 cpufreq_driver flags
/* flags */
#define CPUFREQ_STICKY          (1 << 0)        /* driver isn't removed 												even if all ->init() calls failed */
   
//表示频率的调整,不影响loops_per_jiffy等kernel常来的计算
#define CPUFREQ_CONST_LOOPS     (1 << 1)        /* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */
   
//suspend/resume过程的相关flag
#define CPUFREQ_PM_NO_WARN      (1 << 2)        /* don't warn on suspend/resume speed mismatches */
    
/*
 * This should be set by platforms having multiple clock-domains, i.e.
 * supporting multiple policies. With this sysfs directories of governor would
 * be created in cpu/cpu/cpufreq/ directory and so they can use the same
 * governor with different tunables for different clusters.
  */
// 表示不同的CPU,有不同的频率控制方式,cpufreq core会每一个CPU创建一个cpufreq调频
// 接口。否则,一个调频接口可以实现所有CPU的频率。
#define CPUFREQ_HAVE_GOVERNOR_PER_POLICY (1 << 3)
 
/*
 * Driver will do POSTCHANGE notifications from outside of their ->target()
 * routine and so must set cpufreq_driver->flags with this flag, so that core
 * can handle them specially.
 */
#define CPUFREQ_ASYNC_NOTIFICATION  (1 << 4)
  
/*
 * Set by drivers which want cpufreq core to check if CPU is running at a
 * frequency present in freq-table exposed by the driver. For these drivers if
 * CPU is found running at an out of table freq, we will try to set it to a freq
 * from the table. And if that fails, we will stop further boot process by
 * issuing a BUG_ON().
 */
#define CPUFREQ_NEED_INITIAL_FREQ_CHECK (1 << 5)
3.2.2.4 cpufreq_register_driver
int cpufreq_register_driver(struct cpufreq_driver *driver_data);

该接口简单,只需实现cpufreq_driver中的必须内容及场景所需的内容即可。

4 cpufreq core

4.1 供上层用户空间的接口

cpufreq framework通过sysfs向用户空间提供接口,具体如下:

/sys/devices/system/cpu/cpu0/cpufreq/ 
|-- affected_cpus 
|-- cpuinfo_cur_freq 	//readonly,cpu core的当前频率
|-- cpuinfo_max_freq 	//readonly
|-- cpuinfo_min_freq 	//readonly
|-- cpuinfo_transition_latency //readonly,频率转换延迟
|-- related_cpus 
|-- scaling_available_frequencies 
|-- scaling_available_governors 
|-- scaling_cur_freq 
|-- scaling_driver 
|-- scaling_governor 
|-- scaling_max_freq 
|-- scaling_min_freq 
|-- scaling_setspeed 
`—stats 
    |-- time_in_state 
    |-- total_trans 
`-- trans_table

scaling_max_freq和scaling_min_freq表示调频策略所允许的最大和最小频率,对于可以自动调整频率的cpu,修改它们,就是最终的频率调整。
对不能自动调整频率的cpu,则需要通过其它方式,主动的设置cpu频率,这些都是由具体的governor完成。其中有一个特例:
如果使用的governor是“userspace” governor,则可以通过scaling_setspeed节点,直接修改cpu频率。

4.2 频率调整步骤

开始分析之前,我们先以“userspace” governor为例,介绍一下频率调整的步骤。“userspace”governor是所有governor中最简单的一个,同时又是驱动工程师比较常用的一个,借助它,可以从用户空间修改cpu的频率,操作方法如下(为了简单,以shell脚本的形式给出):

cd /sys/devices/system/cpu/cpu0/cpufreq/

cat cpuinfo_max_freq; cat cpuinfo_min_freq            #获取“物理”上的频率范围 
cat scaling_available_frequencies                     #获取可用的频率列表 
cat scaling_available_governors                      #获取可用的governors 
cat scaling_governor                               #当前的governor 
cat cpuinfo_cur_freq; cat scaling_cur_freq    #获取当前的频率信息,可以比较一下是否不同

cat scaling_max_freq; cat scaling_min_freq   #获取当前调频策略所限定的频率范围

#假设CPU不可以自动调整频率 
echo userspace > scaling_governor           #governor切换为userspace

#如果需要切换的频率值在scaling_available_frequencies内,且在cpuinfo_max_freq/cpuinfo_min_freq的范围内。

#如果需要切换的频率不在scaling_max_freq/scaling_min_freq的范围内,修改这两个值 
echo xxx > scaling_max_freq; echo xxx > scaling_min_freq 

#最后,设置频率值 KHz
echo xxx > scaling_setspeed

4.3 内部逻辑

4.3.1 初始化
4.3.1.1 cpufreq_interface

cpufreq driver注册时,会调用subsys_interface_register接口,注册一个subsystem interface,该interface的定义如下:

/* drivers/cpufreq/cpufreq.c */
 static struct subsys_interface cpufreq_interface = {
	.name		= "cpufreq",
	.subsys		= &cpu_subsys,
	.add_dev	= cpufreq_add_dev,
	.remove_dev	= cpufreq_remove_dev,
};

该interface的subsys是“cpu_subsys”,就是cpu bus(struct bus_type cpu_subsys),提供了add_dev和remove_dev两个回调函数

4.3.1.2 __cpufreq_add_dev
static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
{
    unsigned int j, cpu = dev->id;
    int ret = -ENOMEM;
    struct cpufreq_policy *policy;
    unsigned long flags;
    bool recover_policy = cpufreq_suspended;
#ifdef CONFIG_HOTPLUG_CPU
    struct cpufreq_policy *tpolicy;
#endif
    //cpu offline 直接返回
    if (cpu_is_offline(cpu))
        return 0;

    pr_debug("adding CPU %u\n", cpu);

    #ifdef CONFIG_SMP//多核CPU core使用相同的cpufreq policy的情况处理
    /* check whether a different CPU already registered this
	 * CPU because it is in the same boat. */
    policy = cpufreq_cpu_get(cpu);
    if (unlikely(policy)) {	//注册时,判断policy是否有代劳
        cpufreq_cpu_put(policy);
        return 0;
    }
    #endif

    if (!down_read_trylock(&cpufreq_rwsem))
        return 0;
 
 #ifdef CONFIG_HOTPLUG_CPU
 	/* Check if this cpu was hot-unplugged earlier and has siblings */
	read_lock_irqsave(&cpufreq_driver_lock, flags);
 	list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) {
		if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) {
			read_unlock_irqrestore(&cpufreq_driver_lock, flags);
			ret = cpufreq_add_policy_cpu(tpolicy, cpu, dev);
 			up_read(&cpufreq_rwsem);
 			return ret;
 		}
 	}
 	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 
 	/* 分配cpufreq policy
 	 * Restore the saved policy when doing light-weight init and fall back
	 * to the full init if that fails.
	 */
	policy = recover_policy ? cpufreq_policy_restore(cpu) : NULL;
	if (!policy) {
		recover_policy = false;
		policy = cpufreq_policy_alloc();
		if (!policy)
			goto nomem_out;
	}
 
	/*
	 * In the resume path, since we restore a saved policy, the assignment
	 * to policy->cpu is like an update of the existing policy, rather than
	 * the creation of a brand new one. So we need to perform this update
	 * by invoking update_policy_cpu().
	 */
	if (recover_policy && cpu != policy->cpu)
 		WARN_ON(update_policy_cpu(policy, cpu, dev));
	else
		policy->cpu = cpu;
	
    //cpumask类型,记录该CPU可控制online的CPU
	cpumask_copy(policy->cpus, cpumask_of(cpu));

	init_completion(&policy->kobj_unregister);
	INIT_WORK(&policy->update, handle_update);

 	/* call driver. From then on the cpufreq must be able
 	 * to accept all calls to ->verify and ->setpolicy for this CPU
	 */	
    //调用cpufreq driver init函数
 	ret = cpufreq_driver->init(policy);
 	if (ret) {
		pr_debug("initialization failed\n");
 		goto err_set_policy_cpu;
 	}
 
 	/* related cpus should atleast have policy->cpus */
	cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);

 	/*
 	 * affected cpus must always be the one, which are online. We aren't
 	 * managing offline cpus here.
 	 */
	cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);

 	if (!recover_policy) {
 		policy->user_policy.min = policy->min;
 		policy->user_policy.max = policy->max;
 	}
 
	down_write(&policy->rwsem);
	write_lock_irqsave(&cpufreq_driver_lock, flags); //初始化所有其它共用
 	for_each_cpu(j, policy->cpus)//cpufreq policy的、处于online状态的CPU
 		per_cpu(cpufreq_cpu_data, j) = policy;//(policy->cpus)的
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);// cpufreq_cpu_data变量

 	if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
 		policy->cur = cpufreq_driver->get(policy->cpu);
 		if (!policy->cur) {
 			pr_err("%s: ->get() failed\n", __func__);
			goto err_get_freq;
		}
 	}
 
 	/*
 	 * Sometimes boot loaders set CPU frequency to a value outside of
 	 * frequency table present with cpufreq core. In such cases CPU might be
	 * unstable if it has to run on that frequency for long duration of time
 	 * and so its better to set it to a frequency which is specified in
	 * freq-table. This also makes cpufreq stats inconsistent as
 	 * cpufreq-stats would fail to register because current frequency of CPU
	 * isn't found in freq-table.
	 *
 	 * Because we don't want this change to effect boot process badly, we go
 	 * for the next freq which is >= policy->cur ('cur' must be set by now,
	 * otherwise we will end up setting freq to lowest of the table as 'cur'
	 * is initialized to zero).
 	 *
 	 * We are passing target-freq as "policy->cur - 1" otherwise
	 * __cpufreq_driver_target() would simply fail, as policy->cur will be
	 * equal to target-freq.
	 */
    //若定义CPUFREQ_NEED_INITIAL_FREQ_CHECK
 	if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK)
	    && has_target()) {	//检查cur freq是否在范围内
		/* Are we running at unknown frequency ? */
		ret = cpufreq_frequency_table_get_index(policy, policy->cur);
		if (ret == -EINVAL) {
 			/* Warn user and fix it */
 			pr_warn("%s: CPU%d: Running at unlisted freq: %u KHz\n",
				__func__, policy->cpu, policy->cur);
 			ret = __cpufreq_driver_target(policy, policy->cur - 1,
 				CPUFREQ_RELATION_L);	//重新设置频率

 			/*
			 * Reaching here after boot in a few seconds may not
 			 * mean that system will remain stable at "unknown"
			 * frequency for longer duration. Hence, a BUG_ON().
			 */
			BUG_ON(ret);
 		pr_warn("%s: CPU%d: Unlisted initial frequency changed to: %u KHz\n",
 				__func__, policy->cpu, policy->cur);
 		}
 	}
 
	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				     CPUFREQ_START, policy);
 
 	if (!recover_policy) {
 		ret = cpufreq_add_dev_interface(policy, dev);
 		if (ret)
 			goto err_out_unregister;
		blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				CPUFREQ_CREATE_POLICY, policy);
 	}

 	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	list_add(&policy->policy_list, &cpufreq_policy_list);//添加policy到链表
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	cpufreq_init_policy(policy);//为新建的policy分配governor,并调用
 	// cpufreq_set_policy接口,未该CPU配置一个默认的policy
 	if (!recover_policy) {
 		policy->user_policy.policy = policy->policy;
 		policy->user_policy.governor = policy->governor;
 	}
 	up_write(&policy->rwsem);

 	kobject_uevent(&policy->kobj, KOBJ_ADD);
 	up_read(&cpufreq_rwsem);
 
 	pr_debug("initialization complete\n");
 
 	return 0;
 
err_out_unregister:
err_get_freq:
	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	for_each_cpu(j, policy->cpus)
 		per_cpu(cpufreq_cpu_data, j) = NULL;
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);

 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(policy);
 err_set_policy_cpu:
 	if (recover_policy) {
 		/* Do not leave stale fallback data behind. */
 		per_cpu(cpufreq_cpu_data_fallback, cpu) = NULL;
 		cpufreq_policy_put_kobj(policy);
 	}
 	cpufreq_policy_free(policy);
 
 nomem_out:
 	up_read(&cpufreq_rwsem);
 
 	return ret;
 }
4.3.1.3 多coreCPU共用cpufreq policy

在SMP系统中,多个CPU core可能会由相同的调频机制(其实大多数平台都是这样的)控制,也就是说,所有CPU core的频率和电压,是同时调节的。这种情况下,只需要创建一个cpufreq policy即可,涉及到的代码逻辑包括:

a)primary CPU枚举时,cpufreq_add_dev会调用cpufreq driver的init接口(cpufreq_driver->init),driver需要根据当前的系统情况,设置policy->cpus,告诉cpufreq core哪些CPU共用同一个cpufreq policy。

b)primary CPU的cpufreq_add_dev继续执行,初始化policy->related_cpus,并将policy->cpus中处于offline状态的CPU剔除。具体可参考上面的代码分析。

c)primary CPU的cpufreq_add_dev继续执行,创建sysfs接口,同时为policy->cpus中的其它CPU创建相应的符号链接。

d)secondary CPUs枚举,执行cpufreq_add_dev,判断primary CPU已经代劳之后,直接退出。

e)对于hotplugable的CPU,hotplug in时,由于primary CPU没有帮忙创建sysfs的符号链接,或者hotplug out的时候符号链接被删除,因此需要重新创建。

4.3.2 频率调整

cpufreq framework的频率调整逻辑,总结如下:
通过调整policy(struct cpufreq_policy),确定CPU频率调整的一个大方向,主要是由min_freq和max_freq组成的频率范围;通过cpufreq governor,确定最终的频率值。

4.3.2.1 cpufreq_set_policy

cpufreq_set_policy用来设置一个新的cpufreq policy,调用的时机包括:
a)初始化时(__cpufreq_add_dev->cpufreq_init_policy->cpufreq_set_policy),将cpufreq_driver->init时提供的基础policy,设置生效。

b)修改scaling_max_freq或scaling_min_freq时(store_one->cpufreq_set_policy),将用户空间设置的新的频率范围,设置生效。

c)修改cpufreq governor时(scaling_governor->store_scaling_governor->cpufreq_set_policy),更新governor。

static int cpufreq_set_policy(struct cpufreq_policy *policy,
 				struct cpufreq_policy *new_policy)
 {
 	struct cpufreq_governor *old_gov;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
 		 new_policy->cpu, new_policy->min, new_policy->max);
 
 	memcpy(&new_policy->cpuinfo, &policy->cpuinfo, sizeof(policy->cpuinfo));

 	if (new_policy->min > policy->max || new_policy->max < policy->min)
 		return -EINVAL;
 	
    //调用driver中的verify接口,判断policy是否有效
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
 	if (ret)
 		return ret;
 
 	/* adjust if necessary - all reasons */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_ADJUST, new_policy);
 	
    // CPUFREQ_INCOMPATIBLE,notifier机制回调
 	/* adjust if necessary - hardware incompatibility*/
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_INCOMPATIBLE, new_policy);
 
 	/*
 	 * verify the cpu speed can be set within this limit, which might be
 	 * different to the first one
 	 */
 	ret = cpufreq_driver->verify(new_policy);
 	if (ret)
 		return ret;
 
 	/* notification of the new policy */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_NOTIFY, new_policy);
 
 	policy->min = new_policy->min;
 	policy->max = new_policy->max;
 
 	pr_debug("new min and max freqs are %u - %u kHz\n",
 		 policy->min, policy->max);
 	
    // driver提供了setpolicy回调, CPU core可在指定范围内自行调整频率
 	if (cpufreq_driver->setpolicy) {
 		policy->policy = new_policy->policy;
 		pr_debug("setting range\n");
 		return cpufreq_driver->setpolicy(new_policy);
 	}
 	
    //如果新旧governor相同,直接返回
 	if (new_policy->governor == policy->governor)
 		goto out;
 
 	pr_debug("governor switch\n");
 
 	/* save old, working values */
 	old_gov = policy->governor;
 	/* end old governor */ 
    
    //就governor存在,则将其停止,流程如下:
 	if (old_gov) {          //CPUFREQ_GOV_STOP---->CPUFREQ_GOV_POLICY_EXIT 
 		__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 		up_write(&policy->rwsem);
 		__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 		down_write(&policy->rwsem);
 	}
 
 	/* start new governor */
 	policy->governor = new_policy->governor;
 	if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
 		if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
 			goto out;  //启动新的governor,流程是:
       //CPUFREQ_GOV_POLICY_INIT---->CPUFREQ_GOV_START---->CPUFREQ_GOV_LIMITS
 		up_write(&policy->rwsem);
 		__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 		down_write(&policy->rwsem);
 	}
 
 	/* new governor failed, so re-start old one */
 	pr_debug("starting governor %s failed\n", policy->governor->name);
 	if (old_gov) {
 		policy->governor = old_gov;
 		__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
 		__cpufreq_governor(policy, CPUFREQ_GOV_START);
 	}
 
 	return -EINVAL;
 
  out:
 	pr_debug("governor: change or update limits\n");
 	return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
}
4.3.2.2 scaling_setspeed
static ssize_t store_scaling_setspeed(struct cpufreq_policy *policy,
					const char *buf, size_t count)
{
	unsigned int freq = 0;
	unsigned int ret;

	if (!policy->governor || !policy->governor->store_setspeed)
		return -EINVAL;

	ret = sscanf(buf, "%u", &freq);
	if (ret != 1)
		return -EINVAL;

	policy->governor->store_setspeed(policy, freq);

	return count;
}

policy只规定了频率调整的一个范围,如果driver不支持setpolicy操作,则需要由cpufreq governor确定具体的频率值,并调用driver的target或者target_index接口,修改CPU的频率值。

5 cpufreq governer

cpufreq policy负责设定cpu调频的一个大致范围,而cpu的具体运行频率,则需要由相应的cufreq governor决定。

5.1 Cpufreq governor的实现

5.1.1 struct cpufreq_governor
/* include/linux/cpufreq.h */
struct cpufreq_governor {
    char    name[CPUFREQ_NAME_LEN]; //governor的唯一标识
    int     initialized;	//governor初始化标志
    int     (*governor)     (struct cpufreq_policy *policy,
                                 unsigned int event);
    ssize_t (*show_setspeed)        (struct cpufreq_policy *policy,
                                         char *buf);
    int     (*store_setspeed)       (struct cpufreq_policy *policy,
                                         unsigned int freq);
    unsigned int max_transition_latency; /* HW must be able to switch to
                             next freq faster than this value in nano secs or we
                             will fallback to performance governor */
    struct list_head        governor_list;
    struct module           *owner;
};

show_setspeed和store_setspeed两个回调函数,用于响应用户空间的scaling_setspeed请求。
governor,cpufreq governor的主要功能都是通过该回调函数实现,该函数借助不同的event,以状态机的形式,实现governor的启动、停止等操作。

5.1.2 governor event

kernel将governor的控制方式抽象为下面的5个event,cpufreq core在合适的时机,以event的形式(.governor回调),控制governor完成相应的调频动作。

/* include/linux/cpufreq.h */

/* Governor Events */
#define CPUFREQ_GOV_START       1
#define CPUFREQ_GOV_STOP        2
#define CPUFREQ_GOV_LIMITS      3
#define CPUFREQ_GOV_POLICY_INIT 4
#define CPUFREQ_GOV_POLICY_EXIT 5

CPUFREQ_GOV_POLICY_INIT,policy启动新的governor之前(通常在cpufreq policy刚创建或者governor改变时)发送。governor接收到这个event之后,会进行前期的准备工作,例如初始化一些必要的数据结构(如timer)等。并不是所有governor都需要这个event。
CPUFREQ_GOV_START启动governor。

CPUFREQ_GOV_STOP、CPUFREQ_GOV_POLICY_EXIT,和前面两个event的意义相反。
CPUFREQ_GOV_LIMITS,通常在governor启动后发送,要求governor检查并修改频率值,使其在policy规定的有效范围内。

5.1.3 governor register

所有governor都是通过cpufreq_register_governor注册到kernel中的,该接口比较简单,查找是否有相同名称的governor已经注册,如果没有,将这个governor挂到全局的链表即可,如下:

int cpufreq_register_governor(struct cpufreq_governor *governor)
{
	int err;

	if (!governor)
		return -EINVAL;

	if (cpufreq_disabled())
		return -ENODEV;

	mutex_lock(&cpufreq_governor_mutex);

	err = -EBUSY;
	if (!find_governor(governor->name)) {
		err = 0;
		list_add(&governor->governor_list, &cpufreq_governor_list);
	}

	mutex_unlock(&cpufreq_governor_mutex);
	return err;
}
EXPORT_SYMBOL_GPL(cpufreq_register_governor);

5.2 governor相关的调用流程

5.2.1 启动流程

添加cpufreq设备时,会调用cpufreq_init_policy,该接口的主要功能是为当前的cpufreq policy分配并启动一个cpufreq governor,如下:

static void cpufreq_init_policy(struct cpufreq_policy *policy)
{
         struct cpufreq_governor *gov = NULL;
        struct cpufreq_policy new_policy;
         int ret = 0;
 
         memcpy(&new_policy, policy, sizeof(*policy));
 		
    	//热拔插前判断是否有governor,有则使用当前的
         /* Update governor of new_policy to the governor used before hotplug */
         gov = __find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
         if (gov)
                 pr_debug("Restoring governor %s for cpu %d\n",
                                 policy->governor->name, policy->cpu);
         else	//没有,则使用默认的,默认可通过kernel配置,如:performance
                 gov = CPUFREQ_DEFAULT_GOVERNOR;
 
         new_policy.governor = gov;
 
         /* Use the default policy if its valid. */
         if (cpufreq_driver->setpolicy)
                 cpufreq_parse_governor(gov->name, &new_policy.policy, NULL);
 
         /* set default policy */
         ret = cpufreq_set_policy(policy, &new_policy);
         if (ret) {
                 pr_debug("setting policy failed\n");
                 if (cpufreq_driver->exit)
                         cpufreq_driver->exit(policy);
         }
}

20~21行:如果cpufreq driver提供了setpolicy接口,则说明CPU可以在policy指定的有效范围内,确定具体的运行频率,因此不再需要governor确定运行频率。但如果此时的governor是performace和powersave两种,则有必要通知到cpufreq driver,以便它的setpolicy接口可以根据实际情况正确设置频率范围。怎么通知呢?通过struct cpufreq_policy结构中的policy变量(名字很费解啊!),可选的值有两个,CPUFREQ_POLICY_PERFORMANCE和CPUFREQ_POLICY_POWERSAVE。

5.2.2 调频流程

1)有两种类型的cpu:一种只需要给定调频范围,cpu会在该范围内自行确定运行频率;另一种需要软件指定具体的运行频率。

2)对第一种cpu,cpufreq policy中会指定频率范围policy->{min, max},之后通过setpolicy接口,使其生效即可。

3)对第二种cpu,cpufreq policy在指定频率范围的同时,会指明使用的governor。governor在启动后,会动态的(例如启动一个timer,监测系统运行情况,并根据负荷调整频率),或者静态的(直接设置为某一个合适的频率值),设定cpu运行频率。

kernel document对这个过程有详细的解释,如下:

Documentation\cpu-freq\governors.txt
CPU can be set to switch independently   |         CPU can only be set  
                  within specific "limits"           |       to specific frequencies
                                 "CPUfreq policy" 
                consists of frequency limits (policy->{min,max}) 
                     and CPUfreq governor to be used 
                         /                    \ 
                        /                      \ 
                       /                       the cpufreq governor decides 
                      /                        (dynamically or statically) 
                     /                         what target_freq to set within 
                    /                          the limits of policy->{min,max} 
                   /                                \ 
                  /                                  \ 
        Using the ->setpolicy call,              Using the ->target/target_index call, 
            the limits and the                    the frequency closest 
             "policy" is set.                     to target_freq is set. 
                                                  It is assured that it 
                                                  is within policy->{min,max}

5.3常用的governor

1)Performance
性能优先的governor,直接将cpu频率设置为policy->{min,max}中的最大值。

2)Powersave
功耗优先的governor,直接将cpu频率设置为policy->{min,max}中的最小值。

3)Userspace
由用户空间程序通过scaling_setspeed文件修改频率。

4)Ondemand
根据CPU的当前使用率,动态的调节CPU频率。

5)Conservative
类似Ondemand,不过频率调节的会平滑一下,不会忽然调整为最大值,又忽然调整为最小值。

6 cpufreq调频策略

当前系统支持以下调频策略:

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors 
ondemand powersave userspace performance
  • ondemand 动态调频模式
  • powersave 节能模式
  • userspace 应用模式
  • performance 性能模式

6.1 系统模式切换

# echo powersave > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
powersave
#
# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
performance

6.2 动态调频策略

如果想让系统动态调频,可以考虑用ondemand策略,量产项目请勿使用此模式
此时可以随时看各个频点的时间统计,如下:

# cat /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state 
12000 132779
38400 16944
76800 450
96000 980
128000 6482
192000 617
384000 3002
768000 47334

6.3 特定频率调频

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
12000 24000 30000 40000 60000 120000 240000 378000 408000 450000 480000 600000 612000 696000 708000 756000 
#
# echo userspace > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# userspace 
#
# echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
# cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
# 600000

7 Q&A

7.1 如何查看频率表

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
12000 24000 30000 40000 60000 120000 240000 378000 408000 450000 480000 600000 612000 696000 708000 756000 

7.2 如何查看当前频率

# cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
# 600000

7.3 如何查看当前模式

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# userspace 

7.4 如何切换当前模式

# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 

7.5 如何单独调整频率

# echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed
  • 21
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值