【Linux kernel】Linux电源管理(4)_Power Management Interface

1. 前言

Linux电源管理中,相当多的部分是在处理Hibernate、Suspend、Runtime PM等功能。而这些功能都基于一套相似的逻辑,即“Power management interface”。该Interface的代码实现于“include/linux/pm.h”、“drivers/base/power/main.c”等文件中。主要功能是:对下,定义Device PM相关的回调函数,让各个Driver实现;对上,实现统一的PM操作函数,供PM核心逻辑调用。

因此在对Hibernate、Suspend、Runtime PM等功能解析之前,有必要先熟悉一下PM Interface,这就是本文的主要目的。

2. device PM callbacks

在一个系统中,数量最多的是设备,耗电最多的也是设备,因此设备的电源管理是Linux电源管理的核心内容。而设备电源管理最核心的操作就是:在合适的时机(如不再使用,如暂停使用),将设备置为合理的状态(如关闭,如睡眠)。这就是device PM callbacks的目的:定义一套统一的方式,让设备在特定的时机,步调一致的进入类似的状态(可以想象一下军训时的“一二一”口令)。

在旧版本的内核中,这些PM callbacks分布在设备模型的大型数据结构中,如struct bus_type中的suspend、suspend_late、resume、resume_late,如struct device_driver/struct class/struct device_type中的suspend、resume。很显然这样不具备良好的封装特性,因为随着设备复杂度的增加,简单的suspend、resume已经不能满足电源管理的需求,就需要扩充PM callbacks,就会不可避免的改动这些数据结构。

于是新版本的内核,就将这些Callbacks统一封装为一个数据结构----struct dev_pm_ops,上层的数据结构只需要包含这个结构即可。这样如果需要增加或者修改PM callbacks,就不用改动上层结构了(这就是软件设计中抽象和封装的生动体现,像艺术一样优雅)。当然,内核为了兼容旧的设计,也保留了上述的suspend/resume类型的callbacks,只是已不建议使用,本文就不再介绍它们了。

相信每一个熟悉了旧版本内核的Linux工程师,看到struct dev_pm_ops时都会虎躯一震,这玩意也太复杂了吧!不信您请看:

/**
 * struct dev_pm_ops - device PM callbacks.
 *
 * @prepare: The principal role of this callback is to prevent new children of
 *	the device from being registered after it has returned (the driver's
 *	subsystem and generally the rest of the kernel is supposed to prevent
 *	new calls to the probe method from being made too once @prepare() has
 *	succeeded).  If @prepare() detects a situation it cannot handle (e.g.
 *	registration of a child already in progress), it may return -EAGAIN, so
 *	that the PM core can execute it once again (e.g. after a new child has
 *	been registered) to recover from the race condition.
 *	This method is executed for all kinds of suspend transitions and is
 *	followed by one of the suspend callbacks: @suspend(), @freeze(), or
 *	@poweroff().  If the transition is a suspend to memory or standby (that
 *	is, not related to hibernation), the return value of @prepare() may be
 *	used to indicate to the PM core to leave the device in runtime suspend
 *	if applicable.  Namely, if @prepare() returns a positive number, the PM
 *	core will understand that as a declaration that the device appears to be
 *	runtime-suspended and it may be left in that state during the entire
 *	transition and during the subsequent resume if all of its descendants
 *	are left in runtime suspend too.  If that happens, @complete() will be
 *	executed directly after @prepare() and it must ensure the proper
 *	functioning of the device after the system resume.
 *	The PM core executes subsystem-level @prepare() for all devices before
 *	starting to invoke suspend callbacks for any of them, so generally
 *	devices may be assumed to be functional or to respond to runtime resume
 *	requests while @prepare() is being executed.  However, device drivers
 *	may NOT assume anything about the availability of user space at that
 *	time and it is NOT valid to request firmware from within @prepare()
 *	(it's too late to do that).  It also is NOT valid to allocate
 *	substantial amounts of memory from @prepare() in the GFP_KERNEL mode.
 *	[To work around these limitations, drivers may register suspend and
 *	hibernation notifiers to be executed before the freezing of tasks.]
 *
 * @complete: Undo the changes made by @prepare().  This method is executed for
 *	all kinds of resume transitions, following one of the resume callbacks:
 *	@resume(), @thaw(), @restore().  Also called if the state transition
 *	fails before the driver's suspend callback: @suspend(), @freeze() or
 *	@poweroff(), can be executed (e.g. if the suspend callback fails for one
 *	of the other devices that the PM core has unsuccessfully attempted to
 *	suspend earlier).
 *	The PM core executes subsystem-level @complete() after it has executed
 *	the appropriate resume callbacks for all devices.  If the corresponding
 *	@prepare() at the beginning of the suspend transition returned a
 *	positive number and the device was left in runtime suspend (without
 *	executing any suspend and resume callbacks for it), @complete() will be
 *	the only callback executed for the device during resume.  In that case,
 *	@complete() must be prepared to do whatever is necessary to ensure the
 *	proper functioning of the device after the system resume.  To this end,
 *	@complete() can check the power.direct_complete flag of the device to
 *	learn whether (unset) or not (set) the previous suspend and resume
 *	callbacks have been executed for it.
 *
 * @suspend: Executed before putting the system into a sleep state in which the
 *	contents of main memory are preserved.  The exact action to perform
 *	depends on the device's subsystem (PM domain, device type, class or bus
 *	type), but generally the device must be quiescent after subsystem-level
 *	@suspend() has returned, so that it doesn't do any I/O or DMA.
 *	Subsystem-level @suspend() is executed for all devices after invoking
 *	subsystem-level @prepare() for all of them.
 *
 * @suspend_late: Continue operations started by @suspend().  For a number of
 *	devices @suspend_late() may point to the same callback routine as the
 *	runtime suspend callback.
 *
 * @resume: Executed after waking the system up from a sleep state in which the
 *	contents of main memory were preserved.  The exact action to perform
 *	depends on the device's subsystem, but generally the driver is expected
 *	to start working again, responding to hardware events and software
 *	requests (the device itself may be left in a low-power state, waiting
 *	for a runtime resume to occur).  The state of the device at the time its
 *	driver's @resume() callback is run depends on the platform and subsystem
 *	the device belongs to.  On most platforms, there are no restrictions on
 *	availability of resources like clocks during @resume().
 *	Subsystem-level @resume() is executed for all devices after invoking
 *	subsystem-level @resume_noirq() for all of them.
 *
 * @resume_early: Prepare to execute @resume().  For a number of devices
 *	@resume_early() may point to the same callback routine as the runtime
 *	resume callback.
 *
 * @freeze: Hibernation-specific, executed before creating a hibernation image.
 *	Analogous to @suspend(), but it should not enable the device to signal
 *	wakeup events or change its power state.  The majority of subsystems
 *	(with the notable exception of the PCI bus type) expect the driver-level
 *	@freeze() to save the device settings in memory to be used by @restore()
 *	during the subsequent resume from hibernation.
 *	Subsystem-level @freeze() is executed for all devices after invoking
 *	subsystem-level @prepare() for all of them.
 *
 * @freeze_late: Continue operations started by @freeze().  Analogous to
 *	@suspend_late(), but it should not enable the device to signal wakeup
 *	events or change its power state.
 *
 * @thaw: Hibernation-specific, executed after creating a hibernation image OR
 *	if the creation of an image has failed.  Also executed after a failing
 *	attempt to restore the contents of main memory from such an image.
 *	Undo the changes made by the preceding @freeze(), so the device can be
 *	operated in the same way as immediately before the call to @freeze().
 *	Subsystem-level @thaw() is executed for all devices after invoking
 *	subsystem-level @thaw_noirq() for all of them.  It also may be executed
 *	directly after @freeze() in case of a transition error.
 *
 * @thaw_early: Prepare to execute @thaw().  Undo the changes made by the
 *	preceding @freeze_late().
 *
 * @poweroff: Hibernation-specific, executed after saving a hibernation image.
 *	Analogous to @suspend(), but it need not save the device's settings in
 *	memory.
 *	Subsystem-level @poweroff() is executed for all devices after invoking
 *	subsystem-level @prepare() for all of them.
 *
 * @poweroff_late: Continue operations started by @poweroff().  Analogous to
 *	@suspend_late(), but it need not save the device's settings in memory.
 *
 * @restore: Hibernation-specific, executed after restoring the contents of main
 *	memory from a hibernation image, analogous to @resume().
 *
 * @restore_early: Prepare to execute @restore(), analogous to @resume_early().
 *
 * @suspend_noirq: Complete the actions started by @suspend().  Carry out any
 *	additional operations required for suspending the device that might be
 *	racing with its driver's interrupt handler, which is guaranteed not to
 *	run while @suspend_noirq() is being executed.
 *	It generally is expected that the device will be in a low-power state
 *	(appropriate for the target system sleep state) after subsystem-level
 *	@suspend_noirq() has returned successfully.  If the device can generate
 *	system wakeup signals and is enabled to wake up the system, it should be
 *	configured to do so at that time.  However, depending on the platform
 *	and device's subsystem, @suspend() or @suspend_late() may be allowed to
 *	put the device into the low-power state and configure it to generate
 *	wakeup signals, in which case it generally is not necessary to define
 *	@suspend_noirq().
 *
 * @resume_noirq: Prepare for the execution of @resume() by carrying out any
 *	operations required for resuming the device that might be racing with
 *	its driver's interrupt handler, which is guaranteed not to run while
 *	@resume_noirq() is being executed.
 *
 * @freeze_noirq: Complete the actions started by @freeze().  Carry out any
 *	additional operations required for freezing the device that might be
 *	racing with its driver's interrupt handler, which is guaranteed not to
 *	run while @freeze_noirq() is being executed.
 *	The power state of the device should not be changed by either @freeze(),
 *	or @freeze_late(), or @freeze_noirq() and it should not be configured to
 *	signal system wakeup by any of these callbacks.
 *
 * @thaw_noirq: Prepare for the execution of @thaw() by carrying out any
 *	operations required for thawing the device that might be racing with its
 *	driver's interrupt handler, which is guaranteed not to run while
 *	@thaw_noirq() is being executed.
 *
 * @poweroff_noirq: Complete the actions started by @poweroff().  Analogous to
 *	@suspend_noirq(), but it need not save the device's settings in memory.
 *
 * @restore_noirq: Prepare for the execution of @restore() by carrying out any
 *	operations required for thawing the device that might be racing with its
 *	driver's interrupt handler, which is guaranteed not to run while
 *	@restore_noirq() is being executed.  Analogous to @resume_noirq().
 *
 * @runtime_suspend: Prepare the device for a condition in which it won't be
 *	able to communicate with the CPU(s) and RAM due to power management.
 *	This need not mean that the device should be put into a low-power state.
 *	For example, if the device is behind a link which is about to be turned
 *	off, the device may remain at full power.  If the device does go to low
 *	power and is capable of generating runtime wakeup events, remote wakeup
 *	(i.e., a hardware mechanism allowing the device to request a change of
 *	its power state via an interrupt) should be enabled for it.
 *
 * @runtime_resume: Put the device into the fully active state in response to a
 *	wakeup event generated by hardware or at the request of software.  If
 *	necessary, put the device into the full-power state and restore its
 *	registers, so that it is fully operational.
 *
 * @runtime_idle: Device appears to be inactive and it might be put into a
 *	low-power state if all of the necessary conditions are satisfied.
 *	Check these conditions, and return 0 if it's appropriate to let the PM
 *	core queue a suspend request for the device.
 *
 * Several device power state transitions are externally visible, affecting
 * the state of pending I/O queues and (for drivers that touch hardware)
 * interrupts, wakeups, DMA, and other hardware state.  There may also be
 * internal transitions to various low-power modes which are transparent
 * to the rest of the driver stack (such as a driver that's ON gating off
 * clocks which are not in active use).
 *
 * The externally visible transitions are handled with the help of callbacks
 * included in this structure in such a way that, typically, two levels of
 * callbacks are involved.  First, the PM core executes callbacks provided by PM
 * domains, device types, classes and bus types.  They are the subsystem-level
 * callbacks expected to execute callbacks provided by device drivers, although
 * they may choose not to do that.  If the driver callbacks are executed, they
 * have to collaborate with the subsystem-level callbacks to achieve the goals
 * appropriate for the given system transition, given transition phase and the
 * subsystem the device belongs to.
 *
 * All of the above callbacks, except for @complete(), return error codes.
 * However, the error codes returned by @resume(), @thaw(), @restore(),
 * @resume_noirq(), @thaw_noirq(), and @restore_noirq(), do not cause the PM
 * core to abort the resume transition during which they are returned.  The
 * error codes returned in those cases are only printed to the system logs for
 * debugging purposes.  Still, it is recommended that drivers only return error
 * codes from their resume methods in case of an unrecoverable failure (i.e.
 * when the device being handled refuses to resume and becomes unusable) to
 * allow the PM core to be modified in the future, so that it can avoid
 * attempting to handle devices that failed to resume and their children.
 *
 * It is allowed to unregister devices while the above callbacks are being
 * executed.  However, a callback routine MUST NOT try to unregister the device
 * it was called for, although it may unregister children of that device (for
 * example, if it detects that a child was unplugged while the system was
 * asleep).
 *
 * There also are callbacks related to runtime power management of devices.
 * Again, as a rule these callbacks are executed by the PM core for subsystems
 * (PM domains, device types, classes and bus types) and the subsystem-level
 * callbacks are expected to invoke the driver callbacks.  Moreover, the exact
 * actions to be performed by a device driver's callbacks generally depend on
 * the platform and subsystem the device belongs to.
 *
 * Refer to Documentation/power/runtime_pm.rst for more information about the
 * role of the @runtime_suspend(), @runtime_resume() and @runtime_idle()
 * callbacks in device runtime power management.
 */
struct dev_pm_ops {
	int (*prepare)(struct device *dev);
	void (*complete)(struct device *dev);
	int (*suspend)(struct device *dev);
	int (*resume)(struct device *dev);
	int (*freeze)(struct device *dev);
	int (*thaw)(struct device *dev);
	int (*poweroff)(struct device *dev);
	int (*restore)(struct device *dev);
	int (*suspend_late)(struct device *dev);
	int (*resume_early)(struct device *dev);
	int (*freeze_late)(struct device *dev);
	int (*thaw_early)(struct device *dev);
	int (*poweroff_late)(struct device *dev);
	int (*restore_early)(struct device *dev);
	int (*suspend_noirq)(struct device *dev);
	int (*resume_noirq)(struct device *dev);
	int (*freeze_noirq)(struct device *dev);
	int (*thaw_noirq)(struct device *dev);
	int (*poweroff_noirq)(struct device *dev);
	int (*restore_noirq)(struct device *dev);
	int (*runtime_suspend)(struct device *dev);
	int (*runtime_resume)(struct device *dev);
	int (*runtime_idle)(struct device *dev);

	ANDROID_KABI_RESERVE(1);
};

从Linux PM Core的角度来说,这些callbacks并不复杂,因为PM Core要做的就是在特定的电源管理阶段,调用相应的callbacks,例如在suspend/resume的过程中,PM Core会依次调用“prepare—>suspend—>suspend_late—>suspend_noirq-------wakeup--------->resume_noirq—>resume_early—>resume–>complete”。(*_noirq阶段需要在IRQ被关闭的情况下执行(除非他们被IRQ_WAKEUP标记)。)

但由于这些callbacks需要由具体的设备Driver实现,这就要求驱动工程师在设计每个Driver时,清晰的知道这些callbacks的使用场景、是否需要实现、怎么实现,这才是struct dev_pm_ops的复杂之处。

Linux kernel对struct dev_pm_ops的注释已经非常详细了,但要弄清楚每个callback的使用场景、背后的思考,并不是一件容易的事情。因此蜗蜗不准备在本文对它们进行过多的解释,而打算结合具体的电源管理行为,基于具体的场景,再进行解释。

3. device PM callbacks在设备模型中的体现

我们在介绍“Linux设备模型”时,曾多次提及电源管理相关的内容,那时蜗蜗采取忽略的方式,暂不说明。现在是时候回过头再去看看了。

Linux设备模型中的很多数据结构,都会包含struct dev_pm_ops变量,具体如下:

   1: struct bus_type {
   2:         ...
   3:         const struct dev_pm_ops *pm;
   4:         ...
   5: };
   6:  
   7: struct device_driver {
   8:         ...
   9:         const struct dev_pm_ops *pm;
  10:         ...
  11: };
  12:  
  13: struct class {
  14:         ...
  15:         const struct dev_pm_ops *pm;
  16:         ...
  17: };
  18:  
  19: struct device_type {
  20:         ...
  21:         const struct dev_pm_ops *pm;
  22: };
  23:  
  24: struct device {
  25:         ...
  26:         struct dev_pm_info      power;
  27:         struct dev_pm_domain    *pm_domain;
  28:         ...
  29: };

bus_type、device_driver、class、device_type等结构中的pm指针,比较容易理解,和旧的suspend/resume callbacks类似。我们重点关注一下device结构中的power和pm_domain变量。

◆power变量

power是一个struct dev_pm_info类型的变量,也在“include/linux/pm.h”中定义。从蜗蜗一直工作于的Linux-2.6.23内核,到写这篇文章所用的Linux-3.10.29内核,这个数据结构可是一路发展壮大,从那时的只有4个字段,到现在有40多个字段,简直是想起来什么就放什么啊!

power变量主要保存PM相关的状态,如当前的power_state、是否可以被唤醒、是否已经prepare完成、是否已经suspend完成等等。由于涉及的内容非常多,我们在具体使用的时候,顺便说明。

◆pm_domain指针

在当前的内核中,struct dev_pm_domain结构只包含了一个struct dev_pm_ops ops。蜗蜗猜测这是从可扩展性方面考虑的,后续随着内核的进化,可能会在该结构中添加其他内容。

所谓的PM Domain(电源域),是针对“device”来说的。bus_type、device_driver、class、device_type等结构,本质上代表的是设备驱动,电源管理的操作,由设备驱动负责,是理所应当的。但在内核中,由于各种原因,是允许没有driver的device存在的,那么怎么处理这些设备的电源管理呢?就是通过设备的电源域实现的。

4. device PM callbacks的操作函数

内核在定义device PM callbacks数据结构的同时,为了方便使用该数据结构,也定义了大量的操作API,这些API分为两类。

◆通用的辅助性质的API,直接调用指定设备所绑定的driver的、pm指针的、相应的callback,如

extern int pm_generic_prepare(struct device *dev);
extern int pm_generic_suspend_late(struct device *dev);
extern int pm_generic_suspend_noirq(struct device *dev);
extern int pm_generic_suspend(struct device *dev);
extern int pm_generic_resume_early(struct device *dev);
extern int pm_generic_resume_noirq(struct device *dev);
extern int pm_generic_resume(struct device *dev);
extern int pm_generic_freeze_noirq(struct device *dev);
extern int pm_generic_freeze_late(struct device *dev);
extern int pm_generic_freeze(struct device *dev);
extern int pm_generic_thaw_noirq(struct device *dev);
extern int pm_generic_thaw_early(struct device *dev);
extern int pm_generic_thaw(struct device *dev);
extern int pm_generic_restore_noirq(struct device *dev);
extern int pm_generic_restore_early(struct device *dev);
extern int pm_generic_restore(struct device *dev);
extern int pm_generic_poweroff_noirq(struct device *dev);
extern int pm_generic_poweroff_late(struct device *dev);
extern int pm_generic_poweroff(struct device *dev);
extern void pm_generic_complete(struct device *dev);

以pm_generic_prepare为例,就是查看dev->driver->pm->prepare接口是否存在,如果存在,直接调用并返回结果。

◆和整体电源管理行为相关的API,目的是将各个独立的电源管理行为组合起来,组成一个较为简单的功能,如下

#ifdef CONFIG_PM_SLEEP
extern void device_pm_lock(void);
extern void dpm_resume_start(pm_message_t state);
extern void dpm_resume_end(pm_message_t state);
extern void dpm_resume_noirq(pm_message_t state);
extern void dpm_resume_early(pm_message_t state);
extern void dpm_resume(pm_message_t state);
extern void dpm_complete(pm_message_t state);

extern void device_pm_unlock(void);
extern int dpm_suspend_end(pm_message_t state);
extern int dpm_suspend_start(pm_message_t state);
extern int dpm_suspend_noirq(pm_message_t state);
extern int dpm_suspend_late(pm_message_t state);
extern int dpm_suspend(pm_message_t state);
extern int dpm_prepare(pm_message_t state);

这些API的功能和动作解析如下。

dpm_prepare,执行所有设备的“->prepare() callback(s)”,内部动作为:

1)遍历dpm_list,依次取出挂在该list中的device指针。
   【注1:设备模型在添加设备(device_add)时,会调用device_pm_add接口,将该设备添加到全局链表dpm_list中,以方便后续的遍历操作。】

2)调用内部接口device_prepare,执行实际的prepare动作。该接口会返回执行的结果。

3)如果执行失败,打印错误信息。

4)如果执行成功,将dev->power.is_prepared(就是上面我们提到的struct dev_pm_info类型的变量)设为TRUE,表示设备已经prepared了。同时,将该设备添加到dpm_prepared_list中(该链表保存了所有已经处于prepared状态的设备)。

 

内部接口device_prepare的执行动作为:

1)根据dev->power.syscore,断该设备是否为syscore设备。如果是,则直接返回(因为syscore设备会单独处理)。

2)在prepare时期,调用pm_runtime_get_noresume接口,关闭Runtime suspend功能。以避免由Runtime suspend造成的不能正常唤醒的Issue。该功能会在complete时被重新开启。
   【注2:pm_runtime_get_noresume的实现很简单,就是增加该设备power变量的引用计数(dev->power.usage_count),Runtime PM会根据该计数是否大于零,判断是否开启Runtime PM功能。】

3)调用device_may_wakeup接口,根据当前设备是否有wakeup source(dev->power.wakeup)以及是否允许wakeup(dev->power.can_wakeup),判定该设备是否是一个wakeup path(记录在dev->power.wakeup_path中)。
    【注3:设备的wake up功能,是指系统在低功耗状态下(如suspend、hibernate),某些设备具备唤醒系统的功能。这是电源管理过程的一部分。】

4)根据优先顺序,获得用于prepare的callback函数。由于设备模型有bus、driver、device等多个层级,而prepare接口可能由任意一个层级实现。这里的优先顺序是指,只要优先级高的层级注册了prepare,就会优先使用它,而不会使用优先级低的prepare。优先顺序为:dev->pm_domain->ops、dev->type->pm、dev->class->pm、dev->bus->pm、dev->driver->pm(这个优先顺序同样适用于其它callbacks)。

5)如果得到有限的prepare函数,调用并返回结果。

dpm_suspend,执行所有设备的“->suspend() callback(s)”,其内部动作和dpm_prepare类似:

1)遍历dpm_list,依次取出挂在该list中的device指针。

2)调用内部接口device_suspend,执行实际的prepare动作。该接口会返回执行的结果。

3)如果suspend失败,将该设备的信息记录在一个struct suspend_stats类型的数组中,并打印错误错误信息。

4)最后将设备从其它链表(如dpm_prepared_list),转移到dpm_suspended_list链表中。

内部接口device_suspend的动作和device_prepare类似,这里不再描述了。

dpm_suspend_start,依次执行dpm_prepare和dpm_suspend两个动作。

dpm_suspend_end,依次执行所有设备的“->suspend_late() callback(s)”以及所有设备的“->suspend_noirq() callback(s)”。动作和上面描述的类似,这里不再说明了。

dpm_resume、dpm_complete、dpm_resume_start、dpm_resume_end,是电源管理过程的唤醒动作,和dpm_suspend_xxx系列的接口类似。不再说明了。

Linux电源管理(4)_Power Management Interface

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值