hrtimer是建立在per-CPU时钟事件设备上的,对于一个SMP系统,如果只有全局的时钟事件设备,hrtimer无法工作。因为如果没有per-CPU时钟事件设备,时钟中断发生时系统必须产生必要的IPI中断来通知其他CPU完成相应的工作,而过多的IPI中断会带来很大的系统开销,这样会令使用hrtimer的代价太大,不如不用。为了支持hrtimer,内核需要配置CONFIG_HIGH_RES_TIMERS=y。
hrtimer有两种工作模式:低精度模式(low-resolution mode)与高精度模式(high-resolution mode)。虽然hrtimer子系统是为高精度的timer准备的,但是系统可能在运行过程中动态切换到不同精度的时钟源设备,因此,hrtimer必须能够在低精度模式与高精度模式下自由切换。由于低精度模式是建立在高精度模式之上的,因此即便系统只支持低精度模式,部分支持高精度模式的代码仍然会编译到内核当中。
在介绍具体的实现之前,先介绍一下相关的数据结构
Ø struct hrtimer
/**
* struct hrtimer - the basic hrtimer structure
* @node: timerqueue node, which also manages node.
* @_softexpires: the absolute earliest expiry time of the hrtimer.
* The time which was given as expiry time when the timer
* was armed.
* @function: timer expiry callback function
* @base: pointer to the timer base (per cpu and per clock)
* @start_site: timer statistics field to store the site where the timer
* was started
* The hrtimer structure must be initialized by hrtimer_init()
*/
struct hrtimer {
struct timerqueue_node node;
ktime_t _softexpires;
enum hrtimer_restart (*function)(struct hrtimer *);
struct hrtimer_clock_base *base;
unsigned long state;
#ifdef CONFIG_TIMER_STATS
int start_pid;
void *start_site;
char start_comm[16];
#endif
};
hrtimer_clock_base
/**
* struct hrtimer_clock_base - the timer base for a specific clock
* @cpu_base: per cpu clock base
* @index: clock type index for per_cpu support when moving a
* timer to a base on another cpu.
* @clockid: clock id for per_cpu support
* @active: red black tree root node for the active timers
* @resolution: the resolution of the clock, in nanoseconds
* @get_time: function to retrieve the current time of the clock
* @softirq_time: the time when running the hrtimer queue in the softirq
* @offset: offset of this clock to the monotonic base
*/
struct hrtimer_clock_base {
struct hrtimer_cpu_base *cpu_base;
int index;
clockid_t clockid;
struct timerqueue_head active;
ktime_t resolution;
ktime_t (*get_time)(void);
ktime_t softirq_time;
ktime_t offset;
};
Ø struct hrtimer_cpu_base
/*
* struct hrtimer_cpu_base - the per cpu clock bases
* @lock: lock protecting the base and associated clock bases
* and timers
* @active_bases: Bitfield to mark bases with active timers
* @expires_next: absolute time of the next event which was scheduled
* via clock_set_next_event()
* @hres_active: State of high resolution mode
* @hang_detected: The last hrtimer interrupt detected a hang
* @nr_events: Total number of hrtimer interrupt events
* @nr_retries: Total number of hrtimer interrupt retries
* @nr_hangs: Total number of hrtimer interrupt hangs
* @max_hang_time: Maximum time spent in hrtimer_interrupt
* @clock_base: array of clock bases for this cpu
*/
struct hrtimer_cpu_base {
raw_spinlock_t lock;
unsigned long active_bases;
#ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
int hang_detected;
unsigned long nr_events;
unsigned long nr_retries;
unsigned long nr_hangs;
ktime_t max_hang_time;
#endif
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
};
使用DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases)定义hrtimer_bases,可以管理挂在每一个CPU上的所有hrtimer。每个CPU上的timer list不再使用timer wheel中多级链表的实现方式,而是采用了红黑树(Red-Black Tree)来进行管理。hrtimer_bases的定义如下所示:
DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
{
.clock_base =
{
{
.index = CLOCK_REALTIME,
.get_time = &ktime_get_real,
.resolution = KTIME_LOW_RES,
},
{
.index = CLOCK_MONOTONIC,
.get_time = &ktime_get,
.resolution = KTIME_LOW_RES,
},
}
};
图3为 hrtimer 如何通过 hrtimer_bases 来管理 hrtimer
每个hrtimer_bases都包含两个clock_base,一个是CLOCK_REALTIME类型的,另一个是CLOCK_MONOTONIC类型的。hrtimer可以选择其中之一来设置timer的expire time, 可以是实际的时间 , 也可以是相对系统运行的时间。在hrtimer_run_queues的处理中,首先要通过hrtimer_bases找到正在执行当前中断的 CPU相关联的clock_base,然后逐个检查每个clock_base上挂的timer是否超时。由于timer 在添加到clock_base上时使用了红黑树,最早超时的timer被放到树的最左侧,因此寻找超时timer的过程非常迅速,找到的所有超时timer会被逐一处理。