rt带宽限制浅析

最新推荐文章于 2024-08-11 03:48:11 发布

温暖的电波

最新推荐文章于 2024-08-11 03:48:11 发布

阅读量2.7k

点赞数 3

分类专栏： linux内核

本文链接：https://blog.csdn.net/wennuanddianbo/article/details/70037415

版权

linux内核专栏收录该内容

29 篇文章 11 订阅

订阅专栏

(基于linux-4.4.42)

一、摘要

由于实时任务的优先级高于普通任务，因而为了防止cpu消耗型的实时任务一直占用cpu引发其他任务"饥饿"的情况发生，内核采用了带宽限制手段来抑制实时任务的运行时间。系统中将各个任务按层级组织成一个个任务组，组内的所有任务视为一个整体挂在一个运行队列上，而带宽限制的单位也是针对一个组来进行的。
那么究竟什么是带宽限制呢？在任务调度中带宽限制就是指一定周期内一个队列上任务可运行的最大时间，内核中使用xxx_bandwidth结构来限制任务的运行时间。针对实时任务这个结构就是：

struct rt_bandwidth {
	/* nests inside the rq lock: */
	raw_spinlock_t		rt_runtime_lock;
	ktime_t			rt_period;		
	u64			rt_runtime;
	struct hrtimer		rt_period_timer;
	unsigned int		rt_period_active;
};

在单cpu环境中，rt_bandwidth限制了cpu上的实时任务在rt_period周期内运行时间不能够超过rt_runtime；而在SMP多cpu环境中rt_bandwidth限制了系统中实时任务在rt_period周期内的cpu占用时间比例不能够超过rt_runtime/rt_period。
举个例子，在4核的SMP环境中，rt_runtime为950000，而rt_period为1000000，系统中所有实时任务的cpu(4个核的占用总和)占用率不能够超过95%。

在实时任务组调度使能的情况下，带宽限制与各个组相关，即一个任务组有一个rt_bandwidth，而这个组的rt_bandwidth限制了本组中实时任务的运行时间。

二、实现原理

2.1 相关数据结构

内核中通过static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)函数来判断实时任务运行时间是否超出带宽限制。函数的核心思想就是判断这个运行队列rt_rq的运行时间是否超过了额定的运行时间。而“运行时间”和“额定时间”都是放在struct rt_rq运行队列这个结构中的，我们来看下这个结构相关的具体实现：

/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
	......	
	int rt_throttled;			/* 调度限制标志，置1表示使能调度限制 */
	u64 rt_time;				/* rt_rq队列中的调度实体已经消耗的cpu时间 */
	u64 rt_runtime;			/* rt_rq在一个周期内的最大额定运行时间 */
	/* Nests inside the rq lock: */
	......
	struct rq *rq;			/* rt_rq所属的就绪队列rq */
	struct task_group *tg;		/* rt_rq所属的任务组 */
};

当一个调度组task_group被创建时(例如在cgroup中创建一个cpu子系统时就会创建一个task_group)时，内核会为这个新的task_group在各个cpu上都创建一个运行队列struct rt_rq，而所有的实时调度实体都是挂在各自的运行队列rt_rq上的，整个结构如下所示：

/* task group related information */
struct task_group {
	struct cgroup_subsys_state css;
	......
#ifdef CONFIG_RT_GROUP_SCHED
	struct sched_rt_entity **rt_se;	/* 一个组在每个cpu上都有一个代表这个层级组的rt_se */
	struct rt_rq **rt_rq;		/* 一个组在每个cpu上都有一个rt_rq里面挂着属于这个组的rt_ses */

	struct rt_bandwidth rt_bandwidth;	/* 带宽结构 */
#endif
	......
	struct list_head list;

	struct task_group *parent;		
	......

	struct cfs_bandwidth cfs_bandwidth;
};

2.2 组调度层级结构

在引入了组调度之后，任务的调度的对象都是调度实体，而各个调度实体又有自己的父层级实体，父层级也可以再有自己的父层级...这样层层组织起来成为一个树状结构。而各个cpu上的同一个组层级se又结合在一起组成了上面提到的层级任务组结构：struct task_group。下面是一个task_group的层级结构示意图：

                                                图1 具有3个层的的任务组组织示意图
    如上图所示，每个task_group中都有一个rt_se**和rt_rq**，而子层级的调度实体rt_se[cpu]又都挂到父层级的的rt_rq[cpu]中。
    我们可以将task_group的两个rt_se**和rt_rq**看成两个指针数组sched_rt_entity * rt_se[cpu]和struct rt_rq *rt_rq[cpu]，数组在每个cpu(cpu_possible_mask位图中的所有cpu)上都有一个元素；
    子层级的调度实体都挂到本层级rt_rq队列中，而这些调度实体都会被看成一个整体，即本层级的rt_se；一个task_group中两者之间的关系为

tg.rt_se[cpu]->my_q == tg.rt_rq[cpu]; /* tg表示某一个task_group；cpu为某个具体的cpu */
tg.rt_rq[cpu]队列挂着的所有任务都看成一个整体：tg.rt_se[cpu]。
当tg.rt_se[cpu]->my_q == NULL 时表示这个调度实体不再是一个组，而是一个实实在在的任务。

    有了上面的概念，我们再看一个运行队列的运行时间rt_rq->rt_time和额定时间rt_rq->rt_runtime:
    rt_rq->rt_time就表示属于这个队列(这个组中)的调度实体(子层级、子孙层级...一直到叶子节点层层累加)的运行时间；rt_rq->rt_runtime就表示属于此队列的实体的额定时间。
    调度器在检查一个调度实体运行时间是否超额时，实际检查的是它所在的运行队列的rt_rq[cpu]->rt_time是否超过rt_rq[cpu]->rt_runtime；在SMP系统中，如果内核使能了RT_RUNTIME_SHARE特性，如果运行队列的运行时间已经超额，则会尝试去其他cpu上的rt_rq队列中“借”时间以扩张rt_rq[cpu]->rt_runtime。
    不论如何，如果运行队列的运行时间rt_rq->rt_time超额且在使能了带宽限制的情况下，会使能rt_rq的调度限制标志：rt_rq->rt_throttled = 1，调度受限标志一旦使能，后续就rt_rq就无法得到调度。

三、带宽限制的来龙去脉

3.1 带宽和运行时间初始化

3.1.1 root组带宽初始化

最顶层的root组是所有task_group的祖先，内核中用全局变量struct task_group root_task_group定义。它在内核启动初期由sche_init()进行初始化的：

start_kernel()
 |sched_init()

    我们来看一下sched_init()做了哪些工作：
    1) 带宽初始化
    针对实时任务，这里会对两个带宽结构进行初始化：全局struct rt_bandwidth def_rt_bandwidth和root_task_group.rt_bandwidth，并且将他们的带宽周期和运行时间额度都初始化为相同的值：

def_rt_bandwidth.rt_period = sysctl_sched_rt_period；
def_rt_bandwidth.rt_runtime = sysctl_sched_rt_runtime；
root_task_group.rt_bandwidth.rt_period = sysctl_sched_rt_period；
root_task_group.rt_bandwidth.rt_runtime = sysctl_sched_rt_runtime；

    其中sysctl_sched_rt_period和sysctl_sched_rt_runtime值都是内核定义的全局变量，可以通过/proc/sys/kernel/sched_rt_period_us 和 /proc/sys/kernel/sched_rt_runtime_us 分别查看；默认情况下这两个变量的值分别为1000000 和 950000，单位都是是微秒。
    这样初始化以后，默认情况下def_rt_bandwidth和root_task_group.rt_bandwidth的周期为1秒；而它们的额定时间为0.95秒。

    2)调度实体和运行队列初始化
    根组root_task_group的 struct sched_rt_entity **rt_se 和 struct rt_rq **rt_rq 成员实际上是两个指针数组，每个数组成员与一个cpu相对应；root_task_group中的这些调度实体和运行队列最终是需要和每个cpu上就绪队列rq中的成员关联起来的,效果如下伪代码所示：

rq[cpu]->rt.tg = root_task_group;
root_task_group.rt_rq[cpu] = rq[cpu]->rt;
root_task_group->rt_se[cpu] =	NULL;
rq[cpu]->rt.rt_runtime = def_rt_bandwidth.rt_runtime;	/* 这里将top rt_rq的时间额度设置为950000微秒 */

其中，rq->rt是一个就绪队列上最顶层的运行队列rt_rq，它的运行时间额度rq->rt.rt_runtime初始化为def_rt_bandwidth.rt_runtime，默认为0.95秒。

3.1.2 新建一个task_group

一个task_group分组的创建是通过sched_create_group()函数来实现的。内核中在执行 setsid()系统调用函数或者创建一个cpu子系统控制组的时候都会调用sched_create_group()来创建一个task组。
我们跟随sched_create_group()函数的脚步看看一个task_group如何创建，rt_badwidth如何初始化，一个新的rt_rq的额定时间如何设置。

/* allocate runqueue etc for a new task group */
struct task_group *sched_create_group(struct task_group *parent)
{
	struct task_group *tg;

	tg = kzalloc(sizeof(*tg), GFP_KERNEL);		/* 分配task_group结构 */
	if (!tg)
		return ERR_PTR(-ENOMEM);

	if (!alloc_fair_sched_group(tg, parent))		/* fair调度的情况 */
		goto err;

	if (!alloc_rt_sched_group(tg, parent))		/* 真正的初始化发生在这里 */
		goto err;

	return tg;

err:
	sched_free_group(tg);
	return ERR_PTR(-ENOMEM);
}

函数sched_create_group有两个主要部分，第一步分配task_group结构，第二步就是分配和初始化task_group中的调度实体和运行队列。第二步中，针对普通任务调用alloc_fair_sched_group()函数来完成，针对实时任务调用alloc_rt_sched_group()来完成；我们重点分析实时任务的情况：

int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
{
	struct rt_rq *rt_rq;
	struct sched_rt_entity *rt_se;
	int i;

	tg->rt_rq = kzalloc(sizeof(rt_rq) * nr_cpu_ids, GFP_KERNEL);		/* 准备nr_cpu_ids个rt_rq空间 */
	if (!tg->rt_rq)
		goto err;
	tg->rt_se = kzalloc(sizeof(rt_se) * nr_cpu_ids, GFP_KERNEL);		/* 准备nr_cpu_ids个rt_se空间 */
	if (!tg->rt_se)
		goto err;

	init_rt_bandwidth(&tg->rt_bandwidth,				/* 将tg->rt_period初始化为def_rt_bandwidth.rt_period ；
			ktime_to_ns(def_rt_bandwidth.rt_period), 0);	 * tg->rt_runtime初始化为 0 */

	for_each_possible_cpu(i) {
		rt_rq = kzalloc_node(sizeof(struct rt_rq),		/* */
				     GFP_KERNEL, cpu_to_node(i));
		if (!rt_rq)
			goto err;

		rt_se = kzalloc_node(sizeof(struct sched_rt_entity),
				     GFP_KERNEL, cpu_to_node(i));
		if (!rt_se)
			goto err_free_rq;

		init_rt_rq(rt_rq);						/* 初始化rt_rq成员 */
		rt_rq->rt_runtime = tg->rt_bandwidth.rt_runtime;	/* 设置rt_rq->rt_runtime = 0 */
		init_tg_rt_entry(tg, rt_rq, rt_se, i, parent->rt_se[i]); /* 将新分配的rt_rq、rt_se与tg以及parent组进行关联 */
	}

	return 1;

err_free_rq:
	kfree(rt_rq);
err:
	return 0;
}

函数alloc_rt_sched_group()的主要任务就是分配tg->rt_rq和tg->rt_re这两个指针数组在各个cpu上对应的调度实体rt_se和运行队列rt_rq结构体实例，并初始化这些结构体。
可以看到一个新创建的tg->rt_bandwidth.rt_runtime以及各个cpu上的rt_rq->rt_runtime都初始化为0。这在创建一个cpu子系统控制组的的情况下更加容易验证：

# mount -t cgroup cpu -o cpu /cgroup/cpu		/* 创建cpu子系统 */
# mkdir /cgroup/cpu/child0 				/* 创建一个child0子组 */

这个时候我们可以通过 "cat /cgroup/cpu/child0/cpu.rt_runtime_us"为0，即这个新创建的子组中tg->rt_bandwidth.rt_runtime为0，直到我们往通过"echo xxx > /cgroup/cpu/child0/cpu.rt_runtime_us"写值是才会改变，此时tg->rt_bandwidth.rt_runtime和组内各个cpu上的运行队列的额定时间tg-rt_rq[cpu]->rt_runtime都设置为我们写入的值。

3.2 带宽限制的检查流程

内核在多个关键点都会更新自己cpu上当前任务的运行时间信息，针对实时任务调用的是update_curr_rt()来进行更新。更新当前任务运行时间是任务调度一个非常重要的行为：任务运行多久、何时选择下一个任务、选择哪个任务运行以及cpu负载均衡等等都需要是基于任务信息不断更新来进行的。
对于实时调度，如果使能了带宽限制(即sysctl_sched_rt_runtime >= 0)，还要更新当前任务的运行队列(即叶子节点)到其祖先(root 节点)所在的运行队列的运行时间：

rt_rq->rt_time += delta_exec；

更新完一个运行队列后，还要通过

sched_rt_runtime_exceeded(rt_rq)；

检查运行队列的运行时间是否超过额定运行时间，如果超过额定运行时间还要通过resched_curr(rq)将就绪队列rq上的当前任务设置为TIF_NEED_RESCHED标志以被调度出去。
下面我们就来看看sched_rt_runtime_exceeded(rt_rq)是如何检查一个队列的运行时间超额与否的。

static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
{
	u64 runtime = sched_rt_runtime(rt_rq);      /* runtime为额定时间rt_rq->rt_runtime */

	if (rt_rq->rt_throttled)                      /* 如果受到调度限制直接返回 */
		return rt_rq_throttled(rt_rq);       

	if (runtime >= sched_rt_period(rt_rq))        /* 如果rt_rq的额定时间大于周期说明不会发生超时，返回0表示不超额 */
		return 0;

	balance_runtime(rt_rq);                    /* 对rt_rq的额定时间进行"balance" */
	runtime = sched_rt_runtime(rt_rq);          /* balance后rt_rq的额定时间可能会改变，所以需要重新获取rt_rq->rt_runtime */
	if (runtime == RUNTIME_INF)                 /* 额定时间"无限"，也返回0表示没有超额 */
		return 0;

	if (rt_rq->rt_time > runtime) {            /* 如果 rt_rq上的运行时间大于了额定时间 */
		struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);

		/*
		 * Don't actually throttle groups that have no runtime assigned
		 * but accrue some time due to boosting.
		 */
		if (likely(rt_b->rt_runtime)) {    /* 一般情况下带宽额定时间rt_b->rt_runtime都不为0 */
			rt_rq->rt_throttled = 1;    /* 在rt_rq的运行时间超过额定时间的情况下设置调度限制rt_throttled */
			printk_deferred_once("sched: RT throttling activated\n");
		} else {
			/*
			 * In case we did anyway, make it go away,
			 * replenishment is a joke, since it will replenish us
			 * with exactly 0 ns.
			 */
			rt_rq->rt_time = 0;
		}

		if (rt_rq_throttled(rt_rq)) {     /* 如果rt_rq运行时间超额且设置了调度限制标志 */
			sched_rt_rq_dequeue(rt_rq);  /* 先将rt_rq对应的实体从队列删除，再放到队尾；
							  * 注意：函数想将rt_rq这个组的rt_se及其所有的祖先rt_se出队，
								   然后再从祖先rt_se开始再依次放到队尾；
								   任何一个rt_rq的调度受限时，对应的rt_se在__enqueue_rt_entity(rt_se)
								   是不能入队的，所以这里的rt_rq对应组的调度实体不会入队的；
								   入不了队也就意味着无法得到调度。 */
			return 1;
		}
	}

	return 0;
}

    我们来缕一缕思路：
    1) sched_rt_runtime_exceeded（rt_rq）用于判断一个rt_rq上的运行时间是否超时，如果超时则返回1否则返回0；
    2) 这个函数首先检查的是rt_rq有没有调度限制，额定时间是否大于带宽周期；第一种一般情况返回1(除非发生了优先级翻转)，第二种情况返回0；
    3) 如果即没有调度限制，额定时间也在带宽周期范围内，则首先要对对rt_rq->rt_runtime进行"balance"，即在多核情况下，其他cpu上的rt_rq还有剩余时间，可以从其他cpu的rt_rq中"借"时间；
    4) 经过"balance"后，rt_rq的额定时间可能会增加，最多增加到带宽周期，此时再去检查运行时间(rt_rq->rt_time)是否超过额定时间(rt_rq->rt_runtime)；

5)如果运行时间超额，且带宽额定时间不为0的情况下将设置调度限制标志：rt_rq->rt_throttled = 1，并将rt_rq中的所有上层实体放到队尾，rt_rq对应的本层实体则出队；

    一旦调度被限制(rt_rq->rt_throttled不为0且没有发生优先级翻转)，会有下列影响：
   1) sched_rt_runtime_exceeded()检查时返回1，会为当前任务rq->curr设置TIF_NEED_RESCHED标志，等到下次中断返回时会将它调度出去；
     2) 在调度受限的情况下，任务的"加入运行队列的"入口都会拒绝将一个调度实体入队，如enqueue_top_rt_rq(rt_rq)和__enqueue_rt_entity(rt_se, head)函数都会跳过调度受限的队列。

3.3 调度限制的解除

上面已经了解到，如果一个队列调度限制使能的情况下，将无法得到调度运行的机会；但是任务不可能一直处于调度限制，因为那样的话任务就永远得不到执行了。这个时候就需要一种检查机制，一旦调度限制已经让任务得到了应有的“惩罚”，就需要解除这个限制，让它重获自由。
内核中在struct task_group结构rt_bandwidth的高精度时钟rt_period_timer来实现此功能。高精度时钟rt_period_timer在带宽初始化函数init_rt_bandwidth()中进行初始化：

void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime)
{
	rt_b->rt_period = ns_to_ktime(period);
	rt_b->rt_runtime = runtime;

	raw_spin_lock_init(&rt_b->rt_runtime_lock);

	hrtimer_init(&rt_b->rt_period_timer,				/* 初始化带宽的高精度时钟 */
			CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	rt_b->rt_period_timer.function = sched_rt_period_timer;	/* 设置时钟到期处理函数：sched_rt_period_timer */
}

带宽初始化函数init_rt_bandwidth()在创建一个task_group时调用，这样rt_bandwidth的高精度定时器也会在这个时候初始化；rt_bandwidth的高精度时钟初始化后通过hrtimer_start_expires()激活运转起来。每当调用__enqueue_rt_entity()函数将一个rt_se调度实体入队时，都会检查rt_se所在组的rt_bandwidth上的高精度时钟是否激活，如果没有激活则将其激活。

__enqueue_rt_entity()
 |inc_rt_tasks(rt_se, rt_rq)
  |inc_rt_group(rt_se, rt_rq)
   |start_rt_bandwidth(&rt_rq->tg->rt_bandwidth)

start_rt_bandwidth调用函数来完成rt_bandwidth高精度时钟的激活，如下所示：

static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
{
	if (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF)
		return;

	raw_spin_lock(&rt_b->rt_runtime_lock);
	if (!rt_b->rt_period_active) {
		rt_b->rt_period_active = 1;
		hrtimer_forward_now(&rt_b->rt_period_timer, rt_b->rt_period);		/* 将定时器到期时间设置为一个带宽周期 */
		hrtimer_start_expires(&rt_b->rt_period_timer, HRTIMER_MODE_ABS_PINNED); /* 激活定时器 */
	}
	raw_spin_unlock(&rt_b->rt_runtime_lock);
}

铺垫了这么多，终于等到了定时器的激活；激活后定时器开始飞速运转，直到我们设置的定时器到期；而定时器到期意味着什么呢？意味着时钟到期处理函数rt_b->rt_period_timer.function的调用执行，而这个函数在带宽初始化时设置为sched_rt_period_timer()，所以时钟到期后实际回调的是sched_rt_period_timer()。

static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
{
	struct rt_bandwidth *rt_b =
		container_of(timer, struct rt_bandwidth, rt_period_timer);
	int idle = 0;
	int overrun;

	raw_spin_lock(&rt_b->rt_runtime_lock);
	for (;;) {
		overrun = hrtimer_forward_now(timer, rt_b->rt_period);		/* 更新时钟，overrun返回时钟超时期数 */
		if (!overrun)
			break;

		raw_spin_unlock(&rt_b->rt_runtime_lock);
		idle = do_sched_rt_period_timer(rt_b, overrun);		/* 主要的处理函数 */
		raw_spin_lock(&rt_b->rt_runtime_lock);
	}
	if (idle)
		rt_b->rt_period_active = 0;						/* idle==1表示此task_group中没有可调度的任务，时钟标志设置为未激活 */
	raw_spin_unlock(&rt_b->rt_runtime_lock);

	return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;
}

再来瞧瞧这个核心的处理函数：

static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
{
	int i, idle = 1, throttled = 0;
	const struct cpumask *span;

	span = sched_rt_period_mask();
#ifdef CONFIG_RT_GROUP_SCHED
	/*
	 * FIXME: isolated CPUs should really leave the root task group,
	 * whether they are isolcpus or were isolated via cpusets, lest
	 * the timer run on a CPU which does not service all runqueues,
	 * potentially leaving other CPUs indefinitely throttled.  If
	 * isolation is really required, the user will turn the throttle
	 * off to kill the perturbations it causes anyway.  Meanwhile,
	 * this maintains functionality for boot and/or troubleshooting.
	 */
	if (rt_b == &root_task_group.rt_bandwidth)
		span = cpu_online_mask;
#endif
	for_each_cpu(i, span) {		/* 分析此带宽所在的task_group组上各个cpu的运行队列rt_rq */
		int enqueue = 0;
		struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
		struct rq *rq = rq_of_rt_rq(rt_rq);

		raw_spin_lock(&rq->lock);
		if (rt_rq->rt_time) {	/* rt_rq运行时间不为0：rt_rq的运行时间只有在rt_bandwidth高精度时钟
						 * 到期后才得以重新统计 */
			u64 runtime;

			raw_spin_lock(&rt_rq->rt_runtime_lock);
			if (rt_rq->rt_throttled)
				balance_runtime(rt_rq);		/* 如果rt_rq调度受限进行"balcance"，以尝试从其他cpu的rt_rq偷时间
									 * 这是第二次出现。
									*/
			runtime = rt_rq->rt_runtime;
			rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);	/* 抹去周期运行时间；
												 * @overrun:超过时钟周期数；@runtime：一个周期内运行队列的额定运行时间；
												 * 没有到一个周期，则将运行时间清0；否则	
												 * 运行时间设置为过期超出的额定时间；
												 */
			if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {		/* 如果剩余的运行时间小于一个周期额定时间 
				rt_rq->rt_throttled = 0;					 * 则清除调度限制标志，并将入队标志设置为1 */
				enqueue = 1;

				/*
				 * When we're idle and a woken (rt) task is
				 * throttled check_preempt_curr() will set
				 * skip_update and the time between the wakeup
				 * and this unthrottle will get accounted as
				 * 'runtime'.
				 */
				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
					rq_clock_skip_update(rq, false);
			}
			if (rt_rq->rt_time || rt_rq->rt_nr_running)
				idle = 0;
			raw_spin_unlock(&rt_rq->rt_runtime_lock);
		} else if (rt_rq->rt_nr_running) {		/* 如果此周期rt_rq没有运行时间，但是rt_rq还有就绪的任务，
			idle = 0;				 * 且rt_rq没有调度限制则入队标志置1 */
			if (!rt_rq_throttled(rt_rq))
				enqueue = 1;
		}
		if (rt_rq->rt_throttled)
			throttled = 1;

		if (enqueue)
			sched_rt_rq_enqueue(rt_rq);	/* 在3.2中可以看到rt_rq带宽超时后sched_rt_rq_dequeue()出队后无法再入队，直到这里解除了调度限制 */
		raw_spin_unlock(&rq->lock);
	}

	if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
		return 1;

	return idle;			/* idle返回0表示有cpu上无可运行调度实体 */
}

}
到此我们匆匆浏览了一个task_group中高精度时钟的运行流程；从上面do_sched_rt_period_timer(rt_b, overrun)函数也可以看到队列的带宽限制的解除条件：在时钟到期后重新计算rt_rq的运行时间(也就是剩余的运行时间)，如果更新后的运行时间小于一个周期的额定时间，则会解除rt_rq的调度限制rt_rq->rt_throttled = 0。

四、总结

    上面简要分析了内核中如何通过带宽限制来防止实时任务无限制的占用cpu资源的实现方式。
    1 带宽限制的对象是一个组，组内的任务都挂到同一个运行队列rt_rq上，最终带宽限制的实施对象就是rt_rq；
    2 内核在多个关键点更新任务的信息update_curr_rt()，而更新当前任务信息后，就会检查组的运行时间是否超过带宽限制；
    3 如果一个rt_rq超过带宽限制，则会标记此rt_rq调度受限，此后rt_rq上的实体将被移出队列，并且带宽限制解除前无法再加入到队列上；
    4 每个任务组都维护着一个高精度时钟用以定期(rt_period)更新rt_rq上的运行时间，并对"被惩罚到位"的rt_rq解除调度限制。

上文中有两处提到了对运行队列rt_rq的额定时间进行"balance"，这个"balcance"是如何工作的呢？敬请期待，我们下期再讲。