linux命令ls -rt1,linux调度器（一）- RT

最新推荐文章于 2022-11-11 20:20:53 发布

海伦凯乐

最新推荐文章于 2022-11-11 20:20:53 发布

阅读量602

点赞数

文章标签： linux命令ls -rt1

我们其实都知道linux最著名的CFS调度器，但是CFS过于复杂。所以我想从RT scheduler来一步一步的揭开调度器的面纱。

其实linux内部有五种类型的scheduler class，可以认为是调度器等级，从stop开始、deadline、real time，fair最后是idle。这些class的优先级从高到低。最低的就是idle。

每一个scheduler class都有一个next指针用来指向比它低一等级的class。

可以看一段pick_next_task的代码来了解一下吧。

for_each_class(class) {

p = class->pick_next_task(rq, prev, cookie);

if (p) {

if (unlikely(p == RETRY_TASK))

goto again;

return p;

}

BUG(); /* the idle class will always have a runnable task */

其实我们接触的比较多的是real time跟fair。Scheduler的作用大体分为两部分：

(1)在CPU上，选择合适的task运行。

(2)为新创建或者唤醒的task选择合适的CPU。

当然第二点对于单核CPU不适用。Scheduler class的函数特别多，从后续的讲解中一步步的来了解每一个函数的用途。

主要从三个函数入手

2713

.pick_next_task =

pick_next_task_rt,

2714

.put_prev_task =

put_prev_task_rt,

2715

2716 #ifdef CONFIG_SMP

2717

.select_task_rq =

select_task_rq_rt,

其中pick_next_task用于选择即将运行的task，而select_task_rq用于为进程选择合适的CPU。

再次首先说明一下，在RT中，没有所谓的时间片(其实fair也没有了)的概念。都是高优先级的进程抢占低优先级的task，直到其主动放弃CPU使用权为止。而且RT task的优先级比正常的fair task要高。所以一般要求RT task运行时间尽可能的短。

RT task的优先级在0～ 99之间(包括0跟99)。

pick_next_task用于选择下一个运行的RT

task，但是情况在SMP下就稍微复杂了一下。因为我们追求的是整个系统的优先级高的RT task能够优先运行，如果当前进程的一个RT task放弃CPU使用权的话，假如当前CPU上的下一个RT task的优先级比其余某一个CPU上第二优先级的RT task要低呢？

static struct task_struct *

pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)

{

struct task_struct *p;

struct rt_rq *rt_rq = &rq->rt;

if (need_pull_rt_task(rq, prev)) {

在migrate里面有两种情况：pull是指当前CPU从别的CPU上拉一些进程过来运行。

Push是指将当前CPU上的一些进程migrate到别的CPU上运行。

* This is OK, because current is on_cpu, which avoids it being

* picked for load-balance and preemption/IRQs are still

* disabled avoiding further scheduler activity on it and we're

* being very careful to re-start the picking loop.

lockdep_unpin_lock(&rq->lock, cookie);

pull_rt_task(rq);

lockdep_repin_lock(&rq->lock, cookie);

* pull_rt_task() can drop (and re-acquire) rq->lock; this

* means a dl or stop task can slip in, in which case we need

* to re-start task selection.

if (unlikely((rq->stop && task_on_rq_queued(rq->stop)) ||

rq->dl.dl_nr_running)

如果出现了更高优先级class的task，那么就需要重新选择。

return RETRY_TASK;

}

其中有一个need_pull_rt_task的判断。

static inline bool need_pull_rt_task(struct

rq *rq, struct task_struct *prev)

{

* Try to pull RT tasks here if we lower this rq's prio and cpu is not

* isolated

return rq->rt.highest_prio.curr > prev->prio &&

!cpu_isolated(cpu_of(rq));

}

代码很简单，意思说就是下一个进程优先级要比prev的优先级低的话，那么就需要尝试从别的runqueue上pull task过来。

接着看pull_rt_task函数。

for_each_cpu(cpu, this_rq->rd->rto_mask) {

root domain是一个全局结构体，用来描述sched domain的一些系统全局的信息。

root domain的rto_mask_mask用于标记那些CPU是否rto(real time overload)。什么是rto呢？就是大于1个实时进程，而且存在可以迁移的TASK。什么是可以迁移的task ？就是tsk_nr_cpus_allowed 大于1.

if (this_cpu == cpu)

continue;

src_rq = cpu_rq(cpu);

* Don't bother taking the

src_rq->lock if the next highest

* task is known to be

lower-priority than our current task.

* This may look racy, but if

this value is about to go

* logically higher, the src_rq

will push this task away.

* And if its going logically

lower, we do not care

(src_rq->rt.highest_prio.next >=

this_rq->rt.highest_prio.curr)

continue;

//如果src_rq的下一个RT task的优先级比当前runqueue的优先级要低的话。就跳过src_rq，因为是想pull一个高优先级的RT task过来。

* We can pull only a task,

which is pushable

* on its rq, and no others.

p =

pick_highest_pushable_task(src_rq, this_cpu);

//从src rq上找一个不在running，而且可以跑在this cpu上的进程。

* Do we have an RT task that

preempts

* the to-be-scheduled task?

if (p && (p->prio

< this_rq->rt.highest_prio.curr)) {

//如果找到了一个进程，且该进程的优先级比当前进程的优先级要高的话。

WARN_ON(p ==

src_rq->curr);

WARN_ON(!task_on_rq_queued(p));

* There's a chance

that p is higher in priority

* than what's

currently running on its cpu.

* This is just that p

is wakeing up and hasn't

* had a chance to

schedule. We only pull

* p if it is lower in

priority than the

* current task on the

run queue

if (p->prio < src_rq->curr->prio)

goto skip;

resched = true;

deactivate_task(src_rq,

p, 0);

set_task_cpu(p,

this_cpu);

activate_task(this_rq, p,

0);

//将该进程从src rq上dequeue掉，设置新的run CPU，然后重新enqueue到新的队列里面来。

* We continue with the

search, just in

* case there's an even

higher prio task

* in another runqueue.

(low likelihood

* but possible)

}

skip:

double_unlock_balance(this_rq,

src_rq);

}

//这边好像有点问题，因为for循环会执行好几次，会可能pull 好几个RT task过来?但是由于已经将task enqueue到rt rq了，所以

pick_highest_pushable_task应该会阻止这种情况。

if (resched)

resched_curr(this_rq);

//pull了一个新的高优先级的task过来了，需要重新sched

再次回到pick_next_task_rt函数。

* We may dequeue prev's rt_rq in put_prev_task().

* So, we update time before rt_nr_running check.

if (prev->sched_class == &rt_sched_class)

update_curr_rt(rq);//在调用put_prev_task之前，调用update_curr_rt更新状态。

if (!rt_rq->rt_queued)

return NULL;

put_prev_task(rq, prev);

p = _pick_next_task_rt(rq);

/* The running task is never eligible

for pushing */

dequeue_pushable_task(rq, p);

//将task p从rq的pushable task中删除，因为task p已经准备运行了。

queue_push_tasks(rq);

//插入一个push task的call back给balance_callback(rq)。没太搞清楚意图？

相对而言put_prev_task_rt就简单的很多了。

update_curr_rt(rq);

* The previous task needs to be made

eligible for pushing

* if it is still active

if (on_rt_rq(&p->rt) &&

p->nr_cpus_allowed > 1)

enqueue_pushable_task(rq, p);

//将进程插入到当前rq的链表中(按优先级排序),更新rq的highest_prio.next

接下来是select_task_rq_rt函数。

if (curr &&

unlikely(rt_task(curr)) &&

(curr->nr_cpus_allowed < 2 ||

curr->prio <= p->prio)) {

//如果当前的task是RT，并且当前进程只能在当前CPU上运行。那么给进程p重新选择一个CPU，否则就在当前CPU上运行。

int target = find_lowest_rq(p);

* Don't bother moving it if

the destination CPU is

* not running a lower priority

task.

if (target != -1 &&

p->prio <

cpu_rq(target)->rt.highest_prio.curr)

cpu = target;

}

接着调用find_lowest_rq.来选择一个优先级最低的CPU。

static

int find_lowest_rq(struct task_struct *task)

{

struct sched_domain *sd;

struct cpumask *lowest_mask =

this_cpu_cpumask_var_ptr(local_cpu_mask);

int this_cpu = smp_processor_id();

int cpu = task_cpu(task);

/* Make sure the mask is initialized

first */

if (unlikely(!lowest_mask))

return -1;

if (task->nr_cpus_allowed == 1)

//如果task affinity绑定在一个CPU上不能migrate的话，就不需要选择rq了。

return -1; /* No other targets

possible */

(!cpupri_find(&task_rq(task)->rd->cpupri, task, lowest_mask))

return -1; /* No targets found */

//这里面最关键的就是cpupri_find函数了，通过该函数查找可以容纳该进程的CPU。

怎么算可以容纳呢？即进程p可以抢占CPU，也就是task的优先级比CPU上的优先级最高的进程都要高。

为此在rd->cpupri中维护了一个数组。

struct cpupri {

struct

cpupri_vec pri_to_cpu[CPUPRI_NR_PRIORITIES];

int *cpu_to_pri;

};

该数组大小为102，用于维护CPU的优先级最高的task。除了0～99一共100个RT优先级，还有两外两个IDLE 和NORMAL分别用于

在不存在RT task情况下，用于标记runqueue为idle或者fair tasks。

可以从函数cpupri_set以及cpupri_cleanup上看出使用方法。

Cpu_to_pri是一个NR_CPUS的数组，维护着每一个CPU上RT 的最高优先级。

* At this point we have built a mask

of cpus representing the

* lowest priority tasks in the

system. Now we want to elect

* the best one based on our affinity

and topology.

* We prioritize the last cpu that the

task executed on since

* it is most likely cache-hot in that

location.

if (cpumask_test_cpu(cpu, lowest_mask))

return cpu;

//如果prev cpu满足条件，就是用prev cpu

if (!cpumask_test_cpu(this_cpu,

lowest_mask))

this_cpu = -1; /* Skip this_cpu

opt if not among lowest */

rcu_read_lock();

for_each_domain(cpu, sd) {

if (sd->flags &

SD_WAKE_AFFINE) {

int best_cpu;

* "this_cpu"

is cheaper to preempt than a

* remote processor.

if (this_cpu != -1

cpumask_test_cpu(this_cpu, sched_domain_span(sd))) {

//如果当前CPU满足条件，并且所在sched domain满足SD_WAKE_AFFINE条件的话。使用当前CPU。

rcu_read_unlock();

return

this_cpu;

}

best_cpu =

cpumask_first_and(lowest_mask,

sched_domain_span(sd));

//否则从sched中随意找出第一个满足条件的CPU。

if (best_cpu <

nr_cpu_ids) {

rcu_read_unlock();

return best_cpu;

}

rcu_read_unlock();

1619 /*

1620 * And finally, if there were no

matches within the domains

1621 * just give the caller *something* to

work with from the compatible

1622 * locations.

1623 */

1624 if (this_cpu != -1)

1625 return this_cpu;

//如果当前CPU满足条件，就是用当前CPU，否则从满足条件的CPU中选择第一个。

1626

1627 cpu = cpumask_any(lowest_mask);

1628 if (cpu < nr_cpu_ids)

1629 return cpu;

个人认为上面三个是比较重要的函数，下面想看一下task的en/dequeue

runqueue维护了一个bitmap和per priority的list

133

struct rt_prio_array {

134

DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); /* include 1 bit for delimiter */

135

struct list_head queue[MAX_RT_PRIO];

136 };

同时队列还维护着一个pushable_tasks队列，什么叫pushable

task？也就是nr_cpu_allows >=2.也就是可以在2个及以上CPU运行的，即可以migrate的task。

下面直接开始看代码吧。

static

void

enqueue_task_rt(struct

rq *rq, struct task_struct *p, int flags)

{

struct sched_rt_entity *rt_se =

&p->rt;

if (flags & ENQUEUE_WAKEUP)

rt_se->timeout = 0;

enqueue_rt_entity(rt_se, flags &

ENQUEUE_HEAD);

if (!task_current(rq, p) &&

p->nr_cpus_allowed > 1)

enqueue_pushable_task(rq, p);

//如果进程的nr_cpus_allowed > 1,添加到pushable 队列中。

}

static

void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)

{

struct rq *rq = rq_of_rt_se(rt_se);

dequeue_rt_stack(rt_se);

//这个函数看起来比较奇怪，rt stack ？！为什么呢。因为这个涉及到CGROUP相关的结构，我们的cgroup都是只有parent指针，没有child指针。但是比如添加一个较高优先级的rt task到一个cgroup的话。那么在cgroup的整体的优先级就会被提升，然后在list以及bitmap中的位置就会发生变化。就需要从上到下的来重新一个个插入到相应的list中去。

所以dequeue_rt_stack利用rt entity的back成员建立一个从上到下的层次结构，一边后续利用

for_each_sched_rt_entity(rt_se)

__enqueue_rt_entity(rt_se,

head);

//因为在上面的dequeue_rt_stack中将每一个层次的rt entity冲列表中删除。这里根据新的优先级重新插入。

enqueue_top_rt_rq(&rq->rt);

}

dequeue_task_rt函数大体跟enqueue_task_rt差不多。

海伦凯乐

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
linux命令ls -rt1,linux调度器（一）- RT

我们其实都知道linux最著名的CFS调度器，但是CFS过于复杂。所以我想从RT scheduler来一步一步的揭开调度器的面纱。其实linux内部有五种类型的scheduler class，可以认为是调度器等级，从stop开始、deadline、real time，fair最后是idle。这些class的优先级从高到低。最低的就是idle。每一个scheduler class都有一个next指针...
复制链接

扫一扫