进程组调度机制

最新推荐文章于 2024-07-16 11:53:10 发布

SoloLinux

最新推荐文章于 2024-07-16 11:53:10 发布

阅读量1.3k

点赞数

分类专栏： Linux Memory

Linux Memory 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

又碰到一个神奇的进程调度问题，在系统重启过程中，发现系统挂住了，过了30s后才重新复位，真正系统复位的原因是硬件看门狗重启的系统，而非原来正常的reboot流程。硬件狗记录的复位时间，将不喂狗的时间向前推30s分析串口记录日志，当时的日志就打印了一句话：“sched: RT throttling activated”。
从linux-3.0.101-0.7.17版本内核代码中可以看出，sched_rt_runtime_exceeded打印了这句话。在内核进程组调度过程中，实时进程调度受rt_rq->rt_throttled 的限制，下面便具体说一下涉及到的linux中进程组调度机制。

进程组调度机制

组调度是cgroup里面的概念，指将N个进程视为一个整体，参与系统中的调度过程，具体体现在示例中：A任务有8个进程或线程，B任务有2个进程或线程，仍然有其他的进程或线程存在，就需要控制A任务的CPU占用率不高于40%，B任务的CPU占用率不高于40%，其他任务占用率不少于20%，那么就有对cgroup阀值的设置，cgroup A设置为200,cgroup B设置为200,其他任务默认为100,如此便实现了CPU控制的功能。
在内核中，进程组由task_group进行管理，其中涉及的内容很多都是cgroup控制机制，另外开辟单元在写，此处指重点描述组调度的部分，具体见如下注释。

struct task_group {

struct cgroup_subsys_state css;

//下面是普通进程调度使用

#ifdef CONFIG_FAIR_GROUP_SCHED

/* schedulable entities of this group on each cpu */

//普通进程调度单元，之所以用调度单元，因为被调度的可能是一个进程，也可能是一组进程

struct sched_entity **se;

/* runqueue "owned" by this group on each cpu */

//公平调度队列

struct cfs_rq **cfs_rq;

//下面就是如上示例的控制阀值

unsigned long shares;

atomic_t load_weight;

#endif

#ifdef CONFIG_RT_GROUP_SCHED

//实时进程调度单元

struct sched_rt_entity **rt_se;

//实时进程调度队列

struct rt_rq **rt_rq;

//实时进程占用CPU时间的带宽（或者说比例）

struct rt_bandwidth rt_bandwidth;

#endif

struct rcu_head rcu;

struct list_head list;

//task_group呈树状结构组织，有父节点，兄弟链表，孩子链表，内核里面的根节点是root_task_group

struct task_group *parent;

struct list_head siblings;

struct list_head children;

#ifdef CONFIG_SCHED_AUTOGROUP

struct autogroup *autogroup;

#endif

struct cfs_bandwidth cfs_bandwidth;

};

调度单元有两种，即普通调度单元和实时进程调度单元。

struct sched_entity {

struct load_weight load; /* for load-balancing */

struct rb_node run_node;

struct list_head group_node;

unsigned int on_rq;

u64 exec_start;

u64 sum_exec_runtime;

u64 vruntime;

u64 prev_sum_exec_runtime;

u64 nr_migrations;

#ifdef CONFIG_SCHEDSTATS

struct sched_statistics statistics;

#endif

#ifdef CONFIG_FAIR_GROUP_SCHED

//当前调度单元归属于某个父调度单元

struct sched_entity *parent;

/* rq on which this entity is (to be) queued: */

//当前调度单元归属的父调度单元的调度队列，即当前调度单元插入的队列

struct cfs_rq *cfs_rq;

/* rq "owned" by this entity/group: */

//当前调度单元的调度队列，即管理子调度单元的队列，如果调度单元是task_group，my_q才会有值

//如果当前调度单元是task，那么my_q自然为NULL

struct cfs_rq *my_q;

#endif

void *suse_kabi_padding;

};

struct sched_rt_entity {

struct list_head run_list;

unsigned long timeout;

unsigned int time_slice;

int nr_cpus_allowed;

struct sched_rt_entity *back;

#ifdef CONFIG_RT_GROUP_SCHED

//实时进程的管理和普通进程类似，下面三项意义参考普通进程

struct sched_rt_entity *parent;

/* rq on which this entity is (to be) queued: */

struct rt_rq *rt_rq;

/* rq "owned" by this entity/group: */

struct rt_rq *my_q;

#endif

};

下面看一下调度队列，因为实时调度和普通调度队列需要说明的选项差不多，以实时队列为例：

struct rt_rq {

struct rt_prio_array active;

unsigned long rt_nr_running;

#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED

struct {

int curr; /* highest queued rt task prio */

#ifdef CONFIG_SMP

int next; /* next highest */

#endif

} highest_prio;

#endif

#ifdef CONFIG_SMP

unsigned long rt_nr_migratory;

unsigned long rt_nr_total;

int overloaded;

struct plist_head pushable_tasks;

#endif

//当前队列的实时调度是否受限

int rt_throttled;

//当前队列的累计运行时间

u64 rt_time;

//当前队列的最大运行时间

u64 rt_runtime;

/* Nests inside the rq lock: */

raw_spinlock_t rt_runtime_lock;

#ifdef CONFIG_RT_GROUP_SCHED

unsigned long rt_nr_boosted;

//当前实时调度队列归属调度队列

struct rq *rq;

struct list_head leaf_rt_rq_list;

//当前实时调度队列归属的调度单元

struct task_group *tg;

#endif

};

通过以上3个结构体分析，可以得到下图(点击看大图)：

task_group

从图上可以看出，调度单元和调度队列组合一个树节点，又是另一种单独树结构存在，只是需要注意的是，只有调度单元里面有TASK_RUNNING的进程时，调度单元才会被放到调度队列中。
另外一点是，在没有组调度前，每个CPU上只有一个调度队列，当时可以理解成所有的进程在一个调度组里面，现在则是每个调度组在每个CPU上都有调度队列。在调度过程中，原来是系统选择一个进程运行，当前则是选择一个调度单元运行，调度发生时，schedule进程从root_task_group开始寻找由调度策略决定的调度单元，当调度单元是task_group，则进入task_group的运行队列选择一个合适的调度单元，最终找一个合适的task调度单元。整个过程就是树的遍历，拥有TASK_RUNNING进程的task_group是树的节点，task调度单元则是树的叶子。