Linux 中进程的调度算法

最新推荐文章于 2024-04-18 17:09:39 发布

Linux加油站

最新推荐文章于 2024-04-18 17:09:39 发布

阅读量1.1k

点赞数

文章标签： linux 进程管理 Linux内核

本文链接：https://blog.csdn.net/m0_74282605/article/details/129824061

版权

1 调度类

Linux内核实现了4种调度类，优先级从高到低分别是：

调度类	名称	优先级
stop_sched_class	停止类	-
rt_sched_class	实时类	0-99
fair_sched_class	完全公平调度类	100-139
idle_sched_class	空闲类	-

调度器先从停止类中挑选进程，如果停止类中没有挑选到可运行的进程，再从实时类挑选，依此类推。这可以从kernel/sched/core.c中pick_next_task函数看出来。

其它调度类都很简单，我们这里只讲完全公平调度类(CFS)。

2 调度延迟和调度最小粒度

完全公平调度类使用了一种动态时间片的算法，给每个进程分配CPU占用时间。调度延迟指的是任何一个可运行进程两次运行之间的时间间隔。比如调度延迟是20毫秒，那么每个进程可以执行10毫秒；如果是4个进程，可以执行5毫秒。调度延迟称为 sysctl_sched_latency，记录在/proc/sys/kernel/sched_latency_ns中，以纳秒为单位。

如果进程很多，那么可能每个进程每次运行的时间都很短，这浪费了大量的时间进行调度。所以引入了调度最小粒度的感念。除非进程进行了阻塞任务或者主动让出CPU，否则进程至少执行调度最小粒度的时间。调度最小粒度称为sysctl_sche_min_granularity，记录在/proc/sys/sched_min_granulariry_ns中，以纳秒为单位。

[ sched_nr_latency=\frac{sysctl_sched_latency}{sysctl_sched_min_granularity} ]

这个比值是一个调度延迟内允许的最大运行数目。如果可运行进程个数小于 sched_nr_latency，调度周期等于调度延迟。如果可运行进程超过了sched_nr_latency，系统就不去理会调度延迟，转而保证调度最小粒度，这种情况下，调度周期等于最小粒度乘可运行进程个数。这在kernel/sched/fair.c中计算调度周期的函数可以看出来。

/*
 * The idea is to set a period in which each task runs once.
 *
 * When there are too many tasks (sched_nr_latency) we have to stretch
 * this period because otherwise the slices get too small.
 *
 * p = (nr <= nl) ? l : l*nr/nl
 */
static u64 __sched_period(unsigned long nr_running)
{
        if (unlikely(nr_running > sched_nr_latency))
                return nr_running * sysctl_sched_min_granularity;
        else
                return sysctl_sched_latency;
}

3 进程权重

通过赋予进程权重weight，就可以计算出每个进程的运行时间： [ runtime=period \frac{weight}{sum of weight} ]

Linux下每个进程都有一个nice值，取值范围是[-20,19]，nice值越高，表示越友好，就越谦让，优先级越底。因为内核不能进行浮点运算，在kernel/sched/core.c定义了预先计算出的nice值和weight的对应关系。这样的对应关系遵从的公式已经在注释中给出了。

/*
 * Nice levels are multiplicative, with a gentle 10% change for every
 * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
 * nice 1, it will get ~10% less CPU time than another CPU-bound task
 * that remained on nice 0.
 *
 * The "10% effect" is relative and cumulative: from _any_ nice level,
 * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
 * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
 * If a task goes up by ~10% and another task goes down by ~10% then
 * the relative distance between them is ~25%.)
 */
const int sched_prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

/*
 * Inverse (2^32/x) values of the sched_prio_to_weight[] array, precalculated.
 *
 * In cases where the weight does not change often, we can use the
 * precalculated inverse to speed up arithmetics by turning divisions
 * into multiplications:
 */
const u32 sched_prio_to_wmult[40] = {
 /* -20 */     48388,     59856,     76040,     92818,    118348,
 /* -15 */    147320,    184698,    229616,    287308,    360437,
 /* -10 */    449829,    563644,    704093,    875809,   1099582,
 /*  -5 */   1376151,   1717300,   2157191,   2708050,   3363326,
 /*   0 */   4194304,   5237765,   6557202,   8165337,  10153587,
 /*   5 */  12820798,  15790321,  19976592,  24970740,  31350126,
 /*  10 */  39045157,  49367440,  61356676,  76695844,  95443717,
 /*  15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
};

内核资料直通车：Linux内核源码技术学习路线+视频教程代码资料

学习直通车：Linux内核源码/内存调优/文件系统/进程管理/设备驱动/网络协议栈

4 时间片和虚拟运行时间

在Linux中，每个CPU都拥有一个运行队列，如果队列中存在多个可执行状态的进程，如何选择哪个进程获得CPU呢？

完全公平调度的思想是尽可能使所有进程获得相同的运行时间。每次总是选取队列中已经运行时间最小的进程进行调度。由于引入了优先级的概念，Linux使用加权运行时间作标准。这个加权运行时间称为虚拟运行时间(vruntime)，而真实的运行时间称为 sum_exec_runtime。

[ vruntime = sum_exec_runtime\times \frac{NICE_0_LOAD}{weigh} ]

NICE_0_LOAD的值是nice值为0的进程权重，即1024。每次调度时总是选取vruntime最小的进程进行调度。

include/linux/sched.h中定义了调度实体，里面涉及了组调度的内容，这在后面会提到。

struct sched_entity {
        struct load_weight      load;           /* for load-balancing */
        struct rb_node          run_node;
        struct list_head        group_node;
        unsigned int            on_rq;

        u64                     exec_start;
        u64                     sum_exec_runtime;
        u64                     vruntime;
        u64                     prev_sum_exec_runtime;

        u64                     nr_migrations

最低0.47元/天解锁文章

Linux加油站

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Linux 中进程的调度算法

在迁移后，再加上迁移后的CPU的运行队列中最小虚拟运行时间。注意到sysctl_sched_child_runs_first那一行，可以指定 /proc/sys/kernel/sched_child_runs_first为1使子进程优先获得调度，如果是0，则父进程优先获得调度。如果希望实时进程存在的情况下一般进程也可以消耗少量CPU时间，而不是等待实时进程全部结束后才能执行，可以修改两个控制项：kernel.sched_rt_period_us和 kernel.sched_rt_runtime_us。
复制链接

扫一扫