Linux进程/线程的调度机制介绍：详细解析Linux系统中进程/线程的调度优先级规则

泡沫o0

已于 2023-03-31 23:13:35 修改

阅读量1.6k

点赞数 4

分类专栏： Linux系统编程：从入门到放弃文章标签： linux 服务器嵌入式 c++ 开发语言

于 2023-02-24 18:09:15 首次发布

本文链接：https://blog.csdn.net/qq_21438461/article/details/129204639

版权

Linux系统编程：从入门到放弃专栏收录该内容

161 篇文章 123 订阅

订阅专栏

目录标题

调度机制的原理
查询当前进程或线程的资源使用情况
调度优先级基本知识
Linux进程调度的三种策略
Linux 线程优先级
Linux 进/线程优先级设置接口
代码示例

调度机制的原理

进程和线程的调度都是由内核来完成的。操作系统内核包含一个调度器（scheduler），负责管理和调度所有进程和线程。调度器根据特定的调度策略和优先级规则来决定哪个进程或线程应该获得CPU资源，从而实现多任务和并发执行。

调度过程分为以下几个步骤：

预选：调度器会从进程和线程列表中选择一个候选者，作为接下来要运行的执行单元。
上下文切换：调度器会将当前运行的进程或线程的状态（包括寄存器、程序计数器等）保存到内核数据结构中，然后将选定的候选者的状态加载到CPU中。
执行：选定的进程或线程开始运行。它会在用户态执行指令，直到发生某个事件（如时间片用完、系统调用、中断等），需要重新进行调度。
再次调度：当一个进程或线程无法继续运行时（例如，因为它需要等待I/O操作完成），调度器会再次执行预选和上下文切换，将另一个进程或线程调度到CPU上。

在整个过程中，线程会在用户态和内核态之间切换。当线程执行完一个时间片或遇到需要内核介入的事件时（如系统调用、中断处理等），它会切换到内核态，让调度器重新分配资源。调度器在内核态运行，可以确保对系统资源的控制和管理。
在多核处理器系统中，调度器可以同时调度多个进程或线程在不同的核心上运行，以实现真正的并行执行。总之，进程和线程的调度都是通过内核来完成的，内核调度器负责管理和分配CPU资源，以实现高效的多任务和并发执行。

查询当前进程或线程的资源使用情况

在Linux中，可以使用getrusage()系统调用查询当前进程或线程的资源使用情况，包括本次被调度的时间,另外,通过分析/proc//sched文件中的调度统计信息，可以间接了解进程的调度情况。
getrusage()会返回一个rusage结构，其中包含了各种资源使用信息。以下是rusage结构的定义：
struct rusage {
   struct timeval ru_utime; /* 用户态使用的CPU时间 */
   struct timeval ru_stime; /* 内核态使用的CPU时间 */
   // ...其他资源使用信息
};
getrusage()函数的原型如下：
int getrusage(int who, struct rusage *usage);
who参数可以是以下值之一：

RUSAGE_SELF：查询当前进程的资源使用情况。
RUSAGE_CHILDREN：查询已终止且已收集的所有子进程的资源使用情况。
RUSAGE_THREAD（Linux-specific）：查询当前线程的资源使用情况。

要查询本次被调度的时间，可以将ru_utime和ru_stime的值相加，得到总的CPU时间。

调度优先级基本知识

cpu分配资源的先后顺序就是优先级
优先权高的进程有优先执行的权利
可以让进程运行在指定的cpu上，改善系统整体性能
用top或者ps -l查看进程会发现有PR(PRI) NI两个字段：NI 是优先值，是用户层面的概念， PR是进程的实际优先级，是给内核(kernel)看(用)的。

Linux进程调度的三种策略

SCHED_OTHER，分时调度策略
该策略是是默认的Linux分时调度（time-sharing scheduling）策略，它是Linux线程默认的调度策略。SCHED_OTHER策略的静态优先级总是为0，对于该策略列表上的线程，调度器是基于动态优先级（dynamic priority）来调度的，动态优先级是跟nice中相关(nice值可以由接口nice, setpriority,sched_setattr来设置)，该值会随着线程的运行时间而动态改变，以确保所有具有SCHED_OTHER策略的线程公平运行
SCHED_FIFO，实时调度策略，先到先服务。
根据进程的优先级进行调度，一旦抢占到 CPU 则一直运行，直达自己主动放弃或被被更高优先级的进程抢占;
SCHED_RR，实时调度策略，时间片轮转
在 SCHED_FIFO 的基础上，加上了时间片的概念。当一个进程抢占到 CPU 之后，运行到一定的时间后，调度器会把这个进程放在 CPU 中，当前优先级进程队列的末尾，然后选择另一个相同优先级的进程来执行;

Linux 线程优先级

进程	调度策略	优先级	说明
普通进程	`SCHED_OTHER`或`SCHED_NORMAL`	100-139	这个区间的优先级又称为静态优先级，不会随着时间而改变，内核不会修改它，只能通过系统调用nice去修改，静态优先级数值越大，进程的优先级越小，分配的基时间量就越少。普通进程几乎是无法分到时间片的（只能分到5%的CPU时间）。 static priority=nice+20+MAX_RT_PRIO，nice值[-20,19]，MAX_RT_PRIO默认为100，这样做的好处是，任何内核态进程优先级都大于用户态的进程.;
实时进程	`SCHED_FIFO`或`SCHED_RR`	0-99	只有在下述事件之一发生时，实时进程才会被另外一个进程取代 1. 进程被另外一个具有更高实时优先级的实时进程抢占 2. 进程执行了阻塞操作并进入睡眠 3. 进程停止（处于TASK_STOPPED 或TASK_TRACED状态）或被杀死 4. 进程通过调用系统调用sched_yield()，自愿放弃CPU5. 进程基于时间片轮转的实时进程（SCHED_RR），而且用完了它的时间片.

在这里插入图片描述

这张图表示的是内核中的优先级，分为两段。
前面的数值 0-99 是实时任务，后面的数值 100-139 是普通任务。
数值越低，代表这个任务的优先级越高。
以上是从内核角度来看的优先级。

我们在应用层创建线程的时候，设置了一个优先级数值，这是从应用层角度来看的优先级数值。
但是内核并不会直接使用应用层设置的这个数值，而是经过了一定的运算，才得到内核中所使用的优先级数值(0 ~ 139)。

对于实时任务

我们在创建线程的时候，可以通过下面这样的方式设置优先级数值(0 ~ 99)：
struct sched_param param;
param.__sched_priority = xxx;
当创建线程函数进入内核层面的时候，内核通过下面这个公式来计算真正的优先级数值：
kernel priority = 100 - 1 - param.__sched_priority
如果应用层传入数值 0，那么在内核中优先级数值就是 99(100 - 1 - 0 = 99)，在所有实时任务中，它的优先级是最低的。
如果应用层传输数值 99，那么在内核中优先级数值就是 0(100 - 1 - 99 = 0)，在所有实时任务中，它的优先级是最高的。
因此，从应用层的角度看，传输人优先级数值越大，线程的优先级就越高；数值越小，优先级就越低。
与内核角度是完全相反的！

对于普通任务

调整普通任务的优先级，是通过 nice 值来实现的，内核中也有一个公式来把应用层传入的 nice 值，转成内核角度的优先级数值：
kernel prifoity = 100 + 20 + nice
nice 的合法数值是：-20 ~ 19。
如果应用层设置线程 nice 数值为 -20，那么在内核中优先级数值就是 100(100 + 20 + (-20) = 100)，在所有的普通任务中，它的优先级是最高的。
如果应用层设置线程 nice 数值为 19，那么在内核中优先级数值就是 139(100 +20 +19 = 139)，在所有的普通任务中，它的优先级是最低的。
因此，从应用层的角度看，传输人优先级数值越小，线程的优先级就越高；数值越大，优先级就越低。
与内核角度是完全相同的！

top中的PR和NI

	top命令中pri的计算方法	说明
普通进程	top_pr=static_priority-100	static_priority取值是[100,139]，所以top_pri取值是[0,39]
实时进程	top_pri=-1-real_time_priority	chrt命令就是修改实时进程的优先级，比如给进程12345分配优先级为93，chrt -p 93 12345，则top命令pri值显示的是-94。有的实时进程的pri值显示的是rt，没有具体显示数值

多核调度

对于多核处理器，分布不同核时，调度策略、优先级，都不起作用！(准确的说：调度策略和优先级，在线程所在的那个 CPU 中是起作用的)

Linux 进/线程优先级设置接口

linux进程优先级设置

sched接口摘要

Linux提供了以下用于控制CPU的系统调用。
进程(或更多)的调度行为、策略和优先级。准确地说，是线程)。

   nice(2)
          Set a new nice value for the calling thread, and return
          the new nice value.

   getpriority(2)
          Return the nice value of a thread, a process group, or the
          set of threads owned by a specified user.

   setpriority(2)
          Set the nice value of a thread, a process group, or the
          set of threads owned by a specified user.

   sched_setscheduler(2)
          Set the scheduling policy and parameters of a specified
          thread.

   sched_getscheduler(2)
          Return the scheduling policy of a specified thread.

   sched_setparam(2)
          Set the scheduling parameters of a specified thread.

   sched_getparam(2)
          Fetch the scheduling parameters of a specified thread.

   sched_get_priority_max(2)
          Return the maximum priority available in a specified
          scheduling policy.

   sched_get_priority_min(2)
          Return the minimum priority available in a specified
          scheduling policy.

   sched_rr_get_interval(2)
          Fetch the quantum used for threads that are scheduled
          under the "round-robin" scheduling policy.

   sched_yield(2)
          Cause the caller to relinquish the CPU, so that some other
          thread be executed.

   sched_setaffinity(2)
          (Linux-specific) Set the CPU affinity of a specified
          thread.

   sched_getaffinity(2)
          (Linux-specific) Get the CPU affinity of a specified
          thread.

   sched_setattr(2)
          Set the scheduling policy and parameters of a specified
          thread.  This (Linux-specific) system call provides a
          superset of the functionality of sched_setscheduler(2) and
          sched_setparam(2).

   sched_getattr(2)
          Fetch the scheduling policy and parameters of a specified
          thread.  This (Linux-specific) system call provides a
          superset of the functionality of sched_getscheduler(2) and
          sched_getparam(2).

获取静态优先级范围

int sched_get_priority_max(int policy);

int sched_get_priority_min(int policy);
sched_get_priority_max()返回可与策略标识的调度算法一起使用的最大优先级值。
sched_get_priority_min()返回可与策略标识的调度算法一起使用的最小优先级值。
支持的策略值为SCHED_FIFO、SCHED_RR、SCHED_OTHER。
返回值
如果成功，则sched_get_first_max()和sched_get_first_min()返回最大/最小优先级值。
用于指定的计划策略。出错时，返回-1，并适当设置errno。

设置和获取调度策略/参数

int sched_setscheduler(pid_t pid, int policy,const struct sched_param *param);

int sched_getscheduler(pid_t pid);
参数
pid：
Sched_setScheduler()系统调用为ID在id中指定的线程设置调度策略和参数。如果id等于零，则设置调用线程的调度策略和参数
policy：
目前，Linux支持以下“常规”(即非实时)调度策略作为可能的值。

SCHED_OTHER 标准循环分时策略；
SCHED_BATCH，用于以“批处理”方式执行进程；以及。
SCHED_IDLE，用于运行优先级很低的后台作业。

对于需要精确计算的特殊时间关键型应用程序，还支持各种实时策略。
选择要执行的可运行线程的方式。有关管理何时发生。
进程可以使用这些策略，请参见Sched(7)。可以在策略中指定的实时策略包括：

SCHED_FIFO先进先出策略；
SCHED_RR循环策略。

param：
调度参数在param以下结构体成员中指定:
sched_param{
  ... 
 int SCHED_PRIORITY；
  ...
  };
返回值
如果成功，sched_setScheduler()将返回零，sched_getScheduler()将返回该线程的策略(非负整数)。
出错时，两个调用都返回-1，并且相应地设置了errno。

linux线程优先级设置

参考线程函数的使用

代码示例

使用getrusage()查询当前进程的CPU时间。

//这个示例中显示的时间是从进程开始到调用getrusage()时的累计CPU时间，而不是单次调度的时间。
#include <sys/resource.h>
#include <sys/time.h>
#include <stdio.h>

int main() {
    struct rusage usage;
    getrusage(RUSAGE_SELF, &usage);

    double utime = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec / 1000000.0;
    double stime = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec / 1000000.0;
    double total_time = utime + stime;

    printf("User time: %f\n", utime);
    printf("System time: %f\n", stime);
    printf("Total time: %f\n", total_time);

    return 0;
}

设置为实时进程

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <pthread.h>
 4 #include <sched.h>
 5 
 6 
 7 pid_t pid = getpid();
 8 struct sched_param param;
 9 param.sched_priority = sched_get_priority_max(SCHED_FIFO);   // 也可用SCHED_RR
10 sched_setscheduler(pid, SCHED_RR, &param);                   // 设置当前进程为实时
11 pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);   // 设置当前线程

优先级测试代码

#include <stdio.h>
#include <pthread.h>
#include <sched.h>
#include <assert.h>
 
static int get_thread_policy(pthread_attr_t *attr)
{
	int policy;
	int rs = pthread_attr_getschedpolicy(attr,&policy);
	assert(rs==0);
 
switch(policy)
{
	case SCHED_FIFO:
	printf("policy=SCHED_FIFO\n");
	break;
	 
	case SCHED_RR:
	printf("policy=SCHED_RR\n");
	break;
	 
	case SCHED_OTHER:
	printf("policy=SCHED_OTHER\n");
	break;
	 
	default:
	printf("policy=UNKNOWN\n");
	break;
	}
	return policy;
}
 
static void show_thread_priority(pthread_attr_t *attr,int policy)
{
	int priority = sched_get_priority_max(policy);
	assert(priority != -1);
	printf("max_priority=%d\n",priority);
	 
	priority= sched_get_priority_min(policy);
	assert(priority != -1);
	printf("min_priority=%d\n",priority);
}
 
static int get_thread_priority(pthread_attr_t *attr)
{
	struct sched_param param;
	int rs = pthread_attr_getschedparam(attr,¶m);
	assert(rs == 0);
	 
	printf("priority=%d\n",param.__sched_priority);
	return param.__sched_priority;
}
 
static void set_thread_policy(pthread_attr_t *attr,int policy)
{
	int rs = pthread_attr_setschedpolicy(attr,policy);
	assert(rs==0);
}
 
int main(void)
{
	pthread_attr_t attr;
	int rs;
	 
	rs = pthread_attr_init(&attr);
	assert(rs==0);
	 
	int policy = get_thread_policy(&attr);
	 
	printf("Show current configuration of priority\n");
	get_thread_policy(&attr);
	show_thread_priority(&attr,policy);
	 
	printf("show SCHED_FIFO of priority\n");
	show_thread_priority(&attr,SCHED_FIFO);
	 
	printf("show SCHED_RR of priority\n");
	show_thread_priority(&attr,SCHED_RR);
	 
	printf("show priority of current thread\n");
	get_thread_priority(&attr);
	 
	printf("Set thread policy\n");
	 
	printf("set SCHED_FIFO policy\n");
	set_thread_policy(&attr,SCHED_FIFO);
	get_thread_policy(&attr);
	get_thread_priority(&attr);
	 
	printf("set SCHED_RR policy\n");
	set_thread_policy(&attr,SCHED_RR);
	get_thread_policy(&attr);
	 
	printf("Restore current policy\n");
	set_thread_policy(&attr,policy);
	get_thread_priority(&attr);
	 
	rs = pthread_attr_destroy(&attr);
	assert(rs==0);
	 
	return 0;
}