实例观察优先级翻转和优先级继承现象

王燕龙笔记

已于 2024-03-26 10:56:18 修改

阅读量1k

点赞数 15

分类专栏： linux 综合文章标签： linux

于 2024-02-05 16:44:25 首次发布

本文链接：https://blog.csdn.net/weixin_38184628/article/details/136033449

版权

linux 综合专栏收录该内容

28 篇文章 0 订阅

订阅专栏

本文详细解释了在Linux内核中，特别是没有和有实时内核补丁的情况下，优先级翻转和优先级继承的概念，通过示例代码展示了这两个现象在调度线程和使用自旋锁、mutex锁时的表现。

摘要由CSDN通过智能技术生成

1 背景

当两个优先级不同的线程同时存在于调度队列的时候，我们预期的调度顺序是：优先级高的线程先运行，优先级低的线程后运行。

优先级翻转的意思是，当两个优先级不同的线程同时存在时，高优先级的线程得不到调度，而是低优先级的线程获得了执行的机会，与预期的执行顺序是反着的，所以称为优先级翻转。

当讨论优先级翻转和优先级继承的时候，更多的是在讨论 linux 内核实时补丁的时候。我们知道，在没有打实时内核补丁的内核中，使用自旋锁时是关闭抢占的。因为自旋锁在等锁的过程中是自旋忙等待，会一直占用着 cpu，所以自旋锁适用于加锁时间非常短的场景。但是时间的长短没有严格的规定，是 μs 级，ms 级，还是 s 级，所以这就不能保证 spin lock 的时间是什么量级；同时，自旋锁在内核中的使用非常多，在 linux 基础架构中，驱动中都有使用。这些都给自旋锁的时间带来了不确定性，又由于加锁的时候关闭了内核抢占，即使这个时候有更高优先级的线程被唤醒，也不能抢占当前的任务，所以会导致调度的不确定性。而在打了实时补丁的内核中，自旋锁加锁过程中是支持内核抢占的，这也引入了本文中的优先级翻转问题。

如下是优先级翻转的说明。

有 3 个线程，优先级分别是 low, mid, high，其中 Low 线程和 high 线程会抢同一个锁。

① t1 时刻，low 线程开始运行，调用 spin_lock() 获得了锁。

② t2 时刻，mid 线程被唤醒，由于 mid 线程比 low 线程优先级高，所以 mid 抢占了 low，这个时候 mid 线程得到 cpu 开始执行。

③ t3 时刻，high 线程被唤醒，high 线程被唤醒之后也要调用 spin_lock() 进行加锁，但是这个时候锁被 low 线程拿着，所以 high 线程只能睡眠，等待 low 线程释放锁。

此时还是 mid 线程得到了 cpu 并在运行，拿着锁的 low 线程得不到运行，也无法释放锁，所以最应该得到运行的 high 线程也得不到运行。

④ t4 时刻，mid 线程运行完毕，这个时候 low 线程可以继续运行。

⑤ t5 时刻，low 线程运行完毕，释放锁，这个时候 high 线程得到了锁，可以运行了。

⑤ t6 时刻，high 线程访问临界区完毕，释放自旋锁。

下图中箭头表示时间轴，红色的区域表示线程在运行。从下图中能看出来，high 线程被唤醒时，本应该很快得到 cpu 并运行，但是由于自旋锁的原因，需要等 mid 和 low 运行完毕之后才能得到运行。这就是优先级翻转。

如下是优先级继承的说明。

在 t3 时刻，high 线程被唤醒，也要调用自旋锁加锁，这个时候自旋锁被 low 线程拿着，但是 low 线程得不到执行，cpu 被 mid 线程占用着。linux 内核中所做的事情，就是在 t3 时刻，将 low 线程的优先级调整为 high，这样 low 线程就能尽快执行完毕，释放自旋锁，从而使得 high 线程得到执行。这就是优先级继承。

2 内核线程优先级翻转和优先级继承

2.1 优先级翻转

优先级翻转的现象，在没有打实时内核补丁的系统上可以观察到。

如下是一个内核模块，在模块中创建了 3 个线程，调度策略均为 SCHED_FIFO，优先级分别是 5(low), 10(mid), 15(high)。low 线程中调用了 mutex_lock() 加锁之后是一个死循环，mid 线程中是一个死循环，high 线程中使用 mutex_lock() 加锁。先启动 low 线程，再启动 mid 线程，最后启动 high 线程。

#include <linux/delay.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/spinlock.h>

struct sched_attr {
        __u32 size;

        __u32 sched_policy;
        __u64 sched_flags;

        /* SCHED_NORMAL, SCHED_BATCH */
        __s32 sched_nice;

        /* SCHED_FIFO, SCHED_RR */
        __u32 sched_priority;

        /* SCHED_DEADLINE */
        __u64 sched_runtime;
        __u64 sched_deadline;
        __u64 sched_period;

        /* Utilization hints */
        __u32 sched_util_min;
        __u32 sched_util_max;

};

struct set_sched_attr_func  {
  int (*sched_setattr_nocheck)(struct task_struct *, const struct sched_attr *);
};

// 在有些系统中，内核模块中不能使用内核函数 sched_setattr_nocheck,
// 可以在 /proc/kallsyms 中找到函数对应的地址
// 使用如下方式来使用这个函数
// 但是这种方式只适用于测试环境中
// 生产环境不见这么使用
struct set_sched_attr_func sched_func = {
  .sched_setattr_nocheck = 0xffffd515464e9d20
};

struct mutex test_mutex;
static struct task_struct *init_thread;

static struct task_struct *thread_low;
static struct task_struct *thread_mid;
static struct task_struct *thread_high;

static volatile int exit_flag = 0;

static int thread_low_entry(void *data) {
  struct task_struct *task = current;
  printk("thread low start, tid: %d\n", task->pid);
  mutex_lock(&test_mutex);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  mutex_unlock(&test_mutex);
  printk("therad low return\n");
  return 0;
}

static int thread_mid_entry(void *data) {
  struct task_struct *task = current;
  printk("thread mid start, tid: %d\n", task->pid);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  printk("thread mid return\n");
  return 0;
}

static int thread_high_entry(void *data) {
  struct task_struct *task = current;
  printk("thread high start, tid: %d\n", task->pid);
  mutex_lock(&test_mutex);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  printk("thread high return\n");
  return 0;
}

static int init_thread_entry(void *data) {
  printk("init thread start\n");

  mutex_init(&test_mutex);

  thread_low = kthread_create(thread_low_entry, NULL, "thread_low");
  if (IS_ERR(thread_low)) {
    printk("failed to create thread low\n");
    return -1;
  }
  struct sched_attr attr_low;
  memset(&attr_low, 0, sizeof(struct sched_attr));
  attr_low.sched_policy = SCHED_FIFO;
  attr_low.sched_priority = 5;
  sched_func.sched_setattr_nocheck(thread_low, &attr_low);

  kthread_bind(thread_low, 1);
  wake_up_process(thread_low);
  ssleep(2);

  thread_mid = kthread_create(thread_mid_entry, NULL, "thread_mid");
  if (IS_ERR(thread_mid)) {
    printk("failed to create thread mid\n");
    return -1;
  }
  struct sched_attr attr_mid;
  memset(&attr_mid, 0, sizeof(struct sched_attr));
  attr_mid.sched_policy = SCHED_FIFO;
  attr_mid.sched_priority = 10;
  sched_func.sched_setattr_nocheck(thread_mid, &attr_mid);

  kthread_bind(thread_mid, 1);
  wake_up_process(thread_mid);
  ssleep(30);

  thread_high = kthread_create(thread_high_entry, NULL, "thread_high");
  if (IS_ERR(thread_high)) {
    printk("failed to create thread high\n");
    return -1;
  }
  struct sched_attr attr_high;
  memset(&attr_high, 0, sizeof(struct sched_attr));
  attr_high.sched_policy = SCHED_FIFO;
  attr_high.sched_priority = 15;
  sched_func.sched_setattr_nocheck(thread_high, &attr_high);

  kthread_bind(thread_high, 1);
  wake_up_process(thread_high);

  return 0;
}

static int pi_init(void) {
  printk("pi init\n");
  init_thread = kthread_create(init_thread_entry, NULL, "init_thread");
  if (IS_ERR(init_thread)) {
    printk("failed to create init thread\n");
    return -1;
  }
  wake_up_process(init_thread);

  printk("pi inited\n");
  return 0;
}

static void pi_exit(void) {
  printk("pi exit\n");
  exit_flag = 1;
  ssleep(10);
  printk("pi exited\n");
}

module_init(pi_init);
module_exit(pi_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("wyl");
MODULE_DESCRIPTION("watch pi in rt kernel");
MODULE_VERSION("0.1");

从下边的截图可以看出来，mid 线程一直占着 cpu，cpu 使用率接近 100%，high 线程和 low 线程均得不到执行，cpu 使用率为 0%。

2.2 优先级继承

优先级继承现象在打了实时内核补丁的系统上可以观察到。

如下是一个内核模块，在模块中创建了 3 个线程，调度策略均为 SCHED_FIFO，优先级分别是 5(low), 10(mid), 15(high)。low 线程中调用了 spin_lock() 加锁之后是一个死循环，mid 线程中是一个死循环，high 线程中使用 spin_lock() 加锁。先启动 low 线程，再启动 mid 线程，最后启动 high 线程。

#include <linux/delay.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/spinlock.h>

struct sched_attr {
        __u32 size;

        __u32 sched_policy;
        __u64 sched_flags;

        /* SCHED_NORMAL, SCHED_BATCH */
        __s32 sched_nice;

        /* SCHED_FIFO, SCHED_RR */
        __u32 sched_priority;

        /* SCHED_DEADLINE */
        __u64 sched_runtime;
        __u64 sched_deadline;
        __u64 sched_period;

        /* Utilization hints */
        __u32 sched_util_min;
        __u32 sched_util_max;

};

struct set_sched_attr_func  {
  int (*sched_setattr_nocheck)(struct task_struct *, const struct sched_attr *);
};

struct set_sched_attr_func sched_func = {
  .sched_setattr_nocheck = 0xffffb03161789d20
};

static spinlock_t test_spinlock;

static struct task_struct *init_thread;

static struct task_struct *thread_low;
static struct task_struct *thread_mid;
static struct task_struct *thread_high;

static volatile int exit_flag = 0;

static int thread_low_entry(void *data) {
  struct task_struct *task = current;
  printk("thread low start, tid: %d\n", task->pid);
  spin_lock(&test_spinlock);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  spin_unlock(&test_spinlock);
  printk("therad low return\n");
  return 0;
}

static int thread_mid_entry(void *data) {
  struct task_struct *task = current;
  printk("thread mid start, tid: %d\n", task->pid);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  printk("thread mid return\n");
  return 0;
}

static int thread_high_entry(void *data) {
  struct task_struct *task = current;
  printk("thread high start, tid: %d\n", task->pid);
  spin_lock(&test_spinlock);
  while (1) {
    if (exit_flag) {
      break;
    }
  }
  printk("thread high return\n");
  return 0;
}

static int init_thread_entry(void *data) {
  printk("init thread start\n");

  spin_lock_init(&test_spinlock);

  thread_low = kthread_create(thread_low_entry, NULL, "thread_low");
  if (IS_ERR(thread_low)) {
    printk("failed to create thread low\n");
    return -1;
  }
  struct sched_attr attr_low;
  memset(&attr_low, 0, sizeof(struct sched_attr));
  attr_low.sched_policy = SCHED_FIFO;
  attr_low.sched_priority = 5;
  sched_func.sched_setattr_nocheck(thread_low, &attr_low);

  kthread_bind(thread_low, 1);
  wake_up_process(thread_low);
  ssleep(2);

  thread_mid = kthread_create(thread_mid_entry, NULL, "thread_mid");
  if (IS_ERR(thread_mid)) {
    printk("failed to create thread mid\n");
    return -1;
  }
  struct sched_attr attr_mid;
  memset(&attr_mid, 0, sizeof(struct sched_attr));
  attr_mid.sched_policy = SCHED_FIFO;
  attr_mid.sched_priority = 10;
  sched_func.sched_setattr_nocheck(thread_mid, &attr_mid);

  kthread_bind(thread_mid, 1);
  wake_up_process(thread_mid);
  ssleep(30);

  thread_high = kthread_create(thread_high_entry, NULL, "thread_high");
  if (IS_ERR(thread_high)) {
    printk("failed to create thread high\n");
    return -1;
  }
  struct sched_attr attr_high;
  memset(&attr_high, 0, sizeof(struct sched_attr));
  attr_high.sched_policy = SCHED_FIFO;
  attr_high.sched_priority = 15;
  sched_func.sched_setattr_nocheck(thread_high, &attr_high);

  kthread_bind(thread_high, 1);
  wake_up_process(thread_high);

  return 0;
}

static int pi_init(void) {
  printk("pi init\n");
  init_thread = kthread_create(init_thread_entry, NULL, "init_thread");
  if (IS_ERR(init_thread)) {
    printk("failed to create init thread\n");
    return -1;
  }
  wake_up_process(init_thread);

  printk("pi inited\n");
  return 0;
}

static void pi_exit(void) {
  printk("pi exit\n");
  exit_flag = 1;
  ssleep(10);
  printk("pi exited\n");
}

module_init(pi_init);
module_exit(pi_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("wyl");
MODULE_DESCRIPTION("watch pi in rt kernel");
MODULE_VERSION("0.1");

在 high 线程启动之前， low 线程的优先级显示为 -6，cpu 使用率为 0%。

在 high 线程启动后，low 线程的优先级被修改为与 high 线程保持一致，由 -6 改为了 -16。从下图可以看出，这个时候 low 线程得到了执行，cpu 使用率接近于 100%。

high 线程的优先级显示为 -16。

3 用户态线程优先级翻转和优先级继承

在用户态使用 pthread_mutex_t mutex 时，可以设置属性 PTHREAD_PRIO_INHERIT 来设置使用这个 mutex 的线程是支持优先级继承的。观察现象与内核态线程类似。

本人测试中， PTHREAD_PRIO_INHERIT 的使用不需要打实时内核补丁，普通的系统中也生效。

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>

#define BIND_CPU_CORE 2

pthread_mutex_t mutex;
pthread_mutexattr_t mutex_attr;

int set_fifo(int prio) {
  struct sched_param sp = {.sched_priority = prio};
  int policy = SCHED_FIFO;
  return sched_setscheduler(0, policy, &sp);
}

int32_t set_affinity() {
  cpu_set_t cpuset;
  CPU_ZERO(&cpuset);
  CPU_SET(BIND_CPU_CORE, &cpuset);

  if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) {
    printf("bind cpu error\n");
    return -1;
  }
  return 0;
}

void *fifo_low(void *data) {
  set_fifo(5);
  set_affinity();
  printf("fifo low\n");
  sleep(1);
  printf("fifo low, before lock\n");
  pthread_mutex_lock(&mutex);
  printf("fifo low, after lock\n");
  while (1)
    ;
}

void *fifo_mid(void *data) {
  set_fifo(10);
  set_affinity();
  printf("fifo mid\n");
  sleep(1);
  while (1)
    ;
}

void *fifo_high(void *data) {
  set_fifo(15);
  set_affinity();
  printf("fifo high\n");
  sleep(1);
  printf("fifo high, before lock\n");
  pthread_mutex_lock(&mutex);
  printf("fifo high, after lock\n");
  while (1)
    ;
}

int main() {
  pthread_t fifo_tid1;
  pthread_t fifo_tid2;
  pthread_t fifo_tid3;

  pthread_mutexattr_init(&mutex_attr);
  pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
  pthread_mutex_init(&mutex, &mutex_attr);
  sleep(5);

  pthread_create(&fifo_tid1, NULL, fifo_low, NULL);
  sleep(5);

  pthread_create(&fifo_tid2, NULL, fifo_mid, NULL);
  sleep(30);

  pthread_create(&fifo_tid3, NULL, fifo_high, NULL);
  sleep(1000);

  return 0;
}

王燕龙笔记

关注

15
点赞
踩
27

收藏

觉得还不错? 一键收藏
1
评论
实例观察优先级翻转和优先级继承现象

当两个优先级不同的线程同时存在于调度队列的时候，我们预期的调度顺序是：优先级高的线程先运行，优先级低的线程后运行。优先级翻转的意思是，当两个优先级不同的线程同时存在时，高优先级的线程得不到调度，而是低优先级的线程获得了执行的机会，与预期的执行顺序是反着的，所以称为优先级翻转。当讨论优先级翻转和优先级继承的时候，更多的是在讨论 linux 内核实时补丁的时候。我们知道，在没有打实时内核补丁的内核中，使用自旋锁时是关闭抢占的。
复制链接

扫一扫

专栏目录