深入Linux内核架构笔记 - 进程管理与调度7: 调度器增强

最新推荐文章于 2021-04-30 12:06:02 发布

原创最新推荐文章于 2021-04-30 12:06:02 发布 · 234 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#linux #schedule

Linux 专栏收录该内容

15 篇文章

订阅专栏

本文探讨了多处理器系统(SMP)上的内核调度问题，包括CPU负荷共享、进程亲和性设置、进程迁移策略及数据结构扩展。深入分析了load_balance和move_one_task函数在平衡负载中的作用，以及调度域、迁移线程和负载均衡算法在SMP系统中的应用。

SMP调度

多处理器系统上，内核必须考虑的几个额外的问题
- CPU负荷必须尽可能在所用的处理器上共享
- 进程与系统中某些处理器的亲和性必须是可设置的：设置task_struct的cpus_allowed成员
- 内核必须能够将进程从一个CPU迁移到另一个：会严重危害性能，要谨慎使用

数据结构的扩展

struct sched_class {
#ifdef CONFIG_SMP
	unsigned long (*load_balance) (struct rq *this_rq, int this_cpu,
			struct rq *busiest, unsigned long max_load_move,
			struct sched_domain *sd, enum cpu_idle_type idle,
			int *all_pinned, int *this_best_prio);

	int (*move_one_task) (struct rq *this_rq, int this_cpu,
			      struct rq *busiest, struct sched_domain *sd,
			      enum cpu_idle_type idle);
#endif
}
struct rq {
   #ifdef CONFIG_SMP
	struct sched_domain *sd;

	/* For active balancing */
	int active_balance;
	int push_cpu;
	/* cpu of this runqueue: */
	int cpu;

	struct task_struct *migration_thread;
	struct list_head migration_queue;
#endif
}

load_balance：建立迭代器，用来迭代队列中的可调度实体，并将迭代器传递给balance_tasks(sched.c)进行负载均衡
move_one_task：使用iter_move_one_task从最忙碌的就绪队列移除一个进程
调度域(sched_domain)：将物理上临近或者共享高速缓存的CPU群集起来，优先选择在这些CPU之间迁移进程
migration_thread：迁移线程，负载均衡失败时唤醒，要设置active_balance标志，并将发起请求的CPU记录到rq->cpu中，处理保存在migration_queue中的迁移请求，

负载均衡：在SMP系统上，周期性调度器scheduler_tick会调用trigger_load_balance函数，这会引起SCHED_SOFTIRQ软中断，该软｀中断会确保在适当的时机执行run_rebalance_domains，最终会调用rebalance_domains，实现负载均衡
- rebalance_domains(int cpu, enum cpu_idle_type dile)
  - struct rq *rq = cpu_rq(cpu);
  - unsigned long next_balance = jiffies + 60*HZ;
  - for_each_domain(cpu, sd)
    - if (time_after_eq(jiffies, sd->last_balance + interval))
      - load_balance(cpu, rq, sd, idle, &balance)
- load_balance(int this_cpu, struct rq *this_rq, struct sched_domain *sd, enum cpu_idle_type idle, int *balance)
  - group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
    &cpus, balance)
  - busiest = find_busiest_queue(group, idle, imbalance, &cpus);
  - ld_moved = move_tasks(this_rq, this_cpu, busiest,
    imbalance, sd, idle, &all_pinned)
- move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_load_move)
  - const struct sched_class *class = sched_class_highest;
  - do
    - total_load_moved += class->load_balance(this_rq, this_cpu, busiest,max_load_move - total_load_moved, sd, idle, all_pinned, &this_best_prio)
  - while (class && max_load_move > total_load_moved)
- load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, int *all_pinned, int *this_best_prio)
  - cfs_rq_iterator.start = load_balance_start_fair
  - cfs_rq_iterator.next = load_balance_next_fair;
  - for_each_leaf_cfs_rq(busiest, busy_cfs_rq)
    - cfs_rq_iterator.arg = busy_cfs_rq
    - rem_load_move -= balance_tasks(this_rq, this_cpu, busiest,
      maxload, sd, idle, all_pinned,
      this_best_prio,
      &cfs_rq_iterator) //sched.c　使用迭代器遍历队列中的调度实体
迁移线程
- migration_thread
  - rq = cpu_rq(cpu)
  - while (!kthread_should_stop())
    - if (rq->active_balance)
      - active_load_balance(rq, cpu)
        move_one_task(class->move_one_task)
    - head = &rq->migration_queue
    - req = list_entry(head->next, struct migration_req, list)
    - __migrate_task(req->task, cpu, req->dest_cpu)
核心调度器的改变
- 在用exec系统调用启动一个新进程时，是调度器跨越CPU移动该进程的一个良好时机
```
new_cpu = sched_balance_self(this_cpu, SD_BALANCE_EXEC);
if (new_cpu != this_cpu)
  sched_migrate_task(current, new_cpu);
```
- 完全公平调度器的调度粒度是和CPU的数目成比例的，系统中处理器越多，可以采用的粒度就越大，sysctl_sched_min_granularity和sysctl_sched_latency都乘以矫正因子(1 + log(nr_cpus)) ，但是不能超过200ms

调度域和控制组

在之前的讨论中，调度器并不直接与进程交互，而是处理可调度实体，这使得可以实现组调度：将进程置于不同的组中，调度器首先在这些组之间保证公平，然后在组中的所有进程之间保证公平
内核还提供了控制组，可以通过特殊的文件系统cgroups创建任意的进程集合

内核抢占和低延迟相关工作

内核抢占：高优先级进程可以抢占第优先级进程，当然也可以在必要时禁用抢占：
-　preempt_disable/preempt_enable

内核通过检查TIF_NEED_RESCHED来判断是否有进程在等待CPU

#define preempt_enable() \
  do { \
  	preempt_enable_no_resched(); \
  	barrier(); \
  	preempt_check_resched(); \
  } while (0)

#define preempt_check_resched() \
  do { \
  	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
  		preempt_schedule(); \
  } while (0)
  
  asmlinkage void __sched preempt_schedule(void)
  {
  	if (likely(ti->preempt_count || irqs_disabled()))
  	  return;
  	do {
  	  add_preempt_count(PREEMPT_ACTIVE);
  	  schedule();
  	  sub_preempt_count(PREEMPT_ACTIVE);
  	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
  }