SMP调度
-
多处理器系统上,内核必须考虑的几个额外的问题
- CPU负荷必须尽可能在所用的处理器上共享
- 进程与系统中某些处理器的亲和性必须是可设置的:设置task_struct的cpus_allowed成员
- 内核必须能够将进程从一个CPU迁移到另一个:会严重危害性能,要谨慎使用
-
数据结构的扩展
struct sched_class { #ifdef CONFIG_SMP unsigned long (*load_balance) (struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, int *all_pinned, int *this_best_prio); int (*move_one_task) (struct rq *this_rq, int this_cpu, struct rq *busiest, struct sched_domain *sd, enum cpu_idle_type idle); #endif } struct rq { #ifdef CONFIG_SMP struct sched_domain *sd; /* For active balancing */ int active_balance; int push_cpu; /* cpu of this runqueue: */ int cpu; struct task_struct *migration_thread; struct list_head migration_queue; #endif }
- load_balance:建立迭代器,用来迭代队列中的可调度实体,并将迭代器传递给balance_tasks(sched.c)进行负载均衡
- move_one_task:使用iter_move_one_task从最忙碌的就绪队列移除一个进程
- 调度域(sched_domain):将物理上临近或者共享高速缓存的CPU群集起来,优先选择在这些CPU之间迁移进程
- migration_thread:迁移线程,负载均衡失败时唤醒,要设置active_balance标志,并将发起请求的CPU记录到rq->cpu中,处理保存在migration_queue中的迁移请求,
-
负载均衡:在SMP系统上,周期性调度器scheduler_tick会调用trigger_load_balance函数,这会引起SCHED_SOFTIRQ软中断,该软`中断会确保在适当的时机执行run_rebalance_domains,最终会调用rebalance_domains,实现负载均衡
- rebalance_domains(int cpu, enum cpu_idle_type dile)
- struct rq *rq = cpu_rq(cpu);
- unsigned long next_balance = jiffies + 60*HZ;
- for_each_domain(cpu, sd)
- if (time_after_eq(jiffies, sd->last_balance + interval))
- load_balance(cpu, rq, sd, idle, &balance)
- if (time_after_eq(jiffies, sd->last_balance + interval))
- load_balance(int this_cpu, struct rq *this_rq, struct sched_domain *sd, enum cpu_idle_type idle, int *balance)
- group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
&cpus, balance) - busiest = find_busiest_queue(group, idle, imbalance, &cpus);
- ld_moved = move_tasks(this_rq, this_cpu, busiest,
imbalance, sd, idle, &all_pinned)
- group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
- move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_load_move)
- const struct sched_class *class = sched_class_highest;
- do
- total_load_moved += class->load_balance(this_rq, this_cpu, busiest,max_load_move - total_load_moved, sd, idle, all_pinned, &this_best_prio)
- while (class && max_load_move > total_load_moved)
- load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, int *all_pinned, int *this_best_prio)
- cfs_rq_iterator.start = load_balance_start_fair
- cfs_rq_iterator.next = load_balance_next_fair;
- for_each_leaf_cfs_rq(busiest, busy_cfs_rq)
- cfs_rq_iterator.arg = busy_cfs_rq
- rem_load_move -= balance_tasks(this_rq, this_cpu, busiest,
maxload, sd, idle, all_pinned,
this_best_prio,
&cfs_rq_iterator) //sched.c 使用迭代器遍历队列中的调度实体
- rebalance_domains(int cpu, enum cpu_idle_type dile)
-
迁移线程
- migration_thread
- rq = cpu_rq(cpu)
- while (!kthread_should_stop())
- if (rq->active_balance)
- active_load_balance(rq, cpu)
- move_one_task(class->move_one_task)
- active_load_balance(rq, cpu)
- head = &rq->migration_queue
- req = list_entry(head->next, struct migration_req, list)
- __migrate_task(req->task, cpu, req->dest_cpu)
- if (rq->active_balance)
- migration_thread
-
核心调度器的改变
- 在用exec系统调用启动一个新进程时,是调度器跨越CPU移动该进程的一个良好时机
new_cpu = sched_balance_self(this_cpu, SD_BALANCE_EXEC); if (new_cpu != this_cpu) sched_migrate_task(current, new_cpu);
- 完全公平调度器的调度粒度是和CPU的数目成比例的,系统中处理器越多,可以采用的粒度就越大,sysctl_sched_min_granularity和sysctl_sched_latency都乘以矫正因子(1 + log(nr_cpus)) ,但是不能超过200ms
- 在用exec系统调用启动一个新进程时,是调度器跨越CPU移动该进程的一个良好时机
调度域和控制组
- 在之前的讨论中,调度器并不直接与进程交互,而是处理可调度实体,这使得可以实现组调度:将进程置于不同的组中,调度器首先在这些组之间保证公平,然后在组中的所有进程之间保证公平
- 内核还提供了控制组,可以通过特殊的文件系统cgroups创建任意的进程集合
内核抢占和低延迟相关工作
-
内核抢占:高优先级进程可以抢占第优先级进程,当然也可以在必要时禁用抢占:
- preempt_disable/preempt_enable -
内核通过检查TIF_NEED_RESCHED来判断是否有进程在等待CPU
#define preempt_enable() \ do { \ preempt_enable_no_resched(); \ barrier(); \ preempt_check_resched(); \ } while (0) #define preempt_check_resched() \ do { \ if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ preempt_schedule(); \ } while (0) asmlinkage void __sched preempt_schedule(void) { if (likely(ti->preempt_count || irqs_disabled())) return; do { add_preempt_count(PREEMPT_ACTIVE); schedule(); sub_preempt_count(PREEMPT_ACTIVE); } while (unlikely(test_thread_flag(TIF_NEED_RESCHED))); }
-
在处理硬件中断请求后,根据重新调度的标志和是否允许抢占来控制是否调用调度器。
-
低延迟:内核中耗时较长的操作应该时常检测是否有另一个进程变为可运行,并在必要的情况下发起重新调度(cond_resched)