Linux kernel Process Management 3.1(overview)——Schedule System

Problems with earlier Linux schedulers:

Author: Harold Wang 
http://blog.csdn.net/hero7935

Before the 2.6 kernel, the scheduler had a significant limitation with its Run Queue, because of the O(n) complexity algorithm when many tasks were active, forthermore, only a single Run Queue and a sigle runqueue lock for all processors in SMP system, the act of choosing a task to excute locked out any other processors from manipulating the runqueues, idle processors awaiting release of the runqueue lock and decreased efficiency.

Background of the Linux 2.6 scheduler

The 2.6 scheduler was designed and implemented by Ingo Molnar. Ingo has been involved in Linux kernel development since 1995. His motivation in working on the new scheduler was to create a completely O(1) scheduler for wakeup, context-switch, and timer interrupt overhead. One of the issues that triggered the need for a new scheduler was the use of Java™ virtual machines (JVMs). The Java programming model uses many threads of execution, which results in lots of overhead for scheduling in an O(n) scheduler. An O(1) scheduler doesn't suffer under high loads, so JVMs execute efficiently.

Major scheduling structures

Now, each CPU has its own runqueues, devided into two kinds: Expired and Active. Threr are 140 priority lists that are serviced in FIFO order and the highest priority task service first(to make this efficient, a bitmap and find-frist-bit-set is involved to define when tasks are on a given priority list, The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities. This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of the number of active tasks). Each task in a particular priority list will receive  a particular time slice, he will stick in an Active runqueue when his time slice has not been eaten up, as you have imaged, eating up means he will be moved into Expired runqueue with his time slice is recalculated. when no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus making the expired priority list the active one.

scheduler_tick() : work for recalculating time slice.

This involve many specific mechnisams:

Real-time tasks expired their time slice, just recalculate new time slice and insert into the tail of ACTIVE queue.

User tasks may be deleted from active queue and recalculate prio、timeslice and insert into expired queue or active queue, depending on whether it is expired.

Interactive tasks if not expired, just insert into the tail of the original queue.

Author: Harold Wang 
http://blog.csdn.net/hero7935

 

image

 

To better support for SMP systems, the 2.6 scheduler doesn't use a single lock for scheduling; instead, it has a lock on each runqueue. This allows all CPUs to schedule tasks without contention from other CPUs. Yes, also make the process  Synchronization and Mutual exclusion a little more complex, programming need more carefull. In addition, with a runqueue per processor, a task generally shares affinity with a CPU and can better utilize the CPU's hot cache. Last but not least, Task preemption support is what we want.

Inside the New structure: Run Queue

1) prio_array_t *active, *expired, arrays[2]

struct prio_array
 {
int nr_active;		/*number of active tasks */
struct list_head queue[MAX_PRIO];							
unsigned long bitmap[BITMAP_SIZE];				 
};
image 

2) spinlock_t lock: only have influence on local CPU.

3) task_t *curr

4) tast_t *idle

5) int best_expired_prio: task with hightest prio in expired queue, saved when enter into the queue.

6) unsigned long expired_timestamp

tell us the longest time the task in the expired queue has waited for CPU, typically used by Macro EXPIRED_STARVING(rq).

IF a task has waited for CPU for a very long time(we specify threshold)

IF the task running now has a relatively lower prio than the highest task in expired queue, we should empty the active queue ASAP.

7) struct mm_struct *prev_mm

8) unsigned long nr_running

9) unsigned long nr_switches

10) unsigned long nr_uninterruptible

11) atomic_t nr_iowait

12) unsigned long timestamp_last_tick

13) int prev_cpu_load[NR_CPUS]

14) atomic_t *node_nr_running; int prev_node_load[MAX_NUMNODES]: only NUMA available

15) task_t *migration_thread

16) struct list_head migration_queue

Inside the improved structure: task_struct

see here.

Author: Harold Wang 
http://blog.csdn.net/hero7935

Attachment: Wait Queue Structure

image

Dynamic task prioritization

Penalizing tasks that are bound to a CPU and rewarding tasks that are I/O bound is an intresting mechnism(and efficient) to prevent tasks grasp the CPU and starve other tasks, furthermore, I/O bound tasks commonly use the CPU to set up an I/O and then sleep awaiting the completion of the requesed I/O, this means that it is not very bad for the CPU bound tasks because I/O bound just steal(may be he think, he like Dominate the CPU) a very small time slice! The mechnisam above is only on user tasks, not on real-time tasks.

I will talk about CFS and BSF thoroughly later in my blog……http://blog.csdn.net/hero7935

SMP load balancing

Just as what your intuition feel, different CPU in SMP system may not work in fair workload, some one may too busy to sleep, some one may so idle that want to die(chinese word style).The Linux 2.6 scheduler provides this functionality by using load balancing. Every 200ms, a processor checks to see whether the CPU loads are unbalanced; if they are, the processor performs a cross-CPU balancing of tasks. Unavoidabley, A negative aspect of this process is that the new CPU's cache is cold for a migrated task (needing to pull its data into the cache).

load_balance(): This mechnism will be called every TICK!

pull: Scheduler uses load_balance function to PULL some tasks from a overload CPU to A relatively light workload CPU.

image

push:using migration_thread() core process to handle this, rq->migration_queue PUSH To other CPU.

Conclusion:

Linux kernel schedule system has improved a lot:

  • improre the run_queue—>active/expired array list/lock…
  • fast finding task to run--->bitmap
  • invlove boost/punish mechnism—>I/O bound and CPU bound
  • kernel preemption
  • load balance—>PULL and PUSH
  • CFS schedule algorithm-->O(1)

 

Author: Harold Wang 
http://blog.csdn.net/hero7935

 

 

Reference:

1.M.Tim Jones.Inside the Linux scheduler. http://www.ibm.com/developerworks/linux/library/l-scheduler/ accessed on 2011

2.Robert Love. Linux Kernel Development 3rd Edition [M]. US: Addison-Wesley

3.杨沙洲. Linux 2.6 调度系统分析.http://www.ibm.com/developerworks/cn/linux/kernel/l-kn26sch/index.html accessed on 2011

4.lecture_02.1 ppt (used in class, not convenient to upload ,email me if you need it)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值