linux中进程切换的中心是上下文切换函数 content_switch,它位于Linux 内核源码目录的 kernel/sched/core.c 中,源码如下:
static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
{
prepare_task_switch(rq, prev, next);
/*
* For paravirt, this is coupled with an exit in switch_to to
* combine the page table reload and the switch backend into
* one hypercall.
*/
arch_start_context_switch(prev);
/*
* kernel -> kernel lazy + transfer active
* user -> kernel lazy + mmgrab() active
*
* kernel -> user switch + mmdrop() active
* user -> user switch
*/
if (!next->mm) { // to kernel
enter_lazy_tlb(prev->active_mm, next);
next->active_mm = prev->active_mm;
if (prev->mm) // from user
mmgrab(prev->active_mm);
else
prev->active_mm = NULL;
} else { // to user
membarrier_switch_mm(rq, prev->active_mm, next->mm);
/*
* sys_membarrier() requires an smp_mb() between setting
* rq->curr / membarrier_switch_mm() and returning to userspace.
*
* The below provides this either through switch_mm(), or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().
*/
switch_mm_irqs_off(prev->active_mm, next->mm, next);
if (!prev->mm) { // from kernel
/* will mmdrop() in finish_task_switch(). */
rq->prev_mm = prev->active_mm;
prev->active_mm = NULL;
}
}
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
prepare_lock_switch(rq, next, rf);
/* Here we just switch the register state and the stack. */
switch_to(prev, next, prev);
barrier();
return finish_task_switch(prev);
}
content_switch函数有三个参数,分别是rq、prev、next。
rq指本次进程切换时的运行队列
prev指向切换前的进程
next指向切换后进程
具体切换过程如下:
1.在进程切换前调用prepare_task_switch函数来使得内核执行一些相关的指令。
2.调用 arch_start_context_switch 函数开始进行进程切换。
3.下面的代码进行进程地址切换
if (!next->mm) { // to kernel
enter_lazy_tlb(prev->active_mm, next);
next->active_mm = prev->active_mm;
if (prev->mm) // from user
mmgrab(prev->active_mm);
else
prev->active_mm = NULL;
} else { // to user
membarrier_switch_mm(rq, prev->active_mm, next->mm);
/*
* sys_membarrier() requires an smp_mb() between setting
* rq->curr / membarrier_switch_mm() and returning to userspace.
*
* The below provides this either through switch_mm(), or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().
*/
switch_mm_irqs_off(prev->active_mm, next->mm, next);
if (!prev->mm) { // from kernel
/* will mmdrop() in finish_task_switch(). */
rq->prev_mm = prev->active_mm;
prev->active_mm = NULL;
}
}
决定切换到内核模式还是用户模式。
如果 next->mm 为 NULL,即为切换到内核模式,此时调用 enter_lazy_tlb 函数,进入懒惰的 TLB模式,将 next->active_mm 设置为 prev->active_mm,如果 prev->mm 不为 NULL,表示当前进程是用户态进程,调用 mmgrab函数,否则将 prev->active_mm 设置为 NULL。
如果 next->mm 不为 NULL,即切换到用户模式,调用 membarrier_switch_mm 函数,使用内存屏障,保证上一个进程访问其内存空间与下一个进程访问其内存空间之间的先后顺序,调用 switch_mm_irqs_of函数,来真正切换内存管理结构,如果prev->mm 为 NULL,则表示是从内核模式切换过来的,则需要设置 rq->prev_mm 用于后续清除引用计数,并执行 prev->active_mm = NULL 解除对 active_mm 的借用。
4.执行swtich_to 函数调用 __switch_to_asm,在两个进程之间切换 CPU 寄存器状态和栈,它会保存当前进程(prev)的寄存器状态,加载新进程(next)的寄存器状态,下面是x86_64体系的代码:
ENTRY(__switch_to_asm)
UNWIND_HINT_FUNC
/*
* Save callee-saved registers
* This must match the order in inactive_task_frame
*/
pushq %rbp
pushq %rbx
pushq %r12
pushq %r13
pushq %r14
pushq %r15
/* switch stack */
movq %rsp, TASK_threadsp(%rdi) // 保存旧进程的栈顶
movq TASK_threadsp(%rsi), %rsp // 恢复新进程的栈顶
/* restore callee-saved registers */
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
popq %rbp
jmp __switch_to
END(__switch_to_asm)
5.最后,调用 finish_task_switch 函数,完成进程切换,这个函数与第一步的prepare_task_switch函数是成对出现的,用来完成一些清理操作。