lab5 深入理解进程切换

故事常见的开端

于 2023-04-25 22:55:34 发布

阅读量147

点赞数

文章标签： linux 运维服务器

本文链接：https://blog.csdn.net/Dnisdream/article/details/130375328

版权

操作系统为了控制进程的执行，必须有能力挂起正在CPU上运行的进程，并恢复以前挂起的某个进程的执行，这种行为被称为进程切换，任务切换或上下文切换。

Linux切换并没有使用X86CPU的切换方法，Linux切换的实质就是cr3切换（内存空间切换，在switch_mm函数中）+ 寄存器切换（包括EIP，ESP等，均在switch_to函数中）。

我们可以在VsCode中找到context_switch 函数，

函数主体如下所示：

/*
 * context_switch - switch to the new MM and the new thread's register state.
 */
static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
	       struct task_struct *next, struct rq_flags *rf)
{
	prepare_task_switch(rq, prev, next);

	/*
	 * For paravirt, this is coupled with an exit in switch_to to
	 * combine the page table reload and the switch backend into
	 * one hypercall.
	 */
	arch_start_context_switch(prev);

	/*
	 * kernel -> kernel   lazy + transfer active
	 *   user -> kernel   lazy + mmgrab() active
	 *
	 * kernel ->   user   switch + mmdrop() active
	 *   user ->   user   switch
	 */
	if (!next->mm) {                                // to kernel
		enter_lazy_tlb(prev->active_mm, next);

		next->active_mm = prev->active_mm;
		if (prev->mm)                           // from user
			mmgrab(prev->active_mm);
		else
			prev->active_mm = NULL;
	} else {                                        // to user
		membarrier_switch_mm(rq, prev->active_mm, next->mm);
		/*
		 * sys_membarrier() requires an smp_mb() between setting
		 * rq->curr / membarrier_switch_mm() and returning to userspace.
		 *
		 * The below provides this either through switch_mm(), or in
		 * case 'prev->active_mm == next->mm' through
		 * finish_task_switch()'s mmdrop().
		 */
		switch_mm_irqs_off(prev->active_mm, next->mm, next);

		if (!prev->mm) {                        // from kernel
			/* will mmdrop() in finish_task_switch(). */
			rq->prev_mm = prev->active_mm;
			prev->active_mm = NULL;
		}
	}

	rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);

	prepare_lock_switch(rq, next, rf);

	/* Here we just switch the register state and the stack. */
	switch_to(prev, next, prev);
	barrier();

	return finish_task_switch(prev);
}

我们可以详细分析一下context_switch函数

context_switch函数首先执行了prepare_task_switch函数，该函数的代码如下，其中prepare_task_switch()函数是完成切换前的准备工作；

接着后面判断当前进程是不是内核线程，如果是内核线程，则不需要切换上下文，然后执行了arch_start_context_switch函数，函数arch_start_context_switch开始上下文切换，是每种处理器架构必须定义的函数。我们可以发现ARM64架构没有定义函数arch_start_context_switch，使用默认定义，它也是一个空的宏。

如果下一个进程是内核线程（成员mm是空指针），内核线程没有用户虚拟地址空间，那么需要借用上一个进程的用户虚拟地址空间，把借来的用户虚拟地址空间保存在成员active_mm中，内核线程在借用的用户虚拟地址空间的上面运行。

函数enter_lazy_tlb通知处理器架构不需要切换用户虚拟地址空间，这种加速进程切换的技术称为惰性TLB。ARM64架构定义的函数enter_lazy_tlb是一个空函数。

然后执行prepare_lock_switch函数，prepare_lock_switch将调度后进程next->on_cpu置位

如果下一个进程是用户进程，那么调用函数switch_mm_irqs_off切换进程的用户虚拟地址空间。

其中最关键的是switch_to函数，在switch_to中调用了三个函数，
prev：输入参数，变量值为旧进程描述符的地址。

next：输入参数，变量值为新进程描述符的地址。

last：输出参数，用来记录该进程是由哪个进程切换而来的，即保存在当前进程之前占用cpu的进程的进程描述符地址。

switch_to函数的执行步骤如下：

该宏的工作步骤大致如下：

        prev的值送入eax，next的值送入edx。
        保护prev进程的eflags和ebp寄存器内容，这些内容保存在prev进程的内核堆栈中。
        将prev的esp寄存器中的数据保存在prev->thread.esp中，即将prev进程的内核堆栈保存起来。
        将next->thread.esp中的数据存入esp寄存器中，这是加载next进程的内核堆栈。
        将数值1保存到prev->thread.eip中，该数值1其实就是代码中"1:\t"这行中的1。为了恢复prev进程执行时用。
        将next->thread.eip压入next进程的内核堆栈中。这个值往往是数值1。
        跳转到__switch_to函数处执行。执行到这里，prev进程重新获得CPU，恢复prev进程的ebp和eflags内容。
        将eax的内容存入last参数（这里我也没看出来，原著上如是写，只是在__switch_to函数中返回prev，该值是放在eax中的）。

ENTRY(__switch_to_asm)
  UNWIND_HINT_FUNC
  /*
   * Save callee-saved registers
   * This must match the order in inactive_task_frame
   */
  pushq  %rbp
  pushq  %rbx
  pushq  %r12
  pushq  %r13
  pushq  %r14
  pushq  %r15

  /* switch stack */
  movq  %rsp, TASK_threadsp(%rdi) // 保存旧进程的栈顶
  movq  TASK_threadsp(%rsi), %rsp // 恢复新进程的栈顶

  /* restore callee-saved registers */
  popq  %r15
  popq  %r14
  popq  %r13
  popq  %r12
  popq  %rbx
  popq  %rbp

  jmp  __switch_to
END(__switch_to_asm)

最后执行finsh_task_switch()函数，我们可以通过阅读代码发现，函数finish_task_switch负责在进程切换后执行清理工作。

函数finish_task_switch在从进程prev切换到进程next后为进程prev执行清理工作。比如检测上一个进程的状态是TASK_DEAD，即进程主动退出或者被动退出，需要执行释放资源等操作。

static struct rq *finish_task_switch(struct task_struct *prev)
	__releases(rq->lock)
{
	struct rq *rq = this_rq();
	struct mm_struct *mm = rq->prev_mm;
	long prev_state;

	/*
	 * The previous task will have left us with a preempt_count of 2
	 * because it left us after:
	 *
	 *	schedule()
	 *	  preempt_disable();			// 1
	 *	  __schedule()
	 *	    raw_spin_lock_irq(&rq->lock)	// 2
	 *
	 * Also, see FORK_PREEMPT_COUNT.
	 */
	if (WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
		      "corrupted preempt_count: %s/%d/0x%x\n",
		      current->comm, current->pid, preempt_count()))
		preempt_count_set(FORK_PREEMPT_COUNT);

	rq->prev_mm = NULL;

	/*
	 * A task struct has one reference for the use as "current".
	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
	 * schedule one last time. The schedule call will never return, and
	 * the scheduled task must drop that reference.
	 *
	 * We must observe prev->state before clearing prev->on_cpu (in
	 * finish_task), otherwise a concurrent wakeup can get prev
	 * running on another CPU and we could rave with its RUNNING -> DEAD
	 * transition, resulting in a double drop.
	 */
	prev_state = prev->state;
	vtime_task_switch(prev);
	perf_event_task_sched_in(prev, current);
	finish_task(prev);
	finish_lock_switch(rq);
	finish_arch_post_lock_switch();
	kcov_finish_switch(current);

	fire_sched_in_preempt_notifiers(current);
	/*
	 * When switching through a kernel thread, the loop in
	 * membarrier_{private,global}_expedited() may have observed that
	 * kernel thread and not issued an IPI. It is therefore possible to
	 * schedule between user->kernel->user threads without passing though
	 * switch_mm(). Membarrier requires a barrier after storing to
	 * rq->curr, before returning to userspace, so provide them here:
	 *
	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
	 *   provided by mmdrop(),
	 * - a sync_core for SYNC_CORE.
	 */
	if (mm) {
		membarrier_mm_sync_core_before_usermode(mm);
		mmdrop(mm);
	}
	if (unlikely(prev_state == TASK_DEAD)) {
		if (prev->sched_class->task_dead)
			prev->sched_class->task_dead(prev);

		/*
		 * Remove function-return probe instances associated with this
		 * task and put them back on the free list.
		 */
		kprobe_flush_task(prev);

		/* Task is done with its stack. */
		put_task_stack(prev);

		put_task_struct_rcu_user(prev);
	}

	tick_nohz_task_switch();
	return rq;
}

最后完成了进程的切换。