用户态和内核态

最新推荐文章于 2023-03-04 13:21:46 发布

趁风人

最新推荐文章于 2023-03-04 13:21:46 发布

阅读量2.1k

点赞数

分类专栏： OS 文章标签： linux 操作系统内核

本文链接：https://blog.csdn.net/qq_51604252/article/details/110458156

版权

OS 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

title: 用户态和内核态
date: 2020-12-01 16:49:21
tags: Linux
categories: OS

文章目录

前言

在本文中，会涉及到中断，异常，系统调用等概念。在有些参考资料上，会将中断和系统调用归为异常的一类；也有的资料将异常称为软中断，本文采用后者的描述方式。事实上，三者的具体实现机制和处理方式本质上是一致的，因此会出现不同的定义描述方式也无可厚非。

什么是用户态？什么是内核态？

问题引入：OS为什么要区分用户态和内核态？

出于安全的考虑：在操作系统中有一些较危险指令，应交由受信任的内核来完成。（比如涉及到对底层硬件的访问修改操作时）
出于并发的考虑：如果没有内核态，当一个程序A执行时，该怎么来打断A去执行另一个程序B？答案是无法做到，如果A不主动释放CPU控制权是没有办法去切换进程的，也就无法实现并发。通过中断进入内核态，由内核来实现进程调度。

用户态和内核态是一个抽象出来的概念，在此具体介绍它们之前先了解下CPU特权级的概念。

CPU特权级：

在操作系统中有一些较危险指令，普通用户程序不能随意执行，只能由受信任的内核程序来执行。也就是说内核程序拥有更大的权利即更高的特权级，而普通用户程序特权级较低。在intel x86 CPU架构中，提供了Ring0~Ring3四种特权级，其中Ring0特权级最高。在采用x86架构的Linux系统中，只使用了Ring0和Ring3两个特权级，Ring0特权级对应内核态，Ring3对应用户态。

用户态和内核态

当程序执行时不涉及访问硬件资源，便处于用户态，当程序主动发起系统调用想要访问硬件资源时，产生异常时或者外部硬件产生中断时，便会进入内核态。结合CPU特权级理解，用户态进行的操作是受限的，而内核态的操作是不受限的。

从用户态进入内核态的方式

这里只是从用户态进入内核态的角度简单的介绍了下系统调用、异常、中断，欲了解更多相关知识，请自行查阅资料。

系统调用

如果用户程序想要进行一些需要高权限才能实现的操作，比如创建进程、读写文件等操作（底层需要进行硬件资源），该如何实现？操作系统提供了系统调用这一接口，用户程序主动执行“syscall n”指令，n代表n号服务，“syscall n”指令便调用了n号系统调用，这一指令会使得程序trap内陷到内核态，然后将控制转移到对应的n号陷阱处理程序，处理结束后返回到syscall指令的下一条指令。

异常

当程序进行一些本不该进行的操作例如除0操作、访问一块不该访问的内存，控制权会转入对应的异常处理程序，即切换到内核态进行处理，处理结束后会重新执行产生异常的指令或者直接终止该程序。

中断

当外设硬件设备产生一个中断信号时，CPU发现中断产生，立即转到中断处理程序（中断服务例程），即进入内核态进行处理，处理结束后继续上次被打断的任务。

用户态、内核态的具体切换过程

用户态到内核态的实现主要是特权级的转换和用户栈到内核栈的转换。

用户栈到内核栈的转换大体步骤：

首先找到内核栈的栈基址和栈顶指针。
将当前环境的各种状态值压入内核栈。
将先前由中断向量检索得到的中断处理程序的 cs,ip 信息装入相应的寄存器，开始执行中断处理程序，这时就转到了内核态的程序执行了。（这里的“中断”指的是广义的中断，包括异常和系统调用）

那么如何具体实现的呢？（下面的内容要求一点内核基础）

如何找到内核栈的栈基址：linux中task_struct结构中有一thread_info结构，thread_info和内核栈被thread_union紧密包裹在一块，因此可以通过当前进程的task_struct结构体找到thread_info，thread_info 的地址加上thread_union的大小（一般为8k，两个页框）就是内核栈基址。
在这里插入图片描述

// 摘自/linux-4.9.229/include/linux/sched.h 第1487行
struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
	/*
	 * For reasons of header soup (see current_thread_info()), this
	 * must be the first element of task_struct.
	 */
	struct thread_info thread_info;
#endif
...
}

// 摘自/linux-4.9.229/include/linux/sched.h
union thread_union {
#ifndef CONFIG_THREAD_INFO_IN_TASK
	struct thread_info thread_info;
#endif
	unsigned long stack[THREAD_SIZE/sizeof(long)];
};

如何找到内核栈的栈顶指针esp：linux为每一个cpu提供一个tss段，并且在tr寄存器中保存该段。为了便于理解先看下tss_struct源码，配了相应注释。

// 摘自linux-4.9.229/arch/x86/include/asm/processor.h
struct tss_struct {
	/*
	 * The hardware state:
	 */
	struct x86_hw_tss	x86_tss;

	/*
	 * The extra 1 is there because the CPU will access an
	 * additional byte beyond the end of the IO permission
	 * bitmap. The extra byte must be all 1 bits, and must
	 * be within the limit.
	 */
	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];

#ifdef CONFIG_X86_32
	/*
	 * Space for the temporary SYSENTER stack.
	 */
	unsigned long		SYSENTER_stack_canary;
	unsigned long		SYSENTER_stack[64];
#endif

} ____cacheline_aligned;

再查看其中的x86_hw_tss - 硬件状态结构，sp0表示Ring0状态下的栈顶指针，即内核栈栈顶指针。因此可以通过tr寄存器找到对应的tss结构体x86_tss，x86_tss中的sp0的值即为内核栈栈顶指针。

// 摘自linux-4.9.229/arch/x86/include/asm/processor.h
/* This is the TSS defined by the hardware. */
struct x86_hw_tss {
	unsigned short		back_link, __blh;
	unsigned long		sp0;
	unsigned short		ss0, __ss0h;
	unsigned long		sp1;

	/*
	 * We don't use ring 1, so ss1 is a convenient scratch space in
	 * the same cacheline as sp0.  We use ss1 to cache the value in
	 * MSR_IA32_SYSENTER_CS.  When we context switch
	 * MSR_IA32_SYSENTER_CS, we first check if the new value being
	 * written matches ss1, and, if it's not, then we wrmsr the new
	 * value and update ss1.
	 *
	 * The only reason we context switch MSR_IA32_SYSENTER_CS is
	 * that we set it to zero in vm86 tasks to avoid corrupting the
	 * stack if we were to go through the sysenter path from vm86
	 * mode.
	 */
	unsigned short		ss1;	/* MSR_IA32_SYSENTER_CS */

	unsigned short		__ss1h;
	unsigned long		sp2;
	unsigned short		ss2, __ss2h;
	unsigned long		__cr3;
	unsigned long		ip;
	unsigned long		flags;
	unsigned long		ax;
	unsigned long		cx;
	unsigned long		dx;
	unsigned long		bx;
	unsigned long		sp;
	unsigned long		bp;
	unsigned long		si;
	unsigned long		di;
	unsigned short		es, __esh;
	unsigned short		cs, __csh;
	unsigned short		ss, __ssh;
	unsigned short		ds, __dsh;
	unsigned short		fs, __fsh;
	unsigned short		gs, __gsh;
	unsigned short		ldt, __ldth;
	unsigned short		trace;
	unsigned short		io_bitmap_base;

} __attribute__((packed));