linux kernel 进程管理,Linux内核进程管理：进程的“内核栈”、current宏、进程描述符...

最新推荐文章于 2022-04-15 20:36:33 发布

weixin_39618275

最新推荐文章于 2022-04-15 20:36:33 发布

阅读量413

点赞数

文章标签： linux kernel 进程管理

linux 进程内核栈

概念

在每一个进程的生命周期中，经常会通过系统调用(SYSCALL)陷入内核。在执行系统调用陷入内核之后，这些内核代码所使用的栈并不是原先用户空间中的栈，而是一个内核空间的栈，这个称作进程的“内核栈”。

每个task的栈分成用户栈和内核栈两部分，进程内核栈在kernel中的定义是：

union thread_union {

struct thread_info thread_info;

unsigned long stack[THREAD_SIZE/sizeof(long)];

};

每个task的内核栈大小THREAD_SIZE：

x86：

#define THREAD_SIZE_ORDER1

#define THREAD_SIZE(PAGE_SIZE << THREAD_SIZE_ORDER)

因此是8K

x86_64：

#define THREAD_SIZE_ORDER(2 + KASAN_STACK_ORDER)

#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)

PAGE_SIZE默认4K，KASAN_STACK_ORDER没有定义时为0，因此是16K

ARM：

ARM64：

16K

在32位系统是8KB，64位系统里是16KB。

thread_info 有什么用？

进程在内核中相关的主要数据结构有进程描述符task_struct、threadinfo和mm_struct。上面的共同体thread_union 里，就有thread_info。我们都熟悉进程描述符task_struct，那么thread_info有什么用？

实际上在linux kernel中，task_struct、thread_info都用来保存进程相关信息，即进程PCB信息。然而不同的体系结构里，进程需要存储的信息不尽相同，linux使用task_struct存储通用的信息，将体系结构相关的部分存储在thread_info中。这也是为什么struct task_struct在include/linux/sched.h中定义，而thread_info 在arch/ 下体系结构相关头文件里。

thread_info 、内核栈、task_struct 关联

三者都是密切相关的，服务于进程的关键数据结构，在内核中定义截取如下：

struct task_struct {

#ifdef CONFIG_THREAD_INFO_IN_TASK

struct thread_infothread_info;

#endif

… …

void*stack;

… …

}

/* * */

union thread_union {

#ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK

struct task_struct task;

#endif

#ifndef CONFIG_THREAD_INFO_IN_TASK

struct thread_info thread_info;

#endif

unsigned long stack[THREAD_SIZE/sizeof(long)];

};

/* x86 */

struct thread_info {

unsigned longflags;/* low level flags */

u32status;/* thread synchronous flags */

};

/* ARM */

struct thread_info {

unsigned longflags;/* low level flags */

intpreempt_count;/* 0 => preemptable, <0 => bug */

mm_segment_taddr_limit;/* address limit */

struct task_struct*task;/* main task structure */

… …

};

根据宏“CONFIG_THREAD_INFO_IN_TASK”的存在与否，三者在内核中存在两种不同关联：

(1)thread_info 结构在进程内核栈中

即当“CONFIG_THREAD_INFO_IN_TASK = N”时，thread_info和栈stack 在一个联合体thread_union内，共享一块内存，即thread_info在栈所在物理页框上。

进程描述符task_struct 中的成员“void *stack”指向内核栈。不同的是，在ARM中，struct thread_info 结构体有成员“struct task_struct *task”指向进程描述符task_struct，而x86文件中没有。实际上早期内核3.X版本中，x86下的 thread_info 里也有task_struct的指针，后续版本被删除，具体原因到后面介绍“current”宏时再详细介绍。

至此三者关系可以描述如下(x86中没有info.task指针这条线)：

因为thread_info 结构和stack是联合体，thread_info的地址就是栈所在页框的基地址。因此当我们获得当前进程内核栈的sp寄存器存储的地址时，根据THREAD_SIZE对齐就可以获取thread_info结构的基地址(后面介绍current宏会详细分析)。

(2)thread_info 结构在进程描述符中(task_struct)

即当“CONFIG_THREAD_INFO_IN_TASK = Y”时，thread_info就是struct task_struct的第一个成员。union thread_union 中只有栈，即栈和thread_info 结构不再共享一块内存。task.stack依旧存在。三者关系可描述为：

图二

(3)有一点需要注意，进程描述符中的 task_struct.stack指针，是指向栈区域内存基地址，即thread_union.stack 数组基地址，既不是栈顶也不是栈底，栈顶存在寄存器rsp中，栈底是task_struct.stack+THREAD_SIZE，代码中引用时需要注意。

current 宏

内核中经常通过current宏来获得当前进程对应的struct task_sturct结构，其原理离不开进程内核栈，在介绍完了thread_info、task_sturct和内核栈关系后，我们来看下current宏的具体实现。由于内核栈和体系结构相关，本文分别摘选x86和ARM的源码进行分析：

1、arm

查看arm架构的源码发现，前面提到的CONFIG_THREAD_INFO_IN_TASK宏是关闭的，且没有提供对外kconfig接口。也就是说在32位 arm架构中，thread_info 结构肯定在进程内核栈中。下面这种current宏适用于所有符合“thread_info 结构在内核栈中”的架构：

//arch/arm/include/asm/thread_info.h

static inline struct thread_info *current_thread_info(void)

{

return (struct thread_info *)

(current_stack_pointer & ~(THREAD_SIZE - 1));

}

//include/asm-generic/current.h

#define get_current() (current_thread_info()->task)

#define current get_current()

先通过“sp”栈顶寄存器获取到当前进程的栈地址，通过mask计算，根据page对齐原理就可以拿到位于栈内存区域底部的struct thread_info地址。info->task就是当前进程的进程描述符。

2、ARM64

ARM64增加了很多通用寄存器，使用寄存器传递进程描述符显然效率更高。因此在ARM64架构里，current宏不再通过栈偏移量得到进程描述符地址，而是借用专门的寄存器：

//arch/arm64/include/asm/current.h

static __always_inline struct task_struct *get_current(void)

{

unsigned long sp_el0;

asm ("mrs %0, sp_el0" : "=r" (sp_el0));

return (struct task_struct *)sp_el0;

}

#define current get_current()

ARM64使用sp_el0，在进程切换时暂存进程描述符地址。

sp就是堆栈寄存器。在ARM64里，CPU运行在四个级别(或者叫运行空间)，分别是el0、el1、el2、el3，el0则就是用户空间，el1则是内核空间。sp_el0就是用户栈，本文不再详细扩展，感兴趣的可以阅读网络博客《ARMv8学习》一文。

3、x86

在早期内核代码中(2.x 3.x)，thread_info结构中还有指向struct task_sturct结构的指针成员，在x86上也可以采用和32位ARM类似的获取方式(CONFIG_THREAD_INFO_IN_TASK = N时)。然而在x86体系结构中，linux kernel一直采用的是另一种方式：使用了current_task这个每CPU变量，来存储当前正在使用的cpu的进程描述符struct task_struct。源码如下：

//arch/x86/include/asm/current.h

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)

{

return this_cpu_read_stable(current_task);

}

#define current get_current()

x86上通用寄存器有限，无法像ARM中那样单独拿出寄存器来存储进程描述符task_sturct结构的地址。由于采用了每cpu变量current_task来保存当前运行进程的task_struct，所以在进程切换时，就需要更新该变量。在arch/x86/kernel/process_64.c文件中的__switch_to函数中有如下代码来更新此全局变量：

this_cpu_write(current_task, next_p);

SYSCALL过程调用规范

篇幅有限，本文只选取x86_64架构来分析SYSCALL过程调用和内核栈的结构。内核栈和用户空间的栈帧结构是一样的，可参考之前写的一篇《x86栈帧原理》。

不过由于syscall属于特殊的过程调用，涉及到栈切换，和用户空间过程调用不同之处有：

1)进程内核栈除了需要保存内核空间过程调用外，还需要保存用户空间栈的数据和返回地址，以便在返回用户空间继续执行。

(2)过程调用中寄存器调用约定不同。用户空间进程过程调用约定在上一篇《x86通用寄存器》。内核SYSCALL 过程调用约定遵循C ABI ，规定如下：

Registers on entry:

* rax system call number

* rcx return address

* r11 saved rflags (note: r11 is callee-clobbered register in C ABI)

* rdi arg0

* rsi arg1

* rdx arg2

* r10 arg3 (needs to be moved to rcx to conform to C ABI)

* r8 arg4

* r9 arg5

* (note: r12-r15, rbp, rbx are callee-preserved in C ABI)

主要区别在SYSCALL时，使用rcx寄存器保存 rip的值(即返回地址)，第四个参数就用r10 来保存！内核中参数使用例子：

图三

x86_64进程栈切换

前面花了大篇幅介绍thread_info和stack关系、过程调用规范，是为了能更加清晰认识本文的主角：内核栈。进程通过syscall陷入内核时进行栈切换，我们通过分析整个栈切换流程来逐步描绘内核栈结构。

因为进程内核栈和体系结构密切相关，本文只选取x86_64架构来分析内核栈的结构。下面先来介绍一个重要的数据结构：struct pt_regs。linux kernel 使用它来格式化内核栈：

//arch/x86/include/asm/ptrace.h

struct pt_regs {

* C ABI says these regs are callee-preserved. They aren't saved on kernel entry

* unless syscall needs a complete, fully filled "struct pt_regs".

unsigned long r15;

unsigned long r14;

unsigned long r13;

unsigned long r12;

unsigned long rbp;

unsigned long rbx;

/* These regs are callee-clobbered. Always saved on kernel entry. */

unsigned long r11;

unsigned long r10;

unsigned long r9;

unsigned long r8;

unsigned long ax;

unsigned long cx;

unsigned long dx;

unsigned long si;

unsigned long di;

unsigned long orig_ax;

/* Return frame for iretq */

unsigned long ip;

unsigned long cs;

unsigned long flags;

unsigned long sp;

unsigned long ss;

/* top of stack page */

};

内核栈按照这个顺序缓存各个寄存器存储的用户空间数据/地址，下面会结合源码详细分析。

内核SYSCALL 入口代码在entry_64.S中，了解进程栈结构，需要看在陷入内核后，CPU都做了哪些堆栈操作。下面看下入口处部分汇编源码：

//arch/x86/entry/entry_64.S

ENTRY(entry_SYSCALL_64)

UNWIND_HINT_EMPTY

/* Interrupts are off on entry. */

swapgs

// 将用户栈偏移保存到 per-cpu 变量 rsp_scratch 中

movq%rsp, PER_CPU_VAR(rsp_scratch)

// 切换到进程内核栈

movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp

/* 在栈中倒序构建 struct pt_regs */

pushq$__USER_DS/* pt_regs->ss */

pushqPER_CPU_VAR(rsp_scratch)/* pt_regs->sp */

pushq%r11/* pt_regs->flags */

pushq$__USER_CS/* pt_regs->cs */

pushq%rcx/* pt_regs->ip */

GLOBAL(entry_SYSCALL_64_after_hwframe)

//rax 保存着系统调用号

pushq%rax/* pt_regs->orig_ax */

PUSH_AND_CLEAR_REGS rax=$-ENOSYS

TRACE_IRQS_OFF

/* 保存参数到寄存器，调用do_syscall_64函数 */

movq%rax, %rdi

movq%rsp, %rsi

calldo_syscall_64/* returns with IRQs disabled */

(1)指令“movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp”使栈顶寄存器载入进程内核栈地址，实现了用户栈到进程内核栈的切换；

(2)后续依次将用户空间寄存器压栈，和上面的数据结构struct pt_regs 成员一一对应(顺序固定且是倒序)。有三点需要注意：

1)%rcx寄存器保存在了pt_regs->ip 位置，是因为根据 Intel SDM，syscall 会将当前 rip 存到 rcx ，然后将 IA32_LSTAR 加载到 rip 。因此用户空间下一条指令就是从%rcx寄存器中获取；

2)系统调用号(sys_call_table索引号)保存在%rax中；

3)PUSH_AND_CLEAR_REGS 宏包含剩余寄存器入栈指令，展开如下：

//arch/x86/entry/calling.h

.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0

.if \save_ret

pushq %rsi /* pt_regs->si */

movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */

movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */

.else

pushq %rdi /* pt_regs->di */

pushq %rsi /* pt_regs->si */

.endif

pushq \rdx /* pt_regs->dx */

xorl %edx, %edx /* nospec dx */

pushq %rcx /* pt_regs->cx */

xorl %ecx, %ecx /* nospec cx */

pushq \rax /* pt_regs->ax */

pushq %r8 /* pt_regs->r8 */

xorl %r8d, %r8d /* nospec r8 */

pushq %r9 /* pt_regs->r9 */

xorl %r9d, %r9d /* nospec r9 */

pushq %r10 /* pt_regs->r10 */

xorl %r10d, %r10d /* nospec r10 */

pushq %r11 /* pt_regs->r11 */

xorl %r11d, %r11d /* nospec r11*/

//后面的寄存器是caller-saved，这里可能是空的

pushq %rbx /* pt_regs->rbx */

xorl %ebx, %ebx /* nospec rbx*/

pushq %rbp /* pt_regs->rbp */

xorl %ebp, %ebp /* nospec rbp*/

pushq %r12 /* pt_regs->r12 */

xorl %r12d, %r12d /* nospec r12*/

pushq %r13 /* pt_regs->r13 */

xorl %r13d, %r13d /* nospec r13*/

pushq %r14 /* pt_regs->r14 */

xorl %r14d, %r14d /* nospec r14*/

pushq %r15 /* pt_regs->r15 */

xorl %r15d, %r15d /* nospec r15*/

在x86_64中，在内核栈中，rbx rbp r12 r13 r14 r15不是必须保存的项(为了访问不越界相应空间必须保留)，根据需要保存，linux后续版本采取都保存方式；

(3)和IA32相比，x86_64内核栈起始位置没有预留8KB空间(STACK_PADDIN)，是因为在x86_64中，SYCALL过程内核栈所有寄存器都由软件压栈保存，不存在硬件可能没有压栈，防止越界预留位置的情况。在这里贴上内核中关于STACK_PADDING定义：

/* x86_64 has a fixed-length stack frame */

#ifdef CONFIG_X86_32

# ifdef CONFIG_VM86

# define TOP_OF_KERNEL_STACK_PADDING 16

# else

# define TOP_OF_KERNEL_STACK_PADDING 8

# endif

#else

# define TOP_OF_KERNEL_STACK_PADDING 0

#endif

在x86_64中，linux内核栈、struct pt_regs、current宏、struct task_struct关系总结如下图：

图四

整个图四就是linux SYSCALL，x86_64栈切换的完整过程。图中表格第一列是数据结构struct pt_regs 逆序成员，第二列是栈切换后，依次压栈的寄存器，第三列是寄存器中存放的数据类型。

参考

本文涉及到的源码均来自linux kernel 4.18.0

Linux调度——神奇的current

本文介绍了linux内核中经常出现的current宏，并分析其通用的实现方法，以及其在x86-64下的实现方法。

current的作用

在内核中，访问任务通常需要获得指向其的struct task_struct指针。实际上，内核中大部分处理进程的代码都是通过struct task_struct进行的。因此，通过current宏查找当前正在运行进程的进程描述符就显得尤为重要。硬件体系不同，该宏的实现方式也就不同。有的硬件体系结构可以专门拿出一个寄存器存放指向当前进程的struct task_struct指针，用于加快访问速度。而有些像x86这样的体系结构(其寄存器并不富余)，就只能在内核栈的底端创建struct thread_info结构，通过计算偏移间接地查找struct task_struct结构。

current的通用实现方法

所以通过esp寄存器的值和内核栈大小，就可以方便的计算出内核栈的栈底地址，该地址其实就是进程对应的struct thread_info结构的地址。相关代码如下：

#ifndef __ASM_GENERIC_CURRENT_H

#define __ASM_GENERIC_CURRENT_H

#include

#define get_current() (current_thread_info()->task)

#define current get_current()

#endif /* __ASM_GENERIC_CURRENT_H */

/* how to get the current stack pointer from C */

/* how to get the thread information struct from C */

static inline struct thread_info *current_thread_info(void)

{

return (struct thread_info *)

(current_stack_pointer & ~(THREAD_SIZE - 1));

}

current在x86架构上的实现

理解了如上信息后，x86架构进一步对current宏进行了优化实现：

#ifndef _ASM_X86_CURRENT_H

#define _ASM_X86_CURRENT_H

#include

#ifndef __ASSEMBLY__

struct task_struct;

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)

{

return this_cpu_read_stable(current_task);

}

#define current get_current()

#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_CURRENT_H */

在x86体系结构中，使用了current_task这个每CPU变量，来存储当前正在使用cpu的进程的struct task_struct。由于采用了每cpu变量current_task来保存当前运行进程的task_struct，所以在进程切换时，就需要更新该变量。

在arch/x86/kernel/process_64.c文件中的__switch_to函数中有如下代码：

this_cpu_write(current_task, next_p);

注意：在早期的内核中，通过current_thread_info()->task得到struct task_struct在x86上也是支持的。不过在最新的内核中，该方法已经不支持了。因为新版本的内核中thread_info中已经不存在task这个成员了。

struct thread_info {

unsigned long flags;

u32 status;

}

SIZE: 16

实验示例

注意：本示例是在x86支持current_thread_info()->task的内核上进行的

x86支持current_thread_info()->task方式

#include

static int __init test_thread_info_init(void)

{

struct thread_info *ti = NULL;

struct task_struct *head = NULL;

printk(KERN_ALERT "[Hello] test_thread_info \n");

ti = (struct thread_info*)((unsigned long)&ti & ~(THREAD_SIZE - 1));

head = ti->task;

printk("kernel stack size = %lx\n", THREAD_SIZE);

printk("name is %s\n", head->comm);

return 0;

}

static void __exit test_thread_info_exit(void)

{

printk(KERN_ALERT "[Goodbye] test_thread_info\n");

}

module_init(test_thread_info_init);

module_exit(test_thread_info_exit);

MODULE_LICENSE("GPL");

上述模块初始化代码中，ti作为局部变量，存储在内核栈中，所以12行代码可以获取struct thread_info结构体的地址。

插入模块，打印出进程的名称insmod，说明结果符合预期。

验证一下task_current和thread_info的关系

实验方法：

(1)启动一个stress进程，持续占用CPU。

# stress -c 1

(2)获得stress进程的进程号，使用taskset将其绑定到cpu1上。

# ps aux | grep stress

root 3427 0.0 0.0 7308 424 pts/2 S+ 15:25 0:00 stress -c 1

root 3428 99.9 0.0 7308 100 pts/2 R+ 15:25 6:21 stress -c 1

root 3918 0.0 0.0 112708 968 pts/3 S+ 15:31 0:00 grep --color=auto stress

# taskset -p 02 3428

pid 3428's current affinity mask: f

pid 3428's new affinity mask: 2

此时，我们可以通过crash查看这些数据的关系：

crash> p current_task:1

per_cpu(current_task, 1) = $1 = (struct task_struct *) 0xffff95c498211fc0

crash> task_struct.comm 0xffff95c498211fc0

comm = "stress\000\000\060\000\000\000\000\000\000"

crash> task_struct.stack 0xffff95c498211fc0

stack = 0xffff95c407c28000

crash> thread_info.task 0xffff95c407c28000

task = 0xffff95c498211fc0

cpu1上正在执行的进程的描述符地址为：0xffff95c498211fc0。

其进程名称为我们期望的stress。

通过描述符的stack域，可以得到进程的栈底地址为：0xffff95c407c28000，其实也就是thread_info的地址。

通过thread_info的task域可以看出，其值和current_task:1的值一样。

Linux调度——进程描述符

在linux系统中，每个进程都会有自己的进程描述符，它用结构体struct task_struct来表示，其描述了一个具体进程的所有信息。本文对进程描述符进行了详细的介绍。

进程描述符:task_struct

struct task_struct相对较大，在64位系统上，它大约有4.1KB。但考虑到该结构体内包含了内核管理一个进程所需的所有信息，那么它的大小也算相当小了。

在内核中，我们需要非常高效的获取进程的struct task_struct结构体，在现在的内核版本中，会在内核栈底(对于向下增长的栈来说)创建一个新的结构struct thread_info:

struct thread_info {

struct task_struct *task; /* main task structure */

struct exec_domain *exec_domain; /* execution domain */

__u32 flags; /* low level flags */

__u32 status; /* thread synchronous flags */

__u32 cpu; /* current CPU */

int preempt_count; /* 0 => preemptable,

<0 => BUG */

mm_segment_t addr_limit;

struct restart_block restart_block;

void __user *sysenter_return;

#ifdef CONFIG_X86_32

unsigned long previous_esp; /* ESP of the previous stack in

case of nested (IRQ) stacks

__u8 supervisor_stack[0];

#endif

unsigned int sig_on_uaccess_error:1;

unsigned int uaccess_err:1; /* uaccess failed */

};

该结构体中的task域中存放着指向该任务实际task_struct的指针。而struct task_struct中的stack域指向了该进程的内核栈的栈底(对于向下增长的栈来说)。

task_struct和内核栈的关系如下图所示：

在x86-64位系统上，进程内核栈的大小为16KB，用如下数据结构表示：

union thread_union {

struct thread_info thread_info;

unsigned long stack[THREAD_SIZE/sizeof(long)];

};

我们可以使用crash工具查看struct task_struct的stack域和进程thread_info的关系，这里我查看了系统上init进程(进程号为1)的信息：

crash> union thread_union

union thread_union {

struct thread_info thread_info;

unsigned long stack[2048];

}

SIZE: 16384 //这里可以看出内核栈大小为16KB。

crash> task -R stack 1

PID: 1 TASK: ffff95c499450000 CPU: 1 COMMAND: "systemd"

stack = 0xffff95c49944c000,

crash> thread_info.task 0xffff95c49944c000

task = 0xffff95c499450000

Author laoqinren

LastMod 2019-11-16

linux kernel 进程管理,Linux内核进程管理：进程的“内核栈”、current宏、进程描述符...

“相关推荐”对你有帮助么？