Linux process task_struct

最新推荐文章于 2025-03-31 19:30:59 发布

Hacker_Albert

最新推荐文章于 2025-03-31 19:30:59 发布

阅读量442

点赞数

分类专栏： linux 文章标签： process

本文链接：https://blog.csdn.net/weixin_41028621/article/details/89516896

版权

linux 专栏收录该内容

83 篇文章

订阅专栏

学习linux task_struct

Linux内核通过task_struct结构体来管理process，该结构体包含一个进程所需的所有信息。定义在include/linux/sched.h文件中。

1.进程状态 state

    include/linux/sched.h:
    68 /* Used in tsk->state: */
    69 #define TASK_RUNNING            0x0000
    70 #define TASK_INTERRUPTIBLE      0x0001
    71 #define TASK_UNINTERRUPTIBLE        0x0002                                                                                  
    72 #define __TASK_STOPPED          0x0004
    73 #define __TASK_TRACED           0x0008
    74 /* Used in tsk->exit_state: */
    75 #define EXIT_DEAD           0x0010
    76 #define EXIT_ZOMBIE         0x0020
    77 #define EXIT_TRACE          (EXIT_ZOMBIE | EXIT_DEAD)
    78 /* Used in tsk->state again: */
    79 #define TASK_PARKED         0x0040
    80 #define TASK_DEAD           0x0080
    81 #define TASK_WAKEKILL           0x0100
    82 #define TASK_WAKING         0x0200
    83 #define TASK_NOLOAD         0x0400
    84 #define TASK_NEW            0x0800
    85 #define TASK_STATE_MAX          0x1000
    86 
    87 /* Convenience macros for the sake of set_current_state: */
    88 #define TASK_KILLABLE           (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
    89 #define TASK_STOPPED            (TASK_WAKEKILL | __TASK_STOPPED)
    90 #define TASK_TRACED         (TASK_WAKEKILL | __TASK_TRACED)
    91 
    92 #define TASK_IDLE           (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

Process state域取如下所示的值，系统中的每个进程都必然处于以上所列进程状态中的一种。
在这里插入图片描述
如上所示：睡眠状态有三种：

TASK_INTERRUPTIBLE：可中断模式的睡眠状态
TASK_UNINTERRUPTIBLE ：不可中断睡眠模式
TASK_KILLABLE：类似于 TASK_UNINTERRUPTIBLE，只不过可以响应致命信号。
- TASK_KILLABLE = TASK_UNINTERRUPTIBLE + TASK_WAKEKILL

2.进程标识符 PID

pid_t pid;  
pid_t tgid;

2.1.pid，tid，tgid 区别

pid: 进程ID
tid: 线程ID
tgid: 线程组ID，也就是线程组leader的进程ID，等于pid

In the kernel, each thread has it’s own ID, called a PID (although it would
possibly make more sense to call this a TID, or thread ID) and they also
have a TGID (thread group ID) which is the PID of the thread that started
the whole process.

Simplistically, when a new process is created, it appears as a thread
where both the PID and TGID are the same (new) number.When a thread starts another thread, that started thread gets its own PID (so the scheduler can schedule it independently) but it inherits the TGID from the original thread.
在这里插入图片描述

从用户视角出发，在pid 42中产生的tid 44线程，属于tgid(线程组leader的进程ID) 42。甚至用ps和top的默认参数，你都无法看到tid 44线程。
从内核视角出发，tid 42和tid 44是独立的调度单元，可以把他们视为"pid 42"和"pid 44"。

案例分析：pid,tid,tgid 代码案例

2.2.内核最大进程数：

PID_MAX_DEFAULT：CONFIG_BASE_SMALL=0(默认)，PID最大为32768
PID_MAX_LIMIT ：内核pid 最大上限，4 million PIDs
PIDS_PER_CPU_DEFAULT：每个cpu 默认支持1024个PIDs
PIDS_PER_CPU_MIN ：每个cpu 最小支持8个pids

 //include/linux/threads.h 
 28 #define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
 29 
 30 /*
 31  * A maximum of 4 million PIDs should be enough for a while.
 32  * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
 33  */
 34 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \                                              
 35     (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
 36 
 37 /*
 38  * Define a minimum number of pids per cpu.  Heuristically based
 39  * on original pid max of 32k for 32 cpus.  Also, increase the
 40  * minimum settable value for pid_max on the running system based
 41  * on similar defaults.  See kernel/pid.c:pidmap_init() for details.
 42  */
 43 #define PIDS_PER_CPU_DEFAULT    1024
 44 #define PIDS_PER_CPU_MIN    8

代码实现：

void __init pidmap_init(void)
{
        /* bump default and minimum pid_max based on number of cpus */
        pid_max = min(pid_max_max, max_t(int, pid_max,
                                PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
        pid_max_min = max_t(int, pid_max_min,
                                PIDS_PER_CPU_MIN * num_possible_cpus());
        pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);

        init_pid_ns.pidmap[0].page = kzalloc(PAGE_SIZE, GFP_KERNEL);
        /* Reserve PID 0. We never call free_pidmap(0) */
        set_bit(0, init_pid_ns.pidmap[0].page);
        atomic_dec(&init_pid_ns.pidmap[0].nr_free);

        init_pid_ns.pid_cachep = KMEM_CACHE(pid,
                        SLAB_HWCACHE_ALIGN | SLAB_PANIC);
}

1、选出最大的pid_max
max_t(int, pid_max, PIDS_PER_CPU_DEFAULT * num_possible_cpus())

1.1、默认的pid_max
int pid_max = PID_MAX_DEFAULT;
#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
Note:如果没有配置CONFIG_BASE_SMALL，则pid_max是0x8000，即32768。

1.2、计算cpu默认分配的pid总数
#define PIDS_PER_CPU_DEFAULT    1024 //每个cpu默认是1024个pid。
#define num_possible_cpus()     cpumask_weight(cpu_possible_mask) //cpu总数
总的pid nums = PIDS_PER_CPU_DEFAULT * num_possible_cpus()

Note：PIDs从这两个里面选出最大的，只有cpu超过（32768/1024）32个，才会用到cpu算出的pid总数，否则就是默认的。

2、系统总pid限制，即pid_max_max
int pid_max_max = PID_MAX_LIMIT;
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
        (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

Note:最终的pid_max是不能超过pid_max_max，即不能超过limit（如果配置了CONFIG_BASE_SMALL，则为page_size*8 个pids。如果没配置，且是64位，则为4*1024*1024=4194304个pid；否则就是默认的PID_MAX_DEFAULT

3.进程内核栈

每个普通线程一般都有两个栈，一个位于用户空间，供在用户空间执行时使用，另一个位于内核空间，供这个线程执行系统调用、掉入陷阱或者当CPU在执行这个线程时遇到中断时用。

因为系统中每个进程都有一个用户空间，但是内核空间只有一个，所以内核空间的栈一般都是比较小的。对LINUX内核来说，32位是8KB，64位是16KB。

为什么要引入thread_info结构体？

内核引入thread_info的一大原因是方便通过它直接找到进(线)程的task_struct指针。由于thread_info结构体恰好位于内核栈的低地址开始处，所以只要知道内核栈的起始地址，就可以通过其得到thread_info，进而得到task_struct。

Linux的内核态栈使用一种特殊的约定，分配栈时绝对是按栈大小对齐，然后在栈的最低地址处存放一个thread_info结构体，它和进程的内核态栈存放在一个单独为进程分配的内存区域。

由于这个内存区域同时保存了thread_info和stack，所以使用了联合体来定义，相关数据结构如下：

union thread_union {
    struct thread_info thread_info;
    unsigned long stack[THREAD_SIZE/sizeof(long)];
};

struct thread_info {
    unsigned long        flags;        /* low level flags */
    mm_segment_t        addr_limit;    /* address limit */
    struct task_struct    *task;        /* main task structure */
    int            preempt_count;    /* 0 => preemptable, <0 => bug */
    int            cpu;        /* cpu */
};

在这里插入图片描述
THREAD_SIZE表示了整个内核栈的大小，栈可以向下增长(栈低在高地址)或者向上增长(栈低在低地址)，下面的分析都是基于向下增长的方式。如图中所示，整个内核栈可分为四个部分，从低地址开始依次为:

thread_info结构体
溢出标志
从溢出标志开始到kernel_stack之间的实际可用栈内存空间，kernel_stack为percpu变量，通过它可间接找到内核栈的起始地址
从kernel_stack到栈底的长度为KERNEL_STACK_OFFSET的保留空间

从Linux 4.1开始，x86移除了kernel_stack，并逐渐开始简化thread_info结构体，直到Linux 4.9彻底不再通过thread_info获取task_struct指针，而是直接通过current_task 变量存放task_struct的指针。

   //arch/x86/include/asm/current.h
   11 DECLARE_PER_CPU(struct task_struct *, current_task);
   12 
   13 static __always_inline struct task_struct *get_current(void)
   14 {
   15     return this_cpu_read_stable(current_task);                                                                             
   16 }
   17 #define current get_current()

The kernel stack is laid out with the stack pointer at the top of each stack (at the highest stack address), growing downward for each function call and stack allocation. The thread_info structure for a process is at the bottom of the stack. There is no physical mechanism to detect, at allocation time, if the stack pointer wanders into the thread_info area of the stack. Hence, if the stack overflows (the stack pointer goes into the thread_info area), the behavior of the system is undefined.“arm64: split thread_info from task stack”. Refer to commit

4.中断栈

The setup of an interrupt handler’s stacks is a configuration option.Historically, interrupt handlers did not receive their own stacks. Instead, they would share the stack of the process that they interrupted.Note that a process is always running. When nothing else is schedulable, the idle task runs.Because of sharing the stack, interrupt handlers must be exceptionally frugal with what data they allocate there.

All interrupts are handled by kernel. That is done by interrupt handler written for that particular interrupt. For Interrupt handler there is IRQ stack. The setup of an interrupt handler’s stacks is configuration option. The size of the kernel stack might not always be enough for the kernel work and the space required by IRQ processing routines. Hence 2 stack comes into picture.

Hardware IRQ Stack.
Software IRQ Stack.

In contrast to the regular kernel stack that is allocated per process, the two additional stacks are allocated per CPU. Whenever a hardware interrupt occurs (or a softIRQ is processed), the kernel needs to switch to the appropriate stack.

Historically, interrupt handlers did not receive their own stacks. Instead, interrupt handlers would share the stack of the running process, they interrupted. The kernel stack is two pages in size; typically, that is 8KB on 32-bit architectures and 16KB on 64-bit architectures. Because in this setup interrupt handlers share the stack, they must be exceptionally frugal with what data they allocate there. Of course, the kernel stack is limited to begin with, so all kernel code should be cautious.

3.1.x86_64 interrupt stack

Like all other architectures, x86_64 has a kernel stack for every active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.These stacks contain useful data as long as a thread is alive or a zombie. While the thread is in user space the kernel stack is empty except for the thread_info structure at the bottom.

In addition to the per thread stacks, there are specialized stacks associated with each CPU. These stacks are only used while the kernel is in control on that CPU; when a CPU returns to user space the specialized stacks contain no useful data. The main CPU stacks is Interrupt stack.

Used for external hardware interrupts. If this is the first external
hardware interrupt (i.e. not a nested hardware interrupt) then the
kernel switches from the current task to the interrupt stack. Like
the split thread and interrupt stacks on i386, this gives more room
for kernel interrupt processing without having to increase the size
of every per thread stack.

The interrupt stack is also used when processing a softirq.

Switching to the kernel interrupt stack is done by software based on a per CPU interrupt nest counter. This is needed because x86-64 “IST” hardware stacks cannot nest without races.

x86 interrupt stack is 8k or 16k，as shown below：

   //arch/x86/include/asm/page_64_types.h:
    9 #ifdef CONFIG_KASAN
   10 #define KASAN_STACK_ORDER 1
   11 #else
   12 #define KASAN_STACK_ORDER 0
   13 #endif
   25 #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER)
   26 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)

参考资料：
https://lwn.net/Articles/84583/
https://elinux.org/Kernel_Small_Stacks
https://www.kernel.org/doc/Documentation/x86/kernel-stacks