The current macro and task stack setup in the Linux kernel

最新推荐文章于 2024-09-15 14:59:43 发布

塵觴葉

最新推荐文章于 2024-09-15 14:59:43 发布

阅读量442

点赞数 2

分类专栏：杂谈文章标签： linux kernel stack arm

本文链接：https://blog.csdn.net/yeholmes/article/details/108288035

版权

杂谈专栏收录该内容

63 篇文章 8 订阅

订阅专栏

The `current` macro and task stack setup in the Linux kernel

Getting the task handle in kernel context

In the Linux kernel, a running process or a kernel thread is usually represented as a pointer to task_struct structure, which holds almost all information related to the process or kernel thread. When an user-space application runs in kernel context, task_struct has to be fetched in order to access process-related data. Linux kernel has an ingenuous solution for the problem, which enables running code to acquire a pointer to task_struct in lightening speed: a C macro named current. Let’s take Linux/ARMv7-a for example, related macros and functions are defined as:

current and other macros definitions

Wherever kernel code needs a pointer to current task, just current will do. On a Linux/ARMv7-a system, the lower 13 bits of current stack pointer is cleared, and the result is treated as a pointer to thread_info structure; there is a task member in thread_info which points to the task_struct for current process or kernel thread task handle. Let’s disassemble a system call, close, to find out how the macros and inline functions are translated into machine code:

close system call

We can justify from the picture above, that the current stack pointer sp, is indeed related to the acquisition of current task handle. An interesting fact is that a file descriptor is treated as an unsigned integer in the kernel, this works because when a negative integer is passed into the kernel, it becomes an unsigned integer so large that it cannot be used to index into the process file table, thus an invalid file descriptor is detected, EBADF. Next, we have two questions to ask:

On a Linux/ARMv7-a system, when switching between user-space and kernel-space contexts, how the kernel decides what stack to use? And are there two distinct stacks used separately by user-space and kernel-space contexts ?
When creating a new user-space process or thread, how the kernel sets up the thread_info and task_struct pointers, so that by invoking current macro, kernel code can easily access the task_struct pointer?

The first question is easy to answer with a moment’s pondering. User-space and kernel-space contexts have to use two distinct stacks, because user-space application cannot access kernel memory directly, a different stack must be used when switching between contexts. Now with the last question, a simple application has been wrote which creates a sub-thread after running, to help us debug Linux kernel, thus enhancing our understanding of the Linux kernel.

Creating a new kernel stack for a new task

The debugging session was accomplished with QEMU, which loads a kernel zImage and runs it. After the user-space application’s invocation, Linux kernel stopped at a breakpoint added at the very beginning of clone system call:
User-space application request kernel to create a thread
Note that clone system call can be used to create a new process, so we double-checked that the clone_flags dictates the kernel to create a new thread for user-space application. Wandering about in the kernel sources, we can be certain that the newly created thread’s stack assignment is at kernel/fork.c, line 871, then another breakpoint is added:
Find the newly created task_struct and its stack
Regsters r6 and r8 hold task_struct pointer and newly created kernel stack pointer separately: 0x9e63ae00 and 0x9df0a000. One more word about the kernel stack allocation: on Linux/ARMv7-a systems, kernel stack sizes are usually 8192 bytes, and 8192 bytes aligned. This is an interesting feature but more due to technical reasons. Recall that during task_struct pointer acquisition, the lower 13 bits of stack pointer has to be cleared, thus kernel stack is better off if 8192 bytes aligned. We can infer from assembly instruction, str r8, [r6, #4], that the offset of stack member in task_struct structure is 4 bytes, which can be justified from kernel source code:

/* include/linux/sched.h */
struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
    /*
     * For reasons of header soup (see current_thread_info()), this
     * must be the first element of task_struct.
     */
    struct thread_info      thread_info;
#endif
    /* -1 unrunnable, 0 runnable, >0 stopped: */
    volatile long           state;

    /*
     * This begins the randomizable portion of task_struct. Only
     * scheduling-critical items should be added above here.
     */
    randomized_struct_fields_start

    void                *stack; /* CONFIG_THREAD_INFO_IN_TASK is not defined, current offset is 4 bytes*/
    ...

From kernel source code we can infer that the lower end of newly created stack is actually treated as thread_info structure, and task_struct pointer will have to be stored in thread_info; Let’s find out where the store happens:

Use watchpoint to track thread_info->task
Debugging results show that new task_struct pointer 0x9e63ae00 is stored at beginning of new stack, offset by 12 bytes. Here is corresponding kernel source, an inline function defined in include/linux/sched/task_stack.h:

store task_struct pointer into thread_info
Now we know how Linux kernel creates a new kernel stack for a new task, and the two structures refer to each other (from kernel source code, tsk->stack = stack, and task_thread_info(p)->task = p); more importantly, the new kernel stack is 8192 bytes aligned, so after current macro expansion, the result is always the lower end of new kernel stack (which stores thread_info structure).

Settting up contexts for new task

Now that the new kernel stack has been allocated, and new task_struct bring into existence, but the new task cannot be run immediately. Some architecture specific configurations have to be carried out before new task is ready to run. We now focus our attention to a function named copy_thread(...), which always gets called whenever an application is forking a child process, creating a sub-thread, or when kernel is creating a kernel thread. The main purpose of copy_thread(...) is to setup the entry function and the top of stack for newly created task, by writing structures representing ARMv7-a core registers (notably struct pt_regs for user-space context, and struct cpu_context_save for kernel-space context):

copy_thread function
After careful calculation, the stack pointer for the new task is 0x9df0bfb0, when executing at the very entry of new task, which is in fact an assembly function defined in arch/arm/kernel/entry-common.S. Note that on Linux/ARMv7-a systems, kernel stacks are usually 8192 bytes, the lower end of stack stores thread_info structure, and the stack grows down: the kernel stack is so small compared to user-space application, that as kernel developers, we should always keep this in mind. For the new kernel task, registers are written to struct cpu_context_save structure, this is distinct from struct pt_regs, which is used to store registers from user-space. Lastly, add a breakpoint at the first machine instruction of function ret_from_fork, we can verify our calculation:

The entry of new task
The beautiful assembly code above will take the newly created task to user-space, which as we’ve mentioned earlier, works happily as sub-thread of our test application. So far we’ve followed roughly the whole dancing of Linux kernel creating a new task, setting up the kernel stack, which enables the correct expansion of current macro to fetch current task handle. However, how does Linux kernel store the kernel stack pointer when application is running in user-space? The answer is that for ARMv7-a SoC, there are many stack pointer registers(R13), banked according to CPU execution modes. When an application switches from user-space into kernel-space, registers from user-space are pushed onto kernel stack (accessed via struct pt_regs structure), which always is near the top of stack; when switching back to user-space, the saved registers will be popped out from kernel stack, thus ensure the kernel stack are balanced.

Conclusion

For a thread of an application, running in user-space and kernel-space requires two different stacks. The user-space stack can be determined by application (via clone system call), but the kernel stack is allocated and freed by Linux kernel.
current macro in Linux kernel requires special attention, the lower end of kernel stack stores thread_info structure, which has a pointer to task_struct handle (of cause for Linux/ARMv7-a systems).
The Entry of an application process/thread, and the entry of a kernel thread, are always ret_from_fork. Kernel stacks are small in size, and not all of them are available.