This post tries to depict a inside-view about Linux process / thread. In addition to explain the relative theory / idea , you will see examples in real world. specifically, Linux 2.6.28 on ARM 11.
This post tries to figure out:
1. What is a Linux process/thread?
2. How Linux represents a process / thread?
3. The life-cycle of Linux process / thread.
4. The relationships of processes / threads.
This post will NOT consider task scheduler.
1. What is Linux process / thread?
According to R. Love, a process is the executing program code as well as associated resources, such as open files, pending signals, a memory address space, and so forth.
Threads are objects of activity within the process.
From my point of view, the definition of process is quite simple and clear: process = running program (code section) + environment.
Regarding threads, besides above definition, we need to realize:
a. Linux kernel schedules individual thread, not process.
b. To Linux, a thread is just a special kind of thread:
Processes provide 2 virtualizations: a virtualized processor / virtual memory. Threads share the virtual memory abstraction, whereas each receives its own virtualized processor.
2. How Linux represents a process / thread?
Linux uses struct task_struct (defined in include/linux/sched.h) to describe each task / thread in system. The struct is quite huge (around 1.7KB on a 32-bits machine) because it describes everything regarding a task: the task's memory space, its state, what files can be accessed by the task, what signals will be handled by the task... tons of information. It is boring to list all fields in this post. Instead, only the most important fields (from my point of view) will be showed here.
struct task_struct {
/* task's state: -1 unrunnable, 0 runnable, >0 stopped */
volatile long state;
/* task's stack */
void *stack;
...
struct list_head tasks;
struct mm_struct *mm, *active_mm;
...
pid_t pid;
pid_t tpid;
...
struct task_struct *parent, *real_parent;
...
struct task_struct *group_leader;
...
struct fs_struct *fs;
struct files_struct *files;
}
What we need to pay attention to on task_struct includes:
a. The structure is allocated via the slab allocator.
b. Inside the kernel, tasks are typically referenced directly by a pointer to their task_struct structure. Consequently, it is useful to be able to quickly look up the descriptor of the currently executing task. To do this, a new structure, struct thread_info, was created and at the bottom of the process's kernel stack.
struct thread_info{ //arch/arm/include/asm/thread_info.h
unsigned long flags;
int preempt_count;
mm_segment_t addr_limit;
struct task_struct *task;
struct exec_domain
...
}
Note that the member task is a pointer to the task's actual task_struct.
Each process's kernel stack is 8KB (2 page frames) in size, so we can get task's thread_info by masking out the 13 lower bits of the Stack Pointer. Then we can get task's descriptor via thread_info->task.
3. The life-cycle of Linux process / thread.
A. Process creation
Linux creates a process in two distinct steps:
1). fork(): creates descriptor by means of copying its parent's task_struct.
fork() creates a child process that is a copy of the current task. It differs from the parent only in its PID, PPID, and certain resources and statistics, such as pending signals, which are not inherited.
2). exec(): load and execute program
exec() loads a new executable into the address space and begins executing it.