So that Linux can manage the processes in the system, each process is represented by a task_struct data structure (task and process are terms which Linux uses interchangeably). The task vector is an array of pointers to every task_struct data structure in the system.
This means that the maximum number of processes in the system is limited by the size of the task vector; by default it has 512 entries. As processes are created, a new task_struct is allocated from system memory and added into the task vector. To make it easy to find, the current, running, process is pointed to by the current pointer.
As well as the normal type of process, Linux supports real time processes. These processes have to react very quickly to external events (hence the term ``real time'') and they are treated differently than normal user processes by the scheduler. Although the task_struct data structure is quite large and complex, but its fields can be divided into a number of functional areas:
-
State
-
As a process executes it changes
state according to its circumstances. Linux processes have the following states:
-
Running
- The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system's CPUs). Waiting
- The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances. Stopped
- The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state. Zombie
-
This is a halted process which, for some reason, still has a
task_struct
data structure in the
task
vector. It is what it sounds like, a dead process.
Scheduling Information
-
The scheduler needs this information in order to fairly decide which process in the system most deserves to run,
-
Every process in the system has a process identifier. The process identifier is not an index into the
task
vector, it is simply a number. Each process also has User and group identifiers, these are used to control this processes access to the files and devices in the system,
-
Linux supports the classic Unix IPC mechanisms of signals, pipes and semaphores and also the System V IPC mechanisms of shared memory, semaphores and message queues.
-
In a Linux system no process is independent of any other process. Every process in the system, except the initial process has a parent process. New processes are not created, they are copied, or rather
cloned from previous processes. Every
task_struct
representing a process keeps pointers to its parent process and to its siblings (those processes with the same parent process) as well as to its own child processes. You can see the family relationship between the running processes in a Linux system using the
pstree
command:
init(1)-+-crond(98) |-emacs(387) |-gpm(146) |-inetd(110) |-kerneld(18) |-kflushd(2) |-klogd(87) |-kswapd(3) |-login(160)---bash(192)---emacs(225) |-lpd(121) |-mingetty(161) |-mingetty(162) |-mingetty(163) |-mingetty(164) |-login(403)---bash(404)---pstree(594) |-sendmail(134) |-syslogd(78) `-update(166)
Additionally all of the processes in the system are held in a doubly linked list whose root is the init processes task_struct data structure. This list allows the Linux kernel to look at every process in the system. It needs to do this to provide support for commands such as ps or kill .
Times and Timers
-
The kernel keeps track of a processes creation time as well as the CPU time that it consumes during its lifetime. Each clock tick, the kernel updates the amount of time in
jiffies
that the current process has spent in system and in user mode. Linux also supports process specific
interval timers, processes can use system calls to set up timers to send signals to themselves when the timers expire. These timers can be single-shot or periodic timers.
-
Processes can open and close files as they wish and the processes
task_struct
contains pointers to descriptors for each open file as well as pointers to two VFS inodes. The first is to the root of the process (its home directory) and the second is to its current or
pwd directory.
pwd is derived from the Unix command
pwd
,
print working directory. These two VFS inodes have their
count
fields incremented to show that one or more processes are referencing them. This is why you cannot delete the directory that a process has as its
pwd directory set to, or for that matter one of its sub-directories.
-
Most processes have some virtual memory (kernel threads and daemons do not) and the Linux kernel must track how that virtual memory is mapped onto the system's physical memory.
- A process could be thought of as the sum total of the system's current state. Whenever a process is running it is using the processor's registers, stacks and so on. This is the processes context and, when a process is suspended, all of that CPU specific context must be saved in the task_struct for the process. When a process is restarted by the scheduler its context is restored from here.