An overview of Linux processes
A Process is one of the most important fundamental concepts of the Linux operating system. This article focuses on the basics of Linux processes.
Process
A process is an instance of a program running in Linux. This is the basic definition that you might have heard before. Though its simple enough to understand but still lets elaborate a bit for the beginners. Lets quickly create a hello world program in C language :#include<stdio.h> int main(void) { printf("\n Hello World\n"); // Simulate a wait for some time for(i=0; i<0xFFFFFFFF; i++); return 0; }Compile the code above :
$ gcc -Wall hello_world.c -o hello_worldRun the executable :
$ ./hello_worldThe command above will execute the hello world program. Since the program waits for some time, so quickly go to the other terminal and check for any process named 'hello_world' :
$ ps -aef | grep hello_world himanshu 2260 2146 95 20:38 pts/0 00:00:13 ./hello_worldSo we see that a process named 'hello_world' is running in the system. Now, try to run the same program in parallel from 2-3 locations and again run the above command. I tried running the program in parallel from three different terminals and here is the output of the above command :
$ ps -aef | grep hello_world himanshu 2320 2146 99 20:43 pts/0 00:00:03 ./hello_world himanshu 2321 2261 67 20:43 pts/1 00:00:02 ./hello_world himanshu 2322 2287 72 20:43 pts/2 00:00:00 ./hello_worldSo you see that each instance of the hello_world program created a separate process. Hence we say that process is running instance of a program.
Identifiers associated with a process
Each process has following identifiers associated with it:Process Identifier (PID)
Each process has a unique identifier associated with it known as process ID. This ID remains unique across the system. For example, if you run the ps command on your Linux box, you will see something like:UID PID PPID C STIME TTY TIME CMD root 1 0 0 19:43 ? 00:00:00 /sbin/init root 2 0 0 19:43 ? 00:00:00 [kthreadd] root 3 2 0 19:43 ? 00:00:00 [migration/0] root 4 2 0 19:43 ? 00:00:00 [ksoftirqd/0] root 5 2 0 19:43 ? 00:00:00 [watchdog/0] root 6 2 0 19:43 ? 00:00:00 [migration/1] root 7 2 0 19:43 ? 00:00:00 [ksoftirqd/1] root 8 2 0 19:43 ? 00:00:00 [watchdog/1] root 9 2 0 19:43 ? 00:00:00 [events/0] root 10 2 0 19:43 ? 00:00:00 [events/1] ... ... ...The above output is from my Linux box. The second column (PID) gives the process ID of the process being described in the row. You may notice another similar looking column PPID. Well, this gives information of the parent process ID of this process. Any process in the Linux system will have a parent.
User and group Identifiers (UID and GID)
The category of identifiers associated with a process is the user and group identifiers. The user and group ID can further be classified into :
Real user ID and real group ID
These identifiers give information about the user and group to which a process belongs. Any process inherits these identifiers from its parent process.
Effective user ID, effective group ID and supplementary group ID
Ever got an error like "Permission denied"? Well this is a common error that is encountered many times. This error usually occurs when a process does not have sufficient permissions to carry out a task. These three IDs are used to determine the permission that a process has to do stuff that requires special permissions. Usually the effective user ID is same as real user ID but in case its different then it means that process is running with different privileges then what it has by default (ie inherited from its parent).If a process is running with effective user ID '0', this means that this process has special privileges. The processes that have zero effective user ID are known as privileged processes as they are running as superuser. These processes bypass all the permission checks that kernel has in place for all the unprivileged processes.
Real user ID and real group ID
These identifiers give information about the user and group to which a process belongs. Any process inherits these identifiers from its parent process.
Effective user ID, effective group ID and supplementary group ID
Ever got an error like "Permission denied"? Well this is a common error that is encountered many times. This error usually occurs when a process does not have sufficient permissions to carry out a task. These three IDs are used to determine the permission that a process has to do stuff that requires special permissions. Usually the effective user ID is same as real user ID but in case its different then it means that process is running with different privileges then what it has by default (ie inherited from its parent).If a process is running with effective user ID '0', this means that this process has special privileges. The processes that have zero effective user ID are known as privileged processes as they are running as superuser. These processes bypass all the permission checks that kernel has in place for all the unprivileged processes.
The init process
In Linux every process has a parent process. Now, one would ask that there has to be some starting point, some process that is created first. Yes, there is a process known as 'init' that is the very first process that Linux kernel creates after system boots up. All the process there-on are children of this process either directly or indirectly. The init process has special privileges in the sense that it cannot be killed. The only time it terminates is when the Linux system is shut down. The init process always has process ID 1 associated with it.Zombie and orphan processes
Suppose there are two processes. One is parent process while the other is child process. In a real time, there can be two scenarios:
The parent dies or gets killed before the child.
In the above scenario, the child process becomes the orphan process (as it has lost its parent). In Linux, the init process comes to the rescue of the orphan processes and adopts them. This means after a chile has lost its parent, the init process becomes its new parent process.
The child dies and parent does not perform wait() immediately.
Whenever the child is terminated, the termination status of the child is available to the parent through the wait() family of calls. So, the kernel does waits for parent to retrieve the termination status of the child before its completely wipes out the child process. Now, In a case where parent is not able to immediately perform the wait() (in order to fetch the termination status), the terminated child process becomes zombie process. A zombie process is one that is waiting for its parent to fetch its termination status. Although the kernel releases all the resources that the zombie process was holding before it got killed, some information like its termination status, its process ID etc are still stored by the kernel. Once the parent performs the wait() operation, kernel clears off this information too.
The parent dies or gets killed before the child.
In the above scenario, the child process becomes the orphan process (as it has lost its parent). In Linux, the init process comes to the rescue of the orphan processes and adopts them. This means after a chile has lost its parent, the init process becomes its new parent process.
The child dies and parent does not perform wait() immediately.
Whenever the child is terminated, the termination status of the child is available to the parent through the wait() family of calls. So, the kernel does waits for parent to retrieve the termination status of the child before its completely wipes out the child process. Now, In a case where parent is not able to immediately perform the wait() (in order to fetch the termination status), the terminated child process becomes zombie process. A zombie process is one that is waiting for its parent to fetch its termination status. Although the kernel releases all the resources that the zombie process was holding before it got killed, some information like its termination status, its process ID etc are still stored by the kernel. Once the parent performs the wait() operation, kernel clears off this information too.
Daemon process
A process that needs to run for a long period of time and does not require a controlling terminal, these type of processes are programmed in a way that they becomes a daemon processes. For example, monitoring software like key-logger etc are usually programmed as daemon processes. A daemon process has no controlling terminal.Memory layout of a process
A process can broadly be defined into following segments :
Stack
Stack contains all the data that is local to a function like variables, pointers etc. Each function has its own stack. Stack memory is dynamic in the sense that it grows with each function being called.
Heap
Heap segment contains memory that is dynamically requested by the programs for their variables.
Data
All the global and static members become part of this segment.
Text
All the program instructions, hard-coded strings, constant values are a part of this memory area.
If we extend the above hello world program to something like :
Stack
Stack contains all the data that is local to a function like variables, pointers etc. Each function has its own stack. Stack memory is dynamic in the sense that it grows with each function being called.
Heap
Heap segment contains memory that is dynamically requested by the programs for their variables.
Data
All the global and static members become part of this segment.
Text
All the program instructions, hard-coded strings, constant values are a part of this memory area.
If we extend the above hello world program to something like :
#include<stdio.h> #include<stdlib.h> #include<string.h> int a; int main(void) { int i = 0; char *ptr = (char*)malloc(15); memset(ptr, 0, 15); memcpy(ptr, "Hello World", 11); printf("\n %s \n", ptr); // Simulate a wait for some time for(i=0; i<0xFFFFFFFF; i++); free(ptr); return 0; }In the example above :- The variable 'a' goes into the data segm
Linux process environment
Environment in Linux is a list of 'variable=value' information that is used for variety of purposes. Programs, scripts, shells etc use this information for their smooth operation. For example the home directory of the user which is presently logged-in can be accessed by the 'HOME' environment variable. List of these environment variables along with their values can be viewed using the 'env' command. For example, on my Linux box I could see the following output of the env command :ORBISo we can see that there is a wide list of environment variables available. A user can add an environment variable using the 'export' command. In C language, an extern variable char**environ can be used to access this list in a program. A list of functions like getenv(), setenv() etc are available to manipulate the process environment.T_SO CKET DIR= /tmp /orb it-h iman shu SSH_AGENT_PID=1653 TERM=xterm SHELL=/bin/bash XDG_ SESS ION_ COOK IE=b 8b52 be9a 0280 f3c8 b48f cf04 d7ac 5a3- 1341 9252 17.8 8915 2-13 9076 5341 WINDOWID=62917358 GNOM E_KE YRIN G_CO NTRO L=/t mp/k eyri ng-6 UEJQ 4 GTK_ MODU LES= canb erra -gtk -mod ule USER=himanshu SSH_ AUTH _SOC K=/t mp/k eyri ng-6 UEJQ 4/ss h DEFA ULTS _PAT H=/u sr/s hare /gco nf/g nome .def ault .pat h SESS ION_ MANA GER= loca l/hi mans hu-l apto p:@/ tmp/ .ICE -uni x/16 19,u nix/ hima nshu -lap top: /tmp /.IC E-un ix/1 619 USERNAME=himanshu XDG_ CONF IG_D IRS= /etc /xdg /xdg -gno me:/ etc/ xdg DESK TOP_ SESS ION= gnom e PATH =/us r/lo cal/ sbin :/us r/lo cal/ bin: /usr /sbi n:/u sr/b in:/ sbin :/bi n:/u sr/g ames PWD=/home/himanshu GDM_ KEYB OARD _LAY OUT= us LANG=en_IN GNOM E_KE YRIN G_PI D=16 01 MAND ATOR Y_PA TH=/ usr/ shar e/gc onf/ gnom e.ma ndat ory. path GDM_LANG=en_IN GDMSESSION=gnome SPEECHD_PORT=7560 SHLVL=1 HOME=/home/himanshu GNOM E_DE SKTO P_SE SSIO N_ID =thi s-is -dep reca ted LOGNAME=himanshu XDG_ DATA _DIR S=/u sr/s hare /gno me:/ usr/ loca l/sh are/ :/us r/sh are/ DBUS _SES SION _BUS _ADD RESS =uni x:ab stra ct=/ tmp/ dbus -AWv AHVE XeC, guid =62c 39aa e57a a4bf c10e 80e4 44ff c276 2 DISPLAY=:0.0 XAUT HORI TY=/ var/ run/ gdm/ auth -for -him ansh u-yx PNRW /dat abas e COLO RTER M=gn ome- term inal _=/usr/bin/env