本文主要介绍进程(process)和线程(thread)的相关知识。
1 Why
1.1 为了整合资源
一开始,CPU只有在执行完一份完整的任务代码后,才能执行下一份。后来,通过把CPU时间分片,可以让多个任务“看似”同时地执行。
为了更好地区分这些“同时”执行的任务以及整合各自对应的资源,人们就提出了process这个概念。如下:
A process is basically a program in execution… It is fundamentally a container that holds all the information needed to run a program.
每个process都有独立的下列资源:
- address space: a list of memory locations from 0 to some maximum, which the process can read and write.
- resource: commonly including registers(including the program counter and stack pointer), a list of related processes, and all the other information needed to run the program.
process之间的通信要通过IPC(inter-process communication)来实现。
1.2 为了提高效率
原始的process只有一个“thread of control”来执行任务,后来人们发现如果一个process中能够有“multiple threads of control”,让它们共享process资源并相互协作,将会大大提高效率。由此,人们提出了thread这个概念。
每个thread都拥有自己stack,用来记录执行历史。
正如前面所说,为了提高效率,threads之间共享process的address space和resource。由于address space共享,“thread A”几乎可以毫无障碍地修改“thread B”上的数据。
为什么不在thread之间设置一定的保护(其实在多进程场景设计时,需要考虑进程同步与死锁的问题,这里的保护指的是其他的保护措施)呢?
Unlike different processes, which may be from different users and which may be hostile to one another, a process is always owned by a single user, who has presumably created multiple threads so that they can cooperate, not fight.
为什么不用多进程(multi-processes)而是使用多线程(multi-threads)来协作呢?
…they are lighter weight than processes, they are easier (i.e., faster) to create and destroy than processes. In many systems, creating a thread goes 10-100 times faster than creating a process.
而且,process之间的资源共享和信息传递(IPC)不如thread高效(共享address space和resource)。
1.3 总结
process模型体系由两个独立的概念(resource grouping和execution)组成,解释如下:
resource grouping: One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resource. These resources may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily.
execution: The other concept a process has is a thread of control, usually shortened to just thread. The thread has a program counter that keeps track of which instruction to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from.
process和thread虽然联系紧密,但从概念上区分的话,可以这么认为:
processes are used to group resources together; threads are the entities scheduled for execution on the CPU.
2 program
在有些文章中,在介绍process和thread之前,会首先介绍program,本文也简要介绍一下program。
program指的是软件工程师在IDE、editor等软件中编写的代码(code),即尚未载入计算机内存的代码。
注:有些书籍中把program称为程式。
3 process
process,可翻译为程序、进程,指的是已经执行并且载入到内存中的program,process中的每一行program代码随时都可能被CPU执行。
在实际应用中,打开应用程序实际上就是将program活化成process,所以我们可以查看到PID,即执行中的process。
关于process,进行以下几点说明:
- 同一个program可以同时存在多个process;
- process是电脑中已经执行的program的实体;
- 每一个process都是相互独立的;
- process本身不是基本执行单位,它是thread(基本执行单位)的容器;
- process需要一些资源(如CPU、内存、文件、I/O设备)才能完成工作;
- 在多任务系统(Multitasking Operating System)中,可以同时执行多个process,然而一个CPU一次只能执行一个process(所以才出现了多核处理器),但是在实际的操作系统中,process的运行数量肯定会多于CPU的总数,同时process需要占用内存,所以操作系统需要考虑如何对这些process进行调度(Scheduling)。
4 thread
thread,可以翻译为执行绪、线程(通常使用此翻译)。前面提到了process是thread的容器,在同一个process中会有很多个thread,每一个thread负责某一项功能。
以聊天室process为例,通过使用多个thread,我们可以实现,在接受对方传来的消息的同时,发送自己的消息给对方。
如果把process看作是工厂,thread就是工厂内的工人,负责实现工厂的各个功能,并且每个工人共享工厂内的所有资源。
关于thread,进行以下几点说明:
- 同一个process可以同时存在多个thread;
- 同一个process下的Thread共享process的资源,如内存、变量等,而不同的process则不能如此;
- 在多线程(Multi threading)中,两个线程如果同时存取或改变全局变量(Global Variable),则可能发生同步(Synchronization)的问题。如果线程之间互抢资源,则可能产生死锁(Dead Lock)。在编写多线程程序时,需要特别注意这两种情况。
5 区别与联系
下图展示了进程和线程之间联系:
5.1 key different
Thread and Process are two closely related terms in multi-threading. The main difference between the two terms is that the threads are a part of a process, i.e. a process may contain one or more threads, but a thread cannot contain a process.
5.2 process具体描述
A process is an instance of a program that is being executed. It contains the program code and its current activity. Depending on the operating system, a process may be made up of multiple threads of execution that execute instructions concurrently. A program is a collection of instructions; a process is the actual execution of those instructions.
说明:可以对照前面所讲的program的概念,来理解这里的描述。
A process has a self-contained execution environment. It has a complete set of private basic run-time resources; in particular, each process has its own memory space.
Processes are often considered similar to other programs or applications. However, the running of a single application may in fact be a set of cooperating processes.
To facilitate communication between the processes, most operating systems use Inter Process Communication(IPC) resources, such as pipes and sockets. The IPC resources can also be used for communication between processes on different systems.
Most applications in a virtual machine run as a single process. However, it can create additional processes using a process builder object.
5.3 thread具体描述
In computers, a thread can execute even the smallest sequence of programmed instructions that can be managed independently by an operating system.
The applications of threads and processes differ from one operating system to another. However, the threads are made of and exist within a process; every process has at least one.
Multiple threads can also exist in a process and share resources, which helps in efficient communication between threads.
On a single processor, multitasking takes place as the processor switches between different threads; it is known as multi-threading. The switching happens so frequently that the threads or tasks are perceived to be running at the same time.
Threads can truly be concurrent on a multi-processor or multi-core system, with every processor or core executing the separate threads simultaneously.
5.4 总结
In summary, threads may be considered lightweight processes, as they contain simple sets of instructions and can run within a larger process. Computers can run multiple threads and processes at the same time.
5.5 process和thead对比
下面的表格在一些方面对进程和线程进行了对比。
| process | thread |
Definition | An executing instance of a program is called a process. | A thread is a subset of the process. |
Data segment | It has its own copy of the data segment of the parent process. | It has direct access to the data segment of its process. |
Communication | Processes must use inter-process communication to communicate with sibling processes. | Threads can directly communicate with other threads of its process. |
Overheads | Processes have considerable overhead. | Threads have almost no overhead. |
Creation | New processes require duplication of the parent process. | New threads are easily created. |
Control | Processes can only exercise control over child processes. | Threads can exercise considerable control over threads of the same process. |
Changes | Any change in the parent process does not affect child processes. | Any change in the main thread may affect the behavior of the other threads of the process. |
Memory | Run in separate memory spaces. | Run in shared memory spaces. |
File descriptors | Most file descriptors are not shared. | It shares file descriptors. |
File system | There is no sharing of file system context. | It shares file system context. |
Signal | It does not share signal handling. | It shares signal handling. |
Controlled by | Process is controlled by the operating system. | Threads are controlled by programmer in a program. |
Dependence | Processes are independent. | Threads are dependent. |