线程是什么？

最新推荐文章于 2021-07-31 17:03:23 发布

keminlau

最新推荐文章于 2021-07-31 17:03:23 发布

阅读量2.7k

点赞数

分类专栏：计算机软硬体系（操作系统）文章标签：多线程 network system inheritance thread 嵌入式

计算机软硬体系（操作系统）专栏收录该内容

30 篇文章 1 订阅

订阅专栏

线程是什么？

摘自维基百科（ http://en.wikipedia.org/wiki/Thread_(computer_science)）

刘建文略译（ http://blog.csdn.net/keminlau）

Thread (computer science) From Wikipedia, the free encyclopedia

计算机科学里的线程执行（thread of execution）的概念指的是把计算机程序的执行路线分叉（fork）成两条或多条执行路线，实现并发运行。不同的操作系统对进程和线程的现实都有少许不同，不过大都是进程包含着线程。同一进程内的多个线程可共享诸如内存这样的资源，而进程间不能共享这些资源。

在单处理器的系统中，多线程一般通过时分多路复用方式——处理器在不同的线程间切换来实现。这种线程切换必须很频繁并持续，这样用户才能以为线程是并行运行的。在多处理器或多核系统中，不同线程才是真正的并行运行的，每一个处理器执行不同的线程。

不同的编程语言对线程的支持是不同的。有相当一部分编程语言是不支持线程，不支持在单一程序内同时有多于一个以上的执行上下文。比如Python就是这样。Python里有所谓的全局解释锁（Global Interpreter Lock）限制了它的运行时（runtime）对并行运行的支持。有一些语言则只支持用户线程，用户线程是内核不可见的，因此它不能被多处理器并行调度执行，多处理器并行执行的必须实现所谓的内核线程。

很多现代操作直接支持时间片式和多处理器式线程操作。操作系统内核为程序员对线程操作提供了系统调用（system call）。

线程与进程对比

像Windows NT和OS/2这些操作系统声称线程很“轻”而进程很“重”，但一些其它操作系统则认为二者其实并没有这么大的区别，二者在切换上只是在地址空间上开销有一点区别。

多线程：优点和使用

多线程模型是目前很流行的编程模型和代码执行模型。多个线程共享进程资源而且能独立地执行，而且这种并行和独立可以应用于多处理器环境——多处理器、多核和集群环境。

为了做到线程独立，程序员必须当心以防出现竞态（race conditions ）和一些隐晦(non-intuitive)的程序行为。还有，为了正确的使用数据，线程间必须约定好使用数据的次序。

为了防止常用数据被同时修改，线程需要使用原子操作（atomic operations）（一般用信号灯机[semaphores]制实现）。原子操作使用不慎会造成死锁（deadlocks）现象。

多线程另一个优点就是即便是单处理器系统，应用程序对用户输入仍能保持应答，提高交互友好性。在单线程的程序里，如果主线程要完成一项长时间的任务，那么整个程序就像被冻结一样，没有任何反应。换成多线程模型，我们可以把该长时间任务交给一个新的工作线程（worker thread），主线程可以继续与用户交互。

操作系统实现了两种线程调度方式——抢先式（preemptive）和协作式（cooperative）。抢先式一般被认为较好，因为它允许操作系统决定何时进行上下文切换，操作系统主导线程调度。而协作式则是由线程自己决定何时进行上下文切换，由于线程可能遇到阻塞，比如同步I/O操作，这是协作式需要额外机制处理的（请看下面的“I/O和调度”）。

抢先式的缺点就是系统可能会切换线程不适时，引发“优先级反转”（priority inversion）现象或其它协作式多线程能避免的负面效果。

过去主流的计算机硬件对多线程没有多少支持，因为线程切换一般都快于全进程上下文切换。但嵌入式系统的处理器则不同，嵌入式应用对实时性有很高的要求，必须想办法尽量降低线程切换的时间。嵌入式系统的处理器有专用的寄存器文件分配给每个线程，而不是通用寄存器文件。

上世纪九十年代后期，多线程同时执行的思想已经成为事实，出现所谓的同时多线程技术（simultaneous multithreading.）这也就P4的里的所谓超线程技术。

进程、内核线程、用户线程和纤程

进程是最“重”的内核调度单元。进程拥有由操作系统分配的各种资源，比如内存、文件、套接字、设备和窗口等。进程间不能共享地址空间和文件资源，除非使用显式方式共享，比如文件继承、共享的内存段或映射共享。进程是典型的抢先式多任务的。

内核线程则是最“轻”的内核调度单元。每个进程至少有一个内核线程。如果进程内有多个内核线程，那它们可共享进程资源。当操作系统的调度器是抢先式时，内核线程也是抢先式多任务的（preemptively multitasked）

线程除可以内核内实现，还可以在用户空间内实现，这种线程称用户线程。用户线程是内核不可见的，它们在用户空间内被管理和调度。为了配合多处理器的机器环境 (N:M 模型)，有一些用户线程的实现是基于内核线程的。在本文中如果没有特指，线程默认指内核线程。在虚拟机上实现的用户线程也叫绿色线程。

纤程（Fibers）则是协作式调度中比内核线程还要轻的调度单元：一个正在运行的纤程必须显式同意（yield）启动另一个纤程运行。这种实现方式比内核和用户线程都要容易。纤程可被进程内任何一个线程调度运行，这样应用程序可主动管理任务调度，实现效率最大化。

线程和纤程问题

并发与数据结构

进程内的线程共享地址空间，这意味着两个紧密合作的并发线程可以很方便的交换数据，没有通IPC（Inter-process communication）实现通信所带的复杂性和代价。不过，互斥问题在线程间依然存在的。如果两个线程同时使用一块数据，并且需要多于单条CPU指令的操作时，那么很可能出现竞态（race hazards）现象。由竞态现象产生的BUG是很难再现（reproduce ）和隔离的。

为了防止这些现象发生，线程API为资源的并发访问提供了同步原语（synchronization primitives），比如互斥体（mutex）。

I/O 和调度

用户线程和纤程一般都在用户空间内实现的。在用户空间的线程切换是相当快的，因为完全不用与内核交互：上下文切换时，线程或纤程的CPU寄存器值可直接在本地保存和恢复。由于调度在用户空间内发生，调度策略更容易配置。

不过，如果用户线程或纤程调用了一个受阻塞的系统调用，比如IO操作，会带来一些问题。大部分程序在进行IO操作都同步操作的（KEMIN:我们平时写高级语言代码默认是同步的，代码是顺序执行的，如果要实现异步操作，则必须一个异步操作的环境或框架），在IO操作的系统调用返回之前，进程内其它线程或纤程都不能运行，使这些线程产生不必要的饥饿，这是不合理的。（相反，换成是内核线程时，由于线程由内核管理，如果一个线程执行一个导致阻塞的系统调用，那么内核可以调度进程内的其它线程执行。）

一种常见的解决办法就是，线程库提供一个同步接口，库内部实现异步机制（KEMIN：详情？）。这种方法也用于其它可被阻塞的系统调用（system calls）。还有一种办法，就是尽量避免使用可被阻塞的系统调用。

NOTE:关于异步I/O

现代UNIX系统提供了异步I/O机制，异步I/O是一个非常有用的手段，因为它允许一个进程在等待I/O操作完成的同时执行其它任务。但是，它带来了一个复杂的编程模型。人们希望把异步限制到操作系统级上，而给应用程序一个同步的编程环境。——摘自《1^^^^^Unix Internals - The New Frontiers - 1996》3.2.3

内核线程的使用简化了用户代码的编写，因为大部分的线程管理代码被移入内核。程序本身不需要自己调度线程和显式让出CPU。用户代码可以以很自然的顺序方式编写，包括调用会被阻塞的API，并且不会饥饿其它用户线程。内核线程的缺点是线程切换的时间点很随意，这样很可能引发竞态（race hazards）现象。这在单处理器环境可能发生，多处理器环境则机率更大。

文中的一些术语

race condition

Anomalous behavior due to unexpected critical dependence on the relative timing of events.
For example, if one process writes to a file while another is reading from the same location then the data read may be the old contents, the new contents or some mixture of the two depending on the relative timing of the read and write operations.

A common remedy in this kind of race condition is file locking; a more cumbersome remedy is to reorganize the system such that a certain processes (running a daemon or the like) is the only process that has access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process.

As an example of a more subtle kind of race condition, consider a distributed chat network like IRC, where a user is granted channel-operator privileges in any channel he starts. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither will yet have received the other's signal that that channel has been started.

In this case of a race condition, the "shared resource" is the conception of the state of the network (what channels exist, as well as what users started them and therefore have what privileges), which each server is free to change as long as it signals the other servers on the network about the changes so that they can update their conception of the state of the network. However, the latency across the network makes possible the kind of race condition described. In this case, heading off race conditions by imposing a form of control over access to the shared resource -- say, appointing one server to be in charge of who holds what privileges -- would mean turning the distributed network into a centralized one (at least for that one part of the network operation). Where this is not acceptable, the more pragmatic solution is to have the system recognize when a race condition has occurred and to repair the ill effects.

priority inversion

<parallel> The state of a concurrent system where a high priority task is waiting for a low priority task which is waiting for a medium priority task. The system may become unstable and crash under these circumstances.
In an operating system that uses multiple tasks, each task (or context) may be given a priority. These priorities help the scheduler decide which task to run next. Consider tasks, L, M, and H, with priorities Low, Medium, and High. M is running and H is blocked waiting for some resource that is held by L. So long as any task with a priority higher than L is runable, it will prevent task L, and thus task H, from running.

Priority inversion is generally considered either as a high-level design failure or an implementation issue to be taken into account depending on who is talking. Most operating systems have methods in place to prevent or take inversion into account. Priority inheritance is one method.

The most public instance of priority inversion is the repeated 'fail-safe' rebooting of the Mars Pathfinder. base station ('Sagan Memorial Station').