中篇：Real-world Concurrency（真实世界的并发）翻译&笔记

最新推荐文章于 2024-07-06 18:43:02 发布

霜晨月c

最新推荐文章于 2024-07-06 18:43:02 发布

阅读量65

点赞数

文章标签：笔记 c++ 学习方法程序人生

本文链接：https://blog.csdn.net/m0_53485135/article/details/134140149

版权

中篇：Real-world Concurrency（真实世界的并发）翻译&笔记

这篇文章主要讨论了编写并发程序的主要原则，精读之后收获颇大，故作为笔记，记录于此。

I、翻译部分

书接上篇

Some Historical Context

Before we discuss concurrency with respect to today’s applications, it would be helpful to explore the history of concurrent execution. Even by the 1960s—when the world was still wet with the morning dew of the computer age—it was becoming clear that a single central processing unit executing a single instruction stream would result in unnecessarily limited system performance. While computer designers experimented with different ideas to circumvent this limitation, it was the introduction of the Burroughs B5000 in 1961 that proffered the idea that ultimately proved to be the way forward: disjoint CPUs concurrently executing different instruction streams but sharing a common memory. In this regard (as in many) the B5000 was at least a decade ahead of its time. It was not until the 1980s that the need for multiprocessing became clear to a wider body of researchers, who over the course of the decade explored cache coherence protocols (e.g., the Xerox Dragon and DEC Firefly), prototyped parallel operating systems (e.g., multiprocessor Unix running on the AT&T 3B20A), and developed parallel databases (e.g., Gamma at the University of Wisconsin).

在讨论并发与当今应用的关系之前，我们不妨先来了解一下并发执行的历史。即使到了 20 世纪 60 年代–当世界还沉浸在计算机时代的晨露中时–人们也逐渐意识到，单个中央处理单元执行单个指令流将导致系统性能受到不必要的限制。尽管计算机设计师们尝试了各种不同的想法来规避这一限制，但 1961 年 Burroughs B5000 的问世提出了最终被证明是未来发展方向的想法：不相连的中央处理器同时执行不同的指令流，但共享一个共同的内存。在这一点上（和许多方面一样），B5000 至少领先了时代十年。直到 20 世纪 80 年代，更多的研究人员才意识到多处理的必要性，他们在这十年间探索了高速缓存一致性协议（如施乐公司的 Dragon 和 DEC 的 Firefly），开发了并行操作系统原型（如在 AT&T 3B20A 上运行的多处理器 Unix），并开发了并行数据库（如威斯康星大学的 Gamma）。

In the 1990s, the seeds planted by researchers in the 1980s bore the fruit of practical systems, with many computer companies (e.g., Sun, SGI, Sequent, Pyramid) placing big bets on symmetric multiprocessing. These bets on concurrent hardware necessitated corresponding bets on concurrent software—if an operating system cannot execute in parallel, neither can much else in the system—and these companies independently came to the realization that their operating systems would need to be rewritten around the notion of concurrent execution. These rewrites took place early in the 1990s, and the resulting systems were polished over the decade; much of the resulting technology can today be seen in open source operating systems such as OpenSolaris, FreeBSD, and Linux.

20 世纪 90 年代，研究人员在 20 世纪 80 年代播下的种子结出了实用系统的果实，许多计算机公司（如 Sun、SGI、Sequent、Pyramid）都在对称多处理方面下了大赌注。在并发硬件上下注，就必须在并发软件上下注–如果操作系统不能并行执行，那么系统中的其他东西也无法并行执行，因此这些公司逐渐意识到，他们的操作系统需要围绕并行执行的概念进行重写。这些重写工作发生在 20 世纪 90 年代初，所产生的系统在这十年中不断完善；今天，我们可以在 OpenSolaris、FreeBSD 和 Linux 等开源操作系统中看到许多由此产生的技术。

Just as several computer companies made big bets around multiprocessing, several database vendors made bets around highly parallel relational databases; upstarts including Oracle, Teradata, Tandem, Sybase, and Informix needed to use concurrency to achieve a performance advantage over the mainframes that had dominated transaction processing until that time.2 As in operating systems, this work was conceived in the late 1980s and early 1990s, and incrementally improved over the course of the decade.

正如几家计算机公司围绕多处理技术下了大赌注一样，几家数据库供应商也围绕高度并行的关系数据库下了赌注；包括 Oracle、Teradata、Tandem、Sybase 和 Informix 在内的后起之秀需要利用并发性来实现性能优势，超越在此之前一直主导事务处理的大型机。

The upshot of these trends was that by the end of the 1990s, concurrent systems had displaced their uniprocessor forebears as high-performance computers: when the TOP500 list of supercomputers was first drawn up in 1993, the highest-performing uniprocessor in the world was just #34, and more than 80 percent of the top 500 were multiprocessors of one flavor or another. By 1997, uniprocessors were off the list entirely. Beyond the supercomputing world, many transaction-oriented applications scaled with CPU, allowing users to realize the dream of expanding a system without revisiting architecture.

这些趋势的结果是，到 20 世纪 90 年代末，并发系统取代了单核处理器，成为高性能计算机的先驱：1993 年首次编制超级计算机 TOP500 榜单时，世界上性能最高的单核处理器仅排在第 34 位，而前 500 强中 80% 以上都是这种或那种多核处理器。到 1997 年，单核处理器已经完全退出了榜单。在超级计算机领域之外，许多面向交易的应用程序也随着 CPU 的升级而升级，这使得用户可以在不重新考虑架构的情况下实现扩展系统的梦想。

The rise of concurrent systems in the 1990s coincided with another trend: while CPU clock rate continued to increase, the speed of main memory was not keeping up. To cope with this relatively slower memory, microprocessor architects incorporated deeper (and more complicated) pipelines, caches, and prediction units. Even then, the clock rates themselves were quickly becoming something of a fib: while the CPU might be able to execute at the advertised rate, only a slim fraction of code could actually achieve (let alone surpass) the rate of one cycle per instruction—most code was mired spending three, four, five (or many more) cycles per instruction.

20 世纪 90 年代并发系统的兴起与另一个趋势不谋而合：虽然 CPU 的时钟频率不断提高，但主内存的速度却跟不上。为了应对相对较慢的内存速度，微处理器架构师采用了更深（更复杂）的流水线、缓存和预测单元。即便如此，时钟频率本身很快就变成了一种假象：虽然 CPU 可以按照宣传的速度执行，但实际上只有极少部分代码可以达到（更不用说超过）每条指令一个周期的速度–大多数代码都在每条指令三个、四个、五个（或更多）周期的速度中挣扎。

Many saw these two trends—the rise of concurrency and the futility of increasing clock rate—and came to the logical conclusion: instead of spending transistor budget on “faster” CPUs that weren’t actually yielding much in terms of performance gains (and had terrible costs in terms of power, heat, and area), why not take advantage of the rise of concurrent software and use transistors to effect multiple (simpler) cores per die?

许多人看到了这两个趋势–并发功能的兴起和提高时钟频率的徒劳无益，并得出了合乎逻辑的结论：与其将晶体管预算花在 "更快 "的 CPU 上，而这些 CPU 在性能提升方面实际上并没有什么效果（而且在功耗、发热和面积方面的成本也很高），为什么不利用并发软件的兴起，使用晶体管来实现每个芯片上多个（更简单的）内核呢？

That it was the success of concurrent software that contributed to the genesis of chip multiprocessing is an incredibly important historical point and bears reemphasis. There is a perception that microprocessor architects have—out of malice, cowardice, or despair—inflicted concurrency on software.3 In reality, the opposite is the case: it was the maturity of concurrent software that led architects to consider concurrency on the die. (The reader is referred to one of the earliest chip multiprocessors—DEC’s Piranha—for a detailed discussion of this motivation.4) Were software not ready, these microprocessors would not be commercially viable today. If anything, the “free lunch” that some decry as being over is in fact, at long last, being served. One need only be hungry and know how to eat!

并发软件的成功促成了芯片多处理技术的诞生，这是一个极其重要的历史观点，值得再次强调。有一种观点认为，微处理器架构师出于恶意、怯懦或绝望，将并发性应用于软件3 。实际上，情况恰恰相反：正是并发软件的成熟促使架构师考虑在芯片上实现并发性。(读者可参阅最早的芯片多处理器之一–DEC 的 Piranha，以了解有关这一动机的详细讨论。4）如果软件尚未准备就绪，这些微处理器今天就不会在商业上可行。如果说有人抨击的 "免费午餐 "已经结束，那么事实上，"免费午餐 "终于开始供应了。只要饿了，知道怎么吃就行！

Concurrency is for Performance

The most important conclusion from this foray into history is that concurrency has always been employed for one purpose: to improve the performance of the system. This seems almost too obvious to make explicit—why else would we want concurrency if not to improve performance?—yet for all its obviousness, concurrency’s raison d’être is increasingly forgotten, as if the proliferation of concurrent hardware has awakened an anxiety that all software must use all available physical resources. Just as no programmer felt a moral obligation to eliminate pipeline stalls on a superscalar microprocessor, no software engineer should feel responsible for using concurrency simply because the hardware supports it. Rather, concurrency should be thought about and used for one reason and one reason only: because it is needs to yield an acceptably performing system.

从这一历史探索中得出的最重要结论是，并发一直被用于一个目的：提高系统性能。这一点几乎显而易见，无需赘言–如果不是为了提高性能，我们为什么要使用并发？然而，尽管显而易见，并发存在的理由却被越来越多的人所遗忘，就好像并发硬件的激增唤醒了一种焦虑，即所有软件都必须使用所有可用的物理资源。正如没有程序员觉得自己有道德义务在超标量微处理器上消除流水线停滞一样，软件工程师也不应该仅仅因为硬件支持并发就觉得自己有责任使用并发。相反，考虑和使用并发性只有一个原因：因为它需要产生一个性能可接受的系统。

Concurrent execution can improve performance in three fundamental ways: it can reduce latency (that is, make a unit of work execute faster); it can hide latency (that is, allow the system to continue doing work during a long-latency operation); or it can increase throughput (that is, make the system able to perform more work).

并发执行可以通过三种基本方式提高性能：减少延迟（即使工作单元执行得更快）；隐藏延迟（即允许系统在长时间延迟操作期间继续工作）；或提高吞吐量（即使系统能够执行更多工作）。

Using concurrency to reduce latency is highly problem-specific in that it requires a parallel algorithm for the task at hand. For some kinds of problems—especially those found in scientific computing—this is straightforward: work can be divided a priori, and multiple compute elements set on the task. Many of these problems, however, are often so parallelizable that they do not require the tight coupling of a shared memory—and they are often able to execute more economically on grids of small machines instead of a smaller number of highly concurrent ones. Further, using concurrency to reduce latency requires that a unit of work be long enough in its execution to amortize the substantial costs of coordinating multiple compute elements: one can envision using concurrency to parallelize a sort of 40 million elements—but a sort of a mere 40 elements is unlikely to take enough compute time to pay the overhead of parallelism. In short, the degree to which one can use concurrency to reduce latency depends much more on the problem than on those endeavoring to solve it—and many important problems are simply not amenable to it.

利用并发来减少延迟在很大程度上是针对具体问题的，因为它需要针对手头的任务采用并行算法。对于某些类型的问题，尤其是科学计算中的问题，这一点非常简单：可以先验地进行分工，并在任务中设置多个计算元素。然而，这些问题中的许多通常都是可并行处理的，因此不需要共享内存的紧密耦合，而且它们通常可以在小型机器网格上更经济地执行，而不是在数量较少的高度并发机器上执行。此外，利用并发性来减少延迟要求一个工作单元的执行时间足够长，以摊销协调多个计算元素的大量成本：我们可以设想利用并发性来并行处理 4000 万个元素的排序，但仅仅 40 个元素的排序不太可能花费足够的计算时间来支付并行性的开销。总之，利用并发来减少延迟的程度，更多取决于问题而非解决问题的人–许多重要问题根本无法利用并发。

For long-running operations that cannot be parallelized, concurrent execution can instead be used to perform useful work while the operation is pending; in this model, the latency of the operation is not reduced, but it is hidden by the progression of the system. Using concurrency to hide latency is particularly tempting when the operations themselves are likely to block on entities outside of the program—for example, a disk I/O operation or a DNS lookup. Tempting though it may be, one must be very careful when considering using concurrency merely to hide latency: having a parallel program can become a substantial complexity burden to bear just for improved responsiveness. Further, concurrent execution is not the only way to hide system-induced latencies: one can often achieve the same effect by employing nonblocking operations (e.g., asynchronous I/O) and an event loop (e.g., the poll()/select() calls found in Unix) in an otherwise sequential program. Programmers who wish to hide latency should therefore consider concurrent execution as an option, not as a foregone conclusion.

对于无法并行化的长期运行操作，可以使用并发执行来在操作等待期间执行有用的工作；在这种模式下，操作的延迟不会减少，但会被系统的进程所掩盖。当操作本身可能会阻塞程序外的实体时，使用并发来隐藏延迟尤其诱人–例如磁盘 I/O 操作或 DNS 查找。尽管这很诱人，但在考虑使用并发来掩盖延迟时，我们必须非常谨慎：仅仅为了提高响应速度，并行程序就可能成为复杂性的沉重负担。此外，并发执行并不是隐藏由系统引起的延迟的唯一方法：在一个顺序程序中采用非阻塞操作（如异步 I/O）和事件循环（如 Unix 中的 poll()/select() 调用）通常也能达到同样的效果。因此，希望隐藏延迟的程序员应将并发执行视为一种选择，而非必然结果。

When problems resist parallelization or have no appreciable latency to hide, the third way that concurrent execution can improve performance is to increase the throughput of the system. Instead of using parallel logic to make a single operation faster, one can employ multiple concurrent executions of sequential logic to accommodate more simultaneous work. Importantly, a system using concurrency to increase throughput need not consist exclusively (or even largely) of multithreaded code. Rather, those components of the system that share no state can be left entirely sequential, with the system executing multiple instances of these components concurrently. The sharing in the system can then be offloaded to components explicitly designed around parallel execution on shared state, which can ideally be reduced to those elements already known to operate well in concurrent environments: the database and/or the operating system.

当问题无法并行化或没有明显的延迟可隐藏时，并行执行提高性能的第三种方法就是提高系统的吞吐量。与其使用并行逻辑来加快单个操作的速度，不如采用多个并发执行的顺序逻辑来容纳更多同时进行的工作。重要的是，使用并发来提高吞吐量的系统不一定完全（甚至主要）由多线程代码组成。相反，系统中那些不共享状态的组件可以完全按顺序执行，系统可以同时执行这些组件的多个实例。然后，系统中的共享可以卸载到明确围绕共享状态的并行执行而设计的组件中，理想情况下，这些组件可以简化为那些已知的在并发环境中运行良好的元素：数据库和/或操作系统。

To make this concrete, in a typical MVC (model-view-controller) application, the view (typically implemented in environments such as JavaScript, PHP, or Flash) and the controller (typically implemented in environments such as J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency, provided that the model (typically implemented in terms of a database) allows for parallelism. Given that most don’t write their own databases (and virtually no one writes their own operating systems), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.

具体来说，在一个典型的 MVC（模型-视图-控制器）应用程序中，视图（通常在 JavaScript、PHP 或 Flash 等环境中实现）和控制器（通常在 J2EE 或 Ruby on Rails 等环境中实现）可以纯粹由顺序逻辑组成，并且仍然可以实现高并发，前提是模型（通常以数据库的形式实现）允许并行。鉴于大多数人都不会编写自己的数据库（也几乎没有人编写自己的操作系统），因此有可能在不明确创建单个线程或获取单个锁的情况下构建（事实上，许多人已经构建了）高并发、高可扩展的 MVC 系统；这是架构上的并发，而不是实现上的并发。