分布式系统原理介绍_分布式系统的全面介绍

最新推荐文章于 2022-10-09 15:00:00 发布

cumian9828

最新推荐文章于 2022-10-09 15:00:00 发布

阅读量4.2k

点赞数

文章标签：数据库分布式大数据 hadoop 区块链

原文链接：https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/

版权

本文概述了分布式系统的概念，解释了为何需要分布式系统以应对扩展性和容错性的需求。介绍了数据库的水平扩展、主副本复制策略及其一致性问题。探讨了分布式计算、分布式文件系统如HDFS和IPFS，以及区块链技术的应用。强调了分布式系统复杂性和需要权衡的CAP定理。总结中提示分布式系统的复杂性，提醒避免不必要的分布式设计。

摘要由CSDN通过智能技术生成

分布式系统原理介绍

by Stanislav Kozlovski

斯坦尼斯拉夫·科兹洛夫斯基(Stanislav Kozlovski)

分布式系统的全面介绍 (A Thorough Introduction to Distributed Systems)

什么是分布式系统，为什么这么复杂？ (What is a Distributed System and why is it so complicated?)

With the ever-growing technological expansion of the world, distributed systems are becoming more and more widespread. They are a vast and complex field of study in computer science.

随着世界技术的不断发展，分布式系统变得越来越广泛。它们是计算机科学中一个广阔而复杂的研究领域。

This article aims to introduce you to distributed systems in a basic manner, showing you a glimpse of the different categories of such systems while not diving deep into the details.

本文旨在以一种基本的方式向您介绍分布式系统，向您展示这些系统的不同类别，同时又不深入细节。

什么是分布式系统？ (What is a distributed system?)

A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user.

在最简单的定义中，分布式系统是一组计算机，这些计算机一起工作，对于最终用户而言，它们就像一台计算机。

These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

这些机器具有共享状态，可以同时运行，并且可以独立发生故障，而不会影响整个系统的正常运行时间。

I propose we incrementally work through an example of distributing a system so that you can get a better sense of it all:

我建议我们通过一个分发系统的示例来逐步进行工作，以便您可以更好地了解这一切：

Let’s go with a database! Traditional databases are stored on the filesystem of one single machine, whenever you want to fetch/insert information in it — you talk to that machine directly.

让我们来看一个数据库！传统数据库存储在单台计算机的文件系统中，只要您想在其中获取/插入信息，就可以直接与该计算机通信。

For us to distribute this database system, we’d need to have this database run on multiple machines at the same time. The user must be able to talk to whichever machine he chooses and should not be able to tell that he is not talking to a single machine — if he inserts a record into node#1, node #3 must be able to return that record.

为了分发此数据库系统，我们需要使该数据库同时在多台计算机上运行。用户必须能够与他选择的任何一台计算机交谈，并且不能告诉自己他不在与单台计算机交谈。如果用户将记录插入节点＃1，则节点＃3必须能够返回该记录。

为什么要分配系统？ (Why distribute a system?)

Systems are always distributed by necessity. The truth of the matter is — managing distributed systems is a complex topic chock-full of pitfalls and landmines. It is a headache to deploy, maintain and debug distributed systems, so why go there at all?

系统始终根据需要进行分发。事实的真相是-管理分布式系统是一个复杂的话题，充满了陷阱和地雷。部署，维护和调试分布式系统令人头疼，那么为什么要去那里呢？

What a distributed system enables you to do is scale horizontally. Going back to our previous example of the single database server, the only way to handle more traffic would be to upgrade the hardware the database is running on. This is called scaling vertically.

分布式系统允许您执行的操作是水平缩放 。回到前面的单个数据库服务器示例，处理更多流量的唯一方法是升级运行数据库的硬件。这称为垂直缩放 。

Scaling vertically is all well and good while you can, but after a certain point you will see that even the best hardware is not sufficient for enough traffic, not to mention impractical to host.

可以进行垂直扩展，虽然这样可以做得很好，但是经过一定时间后，您会发现即使最好的硬件也不足以满足足够的流量，更不用说托管主机了。

Scaling horizontally simply means adding more computers rather than upgrading the hardware of a single one.

横向扩展仅意味着添加更多计算机，而不是升级单个计算机的硬件。

It is significantly cheaper than vertical scaling after a certain threshold but that is not its main case for preference.

在一定的阈值之后，它比垂直缩放便宜得多，但这不是首选的主要情况。

Vertical scaling can only bump your performance up to the latest hardware’s capabilities. These capabilities prove to be insufficient for technological companies with moderate to big workloads.

垂直扩展只能将您的性能提升到最新的硬件功能。对于中等或大工作量的技术公司，这些功能被证明是不够的。

The best thing about horizontal scaling is that you have no cap on how much you can scale — whenever performance degrades you simply add another machine, up to infinity potentially.

关于水平缩放的最好之处在于，您对可缩放的数量没有限制-每当性能降低时，您只需添加另一台机器，直到无穷大。

Easy scaling is not the only benefit you get from distributed systems. Fault tolerance and low latency are also equally as important.

轻松扩展并不是从分布式系统中获得的唯一好处。容错和低延迟也同样重要。

Fault Tolerance — a cluster of ten machines across two data centers is inherently more fault-tolerant than a single machine. Even if one data center catches on fire, your application would still work.

容错能力 –跨两个数据中心的十台机器组成的集群本质上比一台机器具有更高的容错能力。即使一个数据中心着火了，您的应用程序仍然可以运行。

Low Latency — The time for a network packet to travel the world is physically bounded by the speed of light. For example, the shortest possible time for a request‘s round-trip time (that is, go back and forth) in a fiber-optic cable between New York to Sydney is 160ms. Distributed systems allow you to have a node in both cities, allowing traffic to hit the node that is closest to it.

低延迟 -网络数据包周游世界的时间实际上受光速的限制。例如，在纽约到悉尼之间的光缆中，请求往返时间 (即来回)的最短时间为160ms 。分布式系统使您可以在两个城市中都有一个节点，从而使流量可以到达距离它最近的节点。

For a distributed system to work, though, you need the software running on those machines to be specifically designed for running on multiple computers at the same time and handling the problems that come along with it. This turns out to be no easy feat.

但是，要使分布式系统正常工作，您需要将在这些计算机上运行的软件专门设计为可同时在多台计算机上运行并处理随之而来的问题。事实证明这并非易事。

扩展我们的数据库 (Scaling our database)

Imagine that our web application got insanely popular. Imagine also that our database started getting twice as much queries per second as it can handle. Your application would immediately start to decline in performance and this would get noticed by your users.

想象一下，我们的Web应用程序异常流行。还要想象一下，我们的数据库开始每秒收到的查询量是其处理能力的两倍。您的应用程序性能将立即开始下降，这将引起用户的注意。

Let’s work together and make our database scale to meet our high demands.

让我们一起努力，使我们的数据库规模可以满足我们的高要求。

In a typical web application you normally read information much more frequently than you insert new information or modify old one.

在典型的Web应用程序中，与插入新信息或修改旧信息相比，通常阅读信息的频率要高得多。

There is a way to increase read performance and that is by the so-called Primary-Replica Replication strategy. Here, you create two new database servers which sync up with the main one. The catch is that you can only read from these new instances.

有一种方法可以提高读取性能，即所谓的“ 主副本复制”策略。在这里，您将创建两个新数据库服务器，它们与主要数据库服务器同步。问题是您只能从这些新实例中读取。

Whenever you insert or modify information — you talk to the primary database. It, in turn, asynchronously informs the replicas of the change and they save it as well.

每当您插入或修改信息时，您都在与主数据库对话。反过来，它以异步方式通知副本，并保存更改。

Congratulations, you can now execute 3x as much read queries! Isn’t this great?

恭喜，您现在可以执行3倍的读取查询！这不是很好吗？

陷阱 (Pitfall)

Gotcha! We immediately lost the C in our relational database’s ACID guarantees, which stands for Consistency.

知道了！ 我们立即在关系数据库的ACID保证(代表一致性)中丢失了C。

You see, there now exists a possibility in which we insert a new record into the database, immediately afterwards issue a read query for it and get nothing back, as if it didn’t exist!

您会发现，现在存在一种可能性，我们可以在数据库中插入一条新记录，然后立即对其发出读取查询，并且什么也没有得到，就好像它不存在一样！

Propagating the new information from the primary to the replica does not happen instantaneously. There actually exists a time window in which you can fetch stale information. If this were not the case, your write performance would suffer, as it would have to synchronously wait for the data to be propagated.

从主数据库向副本数据库传播新信息不会立即发生。实际上存在一个时间窗口，您可以在其中获取过时的信息。如果不是这种情况，您的写入性能将受到影响，因为它必须同步等待数据被传播。

Distributed systems come with a handful of trade-offs. This particular issue is one you will have to live with if you want to adequately scale.

分布式系统需要进行一些折衷。如果要充分扩展规模，则必须解决这一特定问题。

继续扩大规模 (Continuing to Scale)

Using the replica database approach, we can horizontally scale our read traffic up to some extent. That’s great but we’ve hit a wall in regards to our write traffic — it’s still all in one server!

使用副本数据库方法，我们可以在一定程度上横向扩展读取流量。太好了，但是我们在写入流量方面遇到了麻烦-它仍然全部在一个服务器中！

We’re not left with much options here. We simply need to split our write traffic into multiple servers as one is not able to handle it.

我们这里没有太多选择。我们只需要将写入流量拆分为多个服务器，因为一台服务器无法处理。

One way is to go with a multi-primary replication strategy. There, instead of replicas that you can only read from, you have multiple primary nodes which support reads and writes. Unfortunately, this gets complicated real quick as you now have the ability to create conflicts (e.g insert two records with same ID).

一种方法是采用多主复制策略。在这里，您有多个支持读取和写入的主节点，而不是只能从其中读取的副本。不幸的是，由于您现在能够创建冲突 (例如，插入两个具有相同ID的记录)，因此这很快变得非常复杂。

Let’s go with another technique called sharding (also called partitioning).

让我们来看看另一种称为分片的技术 (也称为分区 )。

With sharding you split your server into multiple smaller servers, called shards. These shards all hold different records — you create a rule as to what kind of records go into which shard. It is very important to create the rule such that the data gets spread in an uniform way.

通过分片，您可以将服务器拆分为多个较小的服务器，称为分片。这些分片都保存着不同的记录-您创建一条规则，确定哪种记录进入哪个分片。创建规则以使数据以统一的方式传播非常重要。

A possible approach to this is to define ranges according to some information about a record (e.g users with name A-D).

一种可能的方法是根据一些有关记录的信息(例如，名称为AD的用户)来定义范围。

This sharding key should be chosen very carefully, as the load is not always equal based on arbitrary columns. (e.g more people have a name starting with C rather than Z). A single shard that receives more requests than others is called a hot spot and must be avoided. Once split up, re-sharding data becomes incredibly expensive and can cause significant downtime, as was the case with FourSquare’s infamous 11 hour outage.

分片密钥应非常谨慎地选择，因为基于任意列，负载并不总是相等的。 (例如，更多人的名字以C而不是Z开头)。收到比其他请求更多请求的单个分片称为热点，必须避免。拆分后，重新共享数据变得非常昂贵，并且可能导致大量停机，就像FourSquare臭名昭著的11个小时停机一样。

To keep our example simple, assume our client (the Rails app) knows which database to use for each record. It is also worth noting that there are many strategies for sharding and this is a simple example to illustrate the concept.

为了简化示例，假设我们的客户端(Rails应用程序)知道每个记录要使用哪个数据库。还值得注意的是，有很多分片策略，这是一个简单的示例来说明这一概念。

We have won quite a lot right now — we can increase our write traffic N times where N is the number of shards. This practically gives us almost no limit — imagine how finely-grained we can get with this partitioning.

现在我们已经赢了很多钱-我们可以将写入流量增加N倍，其中N是分片数。这几乎没有给我们任何限制-想象一下通过这种划分我们可以获得多么细的粒度。

陷阱 (Pitfall)

Everything in Software Engineering is more or less a trade-off and this is no exception. Sharding is no simple feat and is best avoided until really needed.

软件工程中的所有内容或多或少都是一个折衷，这也不例外。分片并不是一件简单的事，最好在真正需要之前避免。

We have now made queries by keys other than the partitioned key incredibly inefficient (they need to go through all of the shards). SQL JOIN queries are even worse and complex ones become practically unusable.

现在我们已经通过按键进行查询 除了分区密钥效率极低之外(它们需要遍历所有分片)。 SQL JOIN查询甚至更糟，复杂的查询实际上变得不可用。

分散与分布式 (Decentralized vs Distributed)

Before we go any further I’d like to make a distinction between the two terms.

在继续之前，我想对两个术语进行区分。

Even though the words sound similar and can be concluded to mean the same logically, their difference makes a significant technological and political impact.

尽管这些单词听起来很相似，并且在逻辑上可以得出相同的结论，但它们之间的差异会产生重大的技术和政治影响。

Decentralized is still distributed in the technical sense, but the whole decentralized systems is not owned by one actor. No one company can own a decentralized system, otherwise it wouldn’t be decentralized anymore.

从技术意义上讲，去中心化仍然是分布式的，但是整个去中心化系统不是由一个参与者拥有的。没有一家公司可以拥有去中心化的系统，否则就不会再去中心化了。

This means that most systems we will go over today can be thought of as distributed centralized systems — and that is what they’re made to be.

这意味着我们今天将要讨论的大多数系统都可以被视为分布式集中式系统 ，这就是它们的本质。

If you think about it — it is harder to