嵌入式外行好学吗_对于外行来说，什么是分布式系统？（第1部分）-CSDN博客

嵌入式外行好学吗

This article was originally published on my homepage — https://www.kislayverma.com/post/for-the-layman-ep-1-what-is-a-distributed-system

这篇文章最初发表在我的主页上— https://www.kislayverma.com/post/for-the-layman-ep-1-what-is-a-distributed-system

Hello folks!

大家好！

I am starting a new series of articles called “For the Layman” to cover some frequently encountered software engineering concepts for non-developers or beginners. The articles with try to explain these concepts in simple terms with as little jargon as I can manage.

我将开始撰写一系列新文章，称为“面向Layman”，以介绍非开发人员或初学者经常遇到的一些软件工程概念。这些文章试图用我可以管理的最少术语用简单的术语来解释这些概念。

In this first article of the series, let’s understand distributed systems.

在本系列的第一篇文章中，让我们了解分布式系统 。

As the name suggests, a distributed system is a system whose components are distributed. Let’s look at both those words individually.

顾名思义，分布式系统是其组件为分布式的系统。让我们分别看一下这两个词。

什么是系统？ (What Is a System?)

A system is a set of parts working together to deliver certain functionality. A clock is a system of springs and gears that tells time reliably. A car is built up of many parts which allow us to be driven from one place to another. Let’s call each part a component.

系统是一起工作以提供某些功能的一组零件。时钟是由弹簧和齿轮组成的系统，可以可靠地指示时间。汽车由许多零件组成，使我们可以从一个地方开车到另一个地方。让我们将每个部分称为一个组件。

整体式：非分布式系统 (Monoliths: not distributed systems)

Most mechanical systems are not and cannot be distributed. Components in most hardware systems assume local availability of their partnering components. Piston rods expect to be welded to crankshafts, keyboards expect to be connected to processors, etc. A software system built along these is sometimes called a monolith.

大多数机械系统都没有，也无法分配。大多数硬件系统中的组件都假定其伙伴组件在本地可用。期望将活塞杆焊接到曲轴上，将期望将键盘连接到处理器上，等等。基于这些工具构建的软件系统有时也称为整体式 。

垂直缩放 (Vertical scaling)

A side effect of having to put all components next to each other is that to build a more powerful system, we need to fit more pieces on the same chassis (so to speak). If we want to add more engines so our car can go faster, we need to add a larger engine with more cylinders. This in turn requires a large car, and so on.

必须将所有组件彼此相邻放置的副作用是，要构建功能更强大的系统，我们需要在同一机架上安装更多部件(可以这么说)。如果我们想增加更多的发动机，以便我们的汽车能够更快地行驶，我们需要增加一个具有更多汽缸的更大发动机。这又需要一辆大型汽车，依此类推。

You can also think of trying to add more power to your laptop by adding more processors or more memory. It can be done but makes the laptop larger and larger — eventually, we end up with a desktop rather than a laptop. This process of adding more and more power to a single physical system is called vertical scaling. It is an important strategy (Moore’s law is essentially about vertical scaling) but comes with hard physical limits that are very difficult to surmount.

您也可以考虑通过添加更多处理器或更多内存来为笔记本电脑添加更多功能。可以做到，但是却使笔记本电脑越来越大-最终，我们最终选择了台式机而不是笔记本电脑。为单个物理系统增加越来越多的功率的过程称为垂直缩放 。这是一个重要的策略( 摩尔定律本质上是关于垂直缩放)，但是它具有很难克服的严格的物理限制。

位置耦合 (Location coupling)

Mechanical systems also assume a certain guarantee in terms of collaborating with each other. No unreliability is expected between the turning of a gear and the turning of the clock hand (there may be errors of precision, but that is not the question here). A clock is designed with the assumption that certain things will cause certain other things to happen, and if they don’t, then the clock is considered broken. The functioning of the complete system depends strongly on all of the components always being physically present in a certain location at a certain time. This type of dependency pattern is called a form of tight coupling between components.

机械系统在相互协作方面也承担了一定的保证。在齿轮转动和钟针转动之间不会出现不可靠的情况(可能存在精度误差，但这不是问题所在)。时钟的设计假设是某些事情会导致某些其他事情发生，如果没有发生，则认为该时钟已损坏。完整系统的功能在很大程度上取决于始终在特定时间在特定位置物理存在的所有组件。这种依赖性模式称为组件之间紧密耦合的形式。

全球一致状态 (Globally consistent state)

This hard dependency on all parts strictly working together has an interesting implication. It means that if we know what state (e.g., position, location) one component is in, we necessarily also know the states of all other components. If component A is not in the position that component B expects it to be, then we have a problem. As a result, our knowledge of the system at any point in time is complete and consistent. This is called global state.

严格依赖所有部分的这种严格依赖性具有有趣的含义。这意味着，如果我们知道一个组件处于什么状态(例如，位置，位置)，那么我们也必定也知道所有其他组件的状态。如果组件A不在组件B期望的位置，那么我们有问题。因此，我们在任何时间点对系统的了解都是完整而一致的。这称为全局状态 。

As you may be able to see by now, it is difficult to build very large systems when everything must be co-located and must work all the time. A large system built using these principles is brittle — any small failure can cause a complete outage. It is also not scalable — as the size grows, not only do we have to keep fitting everything next to each other (imagine an engine with thousands of cylinders, all of which must be next to each other and must coordinate completely) but we also have to have complete knowledge of all components at all times to be able to understand if the system is working properly. The cognitive load such a system creates is tremendous and increases exponentially with every new component.

如您现在可能已经看到的那样，当所有东西都必须放在同一位置并且必须一直工作时，很难构建非常大的系统。使用这些原理构建的大型系统很脆弱-任何小的故障都可能导致完全停机。它也不具有可伸缩性-随着尺寸的增加，我们不仅必须保持所有零件彼此相邻(想象一个有数千个气缸的发动机，所有气缸必须彼此相邻并且必须完全协调)，而且我们还必须始终全面了解所有组件，才能了解系统是否正常运行。这种系统产生的认知负担是巨大的，并且随着每个新组件的增加而呈指数增长。

Modern software architecture is as much about handling unprecedented scale as it is about solving business problems. Whether it is the billions of people using Facebook or the surges of online shoppers on Singles Day, software systems today are expected to deliver tremendous performance and continue to function even when parts fail. To fulfil these requirements, distributed systems have emerged as an alternative paradigm for constructing systems built out of many components.

现代软件体系结构既要解决前所未有的规模，也要解决业务问题。无论是数十亿人使用Facebook还是光棍节上的在线购物者数量激增，当今的软件系统都有望提供出色的性能，即使零件出现故障也可以继续运行。为了满足这些要求，分布式系统已经成为构建由许多组件构建的系统的替代范例。

什么是分布式系统？ (What Is a Distributed System?)

Server: A computer connected to the internet or some other network.

服务器 ：连接到互联网或其他网络的计算机。

Distributed systems are made up of “independent” components which are not necessarily located next to each other.

分布式系统由不一定彼此相邻放置的“独立”组件组成。

This seemingly simple definition of distributed systems has huge ramifications and gives these systems their unique strengths and weaknesses. Let’s cover some of these in detail.

分布式系统的这种看似简单的定义具有巨大的影响，并赋予了这些系统独特的优点和缺点。让我们详细介绍其中一些。

位置透明 (Location transparency)

Components in a distributed system communicate with each other via methods/protocols that don’t require the calling component to know where the called component is located. So the engine can potentially be located at home even when you drive your car. Somehow, when the accelerator is pressed, the engine generates more power, which is somehow transferred to the wheels. Another analogy is the remote working style we now see everywhere. Team members are not physically located together but still cooperate by performing their respective jobs in benefit of a shared objective.

分布式系统中的组件通过不需要调用组件知道被调用组件位于何处的方法/协议相互通信。因此，即使您开车，发动机也可能位于家里。不知何故，当踩下加速器时，发动机会产生更多动力，并以某种方式转移到车轮上。另一个比喻是我们现在随处可见的远程工作风格。团队成员并不物理上位于一起，但仍然可以通过共同工作来实现共同的目标来合作。

This is called location transparency, which is a form of loose coupling. (All developers get a dreamy-eyed look when they hear this word — try it!)

这称为位置透明性 ，是一种松散耦合的形式。 (所有开发人员在听到这个单词时都会得到梦幻般的表情-试试吧！)

The advantages of location transparency are obvious. If software components need not be co-located, then we can move them to different physical machines, each of which can then be vertically scaled. That is, we can buy powerful machines for each component separately instead of fitting all components on one machine. This directly leads to a more powerful system. Note that we are not mandating that components must be located on separate servers, just that it shouldn’t matter where they are located as long as there is a way to locate them.

位置透明的优点是显而易见的。如果不需要在同一位置放置软件组件，那么我们可以将它们移动到不同的物理机器上，然后可以垂直缩放每个物理机器。也就是说，我们可以为每个组件分别购买功能强大的机器，而不必在一台机器上安装所有组件。这直接导致功能更强大。请注意，我们并没有强制要求组件必须位于单独的服务器上，只是只要可以找到它们的位置就无所谓。

How do these distributed components find each other? There are many ways to do this. One of the most popular mechanisms is DNS (Domain Name System) which maps names to IP addresses (unique identities of servers all over the internet). The backbone of all networking is the ability to locate one specific machine given its IP address or domain name. This seemingly small (but actually extremely complicated) technique has allowed software to eat the world.

这些分布式组件如何相互找到？有很多方法可以做到这一点。最受欢迎的机制之一是DNS(域名系统) ，它将名称映射到IP地址(整个Internet上服务器的唯一标识)。所有网络的骨干网都具有在给定一台特定计算机的IP地址或域名的情况下的能力。这种看似很小(但实际上非常复杂)的技术已使软件吞噬了整个世界。

部分故障模式 (Partial failure mode)

A fallout of the distributed nature is that our failure mode is not all-or-nothing anymore. The card reader component may have failed but the account management component may be running. This means that some functionality related to account management may still be accessible even though our card swiping users are frustrated.

分布式性质的一个后果是，我们的故障模式不再是全有或全无。读卡器组件可能已发生故障，但帐户管理组件可能正在运行。这意味着，即使我们的刷卡用户感到沮丧，某些与帐户管理相关的功能仍然可以访问。

横向可扩展性 (Horizontal scalability)

Yet another corollary of location transparency is that there need not be only one more instance of a component. If it doesn’t matter where the engine is located, we can now add ten or more independent engines to add that much more power.

位置透明性的另一个推论是，不必只再有一个组件实例。如果引擎在哪里都无所谓，我们现在可以添加十个或更多独立引擎来增加更多的动力。

It is even possible to add and remove engines from the car as needed. This ability to add more instances of a component is called horizontal scaling and is the currently preferred mechanism for scaling software systems since it bypasses the physical limitations of how powerful a single server can be — we just add more low-power servers to compensate.

甚至可以根据需要从汽车中添加和删除引擎。这种添加更多组件实例的能力称为水平缩放 ，是当前缩放软件系统的首选机制，因为它绕过了单个服务器的强大功能的物理限制-我们只是添加了更多低功耗服务器来进行补偿。

So we have a bunch of components living on different servers (machines) and communicating over a network (internet/LAN). Is that it?

因此，我们有一堆组件驻留在不同的服务器(机器)上，并通过网络(互联网/ LAN)进行通信。是吗

最终一致性 (Eventual consistency)

There is one more interesting thing to understand here: The network is unreliable and slow.

这里还有另外一件有趣的事情要理解：网络不可靠且速度慢。

If your Zoom call has ever hung mid-sentence or YouTube has buffered, you know what I’m talking about. Data is sent from one component to the other, but sometimes doesn’t reach it or reaches it after some noticeable time. Maybe the wire is cut, maybe the other component reads the data but crashes before it could do anything with it, maybe a lot of data was flowing over the wire and hence everything is stuck.

如果您的Zoom通话中途挂断或YouTube已缓冲，那么您知道我在说什么。数据从一个组件发送到另一个组件，但是有时无法到达或在明显的时间后到达。也许导线被切断了，也许其他组件读取了数据，但是在对其执行任何操作之前崩溃了，也许大量数据流过了导线，因此一切都被卡住了。

Any which way this occurs, this results in different parts of the system not having complete information about each other. This happens all the time in the systems we encounter on the web — payment was taken but the order could not be placed (payment component is not able to talk to order component), money transfer is triggered but will be reflected in 24 hours (transfer component knows that there is to be a transfer, but the account component has not been told yet).

无论采用哪种方式发生，都会导致系统的不同部分彼此之间没有完整的信息。这种情况在我们在网络上遇到的所有时间中都会发生-已付款但无法下达订单(付款部分无法与订单部分进行对话)，已触发汇款，但会在24小时内反映出来(转移组件知道要进行转帐，但尚未告知帐户组件)。

Another way of putting this is to say that the independent components of a distributed system understand their own state (e.g., position, orientation, amount of load) absolutely but may be out of sync (to a greater or lesser extent) with other components. The out-of-sync-ness is a result of the network acting as a queue of unshared knowledge between them.

另一种说法是，分布式系统的独立组件绝对了解其自身状态(例如，位置，方向，负载量)，但可能与其他组件(或多或少)不同步。不同步是网络充当它们之间未共享知识队列的结果。

This is called inconsistency and is the other side of the global state we encountered in non-distributed systems. The remedy for this in distributed systems is eventual consistency, meaning we must implement mechanisms which will ensure that all components eventually agree with each other on what the overall state of the system is. Note that this is a catchup game, and components are out of sync by design rather than by mistake. This gives us some buffer time in which we can perform knowledge transfer instead of making every component aware of everything instantaneously (a physical impossibility in the distributed world, given that nothing can travel faster than the speed of light).

这被称为不一致 ，是我们在非分布式系统中遇到的全局状态的另一端。分布式系统中对此的补救措施是最终的一致性 ，这意味着我们必须实施一种机制，以确保所有组件最终就系统的总体状态达成一致。请注意，这是一个追赶游戏，并且组件在设计上而不是错误地是不同步的。这给了我们一定的缓冲时间，在其中我们可以执行知识传递，而不是使每个组件都立即知道所有事情(在分布式世界中，由于任何事物的传播速度都不能超过光速，因此这在物理上是不可能的)。

不良零件 (The bad parts)

While distributed systems can be extremely resilient to failures and very responsive under high loads, building well-designed distributed systems is an extremely complicated undertaking.

尽管分布式系统可以非常灵活地应对故障并在高负载下做出快速响应，但是构建精心设计的分布式系统却是一项极其复杂的工作。

The first problem is user experience. There is no way to hide the eventually consistent nature of the system from the users. With instant gratification being the increasingly accepted norm, it can sometimes take a lot of clever UX to keep the system distributed and the user happy.

第一个问题是用户体验。没有办法向用户隐藏系统的最终一致性。随着即时满足成为越来越普遍接受的规范，有时可能需要大量聪明的UX才能使系统保持分布式并使用户满意。

A distributed system is necessarily much more complicated than a monolithic (everything in one place) design to compensate for the fallacies of distributed computing. A lot of new tools have evolved to help developers build reliable distributed systems, but this is still far from an easy or solved problem.

分布式系统必须比整体式设计(要放在一个地方的所有东西)复杂得多，以弥补分布式计算的谬误。已经开发出许多新工具来帮助开发人员构建可靠的分布式系统，但这仍然不是一个容易解决的问题。

I hope you got a basic understanding of distributed systems from this article. If you are a beginner and found that parts of the article were still too technical to understand, let me know in the comments and I will try to break things down in simpler terms.

我希望您对本文的分布式系统有一个基本的了解。如果您是初学者，并且发现本文的某些部分仍然太技术性以至于无法理解，请在评论中让我知道，我将尝试以更简单的术语进行分解。

注册我的邮件列表，直接在收件箱中接收该系列的更多文章(以及我的其他新闻)