实时共享数据清单

最新推荐文章于 2023-01-17 22:37:56 发布

weixin_26721865

最新推荐文章于 2023-01-17 22:37:56 发布

阅读量192

点赞数

文章标签： python java

原文链接：https://codeburst.io/a-manifesto-for-live-sharable-data-4869362dfea5

版权

The ‘truth’ should be the data that is being used, not the data in distant storage.

“真相”应该是正在使用的数据，而不是远程存储中的数据。

Distribute the data automatically, with the guarantee that all of it will converge on the same ‘truth’.

自动分配数据，并确保所有数据都收敛在相同的“真相”上。

Use a published open standard for encoding data with its meaning, and communicating changes to it.

使用已发布的开放标准对数据及其含义进行编码，并传达对其的更改。

Hi, I’m George. This year I left my day job as a software engineering leader and plunged into lockdown under a mountain of work, uncertainty, and risk. Last week, I pushed the button to launch the m-ld Developer Preview. The period between now and when lockdown began has been a mad journey filled with moments of creativity, anxiety, frustration, imposter syndrome, fight and flight urges, elation and time-dilation, and so! much! coffee!

嗨，我是乔治。今年，我离开了日常工作，担任软件工程负责人，并陷入大量工作，不确定性和风险的束缚中。上周，我按下了按钮以启动m-ld开发人员预览。从现在到锁定开始的那段时期是一段疯狂的旅程，充满了创造力，焦虑，沮丧，冒名顶替综合症，战斗和逃避冲动，兴高采烈和时间膨胀等时刻！许多！咖啡！

为什么 (The Why)

As a data management app developer, I’ve used many ways to encode and store data. Frequently, they are combined in the same architecture with one of the locations becoming known as the ‘central truth’:

作为数据管理应用程序开发人员，我使用了许多方法来编码和存储数据。通常，它们结合在同一体系结构中，其中一个位置被称为“中心真理”：

The software application is distributed, using many intermediate data representations but the ‘truth’ is only in the database

While the specific technologies vary, the overall pattern is very common. Motivations include properties of security, integrity, consistency, operational efficiency, and cost. However, there are some other peculiar properties that stand out:

尽管具体技术各不相同，但总体模式非常普遍。动机包括安全性，完整性，一致性，运营效率和成本的属性。但是，还有其他一些奇特的特性：

The ‘truth’ is on the far right-hand side; but the data is being used throughout, with the particular value being realized on the left.
“真相”在最右边。但数据始终在使用，左侧显示了特定的值。
The software application is responsible for both distributing the data and for operating on it.
该软件应用程序负责分发数据并对其进行操作。
Every encoding syntax is specific to a technology and does not expose the data’s meaning enough to be independently understood.
每种编码语法都是特定于某种技术的，不会充分揭示数据的含义以至于无法独立理解。

The main consequence of these properties is application code complexity. We have to be incredibly careful to maintain an understanding of the code as to how current (how close to the truth) our copy of the data is. We must then operate on the data accordingly, and share the understanding with other components. This is hard and frequently goes awry; resulting in software bugs that are very hard to reproduce, let alone fix.

这些属性的主要结果是应用程序代码的复杂性。我们必须非常谨慎地保持对代码的理解，以了解我们的数据副本的最新程度(与事实的接近程度)。然后，我们必须对数据进行相应的操作，并与其他组件共享理解。这很难并且经常出错。导致很难重现的软件错误，更不用说修复了。

In this blog, I’ll argue that — with recent advances in computer science — we can make improvements to this for many applications. Applying our manifesto, we want our architecture to look more like this:

在本博客中，我将辩称-随着计算机科学的最新进展，我们可以针对许多应用程序对此进行改进。应用宣言，我们希望我们的架构看起来像这样：

The ‘truth’ is now the objects in the software application code, and the database is just a backup

如何 (The How)

One thing to notice in the centralized data pattern is that we’re taking each encoding of the data and translating it into a new one so as to make it suitable for either computation, storage, added security, or for any other reason. At each translation, the complexity of keeping the new encoding up-to-date with the previous ones ramps up.

在集中式数据模式中要注意的一件事是，我们正在对数据进行每种编码，并将其转换为新的编码，以使其适用于计算，存储，增加的安全性或任何其他原因。在每次转换时，保持新编码与先前编码保持最新的复杂性会增加。

But what if we did away with the idea of re-encoding the current data, and instead transacted in changes? Humans do this naturally. When having a conversation about some information, we don’t re-state it every time we want to adjust it. We refine information by discussing the delta between the old and the new. And we naturally switch between re-statement and deltas as required.

但是，如果我们放弃了重新编码当前数据，而是进行更改交易的想法，该怎么办？人类自然会这样做。在讨论某些信息时，我们不会在每次要调整它时都重新声明它。我们通过讨论新旧之间的差异来完善信息。我们自然会根据需要在重述和增量之间切换。

This concept is nothing new in software either — event-driven architectures have been a common paradigm since at least the mid-2000s. But consumers of ‘events’ have a new problem: to apply the change to their encoding of the current data. This distributes logically duplicate program code to every consumer — and lines of code are at least linearly proportional to bugs. Even worse, the event ordering is critical so the coordination of the totally ordered log of events becomes the new centralized ‘truth’ (and a literally bigger one).

这个概念在软件中也不是什么新事物，至少从2000年代中期开始，事件驱动的体系结构就成为一种常见的范例。但是“事件”的使用者有一个新问题：将更改应用于他们对当前数据的编码。这会将逻辑上重复的程序代码分发给每个使用者-并且代码行至少与错误成线性比例。更糟糕的是，事件排序至关重要，因此事件的全部排序日志的协调成为新的集中式“真相”(实际上是更大的“真相”)。

Let’s deal with the code duplication issue first. Being good engineers we take care not to repeat ourselves but this becomes hard to do when re-stating something in different languages. So, what if we had a common language for data? One that could express both state and changes to state? Since we’re here, let’s have one in which we can encode the meaning of the data, per our manifesto, that includes a natural way to identify data universally. And further, can we have one for which native, widely-available, battle-hardened database engines exist so that sometimes we don’t have to translate anything at all?

让我们首先处理代码重复问题。作为优秀的工程师，我们注意不要重复自己，但是当用不同语言重新声明某些内容时，这变得很难做到。那么，如果我们有一种通用的数据语言呢？一个既可以表达状态又可以表达状态变化的东西？既然我们在这里，就让我们根据自己的宣言对数据的含义进行编码，其中包括一种自然地通用识别数据的方法。再者，我们是否可以为它提供一个本机的，广泛可用的，经过战斗加固的数据库引擎，以便有时我们根本不需要翻译任何内容？

Sounds like a big ask. Luckily, academia and industry have been working on it for some time. But let’s look at the other problem: change ordering.

听起来像个大问题。幸运的是，学术界和工业界已经为此工作了一段时间。但是，让我们看另一个问题：更改顺序。

变更顺序 (Change Ordering)

Imagine if you shared some information with a friend and then every thought you had about it couldn’t start until your friend finished whatever thought they were having about it. This is the strictest way that centralized data management systems maintain consistency.

想象一下，如果您与朋友分享了一些信息，然后您对它的所有想法就无法开始，直到您的朋友完成了他们对它的任何想法。这是集中式数据管理系统保持一致性的最严格的方法。

To mitigate the impact of this on the fluency of data manipulation, there are various strategies available including fine-grained locking, optimistic locking, and a choice of transaction isolation levels. These have various merits, but each of them re-introduces some of the very distributed application complexity we were trying to reverse, and they still require the central ordered log. What if we went the other way, and just removed the ordered log entirely?

为了减轻这种情况对数据操作流畅性的影响，可以使用各种策略，包括细粒度锁定，乐观锁定和选择事务隔离级别。它们具有各种优点，但是它们每个都重新引入了我们试图扭转的一些非常分散的应用程序复杂性，并且它们仍然需要集中的有序日志。如果我们采用另一种方式，并完全删除了有序日志，该怎么办？

There are two approaches to concurrency control that don’t need a total ordering of changes. One is called Conflict-free Replicated Data Types (CRDTs), and the other Operational Transformation (OT). These do provide the required guarantee that copies of the data will converge to the same ‘truth’. But they don’t remove the possibility that concurrent changes will disagree with each other and lead to a ‘truth’ that doesn’t make sense.

并发控制有两种方法，它们不需要总的变更顺序。一种称为无冲突复制数据类型(CRDT)，另一种称为操作转换(OT)。这些确实提供了必要的保证，即数据副本将收敛到相同的“真相”。但是他们并没有消除并发更改彼此不一致并导致没有意义的“真相”的可能性。

But wait, you and your friend had no trouble refining your shared information, with no deterministic coordination whatsoever. How? Humans employ myriad strategies for coordination. You withhold thoughts while someone else is talking. You undo and redo thoughts against new information, both before and after expressing them. You notice conflicts that corrupt the information or render it illogical, apply obvious resolutions, and negotiate others. You actively seek consensus or delegate decisions.

但是，等等，您和您的朋友可以毫不费力地完善您的共享信息，而无需任何确定性的协调。怎么样？人类采用多种策略进行协调。当其他人讲话时，您会保留想法。在表达新信息之前和之后，您都可以撤消和重做想法。您会注意到冲突，这些冲突会破坏信息或使其变得不合逻辑，采用明显的解决方案并进行其他协商。您积极寻求共识或委托决策。

In the case of document editing, we can go further and notice that given a foundational level of concurrency control in the software — Google Docs uses OT — editing by multiple humans works fine and doesn’t require much explicit coordination at all. Research groups have found that this applies just as well to CRDTs.

在文档编辑的情况下，我们可以更进一步地注意到，在软件中并发控制的基本级别上(Google Docs使用OT)，由多个人员进行的编辑工作很好，根本不需要太多明确的协调。研究小组发现，这同样适用于CRDT。

结论 (Conclusion)

There are many finer details to explore in practice but we have established that our manifesto can be met, in principle, with the application of current computer science. The approach that we’ve taken with m-ld is to provide a protocol, with implementing engines, for distributing data in a distributed application. Here are some of the points we covered in this article:

在实践中，有许多更详细的细节需要探索，但是我们已经确定，原则上，使用当前的计算机科学可以满足我们的宣言。我们使用m-ld采取的方法是提供一种协议，并带有实现引擎，用于在分布式应用程序中分发数据。以下是我们在本文中介绍的一些要点：

The ‘truth’ is the data exposed to the app by the engine.
“真相”是引擎向应用程序公开的数据。
The data is automatically distributed by the engine with the guarantee that all engines will converge on the same ‘truth’.
数据由引擎自动分发，并确保所有引擎都将汇聚在同一“真相”上。
We use an open standard for encoding data with its meaning, and communicating changes to it.
我们使用开放标准来对数据及其含义进行编码，并传达对其的更改。

For now, we’re proving out the tech and filling out the corners that we think are essential for collaboration and autonomy use-cases. But we think we’re onto something important to data architectures in general.

目前，我们正在验证技术并填补我们认为对于协作和自治用例至关重要的角落。但是我们认为总体上来说，我们正在着手于对数据架构重要的事情。

And we’d love to hear what you think. If you’re ready to try m-ld out, you can work with the Developer Preview right now. Let us know what you’re building!

我们很想听听您的想法。如果您准备尝试进行m-ld测试，则可以立即使用Developer Preview 。让我们知道您正在建造什么！

Image for post — xnimrodx from xnimrodx从 flaticon.com. Banner photo by flaticon.com 。 Vienna Reyes on 维也纳雷耶斯的 “ Unsplash. Unsplash”横幅广告。

翻译自: https://codeburst.io/a-manifesto-for-live-sharable-data-4869362dfea5

weixin_26721865

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
实时共享数据清单

The ‘truth’ should be the data that is being used, not the data in distant storage. “真相”应该是正在使用的数据，而不是远程存储中的数据。 Distribute the data automatically, with the guarantee that all of it will converge on t...
复制链接

扫一扫