adobe me动态链接_adobe io事件在s3第二部分上构建分布式链接列表

最新推荐文章于 2024-07-20 17:12:48 发布

郝ren

最新推荐文章于 2024-07-20 17:12:48 发布

阅读量436

点赞数

文章标签： python java 机器学习 linux mysql

原文链接：https://medium.com/adobetech/adobe-i-o-events-building-a-distributed-linked-list-on-s3-part-ii-1538aa824833

版权

adobe me动态链接

In our previous post, we introduced the concept of a Distributed Linked List. We also talked about how Adobe I/O Events implemented them and why those implementations failed.

在上一篇文章中，我们介绍了分布式链接列表的概念。我们还讨论了Adobe I / O Events如何实现它们以及为什么这些实现失败。

After pushing out our second implementation we were keen on rearchitecting our Journaling subsystem to be more performant, cost-effective, and most importantly scalable.

在推出第二种实现方式之后，我们渴望重新配置日记子系统，使其更具性能，成本效益以及最重要的是可扩展性。

From the shortcomings of the first two implementations, we had already learned that utilizing an object store (such as AWS S3) to store the event payloads was the right idea — as long as — we could batch our events together in a way that facilitated event reads.

从前两个实现的缺点中，我们已经了解到利用对象存储(例如AWS S3)存储事件有效负载是正确的想法，只要我们可以通过促进事件的方式将事件批处理在一起读。

We also learned that performing a DB query in the critical path of reading events would continue to prove to be the bottleneck. And, hence, to achieve a truly scalable system we would need to eliminate all database interactions in reading events.

我们还了解到，在读取事件的关键路径中执行数据库查询将继续被证明是瓶颈。因此，为了实现真正可扩展的系统，我们需要消除读取事件中的所有数据库交互。

Putting both those learnings together and accounting for the fact that our API, to a great extent, resembled traversing a linked list — we started to explore the possibility of building a linked list of S3 objects. Where each S3 object would not only contain a batch of events (the data), but also the location of the next S3 object in that list.

将所有这些知识放在一起，并考虑到我们的API在很大程度上类似于遍历链接列表这一事实-我们开始探索构建S3对象的链接列表的可能性。 每个S3对象不仅将包含一批事件(数据)，而且还将包含该列表中下一个S3对象的位置。

And, if we could build such a list, reading events would be a simple list traversal that eliminates any kind of database lookups. Not only that, but the event reads would have the potential to scale as much as AWS S3 itself. And if we played our cards right, the cost of reading events could be as low as S3 GET requests ($4 per 10 million requests). Ka-ching!

而且，如果我们可以构建这样的列表，则读取事件将是一个简单的列表遍历，从而消除了任何类型的数据库查找。不仅如此，事件读取还具有与AWS S3本身一样大的扩展潜力。而且，如果我们打出正确的牌，则读取事件的成本可能会低至S3 GET请求(每1000万个请求4美元)。 嘉庆！

难题 (The hard problem)

Reading the events from the list was never a hard problem, the hard problem was to write those events into the list in the first place — in a manner that is scalable and that works in near real-time.

从列表中读取事件从来都不是一个难题，难题是首先将这些事件写入列表中-这种方式具有可伸缩性，并且几乎可以实时工作。

For true scalability and availability, we were sure that we would ultimately need multiple compute instances (containers) to write to a list concurrently because a single container could be scaled-up only so much.

为了实现真正的可伸缩性和可用性，我们确定最终将需要多个计算实例(容器)来同时写入列表，因为单个容器只能扩展这么多。

And, thus, what we really needed to solve was — writing concurrently to a shared resource in a distributed environment. We also needed to ensure that our solution was “mathematically” correct and was guaranteed to never fail — else we may end up losing or corrupting customer’s data.

因此，我们真正需要解决的是-在分布式环境中同时写入共享资源。我们还需要确保我们的解决方案“在数学上是正确的”并且保证永远不会失败 -否则我们最终可能会丢失或破坏客户的数据。

分布式锁定 (Distributed Locking)

At first glance, the idea of writing concurrently could be easily solved through a lock that could guarantee mutual exclusion. But once we went further down this path, first with a Redis based Distributed Lock (RedLock) and then with ZooKeeper, we learned —

乍看之下，可以通过保证相互排斥的锁轻松解决同时编写的想法。但是一旦我们走了这条路，首先使用基于Redis的分布式锁(RedLock)，然后使用ZooKeeper，我们了解到-

Even though correctly locking between two threads in a JVM or between two processes on the same machine is hard, yet, it is a solved problem.
即使很难在JVM中的两个线程之间或在同一台计算机上的两个进程之间正确锁定，这仍然是一个已解决的问题。
However, achieving the same mutual exclusion guarantee in a distributed environment is not solved. We could either guarantee mutual exclusion or liveness, and we needed both. Martin Kleppmann has written an enlightening article about this analyzing Redlock.
但是， 不能解决在分布式环境中实现相同的互斥保证的问题。我们可以保证相互排斥或活跃，我们都需要。马丁·克莱普曼(Martin Kleppmann)撰写了一篇有关此分析Redlock的启发性文章。

Luckily, we also discussed our approach with experienced folks internally —

幸运的是，我们还与内部经验丰富的人讨论了我们的方法-

The difference between ‘a distributed lock’ and ‘a distributed lock that works in practice and is on the critical path for high throughput writes and magically is not the bottleneck’ is ~50 person years. [sic] — Michael Marth (Director, Engineering)

“分布式锁”与“在实践中有效并且位于高吞吐量写入的关键路径上的魔术锁之间的区别不是魔术般的瓶颈”是大约50个人年。 [原文如此] — Michael Marth(工程总监)

Lastly, there was yet another consideration to take into account — we could not simply devise an approach that would be operationally heavy or one that would need much human intervention.

最后，还有另一个需要考虑的因素–我们不能简单地设计一种在操作上很繁琐的方法，或者需要大量人工干预的方法。

No major cloud provider provides a managed ZooKeeper service. If you want us to run ZooKeeper ourselves, we will need another full time DevOps engineer — Sathyajith Bhat (Senior DevOps Engineer & AWS Community Hero)

没有主要的云提供商提供托管的ZooKeeper服务。 如果您希望我们自己运行ZooKeeper，我们将需要另一位全职DevOps工程师-Sathyajith Bhat(高级DevOps工程师和AWS社区英雄)

All in all, the best thing we did with distributed locking was to eliminate it as an approach early-on.

总而言之，我们对分布式锁定所做的最好的事情就是尽早消除它。

无锁解决方案 (A Lock-free solution)

After weeks of whiteboarding, we finally had a breakthrough. Re-evaluating the problem at hand, we realized that instead of acquiring a lock and writing to the linked list inside the critical section, we could break the whole process down into two steps —

经过数周的白板学习，我们终于取得了突破。重新评估眼前的问题，我们意识到，与其获取锁并写入关键部分内的链接列表，不如将整个过程分为两个步骤：

Order the writes by simply assigning each “write-task” a monotonically increasing number. And then,
通过简单地为每个“写入任务”分配一个单调递增的数字来对写入进行排序。然后，
Perform the actual writes into the list in the determined order.
按照确定的顺序对列表执行实际的写入操作。

Both of the above steps could be run concurrently on any number of containers without affecting the correctness of the algorithm and without corrupting any data. The only thing we had to make sure was to make the second step of writing the events into the list idempotent.

以上两个步骤都可以在任意数量的容器上同时运行，而不会影响算法的正确性，也不会破坏任何数据。我们唯一需要确保的是将事件写入列表幂等的第二步。

Image for post — Illustration: The Lock Free Solution in action

实施：技术选择 (Implementation: Technology choices)

Ideas are a dime a dozen and the true test is always to implement and execute the vision. The new journaling system took two months to go from conceptualization to production — this was, quite frankly, impossible without the technologies available to us out-of-the-box.

创意是一角钱，真正的考验始终是实现和执行愿景。新的日记系统从概念化到生产花费了两个月的时间-坦率地说，如果没有现成可用的技术，这是不可能的。

This section calls out some of the technology choices that we made and how those technology choices affected the design of our system, the API, and the algorithm itself.

本节介绍了我们做出的一些技术选择，以及这些技术选择如何影响我们的系统，API和算法本身的设计。

AWS RDS MySQL (AWS RDS MySQL)

MySQL was our technology of choice. Not only did we heavily depend on the auto-increment id feature to order the event writes, but we were also only able to guarantee the correctness of our algorithm because we used database transactions in MySQL.

MySQL是我们的首选技术。我们不仅严重依赖自动递增id功能来对事件写入进行排序，而且由于我们在MySQL中使用了数据库事务，因此我们也只能保证算法的正确性。

Secondly, AWS RDS is a managed service and there is no denying the fact that we love managed services! RDS provided us with high durability, lossless replication, and abundant uptime right out of the box — thus, reducing our operational load tremendously.

其次，AWS RDS是一项托管服务，无可否认，我们喜欢托管服务！ RDS为我们提供了高耐久性，无损复制和开箱即用的丰富正常运行时间，从而极大地减少了我们的运营负担。

Lastly, to write events at scale to the list, we knew that the database again could become the bottleneck. Hence, we introduced a time-based batching of event writes which made sure that no matter the ingestion load on our system, there was a practical limit to the number of database interactions we would make to process that load. This critical piece of the puzzle made sure that our database could always handle the incoming load.

最后，为了将事件大规模地写入列表，我们知道数据库再次可能成为瓶颈。因此，我们引入了基于时间的事件写入批处理，以确保无论系统上的接收负载如何，处理该负载的数据库交互次数都会受到实际限制。这个难题的关键在于确保我们的数据库能够始终处理传入的负载。

AWS S3 (AWS S3)

The only thing we were sure of when we started on this re-architecture journey was that we needed to use an object store such as S3. To simply say that S3 influenced us would be a gross understatement, the object store, quite literally, was and is the central piece of the journaling subsystem. Here are some ways it influenced the design —

当我们开始重新架构之旅时，我们唯一确定的是我们需要使用诸如S3之类的对象存储。简单地说，S3对我们的影响是轻描淡写，实际上，对象存储曾经是日志子系统的核心。以下是一些影响设计的方式-

In a traditional linked list construction — a new node is appended by updating the next pointer on the tail node. However, because an object in S3 cannot be partially updated or appended to, the distributed linked list could not be constructed in a traditional sense. Instead, each S3 object in our list dictates where the next S3 object has to be.
在传统的链表构造中，通过更新尾节点上的下一个指针来附加新节点。但是，由于无法部分更新或附加S3中的对象，因此无法以传统方式构造分布式链接列表。而是，列表中的每个S3对象指示下一个S3对象必须位于的位置。
In the previous implementation, we used to fetch multiple S3 objects to serve a single read request. Not only did this consume more resources, but more importantly it was very costly. This time we made a subtle change to our API semantics, which made sure that we never have to fetch more than a single S3 object for a single read request. Thus, the 20M read requests that we serve in production every day only cost us $8. I like the fact that our production system spends lesser money serving requests daily than I spent on coffee while I was whiteboarding it.
在以前的实现中，我们曾经获取多个S3对象来满足单个读取请求。这不仅消耗了更多的资源，而且更重要的是它非常昂贵。这次，我们对API语义进行了微妙的更改，以确保我们不必为单个读取请求获取多个S3对象。因此，我们每天在生产中提供的20M读取请求仅需花费我们$ 8。 我喜欢这样一个事实，我们的生产系统每天花在服务请求上的钱比我在白板上花费的咖啡要少。
To maximize S3’s performance we followed its recommendation on object key prefixes and used completely random strings as object keys, even though it meant that we could not list S3 objects meaningfully. Furthermore, we even implemented the capability to add more S3 buckets on the fly, enabling us to add more capacity to our system on demand.
为了最大化S3的性能，我们遵循其关于对象键前缀的建议，并使用完全随机的字符串作为对象键，即使这意味着我们无法有意义地列出S3对象。此外，我们甚至实现了动态添加更多S3存储桶的功能，从而使我们能够根据需要为我们的系统添加更多容量。
Lastly, to talk about S3 and to not talk about eventual consistency would be almost cheating. Anti-climatically, however, we vigorously exploited S3’s read after write consistency to transfer and process large amounts of event data. The only place where eventual consistency played its role was in the API that pulled the events. Here, the eventual consistency time was easily absorbed by our near real-time guarantee.
最后，谈论S3而不谈论最终的一致性几乎是作弊的。但是，从气候角度出发，我们大力利用S3的写一致性一致性来传输和处理大量事件数据。最终一致性发挥作用的唯一地方是在提取事件的API中。在这里，最终的一致性时间很容易被我们的近乎实时的保证所吸收。

我们学到了什么？ (What did we learn?)

Always understand the problem you’re trying to solve. First, find out how the system is intended to be used and only then design it. Also, find out the aspects in which the system is allowed to be less than perfect — for us, it was being near real-time. (Batching, eventual consistency)
始终了解您要解决的问题。首先，找出如何使用该系统，然后再进行设计。另外，找出允许该系统不完美的方面- 对我们来说，它几乎是实时的。 (批处理，最终一致性)
Do not design a system first and then try to fit various technologies in it, rather design the architecture in the stride of the underlying components you use. (S3, MySQL)
不要先设计一个系统，然后再尝试在其中应用各种技术，而要在所使用的基础组件的范围内设计体系结构。 ( S3，MySQL )

3. Listen to experienced folks to cut losses early. (Distributed Locks, ZooKeeper)

3.倾听经验丰富的人的意见，尽早减少损失。 ( 分布式锁，ZooKeeper )

翻译自: https://medium.com/adobetech/adobe-i-o-events-building-a-distributed-linked-list-on-s3-part-ii-1538aa824833

adobe me动态链接

郝ren

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
adobe me动态链接_adobe io事件在s3第二部分上构建分布式链接列表

adobe me动态链接In our previous post, we introduced the concept of a Distributed Linked List. We also talked about how Adobe I/O Events implemented them and why those implementations failed. 在上一篇文章中，我们介绍...
复制链接

扫一扫