表锁定行锁定_追求完美锁定

最新推荐文章于 2024-08-29 09:11:46 发布

weixin_26722031

最新推荐文章于 2024-08-29 09:11:46 发布

阅读量137

点赞数

原文链接：https://medium.com/expected-behavior/in-pursuit-of-perfect-locking-4c054ee32a61

版权

表锁定行锁定

By Nathan Acuff and Jason Gladish

由 内森·阿克夫 和 杰森Gladish

Phil Karlton famously noted that there are only two hard problems in Computer Science: cache invalidation and naming things. To this list, many have added off-by-one-errors. What happens when you decide to take on a problem that involves all three? You start looking for a good distributed locking library.

菲尔·卡尔顿(Phil Karlton)著名地指出，计算机科学中只有两个难题：缓存失效和命名。在此列表中，许多人都添加了一次错误。当您决定解决涉及这三个方面的问题时，会发生什么？您开始寻找一个好的分布式锁定库。

Sometimes, you really do need perfect locking — maybe you’re writing code that controls a critical piece of medical or military hardware. Other times, doing the same work isn’t that big of a deal — maybe your output is idempotent by default, or you’re counting how many people have smashed that like button, and a few extra likes per million isn’t a big deal. Most of the time, you’re somewhere in the middle.

有时，您确实确实需要完美的锁定-也许您正在编写控制关键医疗或军事硬件的代码。在其他时候，完成相同的工作没什么大不了的-也许默认情况下您的输出是幂等的，或者您正在计算有多少人捣破了这样的按钮，而每百万个额外的点赞并不多处理。大多数时候，您都在中间。

It’s a question of trade-offs. Think of it in terms of reliability engineering — how many nines do you really need from your locking system? If using something like RedLock lets you add another nine without much additional overhead, is it worth it? Does using something like RedLock give you a false sense of security that might lead you to a poorer implementation when it comes to testing results and other side effects?

这是权衡的问题。从可靠性工程的角度考虑它-您实际上需要从锁定系统中获得多少个9？如果使用类似RedLock的方法让您在不增加额外开销的情况下添加另外九个，是否值得？使用RedLock之类的工具是否会给您带来错误的安全感，从而在测试结果和其他副作用时可能导致您的实施效果更差？

We’re big fans of Redis and have an existing locking mechanism built on top of it. In looking to switch to a more complete system of locking, we stumbled across a very interesting discussion from 2016. It starts with this post by Salvatore Sanfilippo, AKA Antirez (the creator of Redis), proposing RedLock, a Redis-based locking mechanism. This was followed by an analysis of Antirez’s proposal by Martin Kleppman, a distributed systems expert and researcher. The discussion was mostly civil, but there was no agreement on whether or not RedLock would be a valuable addition to the distributed tools landscape.

我们是Redis的忠实拥护者，并在此之上建立了现有的锁定机制。在寻求切换到更完整的锁定系统时，我们偶然遇到了2016年一个非常有趣的讨论。该文章以AKA Antirez的Salvatore Sanfilippo(Redis的创建者)的帖子开头，提出了基于Redis的锁定机制RedLock。然后是分布式系统专家和研究人员Martin Kleppman对Antirez提议的分析。讨论主要是民间的，但对于RedLock是否会成为分布式工具领域的宝贵补充，尚无共识。

Since it was interesting and we are in the process of making this decision right now, we thought we’d summarize the discussion for future investigators while we’re at it.

由于这很有趣，并且我们现在正在做出此决定，因此，我们认为我们在总结该讨论时供将来的研究人员使用。

争论 (The Arguments)

Before we get into the heart of it, it’s worth noting that all of their posts (and this post) are in the context of locks that have expiration. Most of the time, we’d rather execute twice (and hopefully have some safety on the other end of the execution) than not execute at all, so expiring locks is important.

在深入探讨其核心之前，值得注意的是，他们的所有帖子(和这篇帖子)都是在具有到期权限的锁的上下文中。 在大多数情况下，我们宁愿执行两次(并希望在执行的另一端具有一定的安全性)而不是根本不执行，因此到期锁很重要。

Throughout the related discussions, there were debates on several detailed points, but the majority of them seem to come down to theory vs practice. In theory, this means that they should agree, but in practice, they do not.

在整个相关讨论中，都围绕几个详细问题进行了辩论，但是其中大多数似乎都归结为理论与实践。从理论上讲，这意味着它们应该达成共识，但实际上，他们不同意。

Kleppman’s point is that RedLock isn’t fundamentally different or better than a single Redis instance because they suffer from the same set of potential problems. Antirez counters that even if it is vulnerable to the same kinds of problems, the likelihood of them occurring can be greatly reduced or potentially eliminated.

克莱普曼(Kleppman)的观点是，RedLock与单个Redis实例在本质上并没有不同或更好，因为它们遭受相同的潜在问题。 Antirez认为，即使它容易遭受相同类型的问题的影响，也可以大大降低或消除这些问题的可能性。

Ultimately, they’re both right — a network partition at JUST the right time is fatal to both a naive single-server locking implementation AND to RedLock. The question is, how big of a target is the “right time”? There are two main categories of potential disaster:

最终，它们都是正确的-适时的网络分区对于天真的单服务器锁定实现和RedLock都是致命的。问题是，“正确的时机”有多大？潜在灾难有两大类：

Network Splits

网络拆分

If you lock with one Redis server and that server has issues, no one can do locked work.

如果您使用一台Redis服务器锁定并且该服务器出现问题，则没有人可以执行锁定的工作。

If you use Redlock, have five servers, and need to lock the majority, you can still lose two servers without having issues. If the network has issues at just the right time during the locking process, you may still have issues, but the system should be more resilient in practice.

如果您使用Redlock，有五台服务器，并且需要锁定大多数服务器，则仍然可以丢失两台服务器而不会出现问题。如果网络在锁定过程中恰好在适当的时候出现问题，则您可能仍然有问题，但是系统在实践中应该更具弹性。

Process/Network Pausing

进程/网络暂停

In the case where a process gets a lock, does some work, and acts on some other system, it’s always possible that acting on the other system gets delayed such that the effect is realized after the lock has expired.

如果某个进程获得了锁，完成了一些工作并在其他系统上起作用，则总是有可能使对其他系统的操作延迟，从而在锁到期后才能实现效果。

Of course, even with a single-server lock implementation, it is possible to add some additional error-checking (fencing tokens, lock checking/extending before writing, using monotonically increasing lock keys), but most of the discussion contrasts (or, rather, points out the shared flaws) of a naive implementation and RedLock.

当然，即使使用单服务器锁定实现，也可以添加一些其他的错误检查(击剑令牌，在写入之前进行锁定检查/扩展，使用单调递增的锁定键)，但是大多数讨论都是相反的(或更确切地说，是相反的)指出了天真的实现和RedLock的共同缺陷)。

假设条件 (Assumptions)

Kleppman makes some very strong points about the unbounded nature of things you can’t control. Network and storage issues may cause pauses of unlimited length. Likewise, stop-the-world garbage collection may lead to multi-minute waits and concurrency nightmares. These are real things that really do happen, but other locking systems that are considered much safer also make those assumptions. For example, Zookeeper makes these assumptions:

克莱普曼(Kleppman)对您无法控制的事物的无限本质提出了一些非常强硬的观点。网络和存储问题可能会导致长度不限的暂停。同样，停止世界垃圾收集可能导致几分钟的等待和并发噩梦。这些确实是真实发生的事情，但是其他被认为更安全的锁定系统也做出了这些假设。例如，Zookeeper做出以下假设：

1. Only a minority of servers in a deployment will fail. Failure in this context means a machine crash or some error in the network that partitions a server off from the majority.

1.部署中只有少数服务器将发生故障。在这种情况下，故障意味着机器崩溃或网络中的某些错误，导致服务器与大多数服务器分开。

2. Deployed machines operate correctly. To operate correctly means to execute code correctly, to have clocks that work properly, and to have storage and network components that perform consistently.

2.部署的机器正常运行。正确操作意味着正确执行代码，使时钟正常工作以及使存储和网络组件一致运行。

The issue about clocks, storage, and network components performing consistently is exactly the surface area that he exposes as being RedLock’s fatal flaw. Is Zookeeper less vulnerable than RedLock to these issues? Almost certainly. The question is, what’s the level of effort, and what’s the risk?

关于时钟，存储和网络组件性能始终保持一致的问题正是他作为RedLock的致命缺陷而暴露的表面积。与RedLock相比，Zookeeper是否较不容易受到这些问题的影响？几乎可以确定。问题是，什么程度的努力，什么风险？

结论 (Conclusion)

If you read Martin and Antirez’s posts and felt like they were talking past each other, you weren’t alone. They’re both right. In Theory, RedLock could be a disaster waiting to happen. In Practice, it is probably much more reliable than a naive single server implementation. Reading their posts was interesting, and the points of agreement between the two provide some guidance, whatever system you choose:

如果您阅读了Martin和Antirez的文章，并且觉得他们彼此交谈，那么您并不孤单。他们都是对的。从理论上讲，RedLock可能是一场灾难，等待发生。在实践中，它可能比单纯的单服务器实现更可靠。阅读他们的帖子很有趣，无论选择哪种系统，两者之间的共识点都可以提供一些指导：

Manage your server clock carefully
仔细管理服务器时钟
Pick your expiration TTLs wisely
明智地选择到期TTL
If at all possible, use a monotonically-increasing lock key
尽可能使用单调递增的锁定键
Use fencing tokens when writing to storage, and understand your storage layer’s consistency model
写入存储时使用隔离令牌，并了解存储层的一致性模型
Don’t be afraid to check that you still have a lock and extend the TTL as you go
不要害怕检查您是否还有锁并随身携带TTL扩展

If it seems like we didn’t reach a satisfying conclusion on what system to use, you’re right. Ultimately, if you’re in a system that already uses Zookeeper to manage Kafka or some other service, go ahead and use Zookeeper. If you’re comfortable with Redis and don’t want to maintain additional tooling, RedLock is probably a reasonable choice as long as you understand the potential issues. Distributed locking is hard — who knew?

如果对于使用哪种系统似乎我们没有得出令人满意的结论，那您是对的。最终，如果您使用的系统已经使用Zookeeper管理Kafka或其他服务，请继续使用Zookeeper。如果您对Redis感到满意并且不想维护其他工具，那么只要了解潜在的问题，RedLock可能是一个合理的选择。分布式锁定很难-谁知道？