与Apache Kafka对比

最新推荐文章于 2024-07-24 14:31:27 发布

郝ren

最新推荐文章于 2024-07-24 14:31:27 发布

阅读量1.1k

点赞数

文章标签： python java

原文链接：https://itnext.io/contrasting-nats-with-apache-kafka-1d3bdb9aa767

版权

TL;DR Kafka is an Event Streaming Platform, while NATS is a closer to a conventional Message Queue. Kafka is optimised around the unique needs of emerging Event-Driven Architectures, which enrich the traditional pub-sub model with strong ordering and persistence semantics. Conversely, NATS is highly optimised around pub-sub topologies, and is an excellent platform for decoupling systems where message order and reliable delivery is a non-issue.

TL; DR Kafka是一个事件流平台 ，而NATS则更接近于传统的Message Queue 。 Kafka已针对新兴的事件驱动架构的独特需求进行了优化，该架构通过强大的排序和持久语义丰富了传统的pub-sub模型。相反，NATS是围绕发布订阅拓扑进行高度优化的，并且是用于将消息定单和可靠传递不成问题的去耦系统的出色平台。

I’ll preface this post by pointing out that there is another product — NATS Streaming — which is a different beast and is closer to Kafka. You may want to take a detour to NATS Streaming if you after an alternative event streaming platform; otherwise, read on.

在这篇文章的开头，我将指出还有另一种产品– NATS Streaming –是另一种野兽，与Kafka更近。如果您在使用其他事件流媒体平台之后，则可能想绕开NATS流媒体；否则，请继续阅读。

订阅内容 (Subscriptions)

At its core, NATS is about publishing and listening for messages. These depend heavily on subjects which scope messages into streams or topics. Consumers subscribe to topics either verbatim (matching the topic name precisely), or using wildcards. Below is an illustration of publisher-subject-consumer relationship in NATS.

NATS的核心是发布和侦听消息 。这在很大程度上取决于科目，范围消息转换成流或主题。消费者可以逐字订阅主题(精确匹配主题名称)，也可以使用通配符。下面是NATS中发布者-主题-消费者关系的图示。

Image for post — Credit: https://nats.io/

On the face of it, this isn’t diametrically opposed from Kafka, which also decouples producers and consumes by way of topics. However, their semantics differ. Kafka organises its topics into partitions — unbounded, totally ordered streams of records (Kafka’s substitute terminology for messages). A topic comprises one or more partitions, and exhibits partial order. (In other words, while records are totally ordered within their respective partition, their order across partitions is arbitrary.) This flexible arrangement makes Kafka highly suited to applications where order matters; for example, state machine replication, event sourcing, log shipping, log aggregation, SEDA (staged event- driven architecture) and CEP (complex event processing).

从表面上看，这与Kafka并没有截然相反，Kafka也通过主题将生产者和消费者分开。但是，它们的语义不同。 Kafka将其主题组织成多个分区 -无限制，完全有序的记录流(Kafka的消息替代术语)。一个主题包含一个或多个分区，并表现出部分顺序 。 (换句话说，虽然记录在它们各自的分区中是完全有序的，但是它们在分区之间的顺序是任意的。)这种灵活的安排使Kafka非常适合于顺序重要的应用程序 ；例如，状态机复制，事件源，日志传送，日志聚合，SEDA(分段事件驱动的体系结构)和CEP(复杂事件处理)。

Speaking of topics, the equivalent NATS subject is a lightweight construct that is created automatically based on demand (subscriptions) and is pruned automatically when the demand ceases. NATS subjects are cheap to create, which makes them great for hierarchically organised data, allowing for a fine-grained subscription model. Anyone who’s used MQTT-style brokers (such as HiveMQ) should feel right at home with NATS. By comparison, Kafka’s topics are heavyweight entities that take time to spin up and lack the finesse that you get with NATS. Consequently, Kafka consumers must perform a lot of the requisite data filtering locally — consuming records from all assigned partitions and silently discarding those that are deemed irrelevant.

说到主题，等效的NATS 主题是一种轻量级构造，可根据需求(订阅)自动创建，并在需求停止时自动修剪。 NATS主题的创建成本很低，这使其非常适合按层次结构组织的数据，从而允许使用细粒度的订阅模型。使用过MQTT样式的代理(例如HiveMQ)的任何人都应该对NATS感到宾至如归。相比之下，Kafka的主题是重量级实体，需要花费一些时间才能升级，并且缺乏使用NATS的技巧。因此，Kafka使用者必须在本地执行许多必要的数据过滤-消耗所有分配分区中的记录，并静默丢弃那些不相关的记录。

A Kafka partition is like an artery in a biological sense — drawing away from the source to feed downstream organs — consumers. A record has an offset, and it may have a key and a value; both are byte arrays and both are optional. A record’s key influences its ordering — records sharing a common key are guaranteed to occupy the same partition, and thus preserve their intrinsic order. The concept of partitions, records and offsets is illustrated below.

从生物学的角度来说，卡夫卡隔板就像一条动脉-远离源头为下游器官供血- 消费者 。记录有偏移量 ，可能有键和值 ; 两者都是字节数组，并且都是可选的。记录的键会影响其顺序-保证共享公用键的记录可以占据相同的分区，从而保留其固有顺序。分区，记录和偏移量的概念如下所示。

负载均衡 (Load balancing)

To further crystallise the differences between the two platforms, let’s consider how NATS and Kafka address load balancing — an essential characteristic of any message-oriented middleware.

为了进一步明确这两个平台之间的差异，让我们考虑一下NATS和Kafka如何处理负载平衡 -任何面向消息的中间件的基本特征。

NATS optionally balances message delivery across a group of subscribers which can be used to provide application fault tolerance and scale workload processing. To create a queue subscription, subscribers register a queue name. All subscribers with the same queue name form the corresponding queue group. As messages on the registered subject are published, one member of the group is chosen randomly to receive the message. Although queue groups may have multiple subscribers, each message is consumed by only one. The diagram below illustrates this.

NATS可以选择平衡一组订户之间的消息传递，这些消息可用于提供应用程序容错能力和扩展工作负载处理。为了创建队列订阅，订阅者注册一个队列名称 。具有相同队列名称的所有订阅者形成相应的队列组 。在发布有关已注册主题的消息时，将随机选择一组成员来接收消息 。尽管队列组可能有多个订户，但是每个消息仅由一个使用。下图说明了这一点。

While NATS provides for fine-grained consumer scalability down to the message level, it does so at the expense of message ordering. Messages may be concurrently processed out-of-order at two or more disparate subscribers, making it unsuitable for order-sensitive applications. (Note: NATS Streaming addresses this, but as stated earlier, it is a different product in its own right.)

虽然NATS提供了细微的使用者可伸缩性，直到消息级别，但这样做却以消息排序为代价。消息可能在两个或多个不同的订户上同时进行乱序处理，因此不适合顺序敏感的应用程序。 (注意：NATS Streaming解决了这个问题，但正如前面所述，它本身就是一种不同的产品。)

Kafka consumers subscribe to a topic as part of an encompassing consumer group. When the first consumer in a group joins the topic, it will receive all partitions in that topic. When a second consumer subsequently joins, it will get approximately half of the partitions, relieving the first consumer of half of its prior load. The process runs in reverse when consumers leave (by disconnecting or timing out) — the remaining consumers will absorb a greater number of partitions. So a consumer group balances the partition load; the more consumers you add, the fewer partitions each consumer receives. Adding more consumers than partitions will leave some consumers in an idle state; Kafka will never assign a partition to multiple consumers in the same group. So, although Kafka’s load balancing scheme is more coarse-grained than NATS’; it manages to preserve the order of records at the consumer nodes. The diagram below illustrates the relationship between producers, topics, partitions, consumers and consumer groups. Observe that consumer groups are logically isolated, from both a record flow and a load balancing perspective.

卡夫卡(Kafka)消费者订阅了一个主题，成为一个广泛的消费群体的一部分 。当组中的第一个使用者加入该主题时，它将收到该主题中的所有分区。当第二个使用者随后加入时，它将获得大约一半的分区，从而使第一个使用者减轻了先前负载的一半。当消费者离开时(通过断开连接或超时)，该过程将反向进行-其余的消费者将吸收更多数量的分区。因此，使用者组可以平衡分区负载；您添加的使用者越多，每个使用者接收的分区就越少。增加的使用者数量超过分区数量，将使某些使用者处于空闲状态。 Kafka绝不会将分区分配给同一组中的多个使用者 。因此，尽管Kafka的负载平衡方案比NATS更为粗粒度；它设法在使用者节点上保留记录的顺序。下图说明了生产者，主题，分区，消费者和消费者群体之间的关系。从记录流和负载平衡的角度来看，消费者组在逻辑上是隔离的。

交货保证 (Delivery guarantees)

Yet another significant differentiator is persistence. Kafka is a persistent datastore, offering at-least-once delivery semantics. The act of reading a record by a consumer does not delete the record — it merely advances an internal pointer to the next record in the partition. This is called committing an offset. Should a consumer crash before it successfully processes a record, Kafka will re-deliver the last set of records (for which the offsets have yet to be committed). Because records are persisted for some time (subject to the configurable retention policy), consumers have the luxury of processing records that were published long before their tenure.

持久性是另一个重要的区别。 Kafka是一个持久性数据存储，提供至少一次的交付语义 。使用者读取记录的行为不会删除该记录-它只是将内部指针前进到分区中的下一个记录。这称为提交偏移量。如果消费者在成功处理记录之前崩溃，则Kafka将重新提供最后一组记录(尚未提交偏移量)。由于记录会保留一段时间(取决于可配置的保留策略)，因此消费者可以处理在任期很久之前发布的记录。

By contrast, NATS implements what is commonly referred to as at-most-once delivery. NATS strives to remain on and provide a ‘dial-tone’. However, if a subscriber drops out, it will not receive messages, as the basic NATS platform is a simple pub-sub transport system that offers only TCP-grade reliability. (This makes NATS a little different from more conventional MQ brokers, which tend to persist messages for the benefit of those subscribers that registered their interest prior to the point of publication, deleting messages after they have been delivered to all endpoints.) Simply stated, NATS isn’t designed to be used as a long-term event store; it is best used as a subscription-oriented message-centric transport layer, as opposed to a datastore.

相比之下，NATS实施通常称为“ 最多一次”的交付 。 NATS努力保持现状并提供“拨号音”。但是，如果订户退出，它将不会收到消息，因为基本的NATS平台是一个简单的pub-sub传输系统，仅提供TCP级的可靠性。 (这使NATS与更传统的MQ代理有所不同，后者更倾向于保留消息，以使那些在发布之前注册其兴趣的订户受益，而在消息传递到所有端点之后将其删除。)简单地说， NATS并非旨在用作长期事件存储；与数据存储相反，它最好用作面向订阅的以消息为中心的传输层。

运营问题 (Operational concerns)

From an operational perspective, the differences are also pronounced. Kafka is a behemoth. Its deployment topology consists of a mixture of Broker and ZooKeeper nodes, with hundreds, if not thousands, of tuneable “knobs”, controlling all aspects of its behaviour. (And some are quite dangerous, in uneducated hands.)

从操作的角度来看，差异也很明显。卡夫卡是一个庞然大物。它的部署拓扑结构由Broker和ZooKeeper节点混合而成，具有数百个(即使不是数千个)可调的“旋钮”，可控制其行为的所有方面。 (有些人未经教育就非常危险。)

A NATS cluster is much simpler in this regard, with a lot fewer parameters — unsurprising, given its lack of persistence.

在这方面，NATS群集要简单得多，参数要少得多-鉴于其缺乏持久性，这不足为奇。

摘要 (Summary)

So there you have it. The differences between the two should now be apparent. The points above do not aim to imply that one is better than the other; this is not an A vs B discussion. While it can be objectively stated that Kafka provides more overall flexibility by catering to a broader spectrum of messaging and eventing scenarios, it is also proportionally more complex to configure and maintain, and can be an overkill in some scenarios. NATS is a simpler solution — it’s a lot easier to get started with and operationalise. And let’s not forget, the latter is hugely important. Use the simplest platform that meets your present and anticipated needs, and is well-aligned to the current skill-sets prevalent in your organisation.

所以你有它。两者之间的区别现在应该显而易见。以上几点并非旨在暗示一个优于另一个；这不是A vs B的讨论。可以客观地指出，Kafka通过适应更广泛的消息传递和事件场景而提供了更大的整体灵活性，但它的配置和维护也成比例地更加复杂，在某些情况下可能会显得过大。 NATS是一个更简单的解决方案-入门和可操作性要容易得多。而且请不要忘记，后者非常重要。使用最简单的平台来满足您当前和预期的需求，并使其与组织中普遍使用的当前技能相匹配。

Was this article useful to you? I’d love to hear your feedback, so don’t hold back. If you are interested in Kafka, Kubernetes, microservices, or event streaming, or just have any questions, follow me on Twitter. I’m also a maintainer of Kafdrop and the author of Effective Kafka.

这篇文章对您有用吗？ 我希望听到您的反馈，所以请不要退缩。 如果您对Kafka，Kubernetes，微服务或事件流感兴趣，或者有任何疑问，请 在Twitter上关注我 。 我还是 Kafdrop的维护者 和 Effective Kafka 的作者 。