Kafka and Kinesis are message brokers that have been designed as distributed logs. With them you can only write at the end of the log or you can read entries sequentially. But you cannot remove or update entries, nor add new ones in the middle of the log.

Kafka和kinesis 都是分布式消息中间件,用它你能顺序读和写。但是你不能移除和更新实体,也不能在中间插入一条新的实体

Kafka vs. Kinesis

This simple design allows distributed logs to have a really interesting set of characteristics. Because the reads and writes to the log are sequential, they have much better performance than other message brokers. And because the log is persistent, you can reprocess it as many times as needed.


But even though Kafka and Kinesis are very similar (Kinesis was, to say the least, inspired by Kafka), they differ in many aspects.

虽然Kafka 和 Kinesis 是非常相似的,(基于Kafka的灵感,kinesis被创造出来),他们在很多方面也是不同的。

Configuration & Features


Although both tools are conceptually simple and don't have many features, Kinesis couldn't be simpler. And that's a really good thing. The only options you can tune are the number of shards (throughput) and the number of days you want to keep the data (maximum 7).

虽然概念上两者(Kafka 和 Kinesis)是相似的,没有很多的特性,Kinesis也不是更简单的,但是他们是一个好东西。


Kafka, just because you have to host it yourself, requires more configuration. You'll need to configure each node, define where the data is stored, have a Zookeper cluster up and running, etc. The documentation is great, but you'll have to spend time to understand the details and implications of the options.



In terms of features, Kafka has a log compaction mode that allows you to update entries in the log that have a specific key. This is useful when you want to keep the last value for each key and you don't care about the previous ones.


Reliability  可靠性

In Kafka you can configure, for each topic, the replication factor and how many replicas have to acknowledge a message before is considered successful. The number of nodes and where you run them is also up to you. So you can definitely make it highly available, but you'll have to make sure that this is the case.

在Kafka,你能配置每一个 topic, 每一个备份因子和消息成功前的备份副本数量。节点的数量和运行它们的位置也取决于您自己。所以你肯定可以使它高度可用,但你必须确保这是事实。

In contrast, in Kinesis all messages are written synchronously to 3 different data centers (availability zones) before a write is considered successful. Amazon ensures that you won't lose data, but that comes with a performance cost.


Performance  性能

There are several benchmarks online comparing Kafka and Kinesis, but the result it's always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At least for a reasonable price.


This is in part is because Kafka is insanely fast, but also because Kinesis writes each message synchronously to 3 different machines. And this is quite costly in terms of latency and throughput.


Ecosystem  体系

Kafka is one of the preferred options for the Apache stream processing frameworks, like StormSamza and Spark Streaming. And thanks to Kafka Connect it's also quite easy to add new connectors that move data from and to Kafka.

Kafka是Apache流处理框架的首选选项之一,例如Apache流处理框架 Storm、samza和spark streaming。非常感谢Kafka Connect,它是非常容易的去添加新的连接器,从Kafka向Kafka移动数据


Unsurprisingly, Kinesis is really well integrated with other AWS services. Using Kinesis Firehose you can automatically persist data from Kinesis into S3 or Redshift. It also has adaptors for Storm and Spark Streaming, as well as Amazon's Kinesis Client Library and Lambda to develop consumer applications.

不出所料,Kinesis确实与其他AWS服务完美集成。使用Kinesis Firehose可以自动将Kinesis的数据保存到S3或Redshift。它还具有Storm 和Spark Streaming的适配器,以及亚马逊的Kinesis客户端库和lambda开发消费者应用程序。

Sum Up   总结

Despite the similarities, it's clear that Kafka and Kinesis should be used in different contexts. The main selling points of Kafka are performance and the integration with other big data tools. The downside is that although Kafka is very stable, you'll have to have someone that knows the tool in depth in case something goes wrong. You shouldn't take risks with the infrastructure that stores data such as DBs and distributed logs.


Kinesis' strengths are simplicity and built-in reliability. You just create a stream and let Amazon make sure that the data won't be lost. The downside is that streams can only store records for 7 days, so you'll copy them somewhere else if you want to keep them for longer. Also, throughput is not as good as Kafka's. Those disadvantages make Kinesis a poor choice if you want to build a Kappa architecture.


In a nutshell, Kafka is a better option if:

  • You have the in-house knowledge to maintain Kafka and zookeper
  • You need to process more than 1000s of events/s
  • You don't want to integrate it with AWS services


  • 你拥有维护Kafk和zookeper的内部知识
  • 您需要处理超过1000个事件/秒
  • 您不想将其与AWS服务集成

Kinesis works best if:

  • You don't have the in-house knowledge to maintain Kafka
  • You process 1000s of events/s at most
  • You stream data into S3 or Redshift
  • You don't want to build a Kappa architecture


  • 你没有内部知识来维护Kafka
  • 最多处理1000个事件/秒
  • 将数据传输到S3或Redshift
  • 你不想建造Kafka架构

