kafka
标签(空格分隔): Kafka
一. Concepts
Kafka is used for building real-time data pipelines and streaming apps
- 分布式消息传递
- 网站活跃数据跟踪
- 日志聚合
- 流式数据处理
- 数据存储
- 事件源
- ……
Kafka terminology 术语
1.Topics
Kafka maintains feeds of messages in categories called topics.
消息都归属于一个类别成为topic,在物理上不同Topic的消息分开存储,逻辑上一个Topic的消息对使用者透明
2.Partitions
Topics are broken up into ordered commit logs called partitions
每个Topics划分为一个或者多个Partition,并且Partition中的每条消息都被标记了一个sequential id ,也就是offset,并且存储的数据是可配置存储时间的
3.Message Ordering
消息只保证在同一个Partition中有序,所以,如果要保证从Topic中拿到的数据有序,则需要做到:
- Group messages in a partition by key(producer)
- Configure exactly one consumer instance per partition within a consumer group
kafka能保证的是:
- Message sent by a producer to a particular topic partition will be appended in the order they are sent
- A consumer instance sees messages in the order they are stored in the log
- For a topic with replication factor N, kafka can tolerate up to N-1 server failures without “losing” any messages committed to the log
4.Log
Partition对应逻辑上的Log
5.Replication 副本
- Topics can (and should) be replicated
- The unit of replication is the partition
- Each partition in a topic has 1 leader and 0 or more replicas
A replica is deemed to be “in-sync” if
- The replica can communicate with Zookeeper
- The replica is not “too far” behind the leader(configurable)
The group of in-sync replicas for a partition is called the ISR(In-Sync-Replicas)
- The Replication factor cannot be lowered
6.kafka durability 可靠性
Durability can be configured with the producer configuration request.required.acks
- 0 : The producer never waits for an ack
- 1 : The producer gets an ack after the leader replica has received the data
- -1 : The producer gets an ack after all ISRs receive the data
Minimum available ISR can also be configured such that an error is returned if enough replicas are not available to replicate data
所以,kafka可以选择不同的durability来换取不同的吞吐量
Durability | Behaviour | Per Event Latency | Required Acknowledgements(request.required.acks) |
---|---|---|---|
Hignest | ACK all ISRs have received | Higest | -1 |
Medium | ACK once the leader has received | Medium | 1 |
Lowest | No ACKs required | Lowest | 0 |
通用,kafka可以通过增加更多的Broker来提升吞吐量
一个推荐的配置:
Property | Value |
---|---|
replication | 3 |
min.insync.replicas | 2 |
request.required.acks | -1 |
7.Broker
Kafka is run as a cluster compari