Kafka全解析

本文详细介绍了Kafka的核心概念,包括Topics、Partitions、Message Ordering和Replication等,探讨了Kafka的分布式消息传递、数据存储和消息传递语义。此外,还讨论了Kafka的高可用性和零拷贝技术,以及其在实时流处理、日志聚合等场景的应用。
摘要由CSDN通过智能技术生成

kafka

标签(空格分隔): Kafka


一. Concepts

Kafka is used for building real-time data pipelines and streaming apps

  • 分布式消息传递
  • 网站活跃数据跟踪
  • 日志聚合
  • 流式数据处理
  • 数据存储
  • 事件源
  • ……

image_1c41ppi4cefs1s25sg7aosjb79.png-90.5kB

Kafka terminology 术语

1.Topics

Kafka maintains feeds of messages in categories called topics.
消息都归属于一个类别成为topic,在物理上不同Topic的消息分开存储,逻辑上一个Topic的消息对使用者透明
image_1c41qimir1kmhh85pg5pnk3g16.png-88.5kB

2.Partitions

Topics are broken up into ordered commit logs called partitions
每个Topics划分为一个或者多个Partition,并且Partition中的每条消息都被标记了一个sequential id ,也就是offset,并且存储的数据是可配置存储时间的
image_1c41qsc4d1tr5gum1rg3s314991j.png-45.8kB

3.Message Ordering

消息只保证在同一个Partition中有序,所以,如果要保证从Topic中拿到的数据有序,则需要做到:

  • Group messages in a partition by key(producer)
  • Configure exactly one consumer instance per partition within a consumer group

kafka能保证的是:

  • Message sent by a producer to a particular topic partition will be appended in the order they are sent
  • A consumer instance sees messages in the order they are stored in the log
  • For a topic with replication factor N, kafka can tolerate up to N-1 server failures without “losing” any messages committed to the log
4.Log

Partition对应逻辑上的Log

5.Replication 副本
  • Topics can (and should) be replicated
  • The unit of replication is the partition
  • Each partition in a topic has 1 leader and 0 or more replicas
  • A replica is deemed to be “in-sync” if

    • The replica can communicate with Zookeeper
    • The replica is not “too far” behind the leader(configurable)
  • The group of in-sync replicas for a partition is called the ISR(In-Sync-Replicas)

  • The Replication factor cannot be lowered
6.kafka durability 可靠性

Durability can be configured with the producer configuration request.required.acks

  • 0 : The producer never waits for an ack
  • 1 : The producer gets an ack after the leader replica has received the data
  • -1 : The producer gets an ack after all ISRs receive the data

Minimum available ISR can also be configured such that an error is returned if enough replicas are not available to replicate data

所以,kafka可以选择不同的durability来换取不同的吞吐量

Durability Behaviour Per Event Latency Required Acknowledgements(request.required.acks)
Hignest ACK all ISRs have received Higest -1
Medium ACK once the leader has received Medium 1
Lowest No ACKs required Lowest 0

通用,kafka可以通过增加更多的Broker来提升吞吐量
一个推荐的配置:

Property Value
replication 3
min.insync.replicas 2
request.required.acks -1
7.Broker

Kafka is run as a cluster compari

评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值