Kafka入门资料
官网
几篇不错的blog文章
http://www.cnblogs.com/likehua/p/3999538.html
http://www.infoq.com/cn/articles/kafka-analysis-part-1
http://colobu.com/2015/03/12/kafka-in-practice/
http://www.jasongj.com/2015/01/02/Kafka%E6%B7%B1%E5%BA%A6%E8%A7%A3%E6%9E%90/
Kafka
- 分布式、分区的、基于zookeeper协调、发布/订阅的消息系统
- 最初Linkedin公司开发 —-> 现为Apache开源项目
- Scala编写
核心概念
- Broker
- Topic
- Partition
- Producer
- Consumer
kakfa集群部署
http://blog.csdn.net/wangjia184/article/details/37921183
replication factor
The replication factor controls how many servers will replicate each message that is written. If you have a replication factor of 3 then up to 2 servers can fail before you will lose access to your data. We recommend you use a replication factor of 2 or 3 so that you can transparently bounce machines without interrupting data consumption.
编码:serializer
kafka.serializer.DefaultEncoder/DefaultDecoder
无操作,直接传递byte[],xcloud使用
KeyedMessage<K,byte[]>(topic: String, key: K, message: byte[])
kafka.serializer.StringDecoder/StringEncoder
String对象以UTF-8格式编解码
KeyedMessage<K,String>(topic: String, key: K, message: String)
分区:Partitioner
kafka.producer.DefaultPartitioner
//keyz值hash后对分区数取模
Utils.abs(key.hashCode) % numPartitions
kafka.producer.ByteArrayPartitioner
API
Producer API
kafka.javaapi.producer.Producer
Consumer API
High Level Consumer API VS Simple Consumer API
For most applications, the high level consumer Api is good enough. Some applications want features not exposed to the high level consumer yet (e.g., set initial offset when restarting the consumer). They can instead use our low level SimpleConsumer Api.
使用SimpleConsumer
Xcloud中Kafka消费端代码来自以下官方Example
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
- 多次消费同一消息
- 消费部分分区消息
- 在应用中实现对offset的控制
- 在应用中判断lead broker与处理leader change
Step:
- 找到消费的topic的partition的lead broker
- 建立consumer连接
- 查询offset,kafka.api.OffsetRequest.EarliestTime()与kafka.api.OffsetRequest.LatestTime()
- 构造、发送消费请求,获取消费信息
- 检测 leader changes