1- 使用kafka simple api的步骤一般
- Find an active Broker and find out which Broker is the leader for your topic and partition
- Determine who the replica Brokers are for your topic and partition
- Build the request defining what data you are interested in
- Fetch the data
- Identify and recover from leader changes
使用Low Level Consumer (Simple Consumer)的主要原因是,用户希望比Consumer Group更好的控制数据的消费。比如:
同一条消息读多次
只读取某个Topic的部分Partition
管理事务,从而确保每条消息被处理一次,且仅被处理一次
与Consumer Group相比,Low Level Consumer要求用户做大量的额外工作。
必须在应用程序中跟踪offset,从而确定下一条应该消费哪条消息
应用程序需要通过程序获知每个Partition的Leader是谁
必须处理Leader的变化
java api使用low level consumer的例子可以参考:
http://www.cnblogs.com/fxjwind/p/3794255.html
http://zqhxuyuan.github.io/2016/02/20/Kafka-Consumer-New/
2- 源码分析
2-1 计算offsets
主要是两个方法,一个计算最新的offset,一个计算最老的offset,但是都是通过调用getLeaderOffsets方法实现
def getLatestLeaderOffsets(
topicAndPartitions: Set[TopicAndPartition]
): Either[Err, Map[TopicAndPartition, LeaderOffset]] =
getLeaderOffsets(topicAndPartitions, OffsetRequest.LatestTime)
def getEarliestLeaderOffsets(
topicAndPartitions: Set[TopicAndPartition]
): Either[Err, Map[TopicAndPartition, LeaderOffset]] =
getLeaderOffsets(topicAndPartitions, OffsetRequest.EarliestTime)
def getLeaderOffsets(
topicAndPartitions: Set[TopicAndPartition],
before: Long
): Either[Err, Map[TopicAndPartition, LeaderOffset]] = {
getLeaderOffsets(topicAndPartitions, before, 1).right.map { r =>
r.map { kv =>
// mapValues isn't serializable, see SI-7005
kv._1 -> kv._2.head
}
}
}
重载的 getLeaderOffsets方法:
def getLeaderOffsets(
topicAndPartitions: Set[TopicAndPartition],
before: Long,
maxNumOffsets: Int
): Either[Err, Map[TopicAndPartition, Seq[LeaderOffset]]] = {
findLeaders(topicAndPartitions).right.flatMap { tpToLeader =>
val leaderToTp: Map[(String, Int), Seq[TopicAndPartition]] = flip(tpToLeader)
val leaders = leaderToTp.keys
var result = Map[TopicAndPartition, Seq[LeaderOffset]]()
val errs = new Err
withBrokers(leaders, errs) { consumer =>//为每个consumer获取每个partition上的offset
val partitionsToGetOffsets: Seq[TopicAndPartition] =
leaderToTp((consumer.host, consumer.port))
val reqMap = partitionsToGetOffsets.map { tp: TopicAndPartition =>
tp -> PartitionOffsetRequestInfo(before, maxNumOffsets)
}.toMap
val req = OffsetRequest(reqMap)
val resp = consumer.getOffsetsBefore(req)
val respMap = resp.partitionErrorAndOffsets
partitionsToGetOffsets.foreach { tp: TopicAndPartition =>
respMap.get(tp).foreach { por: PartitionOffsetsResponse =>
if (por.error == ErrorMapping.NoError) {
if (por.offsets.nonEmpty) {
result += tp -> por.offsets.map { off =>
LeaderOffset(consumer.host, consumer.port, off)
}
} else {
errs += new SparkException(
s"Empty offsets for ${tp}, is ${before} before log beginning?")
}
} else {
errs += ErrorMapping.exceptionFor(por.error)
}
}
}
if (result.keys.size == topicAndPartitions.size) {
return Right(result)
}
}
val missing = topicAndPartitions.diff(result.keySet)
errs += new SparkException(s"Couldn't find leader offsets for ${missing}")
Left(errs)
}
}
findLeaders方法对应上边的“Find an active Broker and find out which Broker is the leader for your topic and partition“
def findLeaders(
topicAndPartitions: Set[TopicAndPartition]
): Either[Err, Map[TopicAndPartition, (String, Int)]] = {
val topics = topicAndPartitions.map(_.topic)
val response = getPartitionMetadata(topics).right //调用getPartitionMetadata方法获取partition的元数据,其实是TopicMetadata集合Set
val answer = response.flatMap { tms: Set[TopicMetadata] => //循环上边的Set
val leaderMap = tms.flatMap { tm: TopicMetadata => //循环每个Set中的TopicMetadata对象(topic name和这个topic的所有parititon的PartitionMetadata对象)
tm.partitionsMetadata.flatMap { pm: PartitionMetadata => //循环每个TopicMetadata对象的每个partitionsMetadata对象
val tp = TopicAndPartition(tm.topic, pm.partitionId)
if (topicAndPartitions(tp)) {//如果本次使用的topic和其partitionid包含从kafka集群获取到的信息,那么这个就是leader
pm.leader.map { l =>
tp -> (l.host -> l.port)
}
} else {
None
}
}
}.toMap
if (leaderMap.keys.size == topicAndPartitions.size) {
Right(leaderMap)
} else {
val missing = topicAndPartitions.diff(leaderMap.keySet)
val err = new Err
err += new SparkException(s"Couldn't find leaders for ${missing}")
Left(err)
}
}
answer
}
TopicMetadata的定义,可以看出,topic的元数据主要就是指topic的名字和其一系列partition的元数据
case class TopicMetadata(topic: String, partitionsMetadata: Seq[PartitionMetadata], errorCode: Short = ErrorMapping.NoError)
PartitionMetadata的定义,可以看出,partition的元数据主要是指partition 的id,leader broker,副本broker,和isr(什么是isr???)
case class PartitionMetadata(partitionId: Int,
val leader: Option[Broker],
replicas: Seq[Broker],
isr: Seq[Broker] = Seq.empty,
errorCode: Short = ErrorMapping.NoError)
reference:
http://www.infoq.com/cn/articles/kafka-analysis-part-4