DelayedFetch是FecthRequest对应的延迟操作,和DelayedProduce类似,他的流程是:来自消费者或者其他follower replica的请求到来,会交给KafkaApi的handleFetchRequest处理,然后他会调用Replica
Manager的fetchMessage方法对应Log中读取消息,并生成Delayed
Fetch添加到delayedFetchPurgatory中处理
我们先看一下ReplicaManager#fetchMessage的逻辑:
def fetchMessages(timeout: Long, replicaId: Int, fetchMinBytes: Int, fetchMaxBytes: Int, hardMaxBytesLimit: Boolean,
fetchInfos: Seq[(TopicAndPartition, PartitionFetchInfo)], quota: ReplicaQuota = UnboundedQuota,
responseCallback: Seq[(TopicAndPartition, FetchResponsePartitionData)] => Unit) {
val isFromFollower = replicaId >= 0
val fetchOnlyFromLeader: Boolean = replicaId != Request.DebuggingConsumerId
val fetchOnlyCommitted: Boolean = ! Request.isValidBrokerId(replicaId)
// 从本地日志读取文件
val logReadResults = readFromLocalLog(
replicaId = replicaId,
fetchOnlyFromLeader= fetchOnlyFromLeader,
readOnlyCommitted= fetchOnlyCommitted,
fetchMaxBytes= fetchMaxBytes,
hardMaxBytesLimit= hardMaxBytesLimit,
readPartitionInfo= fetchInfos,
quota = quota)
// 如果fetch请求来自follower,则更新它的LOE
if(Request.isValidBrokerId(replicaId))
/*
* 主要逻辑:
* 1 在leader中维护了follower副本各个状态,这里会更新对应follower的状态比如LEO等
* 2 检测是否需要对ISR进行扩张,如果ISR发生变化,则将ISR集合变化记录保存到zookeeper
* 3 检测是否后移HighWatermark
* 4 检测delayedProducePurgatory中相关key对应的DelayedProduce,如果满足则执行完成
*/
updateFollowerLogReadResults(replicaId, logReadResults)
// 获取从日志读取到的总字节数
val logReadResultValues = logReadResults.map { case (_, v) => v }
// 统计读取到的总字节数
val bytesReadable = logReadResultValues.map(_.info.messageSet.sizeInBytes).sum
// 检查读取结果是否有错误
val errorReadingData = logReadResultValues.foldLeft(false) ((errorIncurred, readResult) =>
errorIncurred|| (readResult.errorCode != Errors.NONE.code))
/*
* 判断是否能够立即返回FetchResponse
* 1 不想等待,需要立即返回的
* 2 FetchRequest没有指定要读取的分区
* 3 数据已经够了
* 4 读取数据时候发生了错误,即检查errorReadingData
*/
if (timeout <= 0 || fetchInfos.isEmpty || bytesReadable >= fetchMinBytes || errorReadingData) {
val fetchPartitionData = logReadResults.map { case (tp, result) =>
tp -> FetchResponsePartitionData(result.errorCode, result.hw, result.info.messageSet)
}
// 直接调用回调函数
responseCallback(fetchPartitionData)
} else {
// 封装返回结果
val fetchPartitionStatus = logReadResults.map { case (topicAndPartition, result) =>
val fetchInfo = fetchInfos.collectFirst {
case (tp, v) if tp == topicAndPartition => v
}.getOrElse(sys.error(s"Partition $topicAndPartition not found in fetchInfos"))
(topicAndPartition, FetchPartitionStatus(result.info.fetchOffsetMetadata, fetchInfo))
}
// 构造FetchMetadata对象
val fetchMetadata = FetchMetadata(fetchMinBytes, fetchMaxBytes, hardMaxBytesLimit, fetchOnlyFromLeader,
fetchOnlyCommitted, isFromFollower, replicaId, fetchPartitionStatus)
// 构造一个DelayedFetdch对象
val delayedFetch = new DelayedFetch(timeout, fetchMetadata, this, quota, responseCallback)
// 创建一个(topic, partition)键值对对形式的列表作为delayed fetch操作的key
val delayedFetchKeys = fetchPartitionStatus.map { case (tp, _) => new TopicPartitionOperationKey(tp) }
// 尝试立即完成当前的请求,否则放入purgatory
delayedFetchPurgatory.tryCompleteElseWatch(delayedFetch, delayedFetchKeys)
}
}
一 DelayedProduce 和 DelayedFetch之间的关系
在处理ProdcueRequest的过程中向Log添加数据,可能会后移leader的log end offset,follower副本就可以读取到足量的数据,所以会尝试DelayedFetch;
在处理来自follower或者消费者FetchRequest可能会后移HW,所以会尝试完成DelayedProduce
二 核心字段
delayMs: 延迟操作的时长
fetchMetadata: FetchMetadata 为FetchRequest中所有相关分区记录相关状态,主要用于判断DelayedProduce是否满足执行条件
responseCallback: 任务满足条件或到期执行在DelayedFetch#onCom
plete调用的回调函数,其主要功能是创建FetchResponse并添加到RequestChannel的responseQueue队列
三 重要方法
DelayedFetch的tryComplete方法主要负责检测DelayedFetch的执行条件,并在满足DelayedFetch的执行条件:
# 发生leader副本迁移,该节点不再是leader副本
# 当前节点找不到需要读取数据的分区
# 开始读取的offset不在新的activeSegment中,此时可能发生了log截断或者日志滚动产生了新的LogSegment
# 累计读取的字节数超过最小字节数
override def tryComplete() : Boolean = {
var accumulatedSize = 0 // 累计的字节数
var accumulatedThrottledSize = 0
// 遍历FetchMetadata中所有的Partition状态
fetchMetadata.fetchPartitionStatus.foreach {
case (topicAndPartition, fetchStatus) =>
// 获取之前读取log的结束位置
val fetchOffset = fetchStatus.startOffsetMetadata
try {
if (fetchOffset != LogOffsetMetadata.UnknownOffsetMetadata) {
// 查找partition的分区副本
val replica = replicaManager.getLeaderReplicaIfLocal(topicAndPartition.topic, topicAndPartition.partition)
// 根据FetchRequest的请求来源不同设置endOffset(读取日志结束位置)
// 消费者对应endOffset是HW,而生产者对应的endOffset是LOE
val endOffset =
if (fetchMetadata.fetchOnlyCommitted)
replica.highWatermark
else
replica.logEndOffset
// 检查之前读取的endOffset是否发生变化,如果没变之前读不到足够数据现在还是读不到,任务条件依然不满足;如果变了则继续下面的检查看是否
// 真正满足任务执行条件
if (endOffset.messageOffset != fetchOffset.messageOffset) {
if (endOffset.onOlderSegment(fetchOffset)) {
// endOffset出现减小的情况,跑到了baseOffset较小的Segment上,可能是leader副本日志出现了截断
debug("Satisfying fetch %s since it is fetching later segments of partition %s.".format(fetchMetadata, topicAndPartition))
return forceComplete()
} else if (fetchOffset.onOlderSegment(endOffset)) {
// fetchOffset在新的endOffset之前,但是产生了新的activeSegment,fetchOffset在较旧的LogSegmnet,但是endOffset在新的LogSegment
debug("Satisfying fetch %s immediately since it is fetching older segments.".format(fetchMetadata))
// We will not force complete the fetch request if a replica should be throttled.
if (!replicaManager.shouldLeaderThrottle(quota, topicAndPartition, fetchMetadata.replicaId))
return forceComplete()
} else if (fetchOffset.messageOffset < endOffset.messageOffset) {
// endOffset和fetchOffset在同一个LogSegment且endOffset向后移动,那就尝试计算累计的字节数
val bytesAvailable = math.min(endOffset.positionDiff(fetchOffset), fetchStatus.fetchInfo.fetchSize)
if (quota.isThrottled(topicAndPartition))
accumulatedThrottledSize += bytesAvailable
else
accumulatedSize += bytesAvailable // 累加字节数
}
}
}
} catch {
case utpe: UnknownTopicOrPartitionException => // Case B
debug("Broker no longer know of %s, satisfy %s immediately".format(topicAndPartition, fetchMetadata))
return forceComplete()
case nle: NotLeaderForPartitionException => // Case A
debug("Broker is no longer the leader of %s, satisfy %s immediately".format(topicAndPartition, fetchMetadata))
return forceComplete()
}
}
// 累计的字节数足够,调用forceComplete方法
if (accumulatedSize >= fetchMetadata.fetchMinBytes
|| ((accumulatedSize + accumulatedThrottledSize) >= fetchMetadata.fetchMinBytes && !quota.isQuotaExceeded()))
forceComplete()
else
false
}