一 核心字段
brokerId: Int 该GroupCoordinator所在的节点
groupConfig: GroupConfig 记录了消费者组中consumer session过期的最小时长和最大时长
offsetConfig: OffsetConfig 记录哦OffsetMetadata相关配置,比如metatdata字段允许的最大长度,Offset Topic每一个分区的副本数
groupManager: GroupMetadataManager 管理消费者组元数据和对应的offset信息的组件
heartbeatPurgatory:DelayedOperationPurgatory[DelayedHeartbeat] 用于管理DelayedHeartbeat延迟操作
joinPurgatory:DelayedOperationPurgatory[DelayedJoin] 用于管理DelayedJoin延迟操作
二 GroupState分析
GroupState接口用于表示消费者组的状态,四个子类分别表示这不同的状态,这个状态只是在服务器端使用,并不是客户端消费者的状态
各个状态之间的转换图:
各个状态的转化时机:
# PreparingRebalance状态
当消费者处于PreparingRebalance状态,GroupCoordinator可以正常处理OffsetFetchRequest,ListGroupRequest,OffsetCommitRequest请求;
但是对于HeartbeatRequest和SyncGroupRequest,则会在其响应里携带REBALANCE_IN_PROGRESS错误码进行标识;当收到JoinGroup
Request的时候会先创建对应的DelayedJoin,等待满足条件后对其响应。
PreparingRebalance -> AwaitingSync: 当有DelayedJoin超时或是消费者组之前的成员(消费者)都已经重新申请加入时进行切换
PreparingRebalance -> Empty: 当所有消费者都离开消费者组时候切换
PreparingRebalance -> Dead:分区迁移的时候删除消费者组
# AwaitingSync状态
表示消费者组正在等待Group Leader的SyncGroupRequest请求时,当GroupCoordinator收到OffsetCommitRequest和HeartbeatRequest时候,会在响应中添加REBALANCE_IN_PROGRESS错误码进行标识,对于来自follower的SyncGroupRequest则直接抛弃,直到收到Group Leader的SyncGroupRequest
AwaitingSync -> Stable: 当GroupCoordinator收到Group Leader发来的SyncGroupRequest时进行切换
AwaitingSync -> PreparingRebalance: 有消费者加入或者退出消费者组;消费者组中有消费者心跳超时;已知成员更新元数据
AwaitingSync -> Dead: 分区迁移的时候删除消费者组
# Stable状态
该状态下,GroupCoordinator可以处理所有的请求,例如:Offset
FetchRequest,HeartbeatRequest,OffsetCommitRequest,来自follower的JoinGroupRequestd等等
Stable -> PreparingRebalance:消费者组有消费者心跳检测超时;有消费者主动退出;当前Group Leader发送JoinGroupRequest;有新的消费者请求加入消费者组
Stable -> Dead: 分区迁移的时候删除消费者组
Dead状态:
处于此状态的消费者组中没有消费者,其对应的GroupMetadata也将被删除,除了OffsetCommitRequest其他请求响应会携带UNKNOWN_MEMBER_ID.
Empty状态:
消费者组中没有消费者了,但是不会被删除,直到所有offset都已经到期;这个状态还表示消费者组只用于offset提交
Empty -> Dead: 最后的offset被删除;组因到期被删除;组因分区迁移被删除
Empty -> PreparingRebalance:新的成员加入,发送JoinGroupRequest
三 JoinGroupRequest分析
首先:会调用KafkaApis的handleJoinGroupRequest方法
def handleJoinGroupRequest(request: RequestChannel.Request) {
import JavaConversions._
// 请求转换成JoinGroupRequest
val joinGroupRequest = request.body.asInstanceOf[JoinGroupRequest]
val responseHeader = new ResponseHeader(request.header.correlationId)
// 回调函数的定义
def sendResponseCallback(joinResult: JoinGroupResult) {
val members = joinResult.members map { case (memberId, metadataArray) => (memberId, ByteBuffer.wrap(metadataArray)) }
val responseBody = new JoinGroupResponse(request.header.apiVersion, joinResult.errorCode, joinResult.generationId,
joinResult.subProtocol, joinResult.memberId, joinResult.leaderId, members)
trace("Sending join group response %s for correlation id %d to client%s."
.format(responseBody, request.header.correlationId, request.header.clientId))
requestChannel.sendResponse(new RequestChannel.Response(request, new ResponseSend(request.connectionId, responseHeader, responseBody)))
}
if (!authorize(request.session, Read, new Resource(Group, joinGroupRequest.groupId()))) {
val responseBody = new JoinGroupResponse(
request.header.apiVersion,
Errors.GROUP_AUTHORIZATION_FAILED.code,
JoinGroupResponse.UNKNOWN_GENERATION_ID,
JoinGroupResponse.UNKNOWN_PROTOCOL,
JoinGroupResponse.UNKNOWN_MEMBER_ID, // memberId
JoinGroupResponse.UNKNOWN_MEMBER_ID, // leaderId
Map.empty[String, ByteBuffer])
requestChannel.sendResponse(new RequestChannel.Response(request, new ResponseSend(request.connectionId, responseHeader, responseBody)))
} else {
// 调用GroupCoordinator#handleJoinGroup方法处理
val protocols = joinGroupRequest.groupProtocols().map(protocol =>
(protocol.name, Utils.toArray(protocol.metadata))).toList
coordinator.handleJoinGroup(
joinGroupRequest.groupId,
joinGroupRequest.memberId,
request.header.clientId,
request.session.clientAddress.toString,
joinGroupRequest.rebalanceTimeout,
joinGroupRequest.sessionTimeout,
joinGroupRequest.protocolType,
protocols,
sendResponseCallback)
}
}
def handleJoinGroup(groupId: String, memberId: String, clientId: String, clientHost: String, rebalanceTimeoutMs: Int,
sessionTimeoutMs: Int, protocolType: String, protocols: List[(String, Array[Byte])], responseCallback: JoinCallback) {
// 首先进行一系列的检测
if (!isActive.get) {
responseCallback(joinError(memberId, Errors.GROUP_COORDINATOR_NOT_AVAILABLE.code))
} else if (!validGroupId(groupId)) { // 检测group id
responseCallback(joinError(memberId, Errors.INVALID_GROUP_ID.code))
} else if (!isCoordinatorForGroup(groupId)) {// 检测GroupCoordinator是否管理此
responseCallback(joinError(memberId, Errors.NOT_COORDINATOR_FOR_GROUP.code))
} else if (isCoordinatorLoadingInProgress(groupId)) {// GroupCoordinator是否已经加载消费者组对应的Offsets Topic分区
responseCallback(joinError(memberId, Errors.GROUP_LOAD_IN_PROGRESS.code))
} else if (sessionTimeoutMs < groupConfig.groupMinSessionTimeoutMs ||
sessionTimeoutMs > groupConfig.groupMaxSessionTimeoutMs) {// 检测消费者的超时时长是否合法
responseCallback(joinError(memberId, Errors.INVALID_SESSION_TIMEOUT.code))
} else {
// 根据groupId获取GroupMetadata数据
groupManager.getGroup(groupId) match {
case None =>
// 判断memberId是否为空,如果不为空,返回错误
if (memberId != JoinGroupRequest.UNKNOWN_MEMBER_ID) {
responseCallback(joinError(memberId, Errors.UNKNOWN_MEMBER_ID.code))
} else {
// 根据groupId闯将GroupMetadata,并缓存起来
val group = groupManager.addGroup(new GroupMetadata(groupId))
doJoinGroup(group, memberId, clientId, clientHost, rebalanceTimeoutMs, sessionTimeoutMs, protocolType, protocols, responseCallback)
}
case Some(group) =>
doJoinGroup(group, memberId, clientId, clientHost, rebalanceTimeoutMs, sessionTimeoutMs, protocolType, protocols, responseCallback)
}
}
}
doJoinGroup方法主要会做以下2方面的检测:第一检测memberId,
JoinGroupRequest可能是来自消费者组中的已知的member,此时请求就会携带之前分配过的memberId,这里就需要检测memberId是否能够GroupMetadata识别;第二检测Member支持的PartitionAssignor,这里需要检测每一个消费者的支持的PartitionAssignor集合与GroupMetadata中候选PartitionAssignor集合(candidateProtocal字段)是否有交集,只有这样才能够选择出所有消费者都支持的PartitionAssignor。
之后就会按照消费者组的状态分类处理:
# Dead
直接返回UNKNOWN_MEMBER_ID错误码
# PreparingRebalance
如果是已知的member重新申请加入,则更新GroupMetadata中记录的member信息;如果是未知的新member申请加入,则创建member并分配memberId,并加入GroupMetadata中
# AawaitingSync
如果是未知的新的member申请加入,则创建member并分配memberId,并加入到GroupMetadata。然后调用maybePrepareRebalance操作将状态切换为PreparingRebalance
如果是已知的member重新申请加入,则要区分member支持的PartitionAssignor是否发生了变化:若未发生变化,则将当前member集合信息返回给Group Leader;若是发生了变化则更新member信息,并且调用maybePrepareRebalance将状态切换为PreparingRebalance
# Stable
如果是未知新的member加入,创建member,并且分配memberId,加入到GroupMetatdata中,然后调用maybePrepareRebalance方法将状态切换为PreparingRebalance
如果是已知member重新申请加入,则要区分member所支持的PartitionAssignor是否发生了变化:若未发生变化则将当前的GroupMetadata的当前状态返回,然后消费者会发送SyncGroupRequest继续后面的操作;如果发生了变化或者Group Leader发送JoinGroupRequest,则更新member信息,并调用方法maybePrepareRebalance将状态切换为PreparingRebalance.
private def doJoinGroup(group: GroupMetadata, memberId: String, clientId: String, clientHost: String, rebalanceTimeoutMs: Int,
sessionTimeoutMs: Int, protocolType: String, protocols: List[(String, Array[Byte])], responseCallback: JoinCallback) {
group synchronized {
// 检测消费者支持的partition assignor
if (!group.is(Empty) && (group.protocolType != Some(protocolType) || !group.supportsProtocols(protocols.map(_._1).toSet))) {
// 如果不支持,则返回错误
responseCallback(joinError(memberId, Errors.INCONSISTENT_GROUP_PROTOCOL.code))
} else if (memberId != JoinGroupRequest.UNKNOWN_MEMBER_ID && !group.has(memberId)) {
// 检测memberId是否能够被识别
responseCallback(joinError(memberId, Errors.UNKNOWN_MEMBER_ID.code))
} else {
group.currentState match {
case Dead =>
// 直接返回UNKNOWN_MEMBER_ID错误码
responseCallback(joinError(memberId, Errors.UNKNOWN_MEMBER_ID.code))
case PreparingRebalance =>
// 如果是未知的新member申请加入,则创建member并分配memberId,并加入GroupMetadata中
if (memberId == JoinGroupRequest.UNKNOWN_MEMBER_ID) {
addMemberAndRebalance(rebalanceTimeoutMs, sessionTimeoutMs, clientId, clientHost, protocolType, protocols, group, responseCallback)
} else {
// 如果是已知的member重新申请加入,则更新GroupMetadata中记录的member信息
val member = group.get(memberId)
updateMemberAndRebalance(group, member, protocols, responseCallback)
}
case AwaitingSync =>
// 如果是未知的新的member申请加入,则创建member并分配memberId,并加入到GroupMetadata。
// 然后调用maybePrepareRebalance操作将状态切换为PreparingRebalance
if (memberId == JoinGroupRequest.UNKNOWN_MEMBER_ID) {
addMemberAndRebalance(rebalanceTimeoutMs, sessionTimeoutMs, clientId, clientHost, protocolType, protocols, group, responseCallback)
} else {
// 如果是已知的member重新申请加入,则要区分member支持的PartitionAssignor是否发生了变化
val member = group.get(memberId)
// 若未发生变化,则将当前member集合信息返回给Group Leader,
if (member.matches(protocols)) {
responseCallback(JoinGroupResult(
members = if (memberId == group.leaderId) {
group.currentMemberMetadata
} else {
Map.empty
},
memberId = memberId,
generationId = group.generationId,
subProtocol = group.protocol,
leaderId = group.leaderId,
errorCode = Errors.NONE.code))
} else {
// 若是发生了变化则更新member信息,并且调用maybePrepareRebalance将状态切换为PreparingRebalance
updateMemberAndRebalance(group, member, protocols, responseCallback)
}
}
case Empty | Stable =>
if (memberId == JoinGroupRequest.UNKNOWN_MEMBER_ID) {
// 如果是未知新的member加入,创建member,并且分配memberId,加入到GroupMetatdata中,
// 然后调用maybePrepareRebalance方法将状态切换为PreparingRebalance
addMemberAndRebalance(rebalanceTimeoutMs, sessionTimeoutMs, clientId, clientHost, protocolType, protocols, group, responseCallback)
} else {
// 如果是已知member重新申请加入,则要区分member所支持的PartitionAssignor是否发生了变化
val member = group.get(memberId)
//如果发生了变化或者Group Leader发送JoinGroupRequest,则更新member信息,并调用方法maybePrepareRebalance
// 将状态切换为PreparingRebalance.
if (memberId == group.leaderId || !member.matches(protocols)) {
updateMemberAndRebalance(group, member, protocols, responseCallback)
} else {
// 若未发生变化则将当前的GroupMetadata的当前状态返回,然后消费者会发送SyncGroupRequest继续后面的操作
responseCallback(JoinGroupResult(
members = Map.empty,
memberId = memberId,
generationId = group.generationId,
subProtocol = group.protocol,
leaderId = group.leaderId,
errorCode = Errors.NONE.code))
}
}
}
// 尝试完成相关的DelayedJoin
if (group.is(PreparingRebalance))
joinPurgatory.checkAndComplete(GroupKey(group.groupId))
}
}
}
private def addMemberAndRebalance(rebalanceTimeoutMs: Int, sessionTimeoutMs: Int, clientId: String,
clientHost: String, protocolType: String, protocols: List[(String, Array[Byte])],
group: GroupMetadata, callback: JoinCallback) = {
// 构走一个memberId
val memberId = clientId + "-" + group.generateMemberIdSuffix
// 创建MemberMetadata对象
val member = new MemberMetadata(memberId, group.groupId, clientId, clientHost, rebalanceTimeoutMs,
sessionTimeoutMs, protocolType, protocols)
// 设置awaitingJoinCallback回调函数,该函数是KafkaApis#handleJoinGroupRequest里的回调函数sendResponseCallback
member.awaitingJoinCallback = callback
// 添加到GroupMetadata保存
group.add(member.memberId, member)
// 尝试进行状态切换到PrepareRebalance
maybePrepareRebalance(group)
member
}
private def prepareRebalance(group: GroupMetadata) {
// 如果处于AwaitingSync状态,则先要重置MemberMetadata#assignment字段
if (group.is(AwaitingSync))
resetAndPropagateAssignmentError(group, Errors.REBALANCE_IN_PROGRESS)
// 将消费者组的状态切换为PreparingRebalance,表示准备执行rebalance操作
group.transitionTo(PreparingRebalance)
info("Preparing to restabilize group %s with old generation %s".format(group.groupId, group.generationId))
// DelayedJoin超时时长是GroupMetadata中所有Member设置的超时时长的最大值
val rebalanceTimeout = group.rebalanceTimeoutMs
// 创建DelayedJoin对象
val delayedRebalance = new DelayedJoin(this, group, rebalanceTimeout)
// 创建DelayedJoin的key
val groupKey = GroupKey(group.groupId)
// 尝试立即完成DelayedJoin,否则将DelayedFetch添加到joinPurgatory中
joinPurgatory.tryCompleteElseWatch(delayedRebalance, Seq(groupKey))
}
private def updateMemberAndRebalance(group: GroupMetadata, member: MemberMetadata,
protocols: List[(String, Array[Byte])], callback: JoinCallback) {
// 更新MemberMetadata支持的协议和awaitingJoinCallback回调函数
member.supportedProtocols = protocols
// 设置awaitingJoinCallback回调函数
member.awaitingJoinCallback = callback
// 尝试进行状态转换
maybePrepareRebalance(group)
}
四 DelayedJoin分析
我们知道消费者组转台切换为PreparingRebalance时会创建一个DelayedJoin对象并添加到GroupCoordinator的joinPurgatory中管理。DelayedJoin也是一种延迟操作,主要功能就是等待消费者组中所有消费者发送JoinGroupRequest申请加入。每当处理完新收到的JoinGroupRequest时候,都会检测相关的DelayedJoin 是否能够完成,经过一段时间的等待,DelayedJoin也会到期执行
coordinator: GroupCoordinator 对应的GroupCoordinator
group: GroupMetadata 对应的GroupMetadata
rebalanceTimeout: Long 指定DelayedJoin到期时长
def tryCompleteJoin(group: GroupMetadata, forceComplete: () => Boolean) = {
group synchronized {
// 判断已知的member是否已经申请加入
if (group.notYetRejoinedMembers.isEmpty)
forceComplete()
else false
}
}
def onCompleteJoin(group: GroupMetadata) {
var delayedStore: Option[DelayedStore] = None
group synchronized {
// 获取未重新申请加入加入的已知member,从GroupMetadtat删除之
group.notYetRejoinedMembers.foreach { failedMember =>
group.remove(failedMember.memberId)
}
// 如果组内还有member或者没有memeber但是元数据还没有被删除
if (!group.is(Dead)) {
// 递增generationId,并且还会选择消费者组最终使用的PartitionAssignor
group.initNextGeneration()
// 如果组内已经没有成员,但是元数据还没有被删除
if (group.is(Empty)) {
info(s"Group ${group.groupId} with generation ${group.generationId} is now empty")
// 创建DelayedStore对象
delayedStore = groupManager.prepareStoreGroup(group, Map.empty, error => {
if (error != Errors.NONE) {
warn(s"Failed to write empty metadata for group ${group.groupId}: ${error.message}")
}
})
} else {
// 就是组内还有成员
info(s"Stabilized group ${group.groupId} generation ${group.generationId}")
// 向GroupMetadata中所有的Member发送JoinGroupResponse
for (member <- group.allMemberMetadata) {
assert(member.awaitingJoinCallback != null)
// 发给Group Leader和follower的JoinGroupResponse不一样
val joinResult = JoinGroupResult(
members=if (member.memberId == group.leaderId) { group.currentMemberMetadata } else { Map.empty },
memberId=member.memberId,
generationId=group.generationId,
subProtocol=group.protocol,
leaderId=group.leaderId,
errorCode=Errors.NONE.code)
member.awaitingJoinCallback(joinResult)
member.awaitingJoinCallback = null
completeAndScheduleNextHeartbeatExpiration(group, member)
}
}
}
}
// 调用GroupMetadataMangaer的store方法
delayedStore.foreach(groupManager.store)
}
五 HearbeatRequest分析
每一个消费者都会定期向GroupCoordinator发送HeartbeatRequest请求,告诉GroupCoordinator自己还活着。
def handleHeartbeat(groupId: String, memberId: String, generationId: Int, responseCallback: Short => Unit) {
// 首先进行一系列检测,当前GroupCoordinator是否管理该消费者组;是否已经加载对应的offset topic分区
if (!isActive.get) {
responseCallback(Errors.GROUP_COORDINATOR_NOT_AVAILABLE.code)
} else if (!isCoordinatorForGroup(groupId)) {
responseCallback(Errors.NOT_COORDINATOR_FOR_GROUP.code)
} else if (isCoordinatorLoadingInProgress(groupId)) {
responseCallback(Errors.NONE.code)
} else {
// 根据groupId 得到group
groupManager.getGroup(groupId) match {
case None =>
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
case Some(group) =>
group synchronized {
// 检测group 状态
group.currentState match {
case Dead =>
// 如果Dead状态,表示其他线程已经从元数据中删除了该组,这可能组已经迁移到其他GroupCoordinator或者
// 组正处于非stable状态的转换
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
case Empty =>
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
// 如果处于AwaitingSync
case AwaitingSync =>
if (!group.has(memberId))
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
else
responseCallback(Errors.REBALANCE_IN_PROGRESS.code)
// 如果处于PreparingRebalance
case PreparingRebalance =>
if (!group.has(memberId)) {
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
} else if (generationId != group.generationId) {
responseCallback(Errors.ILLEGAL_GENERATION.code)
} else {
val member = group.get(memberId)
completeAndScheduleNextHeartbeatExpiration(group, member)
responseCallback(Errors.REBALANCE_IN_PROGRESS.code)
}
// 如果处于Stable状态
case Stable =>
if (!group.has(memberId)) {
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
} else if (generationId != group.generationId) {
responseCallback(Errors.ILLEGAL_GENERATION.code)
} else {
val member = group.get(memberId)
completeAndScheduleNextHeartbeatExpiration(group, member)
responseCallback(Errors.NONE.code)
}
}
}
}
}
}
private def completeAndScheduleNextHeartbeatExpiration(group: GroupMetadata, member: MemberMetadata) {
// complete current heartbeat expectation
member.latestHeartbeat = time.milliseconds()
// 获取DelayedHeartbeat的key
val memberKey = MemberKey(member.groupId, member.memberId)
// 尝试完成之前添加的DelayedHeartbeat
heartbeatPurgatory.checkAndComplete(memberKey)
// 计算下一次的Heartbeat的超时时间
val newHeartbeatDeadline = member.latestHeartbeat + member.sessionTimeoutMs
// 创建新的DelayedHeartbeat,并添加到heartbeatPurgatory中
val delayedHeartbeat = new DelayedHeartbeat(this, group, member, newHeartbeatDeadline, member.sessionTimeoutMs)
heartbeatPurgatory.tryCompleteElseWatch(delayedHeartbeat, Seq(memberKey))
}
六 SyncGroupRequest分析
def handleSyncGroup(groupId: String, generation: Int, memberId: String,
groupAssignment: Map[String, Array[Byte]], responseCallback: SyncCallback) {
if (!isActive.get) {
responseCallback(Array.empty, Errors.GROUP_COORDINATOR_NOT_AVAILABLE.code)
} else if (!isCoordinatorForGroup(groupId)) { // 检测是否该GroupCoordinator管理这个消费者组
responseCallback(Array.empty, Errors.NOT_COORDINATOR_FOR_GROUP.code)
} else {
groupManager.getGroup(groupId) match {
case None => responseCallback(Array.empty, Errors.UNKNOWN_MEMBER_ID.code)
// 调用doSyncGroup
case Some(group) => doSyncGroup(group, generation, memberId, groupAssignment, responseCallback)
}
}
}
private def doSyncGroup(group: GroupMetadata, generationId: Int, memberId: String,
groupAssignment: Map[String, Array[Byte]], responseCallback: SyncCallback) {
var delayedGroupStore: Option[DelayedStore] = None
group synchronized {
// 检测memeber是不是group成员
if (!group.has(memberId)) {
responseCallback(Array.empty, Errors.UNKNOWN_MEMBER_ID.code)
} else if (generationId != group.generationId) { //检测generationId是否合法
responseCallback(Array.empty, Errors.ILLEGAL_GENERATION.code)
} else {
group.currentState match {
case Empty | Dead =>
// 直接返回UNKNOWN_MEMBER_ID错误码
responseCallback(Array.empty, Errors.UNKNOWN_MEMBER_ID.code)
case PreparingRebalance =>
// 直接返回REBALANCE_IN_PROGRESS错误码
responseCallback(Array.empty, Errors.REBALANCE_IN_PROGRESS.code)
case AwaitingSync =>
// 设置awaitingSyncCallback函数也就是kafkaApis里的handleSyncGroupRequest里的sendResponseCallback回调
group.get(memberId).awaitingSyncCallback = responseCallback
// 如果当前发送该请求的memeber是leader
if (memberId == group.leaderId) {
info(s"Assignment received from leader for group ${group.groupId} for generation ${group.generationId}")
// 将没有分配分区的memeber对应的分配结果填充为空的byte数组
val missing = group.allMembers -- groupAssignment.keySet
val assignment = groupAssignment ++ missing.map(_ -> Array.empty[Byte]).toMap
// 通过GroupMetadataManager将GroupMetadata相关信息形成消息,并且写入到对应的offset topic中
delayedGroupStore = groupManager.prepareStoreGroup(group, assignment, (error: Errors) => {
group synchronized {
if (group.is(AwaitingSync) && generationId == group.generationId) {
if (error != Errors.NONE) {
// 清空分区分配结果,发送异常响应
resetAndPropagateAssignmentError(group, error)
// 状态转换为PrepareRebalance
maybePrepareRebalance(group)
} else {
// 设置分区的分配结果,发送正常的SyncGroupResponse
setAndPropagateAssignment(group, assignment)
// 状态转换为Stable
group.transitionTo(Stable)
}
}
}
})
}
case Stable =>
// 将分配给这个member的负责处理的分区信息返回
val memberMetadata = group.get(memberId)
responseCallback(memberMetadata.assignment, Errors.NONE.code)
completeAndScheduleNextHeartbeatExpiration(group, group.get(memberId))
}
}
}
// 调用GroupMetadataManager的store方法
delayedGroupStore.foreach(groupManager.store)
}
七 OffsetCommitRequest分析
def handleCommitOffsets(groupId: String, memberId: String, generationId: Int,
offsetMetadata: immutable.Map[TopicPartition, OffsetAndMetadata],
responseCallback: immutable.Map[TopicPartition, Short] => Unit) {
// 首先进行一系列的检查
if (!isActive.get) {
responseCallback(offsetMetadata.mapValues(_ => Errors.GROUP_COORDINATOR_NOT_AVAILABLE.code))
} else if (!isCoordinatorForGroup(groupId)) {
responseCallback(offsetMetadata.mapValues(_ => Errors.NOT_COORDINATOR_FOR_GROUP.code))
} else if (isCoordinatorLoadingInProgress(groupId)) {
responseCallback(offsetMetadata.mapValues(_ => Errors.GROUP_LOAD_IN_PROGRESS.code))
} else {
groupManager.getGroup(groupId) match {
// 如果对象消费者组不存在且generationId < 0,表示GroupCoordinator不维护消费组的分区分配结果,只记录提交的offset
case None =>
if (generationId < 0) {
// the group is not relying on Kafka for group management, so allow the commit
val group = groupManager.addGroup(new GroupMetadata(groupId))
doCommitOffsets(group, memberId, generationId, offsetMetadata, responseCallback)
} else {
// or this is a request coming from an older generation. either way, reject the commit
responseCallback(offsetMetadata.mapValues(_ => Errors.ILLEGAL_GENERATION.code))
}
case Some(group) =>
doCommitOffsets(group, memberId, generationId, offsetMetadata, responseCallback)
}
}
}
def doCommitOffsets(group: GroupMetadata, memberId: String, generationId: Int,
offsetMetadata: immutable.Map[TopicPartition, OffsetAndMetadata],
responseCallback: immutable.Map[TopicPartition, Short] => Unit) {
var delayedOffsetStore: Option[DelayedStore] = None
group synchronized {
// 如果group状态时Dead
if (group.is(Dead)) {
// 直接返回UNKNOWN_MEMBER_ID错误码
responseCallback(offsetMetadata.mapValues(_ => Errors.UNKNOWN_MEMBER_ID.code))
} else if (generationId < 0 && group.is(Empty)) {// 如果group没有member且generationId < 0
// 这个组仅仅用于存储offsets
delayedOffsetStore = groupManager.prepareStoreOffsets(group, memberId, generationId,
offsetMetadata, responseCallback)
} else if (group.is(AwaitingSync)) {// 如果组状态AwaitingSync
// 直接返回响应REBALANCE_IN_PROGRESS
responseCallback(offsetMetadata.mapValues(_ => Errors.REBALANCE_IN_PROGRESS.code))
} else if (!group.has(memberId)) { // 如果发送请求meneber不是该组的成员,返回UNKNOWN_MEMBER_ID错误码
responseCallback(offsetMetadata.mapValues(_ => Errors.UNKNOWN_MEMBER_ID.code))
} else if (generationId != group.generationId) {// 如果genrationId不合法,返回ILLEGAL_GENERATION错误码
responseCallback(offsetMetadata.mapValues(_ => Errors.ILLEGAL_GENERATION.code))
} else {
// 获取member
val member = group.get(memberId)
// 更新心跳
completeAndScheduleNextHeartbeatExpiration(group, member)
// 创建DelayedOffsetStore
delayedOffsetStore = groupManager.prepareStoreOffsets(group, memberId, generationId,
offsetMetadata, responseCallback)
}
}
// 调用GroupMetadataManger的store方法存储offset
delayedOffsetStore.foreach(groupManager.store)
}
八 LeaveGroupRequest分析
当消费者离开消费者组,例如调用unsubscribe方法取消对topic的订阅,会向GroupCoordiantor发送LeaveGroupRequest.
def handleLeaveGroup(groupId: String, memberId: String, responseCallback: Short => Unit) {
if (!isActive.get) {
responseCallback(Errors.GROUP_COORDINATOR_NOT_AVAILABLE.code)
} else if (!isCoordinatorForGroup(groupId)) {
responseCallback(Errors.NOT_COORDINATOR_FOR_GROUP.code)
} else if (isCoordinatorLoadingInProgress(groupId)) {
responseCallback(Errors.GROUP_LOAD_IN_PROGRESS.code)
} else {
groupManager.getGroup(groupId) match {
// 如果没找到该组,返回UNKNOWN_MEMBER_ID错误码
case None =>
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
case Some(group) =>
group synchronized {
// 如果group已经没有组了,且元数据也被删除了或者该组没有这个member
if (group.is(Dead) || !group.has(memberId)) {
// 返回UNKNOWN_MEMBER_ID错误码
responseCallback(Errors.UNKNOWN_MEMBER_ID.code)
} else {
// 获取该member
val member = group.get(memberId)
// 标记该成员为isLeaving为true;尝试执行对应的DelayedHeartbeat操作
removeHeartbeatForLeavingMember(group, member)
// 从消费者组中删除该成员,并且进行组的状态转换
onMemberFailure(group, member)
responseCallback(Errors.NONE.code)
}
}
}
}
}
private def removeHeartbeatForLeavingMember(group: GroupMetadata, member: MemberMetadata) {
// 标记该成员为isLeaving为true
member.isLeaving = true
// 构建 member key
val memberKey = MemberKey(member.groupId, member.memberId)
// 尝试执行对应的DelayedHeartbeat操作
heartbeatPurgatory.checkAndComplete(memberKey)
}
private def onMemberFailure(group: GroupMetadata, member: MemberMetadata) {
trace("Member %s in group %s has failed".format(member.memberId, group.groupId))
// 从消费者组中删除该成员
group.remove(member.memberId)
group.currentState match {
// group当前状态是Dead | Empty,什么也不做
case Dead | Empty =>
// group当前状态是Stable | AwaitingSync ,则转换成PrepareRebalance
case Stable | AwaitingSync => maybePrepareRebalance(group)
// group当前状态是PrepareRebalance,则试图完成DelayedJoin操作
case PreparingRebalance => joinPurgatory.checkAndComplete(GroupKey(group.groupId))
}
}