一、前言
我们知道,Kafka有消费者组(Consumer Group)的概念:
- 每个消费者属于一个消费者组 ,一个消费者组有多个消费者
- 发布到topic的消息只能被每个订阅该topic的消费者组中的一个消费者消费
- 不同消费者组中的消费者可以消费同一个topic下的消息
但是消费者(Consumer)是如何知道要消费topic下哪个分区(partition)的消息的呢?每个分区和消费者之间的分配关系是如何确定的?如果出现消费者加入或者退出,分区数(partition)变化等情况时,消费者与分区之间的分配关系怎么重新分配?本文通过分析Consumer rebalance过程来解决这些问题
1.1 相关概念
GroupCoordinator:服务端协调者,负责与客户端通信,每个 Broker 都会启动一个 GroupCoordinator 服务,消费者组会通过 __consumer_offsets
的分区数量取模的方式确定选择哪个Broker的GroupCoordinator
。
ConsumerCoordinator:客户端协调者,负责与服务端通信。
二、整体流程
- GroupCoordinatorRequest(GCR):寻找GroupCoordinator,这个过程主要会向最少请求的节点发起请求,等待节点成功返回GroupCoordinator,尝试连接该GroupCoordinator
- JoinGroupRequest(JGR):关闭心跳,发送JGR请求,GroupCoordinator接收请求并指定消费者组的一个消费者成为Leader,让其负责分区partition的分配,并返回分区分配策略
- SyncGroupRequest(SGR):发起SGR请求同步分区分配策略,成功则重新开启心跳
2.1 触发条件
- 有新的消费者加入Consumer Group
- 有消费者宕机下线。
- 有消费者主动退出Consumer Group
- Consumer Group订阅的任一Topic出现分区数量的变化
- 消费者调用unsubscribe取消对某Topic的订阅
2.2 KafkaConsumer类的poll方法
首先看下消费者是如何消费消息的,消费者通过KafkaConsumer的poll()方法和assign()方法进行消费,消费者启动的时候leader会给当前消费者分配分区,并且保存在KafkaConsumer类中的字段subscriptions下,消费者拉取消息的时候通过读取字段subscriptions来获取分配好的分区,并向该分区拉取消息,消费者拉取消息分析如下:
private final SubscriptionState subscriptions; // 保存leader分配好的分区,KafkaConsumer直接往该分区拉取数据
private KafkaConsumer(ConsumerConfig config, Deserializer<K> keyDeserializer, Deserializer<V> valueDeserializer) {
...
this.subscriptions = new SubscriptionState(logContext, offsetResetStrategy);
// 将subscriptions字段赋值给fetcher中的subscriptions,拉取消息的时候会往fetcher中的subscriptions字段读取要消费的分区列表。
this.fetcher = new Fetcher(logContext, this.client, config.getInt("fetch.min.bytes"), config.getInt("fetch.max.bytes"), config.getInt("fetch.max.wait.ms"), config.getInt("max.partition.fetch.bytes"), config.getInt("max.poll.records"), config.getBoolean("check.crcs"), config.getString("client.rack"), this.keyDeserializer, this.valueDeserializer, this.metadata, this.subscriptions, this.metrics, metricsRegistry, this.time, this.retryBackoffMs, this.requestTimeoutMs, isolationLevel, apiVersions);
...
}
poll方法:
private ConsumerRecords<K, V> poll(Timer timer, boolean includeMetadataInTimeout) {
this.acquireAndEnsureOpen();
try {
// 判断订阅模式,有四种:NONE,AUTO_TOPICS, AUTO_PATTERN,USER_ASSIGNED;如果为NONE则异常
if (this.subscriptions.hasNoSubscriptionOrUserAssignment()) {
throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");
} else {
ConsumerRecords var3;
do {
this.client.maybeTriggerWakeup();
if (includeMetadataInTimeout) {
// 触发一次rebalance过程,实际上是获取leader分配后的分区,并赋值给subscriptions字段
if (!this.updateAssignmentMetadataIfNeeded(timer)) {
var3 = ConsumerRecords.empty();
return var3;
}
} else {
while(!this.updateAssignmentMetadataIfNeeded(this.time.timer(9223372036854775807L))) {
this.log.warn("Still waiting for metadata");
}
}
// 拉取消息
Map<TopicPartition, List<ConsumerRecord<K, V>>> records = this.pollForFetches(timer);
if (!records.isEmpty()) {
if (this.fetcher.sendFetches() > 0 || this.client.hasPendingRequests()) {
this.client.pollNoWakeup();
}
ConsumerRecords var4 = this.interceptors.onConsume(new ConsumerRecords(records));
return var4;
}
} while(timer.notExpired());
var3 = ConsumerRecords.empty();
return var3;
}
} finally {
this.release();
}
}
继续查看方法pollForFetches:
private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollForFetches(Timer timer) {
long pollTimeout = this.coordinator == null ? timer.remainingMs() : Math.min(this.coordinator.timeToNextPoll(timer.currentTimeMs()), timer.remainingMs());
Map<TopicPartition, List<ConsumerRecord<K, V>>> records = this.fetcher.fetchedRecords();
if (!records.isEmpty()) {
return records;
} else {
// 拉取消息
this.fetcher.sendFetches();
if (!this.cachedSubscriptionHashAllFetchPositions && pollTimeout > this.retryBackoffMs) {
pollTimeout = this.retryBackoffMs;
}
Timer pollTimer = this.time.timer(pollTimeout);
this.client.poll(pollTimer, () -> {
return !this.fetcher.hasCompletedFetches();
});
timer.update(pollTimer.currentTimeMs());
return this.coordinator != null && this.coordinator.rejoinNeededOrPending() ? Collections.emptyMap() : this.fetcher.fetchedRecords();
}
}
public synchronized int sendFetches() {
// 更新分配的分区
this.sensors.maybeUpdateAssignment(this.subscriptions);
// 获取分配的分区,即subscriptions字段
Map<Node, FetchRequestData> fetchRequestMap = this.prepareFetchRequests();
Iterator var2 = fetchRequestMap.entrySet().iterator();
while(var2.hasNext()) {
Entry<Node, FetchRequestData> entry = (Entry)var2.next();
final Node fetchTarget = (Node)entry.getKey();
final FetchRequestData data = (FetchRequestData)entry.getValue();
Builder request = Builder.forConsumer(this.maxWaitMs, this.minBytes, data.toSend()).isolationLevel(this.isolationLevel).setMaxBytes(this.maxBytes).metadata(data.metadata()).toForget(data.toForget()).rackId(this.clientRackId);
if (this.log.isDebugEnabled()) {
this.log.debug("Sending {} {} to broker {}", new Object[]{this.isolationLevel, data.toString(), fetchTarget});
}
// fetchTarget是要拉取的分区节点
this.client.send(fetchTarget, request).addListener(new RequestFutureListener<ClientResponse>() {...});
this.nodesWithPendingFetchRequests.add(((Node)entry.getKey()).id());
}
return fetchRequestMap.size();
}
2.3 GCR请求
boolean updateAssignmentMetadataIfNeeded(final Timer timer) {
// 消费者协调者
if (coordinator != null && !coordinator.poll(timer)) {
return false;
}
return updateFetchPositions(timer);
}
由上面分析可知,this.client.send(fetchTarget, request)
消费者最终会向fetchTarget节点拉取数据,因此消费者其实在客户端已经保存好了与分区之间的关系,每次拉取消息的时候只需要读取subscriptions字段中的分区,并向该分区拉取数据就行了。而rebalance过程就是修改分区分配的过程,重新分配后的分区会保存到消费者的subscriptions字段中。
三、GroupCoordinatorRequest过程
分为如下几个步骤:
- coordinatorUnknown()是否需要查找GroupCoordinator
- lookupCoordinator()选择具有最少请求的节点Node,即具有最少的InFlightRequests的节点。
- sendFindCoordinatorRequest()向集群中最少请求节点发送获取协调器节点请求,并将GCR请求放到unset队列
- ConsumerNetworkClient.poll()将GCR请求发送出去,并执行回调,成功的话则获取GroupCoordinator节点。
- 抛出RuntimeException异常则更新元数据后重试,连接断开则睡眠一定时间后重试
下面以ConsumerCoordinator类的poll方法为例,分析GCR获取GroupCoordinator过程
public boolean poll(Timer timer) {
this.maybeUpdateSubscriptionMetadata();
this.invokeCompletedOffsetCommitCallbacks();
if (this.subscriptions.partitionsAutoAssigned()) {
this.pollHeartbeat(timer.currentTimeMs());
// 查找GroupCoordinator
if (this.coordinatorUnknown() && !this.ensureCoordinatorReady(timer)) {
return false;
}
if (this.rejoinNeededOrPending()) {
if (this.subscriptions.hasPatternSubscription()) {
if (this.metadata.timeToAllowUpdate(timer.currentTimeMs()) == 0L) {
this.metadata.requestUpdate();
}
if (!this.client.ensureFreshMetadata(timer)) {
return false;
}
this.maybeUpdateSubscriptionMetadata();
}
if (!this.ensureActiveGroup(timer)) {
return false;
}
}
} else if (this.metadata.updateRequested() && !this.client.hasReadyNodes(timer.currentTimeMs())) {
this.client.awaitMetadataUpdate(timer);
}
this.maybeAutoCommitOffsetsAsync(timer.currentTimeMs());
return true;
}
AbstractCoordinator类的coordinatorUnknown()
检查是否需要查找GroupCoordinator
public boolean coordinatorUnknown() {
// 检查coordinator是否为空
return this.checkAndGetCoordinator() == null;
}
protected synchronized Node checkAndGetCoordinator() {
// 检查网络连接是否可用
if (this.coordinator != null && this.client.isUnavailable(this.coordinator)) {
this.markCoordinatorUnknown(true);
return null;
} else {
return this.coordinator;
}
}
AbstractCoordinator类的ensureCoordinatorReady()
方法获取GroupCoordinator,
protected synchronized boolean ensureCoordinatorReady(Timer timer) {
// 判断
if (!this.coordinatorUnknown()) {
return true;
} else {
do {
// 查找Borker,并将GCR请求添加到unset队列中
RequestFuture<Void> future = this.lookupCoordinator();
// 发送unset队列中的所有请求,并执行Broker返回Response的监听器回调
this.client.poll(future, timer);
// GCR请求完成,跳出循环
if (!future.isDone()) {
break;
}
// 异常,更新元数据后重试
if (future.failed()) {
if (!future.isRetriable()) {
throw future.exception();
}
this.log.debug("Coordinator discovery failed, refreshing metadata");
this.client.awaitMetadataUpdate(timer);
// 找到了GroupCoordinator,但是断开与Node节点的连接,则睡眠一段时间后在重试
} else if (this.coordinator != null && this.client.isUnavailable(this.coordinator)) {
this.markCoordinatorUnknown();
timer.sleep(this.retryBackoffMs);
}
} while(this.coordinatorUnknown() && timer.notExpired());
return !this.coordinatorUnknown();
}
}
AbstractCoordinator类的lookupCoordinator()
获取集群中负载最小的Node节点,向该最小负载的节点发送获取GCR请求,sendFindCoordinatorRequest()方法发送GCR请求
protected synchronized RequestFuture<Void> lookupCoordinator() {
if (this.findCoordinatorFuture == null) {
// 获取负载最小的Node节点
Node node = this.client.leastLoadedNode();
if (node == null) {
this.log.debug("No broker available to send FindCoordinator request");
return RequestFuture.noBrokersAvailable();
}
// 发送GroupCoordinatorRequest请求
this.findCoordinatorFuture = this.sendFindCoordinatorRequest(node);
}
return this.findCoordinatorFuture;
}
调用ConsumerNetworkClient类的send方法,该send()方法不会直接发送请求,而是会将请求直接放在unset队列中,同时会返回一个RequestFuture类,这个类用于异步回调,通过添加监听器FindCoordinatorResponseHandler异步处理返回的结果:
private RequestFuture<Void> sendFindCoordinatorRequest(Node node) {
this.log.debug("Sending FindCoordinator request to broker {}", node);
org.apache.kafka.common.requests.FindCoordinatorRequest.Builder requestBuilder = new org.apache.kafka.common.requests.FindCoordinatorRequest.Builder((new FindCoordinatorRequestData()).setKeyType(CoordinatorType.GROUP.id()).setKey(this.groupId));
// 调用send方法,并添加监听器处理请求返回的结果
return this.client.send(node, requestBuilder).compose(new AbstractCoordinator.FindCoordinatorResponseHandler());
}
public RequestFuture<ClientResponse> send(Node node, Builder<?> requestBuilder) {
return this.send(node, requestBuilder, this.requestTimeoutMs);
}
public RequestFuture<ClientResponse> send(Node node, Builder<?> requestBuilder, int requestTimeoutMs) {
long now = this.time.milliseconds();
ConsumerNetworkClient.RequestFutureCompletionHandler completionHandler = new ConsumerNetworkClient.RequestFutureCompletionHandler();
ClientRequest clientRequest = this.client.newClientRequest(node.idString(), requestBuilder, now, true, requestTimeoutMs, completionHandler);
// 放到unset队列,并不直接发送请求
this.unsent.put(node, clientRequest);
this.client.wakeup();
// 返回RequestFuture,可在该类中添加监听器处理返回的结果
return completionHandler.future;
}
对RequestFuture类添加监听器FindCoordinatorResponseHandler,异步回调处理Response的结果
private class FindCoordinatorResponseHandler extends RequestFutureAdapter<ClientResponse, Void> {
private FindCoordinatorResponseHandler() {
}
// 成功回调
public void onSuccess(ClientResponse resp, RequestFuture<Void> future) {
AbstractCoordinator.this.log.debug("Received FindCoordinator response {}", resp);
AbstractCoordinator.this.clearFindCoordinatorFuture();
FindCoordinatorResponse findCoordinatorResponse = (FindCoordinatorResponse)resp.responseBody();
Errors error = findCoordinatorResponse.error();
if (error == Errors.NONE) {
synchronized(AbstractCoordinator.this) {
int coordinatorConnectionId = 2147483647 - findCoordinatorResponse.data().nodeId();
// 设置GroupCoordinator所在的Node节点
AbstractCoordinator.this.coordinator = new Node(coordinatorConnectionId, findCoordinatorResponse.data().host(), findCoordinatorResponse.data().port());
AbstractCoordinator.this.log.info("Discovered group coordinator {}", AbstractCoordinator.this.coordinator);
// 尝试发起与GroupCoordinator的连接
AbstractCoordinator.this.client.tryConnect(AbstractCoordinator.this.coordinator);
// 更新心跳时间
AbstractCoordinator.this.heartbeat.resetSessionTimeout();
}
future.complete((Object)null);
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
} else {
AbstractCoordinator.this.log.debug("Group coordinator lookup failed: {}", findCoordinatorResponse.data().errorMessage());
future.raise(error);
}
}
public void onFailure(RuntimeException e, RequestFuture<Void> future) {
AbstractCoordinator.this.clearFindCoordinatorFuture();
super.onFailure(e, future);
}
}
回到AbstractCoordinator类的ensureCoordinatorReady()方法,将GCR请求放到unset队列后,就会调用this.client.poll(future, timer)
方法发送GCR请求并执行回调,ConsumerNetworkClient类的poll方法如下:
public boolean poll(RequestFuture<?> future, Timer timer) {
do {
// 带超时的阻塞发送,直到超时或者当前GCR请求完成
this.poll((Timer)timer, (ConsumerNetworkClient.PollCondition)future);
} while(!future.isDone() && timer.notExpired());
return future.isDone();
}
public void poll(Timer timer, ConsumerNetworkClient.PollCondition pollCondition) {
this.poll(timer, pollCondition, false);
}
继续往下看,该poll()方法会执行以下一些步骤,处理所有等待完成的请求,处理异步断开的节点的请求,并且调用send()等待发送unset队列中的所有请求,调用poll()发送请求,处理连接failed的节点的请求,移除unset队列中过期的请求。
public void poll(Timer timer, ConsumerNetworkClient.PollCondition pollCondition, boolean disableWakeup) {
// 通知pendingCompletion队列中的请求,并执行请求的回调pendingCompletion.fireCompletion()
this.firePendingCompletedRequests();
this.lock.lock();
try {
// 处理异步断开的请求,获取pendingDisconnects中异步断开的Node节点,移除unset队列中对应Node节点的所有请求,并执行对应请求的回调
this.handlePendingDisconnects();
// 将unset队列中的所有请求,调用NetworkClient.send()方法,该方法将请求保存到KafkaChannel的send字段中等待发送
long pollDelayMs = this.trySend(timer.currentTimeMs());
if (this.pendingCompletion.isEmpty() && (pollCondition == null || pollCondition.shouldBlock())) {
long pollTimeout = Math.min(timer.remainingMs(), pollDelayMs);
if (this.client.inFlightRequestCount() == 0) {
pollTimeout = Math.min(pollTimeout, this.retryBackoffMs);
}
// 延时发送,将KafkaChannel的send请求发送出去
this.client.poll(pollTimeout, timer.currentTimeMs());
} else {
// 立刻发送,将KafkaChannel的send请求发送出去
this.client.poll(0L, timer.currentTimeMs());
}
// 更新时间
timer.update();
// 处理连接失败的Node节点,移除unset队列中连接失败节点的所有请求,并执行对应unset队列中的所有请求的回调
this.checkDisconnects(timer.currentTimeMs());
if (!disableWakeup) {
this.maybeTriggerWakeup();
}
this.maybeThrowInterruptException();
// 处理完连接失败的节点请求,再次尝试等待发送
this.trySend(timer.currentTimeMs());
// 移除unset队列中过期的请求,并执行请求的回调
this.failExpiredRequests(timer.currentTimeMs());
this.unsent.clean();
} finally {
this.lock.unlock();
}
// 再次通知pendingCompletion执行回调
this.firePendingCompletedRequests();
this.metadata.maybeThrowException();
}
JoinGroupRequest和SyncGroupRequest请求分析
步骤如下:
- ensureCoordinatorReady()查找GroupCoordinator。
- rejoinNeededOrPending()是否需要重新加入组,initiateJoinGroup()初始化,停止心跳,设置REBALANCING状态等。
- sendJoinGroupRequest()发送JGR请求,GroupCoordinator会指定消费者Leader,并决定分区策略,消费者Leader负责分区的分配。
- sendSyncGroupRequest()发送SGR请求,所有消费者向GroupCoordinator同步分区分配结果,同时重启心跳,设置STABLE状态等。
- onJoinComplete()保存分配好的分区。
同样以ConsumerCoordinator类的poll()方法为例,this.ensureActiveGroup(timer)
发起JGR请求和SGR请求
boolean ensureActiveGroup(Timer timer) {
// 再次获取GroupCoordinator
if (!this.ensureCoordinatorReady(timer)) {
return false;
} else {
// 开启心跳线程
this.startHeartbeatThreadIfNeeded();
// 加入消费者组
return this.joinGroupIfNeeded(timer);
}
}
boolean joinGroupIfNeeded(Timer timer) {
// 需要加入组
while(this.rejoinNeededOrPending()) {
// 获取GroupCoordinator
if (!this.ensureCoordinatorReady(timer)) {
return false;
}
if (this.needsJoinPrepare) {
this.onJoinPrepare(this.generation.generationId, this.generation.memberId);
this.needsJoinPrepare = false;
}
// 发送JGR和SGR请求
RequestFuture<ByteBuffer> future = this.initiateJoinGroup();
// 阻塞等待JGR和SGR请求完成
this.client.poll(future, timer);
if (!future.isDone()) {
return false;
}
// 成功
if (future.succeeded()) {
// 获取分区分配结果
ByteBuffer memberAssignment = ((ByteBuffer)future.value()).duplicate();
// 执行分区的分配
this.onJoinComplete(this.generation.generationId, this.generation.memberId, this.generation.protocol, memberAssignment);
this.resetJoinGroupFuture();
this.needsJoinPrepare = true;
} else {
// 失败,重新加入组
this.resetJoinGroupFuture();
RuntimeException exception = future.exception();
if (!(exception instanceof UnknownMemberIdException) && !(exception instanceof RebalanceInProgressException) && !(exception instanceof IllegalGenerationException) && !(exception instanceof MemberIdRequiredException)) {
if (!future.isRetriable()) {
throw exception;
}
timer.sleep(this.retryBackoffMs);
}
}
}
return true;
}
AbstractCoordinator类的initiateJoinGroup()初始化加入组,停止心跳,发送JGR请求,添加SGR请求监听器,当SGR请求成功时则重新开启心跳线程,设置状态等操作。
private synchronized RequestFuture<ByteBuffer> initiateJoinGroup() {
if (this.joinFuture == null) {
// 停止心跳现成
this.disableHeartbeatThread();
this.state = AbstractCoordinator.MemberState.REBALANCING;
// 加入组请求
this.joinFuture = this.sendJoinGroupRequest();
// SGR请求的回调监听器
this.joinFuture.addListener(new RequestFutureListener<ByteBuffer>() {
// 成功回调
public void onSuccess(ByteBuffer value) {
synchronized(AbstractCoordinator.this) {
AbstractCoordinator.this.log.info("Successfully joined group with generation {}", AbstractCoordinator.this.generation.generationId);
AbstractCoordinator.this.state = AbstractCoordinator.MemberState.STABLE;
// 不用重新加入组
AbstractCoordinator.this.rejoinNeeded = false;
if (AbstractCoordinator.this.heartbeatThread != null) {
// 重新开启心跳线程
AbstractCoordinator.this.heartbeatThread.enable();
}
}
}
// 失败回调
public void onFailure(RuntimeException e) {
synchronized(AbstractCoordinator.this) {
AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
}
}
});
}
return this.joinFuture;
}
继续查看sendJoinGroupRequest(),this.client.send(this.coordinator, requestBuilder, joinGroupTimeoutMs)
,指定Broker节点this.coordinator
,同时添加JGR请求的回调监听器new AbstractCoordinator.JoinGroupResponseHandler()
RequestFuture<ByteBuffer> sendJoinGroupRequest() {
if (this.coordinatorUnknown()) {
return RequestFuture.coordinatorNotAvailable();
} else {
this.log.info("(Re-)joining group");
Builder requestBuilder = new Builder((new JoinGroupRequestData()).setGroupId(this.groupId).setSessionTimeoutMs(this.sessionTimeoutMs).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setProtocolType(this.protocolType()).setProtocols(this.metadata()).setRebalanceTimeoutMs(this.rebalanceTimeoutMs));
this.log.debug("Sending JoinGroup ({}) to coordinator {}", requestBuilder, this.coordinator);
int joinGroupTimeoutMs = Math.max(this.rebalanceTimeoutMs, this.rebalanceTimeoutMs + 5000);
return this.client.send(this.coordinator, requestBuilder, joinGroupTimeoutMs).compose(new AbstractCoordinator.JoinGroupResponseHandler());
}
}
JoinGroupResponseHandler监听类,GroupCoordinator返回分区分配策略,同时GroupCoordinator决定消费者Leader,其他消费者作为Follower,对应执行代码AbstractCoordinator.this.onJoinLeader(joinResponse).chain(future)
和AbstractCoordinator.this.onJoinFollower().chain(future)
,最后发送SGR请求同步分区分配结果,只是Leader会根据GroupCoordinator返回的分区分配策略进行分区(partition)的分配,并在SGR请求中带上分区分配结果,Follower则带上空的分配结果。
private class JoinGroupResponseHandler extends AbstractCoordinator.CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
private JoinGroupResponseHandler() {
super();
}
public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
Errors error = joinResponse.error();
if (error == Errors.NONE) {
AbstractCoordinator.this.log.debug("Received successful JoinGroup response: {}", joinResponse);
AbstractCoordinator.this.sensors.joinLatency.record((double)this.response.requestLatencyMs());
synchronized(AbstractCoordinator.this) {
if (AbstractCoordinator.this.state != AbstractCoordinator.MemberState.REBALANCING) {
future.raise(new AbstractCoordinator.UnjoinedGroupException());
} else {
// 由GroupCoordinator制定分区分配策略和消费者Leader
AbstractCoordinator.this.generation = new AbstractCoordinator.Generation(joinResponse.data().generationId(), joinResponse.data().memberId(), joinResponse.data().protocolName());
if (joinResponse.isLeader()) {
// 分区分配并发送SGR请求,并绑定SRG回调监听器
AbstractCoordinator.this.onJoinLeader(joinResponse).chain(future);
} else {
// 发送SGR请求,并绑定SRG回调监听器
AbstractCoordinator.this.onJoinFollower().chain(future);
}
}
}
} else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS) {
AbstractCoordinator.this.log.debug("Attempt to join group rejected since coordinator {} is loading the group.", AbstractCoordinator.this.coordinator());
future.raise(error);
} else if (error == Errors.UNKNOWN_MEMBER_ID) {
AbstractCoordinator.this.resetGeneration();
AbstractCoordinator.this.log.debug("Attempt to join group failed due to unknown member id.");
future.raise(Errors.UNKNOWN_MEMBER_ID);
} else if (error != Errors.COORDINATOR_NOT_AVAILABLE && error != Errors.NOT_COORDINATOR) {
if (error == Errors.FENCED_INSTANCE_ID) {
AbstractCoordinator.this.log.error("Received fatal exception: group.instance.id gets fenced");
future.raise(error);
} else if (error != Errors.INCONSISTENT_GROUP_PROTOCOL && error != Errors.INVALID_SESSION_TIMEOUT && error != Errors.INVALID_GROUP_ID && error != Errors.GROUP_AUTHORIZATION_FAILED && error != Errors.GROUP_MAX_SIZE_REACHED) {
if (error == Errors.UNSUPPORTED_VERSION) {
AbstractCoordinator.this.log.error("Attempt to join group failed due to unsupported version error. Please unset field group.instance.id and retryto see if the problem resolves");
future.raise(error);
} else if (error == Errors.MEMBER_ID_REQUIRED) {
synchronized(AbstractCoordinator.this) {
AbstractCoordinator.this.generation = new AbstractCoordinator.Generation(-1, joinResponse.data().memberId(), (String)null);
AbstractCoordinator.this.rejoinNeeded = true;
AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
}
future.raise(Errors.MEMBER_ID_REQUIRED);
} else {
AbstractCoordinator.this.log.error("Attempt to join group failed due to unexpected error: {}", error.message());
future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
}
} else {
AbstractCoordinator.this.log.error("Attempt to join group failed due to fatal error: {}", error.message());
if (error == Errors.GROUP_MAX_SIZE_REACHED) {
future.raise(new GroupMaxSizeReachedException(AbstractCoordinator.this.groupId));
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
} else {
future.raise(error);
}
}
} else {
AbstractCoordinator.this.markCoordinatorUnknown();
AbstractCoordinator.this.log.debug("Attempt to join group failed due to obsolete coordinator information: {}", error.message());
future.raise(error);
}
}
}
如果当前消费者是Leader,将按照分区分配策略分进行分区(partition)的分配,然后发送SGR请求并带上分区分配结果,分区分配策略由服务端配置文件server.properities
的参数partition.assignment.strategy
设置,默认是range
private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
try {
// 消费者根据分区分配策略进行分区的分配,分配策略由GroupCoordinator决定
Map<String, ByteBuffer> groupAssignment = this.performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members());
List<SyncGroupRequestAssignment> groupAssignmentList = new ArrayList();
Iterator var4 = groupAssignment.entrySet().iterator();
while(var4.hasNext()) {
Entry<String, ByteBuffer> assignment = (Entry)var4.next();
groupAssignmentList.add((new SyncGroupRequestAssignment()).setMemberId((String)assignment.getKey()).setAssignment(Utils.toArray((ByteBuffer)assignment.getValue())));
}
// 发送SGR请求,并将分区分配结果同步给GroupCoordinator
org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder = new org.apache.kafka.common.requests.SyncGroupRequest.Builder((new SyncGroupRequestData()).setGroupId(this.groupId).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setGenerationId(this.generation.generationId).setAssignments(groupAssignmentList));
this.log.debug("Sending leader SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
return this.sendSyncGroupRequest(requestBuilder);
} catch (RuntimeException var6) {
return RequestFuture.failure(var6);
}
}
如果当前消费者是Follower,直接发送SGR请求
private RequestFuture<ByteBuffer> onJoinFollower() {
org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder = new org.apache.kafka.common.requests.SyncGroupRequest.Builder((new SyncGroupRequestData()).setGroupId(this.groupId).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setGenerationId(this.generation.generationId).setAssignments(Collections.emptyList()));
this.log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
return this.sendSyncGroupRequest(requestBuilder);
}
private RequestFuture<ByteBuffer> sendSyncGroupRequest(org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder) {
return this.coordinatorUnknown() ? RequestFuture.coordinatorNotAvailable() : this.client.send(this.coordinator, requestBuilder).compose(new AbstractCoordinator.SyncGroupResponseHandler());
}
SGR请求监听器,如果出现异常,则会设置标记AbstractCoordinator.this.requestRejoin()
重新加入组,如果成功则通知其他监听器回调,最终由方法ConsumerCoordinator.onJoinComplete()执行分区分配结果
private class SyncGroupResponseHandler extends AbstractCoordinator.CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {
private SyncGroupResponseHandler() {
super();
}
public void handle(SyncGroupResponse syncResponse, RequestFuture<ByteBuffer> future) {
Errors error = syncResponse.error();
if (error == Errors.NONE) {
AbstractCoordinator.this.sensors.syncLatency.record((double)this.response.requestLatencyMs());
// 回调,并且将分区分配结果通过CAS自旋设赋值给future的value
future.complete(ByteBuffer.wrap(syncResponse.data.assignment()));
} else {
AbstractCoordinator.this.requestRejoin();
if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
} else if (error == Errors.REBALANCE_IN_PROGRESS) {
AbstractCoordinator.this.log.debug("SyncGroup failed because the group began another rebalance");
future.raise(error);
} else if (error == Errors.FENCED_INSTANCE_ID) {
AbstractCoordinator.this.log.error("Received fatal exception: group.instance.id gets fenced");
future.raise(error);
} else if (error != Errors.UNKNOWN_MEMBER_ID && error != Errors.ILLEGAL_GENERATION) {
if (error != Errors.COORDINATOR_NOT_AVAILABLE && error != Errors.NOT_COORDINATOR) {
future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
} else {
AbstractCoordinator.this.log.debug("SyncGroup failed: {}", error.message());
AbstractCoordinator.this.markCoordinatorUnknown();
future.raise(error);
}
} else {
AbstractCoordinator.this.log.debug("SyncGroup failed: {}", error.message());
AbstractCoordinator.this.resetGeneration();
future.raise(error);
}
}
}
}
future.complete(ByteBuffer.wrap(syncResponse.data.assignment()))
这里其实是回调这个监听器:
private synchronized RequestFuture<ByteBuffer> initiateJoinGroup() {
if (this.joinFuture == null) {
this.disableHeartbeatThread();
this.state = AbstractCoordinator.MemberState.REBALANCING;
this.joinFuture = this.sendJoinGroupRequest();
// SGR回调监听器
this.joinFuture.addListener(new RequestFutureListener<ByteBuffer>() {
public void onSuccess(ByteBuffer value) {
synchronized(AbstractCoordinator.this) {
AbstractCoordinator.this.log.info("Successfully joined group with generation {}", AbstractCoordinator.this.generation.generationId);
AbstractCoordinator.this.state = AbstractCoordinator.MemberState.STABLE;
AbstractCoordinator.this.rejoinNeeded = false;
if (AbstractCoordinator.this.heartbeatThread != null) {
AbstractCoordinator.this.heartbeatThread.enable();
}
}
}
public void onFailure(RuntimeException e) {
synchronized(AbstractCoordinator.this) {
AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
}
}
});
}
return this.joinFuture;
}
继续回到AbstractCoordinator的joinGroupIfNeeded()方法,拿到GroupCoordinator的分区分配结果后,由方法onJoinComplete()执行处理
boolean joinGroupIfNeeded(Timer timer) {
// 需要加入组
while(this.rejoinNeededOrPending()) {
// 获取GroupCoordinator
if (!this.ensureCoordinatorReady(timer)) {
return false;
}
if (this.needsJoinPrepare) {
this.onJoinPrepare(this.generation.generationId, this.generation.memberId);
this.needsJoinPrepare = false;
}
// 发送JGR和SGR请求
RequestFuture<ByteBuffer> future = this.initiateJoinGroup();
// 阻塞等待JGR和SGR请求完成
this.client.poll(future, timer);
if (!future.isDone()) {
return false;
}
// 成功
if (future.succeeded()) {
// 获取分区分配结果
ByteBuffer memberAssignment = ((ByteBuffer)future.value()).duplicate();
// 执行分区的分配
this.onJoinComplete(this.generation.generationId, this.generation.memberId, this.generation.protocol, memberAssignment);
this.resetJoinGroupFuture();
this.needsJoinPrepare = true;
} else {
// 失败,重新加入组
this.resetJoinGroupFuture();
RuntimeException exception = future.exception();
if (!(exception instanceof UnknownMemberIdException) && !(exception instanceof RebalanceInProgressException) && !(exception instanceof IllegalGenerationException) && !(exception instanceof MemberIdRequiredException)) {
if (!future.isRetriable()) {
throw exception;
}
timer.sleep(this.retryBackoffMs);
}
}
}
return true;
}
onJoinComplete()方法在ConsumerCoordinator类中实现,更新leader分配的分区partition
protected void onJoinComplete(int generation, String memberId, String assignmentStrategy, ByteBuffer assignmentBuffer) {
if (!this.isLeader) {
this.assignmentSnapshot = null;
}
// 分区策略
PartitionAssignor assignor = this.lookupAssignor(assignmentStrategy);
if (assignor == null) {
throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
} else {
// 分区结果
Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
// 更新分配的分区partition
if (!this.subscriptions.assignFromSubscribed(assignment.partitions())) {
this.handleAssignmentMismatch(assignment);
} else {
Set<TopicPartition> assignedPartitions = this.subscriptions.assignedPartitions();
this.maybeUpdateJoinedSubscription(assignedPartitions);
assignor.onAssignment(assignment, generation);
if (this.autoCommitEnabled) {
this.nextAutoCommitTimer.updateAndReset((long)this.autoCommitIntervalMs);
}
ConsumerRebalanceListener listener = this.subscriptions.rebalanceListener();
this.log.info("Setting newly assigned partitions: {}", Utils.join(assignedPartitions, ", "));
try {
listener.onPartitionsAssigned(assignedPartitions);
} catch (InterruptException | WakeupException var10) {
throw var10;
} catch (Exception var11) {
this.log.error("User provided listener {} failed on partition assignment", listener.getClass().getName(), var11);
}
}
}
}
至此,rebalance过程分析完毕