深入理解Kafka(二):rebalance源码剖析

一、前言

我们知道,Kafka有消费者组(Consumer Group)的概念:

  • 每个消费者属于一个消费者组 ,一个消费者组有多个消费者
  • 发布到topic的消息只能被每个订阅该topic的消费者组中的一个消费者消费
  • 不同消费者组中的消费者可以消费同一个topic下的消息

但是消费者(Consumer)是如何知道要消费topic下哪个分区(partition)的消息的呢?每个分区和消费者之间的分配关系是如何确定的?如果出现消费者加入或者退出,分区数(partition)变化等情况时,消费者与分区之间的分配关系怎么重新分配?本文通过分析Consumer rebalance过程来解决这些问题

1.1 相关概念

GroupCoordinator:服务端协调者,负责与客户端通信,每个 Broker 都会启动一个 GroupCoordinator 服务,消费者组会通过 __consumer_offsets的分区数量取模的方式确定选择哪个BrokerGroupCoordinator
ConsumerCoordinator:客户端协调者,负责与服务端通信。

二、整体流程

  • GroupCoordinatorRequest(GCR):寻找GroupCoordinator,这个过程主要会向最少请求的节点发起请求,等待节点成功返回GroupCoordinator,尝试连接该GroupCoordinator
  • JoinGroupRequest(JGR):关闭心跳,发送JGR请求,GroupCoordinator接收请求并指定消费者组的一个消费者成为Leader,让其负责分区partition的分配,并返回分区分配策略
  • SyncGroupRequest(SGR):发起SGR请求同步分区分配策略,成功则重新开启心跳

2.1 触发条件

  • 有新的消费者加入Consumer Group
  • 有消费者宕机下线。
  • 有消费者主动退出Consumer Group
  • Consumer Group订阅的任一Topic出现分区数量的变化
  • 消费者调用unsubscribe取消对某Topic的订阅

2.2 KafkaConsumer类的poll方法

首先看下消费者是如何消费消息的,消费者通过KafkaConsumerpoll()方法和assign()方法进行消费,消费者启动的时候leader会给当前消费者分配分区,并且保存在KafkaConsumer类中的字段subscriptions下,消费者拉取消息的时候通过读取字段subscriptions来获取分配好的分区,并向该分区拉取消息,消费者拉取消息分析如下:

	private final SubscriptionState subscriptions; // 保存leader分配好的分区,KafkaConsumer直接往该分区拉取数据
	private KafkaConsumer(ConsumerConfig config, Deserializer<K> keyDeserializer, Deserializer<V> valueDeserializer) {
		...
		this.subscriptions = new SubscriptionState(logContext, offsetResetStrategy);
		// 将subscriptions字段赋值给fetcher中的subscriptions,拉取消息的时候会往fetcher中的subscriptions字段读取要消费的分区列表。
		this.fetcher = new Fetcher(logContext, this.client, config.getInt("fetch.min.bytes"), config.getInt("fetch.max.bytes"), config.getInt("fetch.max.wait.ms"), config.getInt("max.partition.fetch.bytes"), config.getInt("max.poll.records"), config.getBoolean("check.crcs"), config.getString("client.rack"), this.keyDeserializer, this.valueDeserializer, this.metadata, this.subscriptions, this.metrics, metricsRegistry, this.time, this.retryBackoffMs, this.requestTimeoutMs, isolationLevel, apiVersions);
		...
    }

poll方法:

	private ConsumerRecords<K, V> poll(Timer timer, boolean includeMetadataInTimeout) {
        this.acquireAndEnsureOpen();
		
        try {
        	// 判断订阅模式,有四种:NONE,AUTO_TOPICS, AUTO_PATTERN,USER_ASSIGNED;如果为NONE则异常
            if (this.subscriptions.hasNoSubscriptionOrUserAssignment()) {
                throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");
            } else {
                ConsumerRecords var3;
                do {
                    this.client.maybeTriggerWakeup();
                    if (includeMetadataInTimeout) {
                    	// 触发一次rebalance过程,实际上是获取leader分配后的分区,并赋值给subscriptions字段
                        if (!this.updateAssignmentMetadataIfNeeded(timer)) {
                            var3 = ConsumerRecords.empty();
                            return var3;
                        }
                    } else {
                        while(!this.updateAssignmentMetadataIfNeeded(this.time.timer(9223372036854775807L))) {
                            this.log.warn("Still waiting for metadata");
                        }
                    }
					// 拉取消息
                    Map<TopicPartition, List<ConsumerRecord<K, V>>> records = this.pollForFetches(timer);
                    if (!records.isEmpty()) {
                        if (this.fetcher.sendFetches() > 0 || this.client.hasPendingRequests()) {
                            this.client.pollNoWakeup();
                        }

                        ConsumerRecords var4 = this.interceptors.onConsume(new ConsumerRecords(records));
                        return var4;
                    }
                } while(timer.notExpired());

                var3 = ConsumerRecords.empty();
                return var3;
            }
        } finally {
            this.release();
        }
    }

继续查看方法pollForFetches

	private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollForFetches(Timer timer) {
        long pollTimeout = this.coordinator == null ? timer.remainingMs() : Math.min(this.coordinator.timeToNextPoll(timer.currentTimeMs()), timer.remainingMs());
        Map<TopicPartition, List<ConsumerRecord<K, V>>> records = this.fetcher.fetchedRecords();
        if (!records.isEmpty()) {
            return records;
        } else {
        	// 拉取消息
            this.fetcher.sendFetches();
            if (!this.cachedSubscriptionHashAllFetchPositions && pollTimeout > this.retryBackoffMs) {
                pollTimeout = this.retryBackoffMs;
            }

            Timer pollTimer = this.time.timer(pollTimeout);
            this.client.poll(pollTimer, () -> {
                return !this.fetcher.hasCompletedFetches();
            });
            timer.update(pollTimer.currentTimeMs());
            return this.coordinator != null && this.coordinator.rejoinNeededOrPending() ? Collections.emptyMap() : this.fetcher.fetchedRecords();
        }
    }
	public synchronized int sendFetches() {
		// 更新分配的分区
        this.sensors.maybeUpdateAssignment(this.subscriptions);
        // 获取分配的分区,即subscriptions字段
         Map<Node, FetchRequestData> fetchRequestMap = this.prepareFetchRequests();
        Iterator var2 = fetchRequestMap.entrySet().iterator();

        while(var2.hasNext()) {
            Entry<Node, FetchRequestData> entry = (Entry)var2.next();
            final Node fetchTarget = (Node)entry.getKey();
            final FetchRequestData data = (FetchRequestData)entry.getValue();
            Builder request = Builder.forConsumer(this.maxWaitMs, this.minBytes, data.toSend()).isolationLevel(this.isolationLevel).setMaxBytes(this.maxBytes).metadata(data.metadata()).toForget(data.toForget()).rackId(this.clientRackId);
            if (this.log.isDebugEnabled()) {
                this.log.debug("Sending {} {} to broker {}", new Object[]{this.isolationLevel, data.toString(), fetchTarget});
            }
            // fetchTarget是要拉取的分区节点
        	this.client.send(fetchTarget, request).addListener(new RequestFutureListener<ClientResponse>() {...});
        	this.nodesWithPendingFetchRequests.add(((Node)entry.getKey()).id());
        }

        return fetchRequestMap.size();
	}
2.3 GCR请求
boolean updateAssignmentMetadataIfNeeded(final Timer timer) {
	// 消费者协调者
    if (coordinator != null && !coordinator.poll(timer)) {
        return false;
    }
    return updateFetchPositions(timer);
}

由上面分析可知,this.client.send(fetchTarget, request)消费者最终会向fetchTarget节点拉取数据,因此消费者其实在客户端已经保存好了与分区之间的关系,每次拉取消息的时候只需要读取subscriptions字段中的分区,并向该分区拉取数据就行了。而rebalance过程就是修改分区分配的过程,重新分配后的分区会保存到消费者的subscriptions字段中。

三、GroupCoordinatorRequest过程

分为如下几个步骤:

  1. coordinatorUnknown()是否需要查找GroupCoordinator
  2. lookupCoordinator()选择具有最少请求的节点Node,即具有最少的InFlightRequests的节点。
  3. sendFindCoordinatorRequest()向集群中最少请求节点发送获取协调器节点请求,并将GCR请求放到unset队列
  4. ConsumerNetworkClient.poll()将GCR请求发送出去,并执行回调,成功的话则获取GroupCoordinator节点。
  5. 抛出RuntimeException异常则更新元数据后重试,连接断开则睡眠一定时间后重试

下面以ConsumerCoordinator类的poll方法为例,分析GCR获取GroupCoordinator过程

	public boolean poll(Timer timer) {
        this.maybeUpdateSubscriptionMetadata();
        this.invokeCompletedOffsetCommitCallbacks();
        if (this.subscriptions.partitionsAutoAssigned()) {
            this.pollHeartbeat(timer.currentTimeMs());
            // 查找GroupCoordinator
            if (this.coordinatorUnknown() && !this.ensureCoordinatorReady(timer)) {
                return false;
            }

            if (this.rejoinNeededOrPending()) {
                if (this.subscriptions.hasPatternSubscription()) {
                    if (this.metadata.timeToAllowUpdate(timer.currentTimeMs()) == 0L) {
                        this.metadata.requestUpdate();
                    }

                    if (!this.client.ensureFreshMetadata(timer)) {
                        return false;
                    }

                    this.maybeUpdateSubscriptionMetadata();
                }

                if (!this.ensureActiveGroup(timer)) {
                    return false;
                }
            }
        } else if (this.metadata.updateRequested() && !this.client.hasReadyNodes(timer.currentTimeMs())) {
            this.client.awaitMetadataUpdate(timer);
        }

        this.maybeAutoCommitOffsetsAsync(timer.currentTimeMs());
        return true;
    }

AbstractCoordinator类的coordinatorUnknown()检查是否需要查找GroupCoordinator

	public boolean coordinatorUnknown() {
	    // 检查coordinator是否为空
        return this.checkAndGetCoordinator() == null;
    }
	protected synchronized Node checkAndGetCoordinator() {
	    // 检查网络连接是否可用
        if (this.coordinator != null && this.client.isUnavailable(this.coordinator)) {
            this.markCoordinatorUnknown(true);
            return null;
        } else {
            return this.coordinator;
        }
    }

AbstractCoordinator类的ensureCoordinatorReady()方法获取GroupCoordinator

	protected synchronized boolean ensureCoordinatorReady(Timer timer) {
	    // 判断
        if (!this.coordinatorUnknown()) {
            return true;
        } else {
            do {
                // 查找Borker,并将GCR请求添加到unset队列中
                RequestFuture<Void> future = this.lookupCoordinator();
                // 发送unset队列中的所有请求,并执行Broker返回Response的监听器回调
                this.client.poll(future, timer);
                // GCR请求完成,跳出循环
                if (!future.isDone()) {
                    break;
                }
                // 异常,更新元数据后重试
                if (future.failed()) {
                    if (!future.isRetriable()) {
                        throw future.exception();
                    }
                    this.log.debug("Coordinator discovery failed, refreshing metadata");
                    this.client.awaitMetadataUpdate(timer);
                    // 找到了GroupCoordinator,但是断开与Node节点的连接,则睡眠一段时间后在重试
                } else if (this.coordinator != null && this.client.isUnavailable(this.coordinator)) {
                    this.markCoordinatorUnknown();
                    timer.sleep(this.retryBackoffMs);
                }
            } while(this.coordinatorUnknown() && timer.notExpired());
            return !this.coordinatorUnknown();
        }
    }

AbstractCoordinator类的lookupCoordinator()获取集群中负载最小的Node节点,向该最小负载的节点发送获取GCR请求,sendFindCoordinatorRequest()方法发送GCR请求

	protected synchronized RequestFuture<Void> lookupCoordinator() {
        if (this.findCoordinatorFuture == null) {
            // 获取负载最小的Node节点
            Node node = this.client.leastLoadedNode();
            if (node == null) {
                this.log.debug("No broker available to send FindCoordinator request");
                return RequestFuture.noBrokersAvailable();
            }
            // 发送GroupCoordinatorRequest请求
            this.findCoordinatorFuture = this.sendFindCoordinatorRequest(node);
        }
        return this.findCoordinatorFuture;
    }

调用ConsumerNetworkClient类的send方法,该send()方法不会直接发送请求,而是会将请求直接放在unset队列中,同时会返回一个RequestFuture类,这个类用于异步回调,通过添加监听器FindCoordinatorResponseHandler异步处理返回的结果:

	private RequestFuture<Void> sendFindCoordinatorRequest(Node node) {
        this.log.debug("Sending FindCoordinator request to broker {}", node);
        org.apache.kafka.common.requests.FindCoordinatorRequest.Builder requestBuilder = new org.apache.kafka.common.requests.FindCoordinatorRequest.Builder((new FindCoordinatorRequestData()).setKeyType(CoordinatorType.GROUP.id()).setKey(this.groupId));
        // 调用send方法,并添加监听器处理请求返回的结果
        return this.client.send(node, requestBuilder).compose(new AbstractCoordinator.FindCoordinatorResponseHandler());
    }
	public RequestFuture<ClientResponse> send(Node node, Builder<?> requestBuilder) {
        return this.send(node, requestBuilder, this.requestTimeoutMs);
    }

    public RequestFuture<ClientResponse> send(Node node, Builder<?> requestBuilder, int requestTimeoutMs) {
        long now = this.time.milliseconds();
        ConsumerNetworkClient.RequestFutureCompletionHandler completionHandler = new ConsumerNetworkClient.RequestFutureCompletionHandler();
        ClientRequest clientRequest = this.client.newClientRequest(node.idString(), requestBuilder, now, true, requestTimeoutMs, completionHandler);
        // 放到unset队列,并不直接发送请求
        this.unsent.put(node, clientRequest);
        this.client.wakeup();
        // 返回RequestFuture,可在该类中添加监听器处理返回的结果
        return completionHandler.future;
    }

RequestFuture类添加监听器FindCoordinatorResponseHandler,异步回调处理Response的结果

	private class FindCoordinatorResponseHandler extends RequestFutureAdapter<ClientResponse, Void> {
        private FindCoordinatorResponseHandler() {
        }
		// 成功回调
        public void onSuccess(ClientResponse resp, RequestFuture<Void> future) {
            AbstractCoordinator.this.log.debug("Received FindCoordinator response {}", resp);
            AbstractCoordinator.this.clearFindCoordinatorFuture();
            FindCoordinatorResponse findCoordinatorResponse = (FindCoordinatorResponse)resp.responseBody();
            Errors error = findCoordinatorResponse.error();
            if (error == Errors.NONE) {
                synchronized(AbstractCoordinator.this) {
                    int coordinatorConnectionId = 2147483647 - findCoordinatorResponse.data().nodeId();
                    // 设置GroupCoordinator所在的Node节点
                    AbstractCoordinator.this.coordinator = new Node(coordinatorConnectionId, findCoordinatorResponse.data().host(), findCoordinatorResponse.data().port());
                    AbstractCoordinator.this.log.info("Discovered group coordinator {}", AbstractCoordinator.this.coordinator);
                    // 尝试发起与GroupCoordinator的连接
                    AbstractCoordinator.this.client.tryConnect(AbstractCoordinator.this.coordinator);
                    // 更新心跳时间
                    AbstractCoordinator.this.heartbeat.resetSessionTimeout();
                }

                future.complete((Object)null);
            } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
            } else {
                AbstractCoordinator.this.log.debug("Group coordinator lookup failed: {}", findCoordinatorResponse.data().errorMessage());
                future.raise(error);
            }

        }

        public void onFailure(RuntimeException e, RequestFuture<Void> future) {
            AbstractCoordinator.this.clearFindCoordinatorFuture();
            super.onFailure(e, future);
        }
    }

回到AbstractCoordinator类的ensureCoordinatorReady()方法,将GCR请求放到unset队列后,就会调用this.client.poll(future, timer)方法发送GCR请求并执行回调,ConsumerNetworkClient类的poll方法如下:

	public boolean poll(RequestFuture<?> future, Timer timer) {
        do {
            // 带超时的阻塞发送,直到超时或者当前GCR请求完成
            this.poll((Timer)timer, (ConsumerNetworkClient.PollCondition)future);
        } while(!future.isDone() && timer.notExpired());
        return future.isDone();
    }
    public void poll(Timer timer, ConsumerNetworkClient.PollCondition pollCondition) {
        this.poll(timer, pollCondition, false);
    }

继续往下看,该poll()方法会执行以下一些步骤,处理所有等待完成的请求,处理异步断开的节点的请求,并且调用send()等待发送unset队列中的所有请求,调用poll()发送请求,处理连接failed的节点的请求,移除unset队列中过期的请求。

public void poll(Timer timer, ConsumerNetworkClient.PollCondition pollCondition, boolean disableWakeup) {
        // 通知pendingCompletion队列中的请求,并执行请求的回调pendingCompletion.fireCompletion()
        this.firePendingCompletedRequests();
        this.lock.lock();

        try {
            // 处理异步断开的请求,获取pendingDisconnects中异步断开的Node节点,移除unset队列中对应Node节点的所有请求,并执行对应请求的回调
            this.handlePendingDisconnects();
            // 将unset队列中的所有请求,调用NetworkClient.send()方法,该方法将请求保存到KafkaChannel的send字段中等待发送
            long pollDelayMs = this.trySend(timer.currentTimeMs());
            if (this.pendingCompletion.isEmpty() && (pollCondition == null || pollCondition.shouldBlock())) {
                long pollTimeout = Math.min(timer.remainingMs(), pollDelayMs);
                if (this.client.inFlightRequestCount() == 0) {
                    pollTimeout = Math.min(pollTimeout, this.retryBackoffMs);
                }
				// 延时发送,将KafkaChannel的send请求发送出去
                this.client.poll(pollTimeout, timer.currentTimeMs());
            } else {
                // 立刻发送,将KafkaChannel的send请求发送出去
                this.client.poll(0L, timer.currentTimeMs());
            }
  			// 更新时间
            timer.update();
            // 处理连接失败的Node节点,移除unset队列中连接失败节点的所有请求,并执行对应unset队列中的所有请求的回调
            this.checkDisconnects(timer.currentTimeMs());
            if (!disableWakeup) {
                this.maybeTriggerWakeup();
            }

            this.maybeThrowInterruptException();
            // 处理完连接失败的节点请求,再次尝试等待发送	
            this.trySend(timer.currentTimeMs());
            // 移除unset队列中过期的请求,并执行请求的回调
            this.failExpiredRequests(timer.currentTimeMs());
            this.unsent.clean();
        } finally {
            this.lock.unlock();
        }
  		// 再次通知pendingCompletion执行回调
        this.firePendingCompletedRequests();
        this.metadata.maybeThrowException();
    }

JoinGroupRequest和SyncGroupRequest请求分析

步骤如下:

  1. ensureCoordinatorReady()查找GroupCoordinator。
  2. rejoinNeededOrPending()是否需要重新加入组,initiateJoinGroup()初始化,停止心跳,设置REBALANCING状态等。
  3. sendJoinGroupRequest()发送JGR请求,GroupCoordinator会指定消费者Leader,并决定分区策略,消费者Leader负责分区的分配。
  4. sendSyncGroupRequest()发送SGR请求,所有消费者向GroupCoordinator同步分区分配结果,同时重启心跳,设置STABLE状态等。
  5. onJoinComplete()保存分配好的分区。

同样以ConsumerCoordinator类的poll()方法为例,this.ensureActiveGroup(timer)发起JGR请求和SGR请求

	boolean ensureActiveGroup(Timer timer) {
		// 再次获取GroupCoordinator
        if (!this.ensureCoordinatorReady(timer)) {
            return false;
        } else {
        	// 开启心跳线程
            this.startHeartbeatThreadIfNeeded();
            // 加入消费者组
            return this.joinGroupIfNeeded(timer);
        }
    }
	boolean joinGroupIfNeeded(Timer timer) {
		// 需要加入组
        while(this.rejoinNeededOrPending()) {
            // 获取GroupCoordinator
            if (!this.ensureCoordinatorReady(timer)) {
                return false;
            }

            if (this.needsJoinPrepare) {
                this.onJoinPrepare(this.generation.generationId, this.generation.memberId);
                this.needsJoinPrepare = false;
            }
            // 发送JGR和SGR请求            
            RequestFuture<ByteBuffer> future = this.initiateJoinGroup();
            // 阻塞等待JGR和SGR请求完成
            this.client.poll(future, timer);
            if (!future.isDone()) {
                return false;
            }
			// 成功
            if (future.succeeded()) {
            	// 获取分区分配结果
                ByteBuffer memberAssignment = ((ByteBuffer)future.value()).duplicate();
                // 执行分区的分配
                this.onJoinComplete(this.generation.generationId, this.generation.memberId, this.generation.protocol, memberAssignment);
                this.resetJoinGroupFuture();
                this.needsJoinPrepare = true;
            } else {
            	// 失败,重新加入组
                this.resetJoinGroupFuture();
                RuntimeException exception = future.exception();
                if (!(exception instanceof UnknownMemberIdException) && !(exception instanceof RebalanceInProgressException) && !(exception instanceof IllegalGenerationException) && !(exception instanceof MemberIdRequiredException)) {
                    if (!future.isRetriable()) {
                        throw exception;
                    }

                    timer.sleep(this.retryBackoffMs);
                }
            }
        }

        return true;
    }

AbstractCoordinator类的initiateJoinGroup()初始化加入组,停止心跳,发送JGR请求,添加SGR请求监听器,当SGR请求成功时则重新开启心跳线程,设置状态等操作。

	private synchronized RequestFuture<ByteBuffer> initiateJoinGroup() {
        if (this.joinFuture == null) {
        	// 停止心跳现成
            this.disableHeartbeatThread();
            this.state = AbstractCoordinator.MemberState.REBALANCING;
            // 加入组请求
            this.joinFuture = this.sendJoinGroupRequest();
            // SGR请求的回调监听器
            this.joinFuture.addListener(new RequestFutureListener<ByteBuffer>() {
            	// 成功回调
                public void onSuccess(ByteBuffer value) {
                    synchronized(AbstractCoordinator.this) {
                        AbstractCoordinator.this.log.info("Successfully joined group with generation {}", AbstractCoordinator.this.generation.generationId);
                        AbstractCoordinator.this.state = AbstractCoordinator.MemberState.STABLE;
                        // 不用重新加入组
                        AbstractCoordinator.this.rejoinNeeded = false;
                        if (AbstractCoordinator.this.heartbeatThread != null) {
                        	// 重新开启心跳线程
                            AbstractCoordinator.this.heartbeatThread.enable();
                        }
                    }
                }
				// 失败回调
                public void onFailure(RuntimeException e) {
                    synchronized(AbstractCoordinator.this) {
                        AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
                    }
                }
            });
        }

        return this.joinFuture;
    }

继续查看sendJoinGroupRequest(),this.client.send(this.coordinator, requestBuilder, joinGroupTimeoutMs),指定Broker节点this.coordinator,同时添加JGR请求的回调监听器new AbstractCoordinator.JoinGroupResponseHandler()

	RequestFuture<ByteBuffer> sendJoinGroupRequest() {
        if (this.coordinatorUnknown()) {
            return RequestFuture.coordinatorNotAvailable();
        } else {
            this.log.info("(Re-)joining group");
            Builder requestBuilder = new Builder((new JoinGroupRequestData()).setGroupId(this.groupId).setSessionTimeoutMs(this.sessionTimeoutMs).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setProtocolType(this.protocolType()).setProtocols(this.metadata()).setRebalanceTimeoutMs(this.rebalanceTimeoutMs));
            this.log.debug("Sending JoinGroup ({}) to coordinator {}", requestBuilder, this.coordinator);
            int joinGroupTimeoutMs = Math.max(this.rebalanceTimeoutMs, this.rebalanceTimeoutMs + 5000);
            return this.client.send(this.coordinator, requestBuilder, joinGroupTimeoutMs).compose(new AbstractCoordinator.JoinGroupResponseHandler());
        }
    }

JoinGroupResponseHandler监听类,GroupCoordinator返回分区分配策略,同时GroupCoordinator决定消费者Leader,其他消费者作为Follower,对应执行代码AbstractCoordinator.this.onJoinLeader(joinResponse).chain(future)AbstractCoordinator.this.onJoinFollower().chain(future),最后发送SGR请求同步分区分配结果,只是Leader会根据GroupCoordinator返回的分区分配策略进行分区(partition)的分配,并在SGR请求中带上分区分配结果,Follower则带上空的分配结果。

	private class JoinGroupResponseHandler extends AbstractCoordinator.CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
        private JoinGroupResponseHandler() {
            super();
        }

        public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
            Errors error = joinResponse.error();
            if (error == Errors.NONE) {
                AbstractCoordinator.this.log.debug("Received successful JoinGroup response: {}", joinResponse);
                AbstractCoordinator.this.sensors.joinLatency.record((double)this.response.requestLatencyMs());
                synchronized(AbstractCoordinator.this) {
                    if (AbstractCoordinator.this.state != AbstractCoordinator.MemberState.REBALANCING) {
                        future.raise(new AbstractCoordinator.UnjoinedGroupException());
                    } else {
                    	// 由GroupCoordinator制定分区分配策略和消费者Leader
                        AbstractCoordinator.this.generation = new AbstractCoordinator.Generation(joinResponse.data().generationId(), joinResponse.data().memberId(), joinResponse.data().protocolName());
                        if (joinResponse.isLeader()) {
                        	// 分区分配并发送SGR请求,并绑定SRG回调监听器
                            AbstractCoordinator.this.onJoinLeader(joinResponse).chain(future);
                        } else {
                        	// 发送SGR请求,并绑定SRG回调监听器
                            AbstractCoordinator.this.onJoinFollower().chain(future);
                        }
                    }
                }
            } else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS) {
                AbstractCoordinator.this.log.debug("Attempt to join group rejected since coordinator {} is loading the group.", AbstractCoordinator.this.coordinator());
                future.raise(error);
            } else if (error == Errors.UNKNOWN_MEMBER_ID) {
                AbstractCoordinator.this.resetGeneration();
                AbstractCoordinator.this.log.debug("Attempt to join group failed due to unknown member id.");
                future.raise(Errors.UNKNOWN_MEMBER_ID);
            } else if (error != Errors.COORDINATOR_NOT_AVAILABLE && error != Errors.NOT_COORDINATOR) {
                if (error == Errors.FENCED_INSTANCE_ID) {
                    AbstractCoordinator.this.log.error("Received fatal exception: group.instance.id gets fenced");
                    future.raise(error);
                } else if (error != Errors.INCONSISTENT_GROUP_PROTOCOL && error != Errors.INVALID_SESSION_TIMEOUT && error != Errors.INVALID_GROUP_ID && error != Errors.GROUP_AUTHORIZATION_FAILED && error != Errors.GROUP_MAX_SIZE_REACHED) {
                    if (error == Errors.UNSUPPORTED_VERSION) {
                        AbstractCoordinator.this.log.error("Attempt to join group failed due to unsupported version error. Please unset field group.instance.id and retryto see if the problem resolves");
                        future.raise(error);
                    } else if (error == Errors.MEMBER_ID_REQUIRED) {
                        synchronized(AbstractCoordinator.this) {
                            AbstractCoordinator.this.generation = new AbstractCoordinator.Generation(-1, joinResponse.data().memberId(), (String)null);
                            AbstractCoordinator.this.rejoinNeeded = true;
                            AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
                        }

                        future.raise(Errors.MEMBER_ID_REQUIRED);
                    } else {
                        AbstractCoordinator.this.log.error("Attempt to join group failed due to unexpected error: {}", error.message());
                        future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
                    }
                } else {
                    AbstractCoordinator.this.log.error("Attempt to join group failed due to fatal error: {}", error.message());
                    if (error == Errors.GROUP_MAX_SIZE_REACHED) {
                        future.raise(new GroupMaxSizeReachedException(AbstractCoordinator.this.groupId));
                    } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                        future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
                    } else {
                        future.raise(error);
                    }
                }
            } else {
                AbstractCoordinator.this.markCoordinatorUnknown();
                AbstractCoordinator.this.log.debug("Attempt to join group failed due to obsolete coordinator information: {}", error.message());
                future.raise(error);
            }
        }
    }

如果当前消费者是Leader,将按照分区分配策略分进行分区(partition)的分配,然后发送SGR请求并带上分区分配结果,分区分配策略由服务端配置文件server.properities的参数partition.assignment.strategy设置,默认是range

	private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
        try {
        	// 消费者根据分区分配策略进行分区的分配,分配策略由GroupCoordinator决定
            Map<String, ByteBuffer> groupAssignment = this.performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members());
            List<SyncGroupRequestAssignment> groupAssignmentList = new ArrayList();
            Iterator var4 = groupAssignment.entrySet().iterator();

            while(var4.hasNext()) {
                Entry<String, ByteBuffer> assignment = (Entry)var4.next();
                groupAssignmentList.add((new SyncGroupRequestAssignment()).setMemberId((String)assignment.getKey()).setAssignment(Utils.toArray((ByteBuffer)assignment.getValue())));
            }
			// 发送SGR请求,并将分区分配结果同步给GroupCoordinator
            org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder = new org.apache.kafka.common.requests.SyncGroupRequest.Builder((new SyncGroupRequestData()).setGroupId(this.groupId).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setGenerationId(this.generation.generationId).setAssignments(groupAssignmentList));
            this.log.debug("Sending leader SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
            return this.sendSyncGroupRequest(requestBuilder);
        } catch (RuntimeException var6) {
            return RequestFuture.failure(var6);
        }
    }

如果当前消费者是Follower,直接发送SGR请求

 	private RequestFuture<ByteBuffer> onJoinFollower() {
        org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder = new org.apache.kafka.common.requests.SyncGroupRequest.Builder((new SyncGroupRequestData()).setGroupId(this.groupId).setMemberId(this.generation.memberId).setGroupInstanceId((String)this.groupInstanceId.orElse((Object)null)).setGenerationId(this.generation.generationId).setAssignments(Collections.emptyList()));
        this.log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
        return this.sendSyncGroupRequest(requestBuilder);
    }
	private RequestFuture<ByteBuffer> sendSyncGroupRequest(org.apache.kafka.common.requests.SyncGroupRequest.Builder requestBuilder) {
        return this.coordinatorUnknown() ? RequestFuture.coordinatorNotAvailable() : this.client.send(this.coordinator, requestBuilder).compose(new AbstractCoordinator.SyncGroupResponseHandler());
    }

SGR请求监听器,如果出现异常,则会设置标记AbstractCoordinator.this.requestRejoin()重新加入组,如果成功则通知其他监听器回调,最终由方法ConsumerCoordinator.onJoinComplete()执行分区分配结果

	private class SyncGroupResponseHandler extends AbstractCoordinator.CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {
        private SyncGroupResponseHandler() {
            super();
        }

        public void handle(SyncGroupResponse syncResponse, RequestFuture<ByteBuffer> future) {
            Errors error = syncResponse.error();
            if (error == Errors.NONE) {
                AbstractCoordinator.this.sensors.syncLatency.record((double)this.response.requestLatencyMs());
                // 回调,并且将分区分配结果通过CAS自旋设赋值给future的value
                future.complete(ByteBuffer.wrap(syncResponse.data.assignment()));
            } else {
                AbstractCoordinator.this.requestRejoin();
                if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                    future.raise(new GroupAuthorizationException(AbstractCoordinator.this.groupId));
                } else if (error == Errors.REBALANCE_IN_PROGRESS) {
                    AbstractCoordinator.this.log.debug("SyncGroup failed because the group began another rebalance");
                    future.raise(error);
                } else if (error == Errors.FENCED_INSTANCE_ID) {
                    AbstractCoordinator.this.log.error("Received fatal exception: group.instance.id gets fenced");
                    future.raise(error);
                } else if (error != Errors.UNKNOWN_MEMBER_ID && error != Errors.ILLEGAL_GENERATION) {
                    if (error != Errors.COORDINATOR_NOT_AVAILABLE && error != Errors.NOT_COORDINATOR) {
                        future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
                    } else {
                        AbstractCoordinator.this.log.debug("SyncGroup failed: {}", error.message());
                        AbstractCoordinator.this.markCoordinatorUnknown();
                        future.raise(error);
                    }
                } else {
                    AbstractCoordinator.this.log.debug("SyncGroup failed: {}", error.message());
                    AbstractCoordinator.this.resetGeneration();
                    future.raise(error);
                }
            }

        }
    }

future.complete(ByteBuffer.wrap(syncResponse.data.assignment()))这里其实是回调这个监听器:

	private synchronized RequestFuture<ByteBuffer> initiateJoinGroup() {
        if (this.joinFuture == null) {
            this.disableHeartbeatThread();
            this.state = AbstractCoordinator.MemberState.REBALANCING;
            this.joinFuture = this.sendJoinGroupRequest();
            // SGR回调监听器
            this.joinFuture.addListener(new RequestFutureListener<ByteBuffer>() {
                public void onSuccess(ByteBuffer value) {
                    synchronized(AbstractCoordinator.this) {
                        AbstractCoordinator.this.log.info("Successfully joined group with generation {}", AbstractCoordinator.this.generation.generationId);
                        AbstractCoordinator.this.state = AbstractCoordinator.MemberState.STABLE;
                        AbstractCoordinator.this.rejoinNeeded = false;
                        if (AbstractCoordinator.this.heartbeatThread != null) {
                            AbstractCoordinator.this.heartbeatThread.enable();
                        }
                    }
                }
                public void onFailure(RuntimeException e) {
                    synchronized(AbstractCoordinator.this) {
                        AbstractCoordinator.this.state = AbstractCoordinator.MemberState.UNJOINED;
                    }
                }
            });
        }
        return this.joinFuture;
    }

继续回到AbstractCoordinator的joinGroupIfNeeded()方法,拿到GroupCoordinator的分区分配结果后,由方法onJoinComplete()执行处理

	boolean joinGroupIfNeeded(Timer timer) {
		// 需要加入组
        while(this.rejoinNeededOrPending()) {
            // 获取GroupCoordinator
            if (!this.ensureCoordinatorReady(timer)) {
                return false;
            }

            if (this.needsJoinPrepare) {
                this.onJoinPrepare(this.generation.generationId, this.generation.memberId);
                this.needsJoinPrepare = false;
            }
            // 发送JGR和SGR请求            
            RequestFuture<ByteBuffer> future = this.initiateJoinGroup();
            // 阻塞等待JGR和SGR请求完成
            this.client.poll(future, timer);
            if (!future.isDone()) {
                return false;
            }
			// 成功
            if (future.succeeded()) {
            	// 获取分区分配结果
                ByteBuffer memberAssignment = ((ByteBuffer)future.value()).duplicate();
                // 执行分区的分配
                this.onJoinComplete(this.generation.generationId, this.generation.memberId, this.generation.protocol, memberAssignment);
                this.resetJoinGroupFuture();
                this.needsJoinPrepare = true;
            } else {
            	// 失败,重新加入组
                this.resetJoinGroupFuture();
                RuntimeException exception = future.exception();
                if (!(exception instanceof UnknownMemberIdException) && !(exception instanceof RebalanceInProgressException) && !(exception instanceof IllegalGenerationException) && !(exception instanceof MemberIdRequiredException)) {
                    if (!future.isRetriable()) {
                        throw exception;
                    }

                    timer.sleep(this.retryBackoffMs);
                }
            }
        }

        return true;
    }

onJoinComplete()方法在ConsumerCoordinator类中实现,更新leader分配的分区partition

	protected void onJoinComplete(int generation, String memberId, String assignmentStrategy, ByteBuffer assignmentBuffer) {
        if (!this.isLeader) {
            this.assignmentSnapshot = null;
        }
		// 分区策略
        PartitionAssignor assignor = this.lookupAssignor(assignmentStrategy);
        if (assignor == null) {
            throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
        } else {
        	// 分区结果
            Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
            // 更新分配的分区partition
            if (!this.subscriptions.assignFromSubscribed(assignment.partitions())) {
                this.handleAssignmentMismatch(assignment);
            } else {
                Set<TopicPartition> assignedPartitions = this.subscriptions.assignedPartitions();
                this.maybeUpdateJoinedSubscription(assignedPartitions);
                assignor.onAssignment(assignment, generation);
                if (this.autoCommitEnabled) {
                    this.nextAutoCommitTimer.updateAndReset((long)this.autoCommitIntervalMs);
                }

                ConsumerRebalanceListener listener = this.subscriptions.rebalanceListener();
                this.log.info("Setting newly assigned partitions: {}", Utils.join(assignedPartitions, ", "));

                try {
                    listener.onPartitionsAssigned(assignedPartitions);
                } catch (InterruptException | WakeupException var10) {
                    throw var10;
                } catch (Exception var11) {
                    this.log.error("User provided listener {} failed on partition assignment", listener.getClass().getName(), var11);
                }

            }
        }
    }

至此,rebalance过程分析完毕

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值