kafka消费者(六):Rebalance

触发rebalance的条件:
1. 有新的消费者加入消费者组
2. 有消费者宕机下线。例如长时间的GC。网络延迟导致消费者长时间没有向GroupCoordinator发送心跳请求。
3. 消费者退出消费者组
4. 消费者组订阅的任意一个topic发生了分区数量的变化
5. 消费者调用unsubscribe()取消订阅。
rebalance的操作分为三个阶段,下面分别介绍:
第一阶段
1. 查找GroupCoordinator,这个阶段消费者会向Kafka集群中的任意一个Broker发送GroupCoordinatorRequest请求,并且返回GroupCoordinatorResponse响应。
首先检测是否需要重新查找GroupCoordinator,主要检查coordinator字段是否为空以及GroupCoordinator之间的连接是否正常。

public abstract class AbstractCoordinator implements Closeable {
    public boolean coordinatorUnknown() {
        //检测coordinator字段是否为null
        if (coordinator == null)
            return true;
        //检测与GroupCoordinator之间的网络是否正常
        if (client.connectionFailed(coordinator)) {
            //把unsent集合中对应的请求清空并把coordinator字段设置为null
            coordinatorDead();
            return true;
        }

        return false;
    }
    
    protected void coordinatorDead() {
        if (this.coordinator != null) {
            log.info("Marking the coordinator {} dead for group {}", this.coordinator, groupId);
            client.failUnsentRequests(this.coordinator, GroupCoordinatorNotAvailableException.INSTANCE);
            this.coordinator = null;
        }
    }
}

2. 查找集群中负载最低node节点,创建GroupCoordinatorRequest请求,调用client.send()方法把请求放入unsent中等待发送,并返回RequestFuture<Void>对象。返回的RequestFuture<Void>对象经过了compose()方法适配,原理同HeartbeatCompletionHandler。
3. 调用ConsumerNetworkClient.poll(future)方法,把请求发送出去。此处调用则色的方式发送,直到收到GroupCoordinatorResponse响应或异常完成才会返回。
4. 检测ResquestFuture<Void>对象的状态。如果出现RetriableException异常,则调用ConsumerNetWorkClient.awaitMetadataUpdate()方法阻塞更新Metadata中记录的集群元数据,并且返回步骤1继续执行。
5. 如果成功刚找到GroupCoordinator节点,但是网络连接失败,就把unsent字段清空,把coordinator置为null,重新查找GroupCoordinator,跳到步骤1继续执行。

/**
 * Block until the coordinator for this group is known and is ready to receive requests.
 */
public abstract class AbstractCoordinator implements Closeable {
    public void ensureCoordinatorReady() {
        //1. 检测GroupCoordinator状态
        while (coordinatorUnknown()) {
            //2. 创建并缓存请求
            RequestFuture<Void> future = sendGroupCoordinatorRequest();
            //3. 阻塞发送请求,并处理响应
            client.poll(future);

            if (future.failed()) {
                //4. 异常处理,阻塞更新Metadata中记录的集群元数据
                if (future.isRetriable())
                    client.awaitMetadataUpdate();
                else
                    throw future.exception();
            } else if (coordinator != null && client.connectionFailed(coordinator)) {
                //5. 找到但是连接不到GroupCoordinator,退避后重试
                coordinatorDead();
                time.sleep(retryBackoffMs);
            }

        }
    }
    
    private RequestFuture<Void> sendGroupCoordinatorRequest() {
        // initiate the group metadata request
        // 找到InFlightRequest中未确认请求最少的节点,认为此节点负载最低。
        Node node = this.client.leastLoadedNode();
        if (node == null) {
            // 找不到可用节点。
            return RequestFuture.noBrokersAvailable();
        } else {
            // create a group  metadata request
            log.debug("Sending coordinator request for group {} to broker {}", groupId, node);
            //创建GroupCoordinatorRequest请求,并发送
            GroupCoordinatorRequest metadataRequest = new GroupCoordinatorRequest(this.groupId);
            return client.send(node, ApiKeys.GROUP_COORDINATOR, metadataRequest)
                    .compose(new RequestFutureAdapter<ClientResponse, Void>() {
                        @Override
                        public void onSuccess(ClientResponse response, RequestFuture<Void> future) {
                            //处理GroupMetadataResponse
                            handleGroupMetadataResponse(response, future);
                        }
                    });
        }
    }
}

handleGroupMetadataResponse为处理发送GroupCoordinatorRequest后的响应函数:

private void handleGroupMetadataResponse(ClientResponse resp, RequestFuture<Void> future) {
    log.debug("Received group coordinator response {}", resp);
    //调用coordinatorUnknown()检测是否已经找到了GroupCoordinator且成功连接。如果已经成功连接就忽略这个GroupCoordinatorResponse,因为在发生GroupCoordinatorRequest时并没有防止重发的机制,可能有多个GroupCoordinatorResponse。
    if (!coordinatorUnknown()) {
        // We already found the coordinator, so ignore the request
        future.complete(null);
    } else {
        //解析GroupCoordinatorResponse
        GroupCoordinatorResponse groupCoordinatorResponse = new GroupCoordinatorResponse(resp.responseBody());
        // use MAX_VALUE - node.id as the coordinator id to mimic separate connections
        // for the coordinator in the underlying network client layer
        // TODO: this needs to be better handled in KAFKA-1935
        Errors error = Errors.forCode(groupCoordinatorResponse.errorCode());
        if (error == Errors.NONE) {
            //创建GroupCoordinator对应的Node对象
            this.coordinator = new Node(Integer.MAX_VALUE - groupCoordinatorResponse.node().id(),
                    groupCoordinatorResponse.node().host(),
                    groupCoordinatorResponse.node().port());

            log.info("Discovered coordinator {} for group {}.", coordinator, groupId);
            //尝试与GroupCoordinator建立连接
            client.tryConnect(coordinator);

            // 启动心跳线程
            if (generation > 0)
                heartbeatTask.reset();
            //调用complete方法把收到的响应传播出去
            future.complete(null);
        } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
            future.raise(new GroupAuthorizationException(groupId));
        } else {
            //如果错误码不是null,把异常传播初五,最终由ensureCoordinatorReady()方法中的步骤4处理。
            future.raise(error);
        }
    }
}

第二阶段
在找到对应的GroupCoordinator之后,进入Join Group阶段,向GroupCoordinator发送JoinGroupRequest请求并处理响应。函数入口是ensurePartitionAssignment

public final class ConsumerCoordinator extends AbstractCoordinator {
    public void ensureFreshMetadata() {
        //如果长时间没有更新或者Metadata.needUpdate字段为True,就更新Metadata
        if (this.metadata.updateRequested() || this.metadata.timeToNextUpdate(time.milliseconds()) == 0)
            awaitMetadataUpdate();//阻塞
    }
    public void ensurePartitionAssignment() {
        if (subscriptions.partitionsAutoAssigned()) {
            // Due to a race condition between the initial metadata fetch and the initial rebalance, we need to ensure that
            // the metadata is fresh before joining initially, and then request the metadata update. If metadata update arrives
            // while the rebalance is still pending (for example, when the join group is still inflight), then we will lose
            // track of the fact that we need to rebalance again to reflect the change to the topic subscription. Without
            // ensuring that the metadata is fresh, any metadata update that changes the topic subscriptions and arrives with a
            // rebalance in progress will essentially be ignored. See KAFKA-3949 for the complete description of the problem.
            // 在ConsumerCoordinator的构造函数中未Metadata添加了监听器。当Metadata更新时就会使用SubscriptionState中的正则过滤topic,并更新SubscriptionState中的信息。
            // 此处更新防止使用过期的Metadata进行rebalance操作而导致更多次的rebalance
            if (subscriptions.hasPatternSubscription())
                client.ensureFreshMetadata();

            ensureActiveGroup();
        }
    }
    
    public void ensureActiveGroup() {
        //检测是否需要发送JoinGroupRequest请求
        /*
            //是否使用了AUTO_TOPICS或者AUTO_PATTERN模式,检测rejonNeeded和needsPartitionAssignment两个字段的值。
            return subscriptions.partitionsAutoAssigned() &&
                super.needRejoin() || subscriptions.partitionAssignmentNeeded();
        */
        if (!needRejoin())
            return;

        if (needsJoinPrepare) {
            //在发送JoinGroupRequest前的准备工作
            onJoinPrepare(generation, memberId);
            needsJoinPrepare = false;
        }

        while (needRejoin()) {
            //检测GroupCoordinator状态
            ensureCoordinatorReady();

            // 如果还有发送到GroupCoordinator所在Node的请求,就阻塞等待。
            //避免重复发送JoinGroupRequest请求
            if (client.pendingRequestCount(this.coordinator) > 0) {
                client.awaitPendingRequests(this.coordinator);
                continue;
            }
            //创建并缓存请求,放在unsent中
            RequestFuture<ByteBuffer> future = sendJoinGroupRequest();
            //添加监听器
            future.addListener(new RequestFutureListener<ByteBuffer>() {
                @Override
                public void onSuccess(ByteBuffer value) {
                    // handle join completion in the callback so that the callback will be invoked
                    // even if the consumer is woken up before finishing the rebalance
                    onJoinComplete(generation, memberId, protocol, value);
                    needsJoinPrepare = true;
                    heartbeatTask.reset();
                }

                @Override
                public void onFailure(RuntimeException e) {
                    // we handle failures below after the request finishes. if the join completes
                    // after having been woken up, the exception is ignored and we will rejoin
                }
            });
            //阻塞等待JoinGroupRequest请求完成
            client.poll(future);

            if (future.failed()) {
                //异常处理,退避后重试
                RuntimeException exception = future.exception();
                if (exception instanceof UnknownMemberIdException ||
                        exception instanceof RebalanceInProgressException ||
                        exception instanceof IllegalGenerationException)
                    continue;
                else if (!future.isRetriable())
                    throw exception;
                time.sleep(retryBackoffMs);
            }
        }
    }
    //在发送JoinGroupRequest前的准备工作
    protected void onJoinPrepare(int generation, String memberId) {
        // 如果开启了自动提交,就进行一次同步的提交offset操作,阻塞。
        maybeAutoCommitOffsetsSync();

        // 调用注册在SubscriptionState中的ConsumerRebalanceListener上的回调方法
        ConsumerRebalanceListener listener = subscriptions.listener();
        log.info("Revoking previously assigned partitions {} for group {}", subscriptions.assignedPartitions(), groupId);
        try {
            Set<TopicPartition> revoked = new HashSet<>(subscriptions.assignedPartitions());
            listener.onPartitionsRevoked(revoked);
        } catch (WakeupException e) {
            throw e;
        } catch (Exception e) {
            log.error("User provided listener {} for group {} failed on partition revocation",
                    listener.getClass().getName(), groupId, e);
        }

        assignmentSnapshot = null;
        //把needsPartitionAssignment设置为true。
        subscriptions.needReassignment();
    }
}

在发送JoinGroupRequest后,处理响应的流程如下:

private class JoinGroupResponseHandler extends CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {

    @Override
    public JoinGroupResponse parse(ClientResponse response) {
        return new JoinGroupResponse(response.responseBody());
    }

    @Override
    public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
        Errors error = Errors.forCode(joinResponse.errorCode());
        if (error == Errors.NONE) {
            //解析JoinGroupRequest,更新到本地
            log.debug("Received successful join group response for group {}: {}", groupId, joinResponse.toStruct());
            AbstractCoordinator.this.memberId = joinResponse.memberId();
            AbstractCoordinator.this.generation = joinResponse.generationId();
            //修改rejoin标志
            AbstractCoordinator.this.rejoinNeeded = false;
            AbstractCoordinator.this.protocol = joinResponse.groupProtocol();
            sensors.joinLatency.record(response.requestLatencyMs());
            //消费者根据leaderID判断自己是不是leader。如果是leader就进入onJoinLeader()方法,如果不是就进入onJoinFollower方法
            //onjoinFollower方法是onJoinLeader方法的子集
            if (joinResponse.isLeader()) {
                onJoinLeader(joinResponse).chain(future);
            } else {
                onJoinFollower().chain(future);
            }
        } else if (error == Errors.GROUP_LOAD_IN_PROGRESS) {
            log.debug("Attempt to join group {} rejected since coordinator {} is loading the group.", groupId,
                    coordinator);
            // backoff and retry
            future.raise(error);
        } else if (error == Errors.UNKNOWN_MEMBER_ID) {
            // reset the member id and retry immediately
            AbstractCoordinator.this.memberId = JoinGroupRequest.UNKNOWN_MEMBER_ID;
            log.debug("Attempt to join group {} failed due to unknown member id.", groupId);
            future.raise(Errors.UNKNOWN_MEMBER_ID);
        } else if (error == Errors.GROUP_COORDINATOR_NOT_AVAILABLE
                || error == Errors.NOT_COORDINATOR_FOR_GROUP) {
            // re-discover the coordinator and retry with backoff
            coordinatorDead();
            log.debug("Attempt to join group {} failed due to obsolete coordinator information: {}", groupId, error.message());
            future.raise(error);
        } else if (error == Errors.INCONSISTENT_GROUP_PROTOCOL
                || error == Errors.INVALID_SESSION_TIMEOUT
                || error == Errors.INVALID_GROUP_ID) {
            // log the error and re-throw the exception
            log.error("Attempt to join group {} failed due to fatal error: {}", groupId, error.message());
            future.raise(error);
        } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
            future.raise(new GroupAuthorizationException(groupId));
        } else {
            // unexpected error, throw the exception
            future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
        }
    }
    //下面是onJoinLeader的实现
    private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
        try {
            // 进行分区分配,发结果发送给GroupCoordinator,onJoinFollower中只没有这一步。
            Map<String, ByteBuffer> groupAssignment = performAssignment(joinResponse.leaderId(), joinResponse.groupProtocol(),
                    joinResponse.members());
            //创建并发送SyncGroupRequest,Follower和Leader都会进行这一步,然后处理各自的响应。
            SyncGroupRequest request = new SyncGroupRequest(groupId, generation, memberId, groupAssignment);
            log.debug("Sending leader SyncGroup for group {} to coordinator {}: {}", groupId, this.coordinator, request);
            return sendSyncGroupRequest(request);
        } catch (RuntimeException e) {
            return RequestFuture.failure(e);
        }
    }
    
    protected Map<String, ByteBuffer> performAssignment(String leaderId,
                                                        String assignmentStrategy,
                                                        Map<String, ByteBuffer> allSubscriptions) {
        //查找分区分配使用的PartitionAssignor
        PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
        if (assignor == null)
            throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
        //反序列化操作
        Set<String> allSubscribedTopics = new HashSet<>();
        Map<String, Subscription> subscriptions = new HashMap<>();
        for (Map.Entry<String, ByteBuffer> subscriptionEntry : allSubscriptions.entrySet()) {
            Subscription subscription = ConsumerProtocol.deserializeSubscription(subscriptionEntry.getValue());
            subscriptions.put(subscriptionEntry.getKey(), subscription);
            allSubscribedTopics.addAll(subscription.topics());
        }

        // the leader will begin watching for changes to any of the topics the group is interested in,
        // which ensures that all metadata changes will eventually be seen
        // 对于leader来说,要关注消费者组中所有订阅的topic
        // follower只要关心自己订阅的topic
        this.subscriptions.groupSubscribe(allSubscribedTopics);
        metadata.setTopics(this.subscriptions.groupSubscription());

        // update metadata (if needed) and keep track of the metadata used for assignment so that
        // we can check after rebalance completion whether anything has changed
        // 上述步骤期间,可能会有新的topic加入,更新metadata
        client.ensureFreshMetadata();
        //记录快照
        assignmentSnapshot = metadataSnapshot;

        log.debug("Performing assignment for group {} using strategy {} with subscriptions {}",
                groupId, assignor.name(), subscriptions);
        //进行分区分配
        Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions);

        log.debug("Finished assignment for group {}: {}", groupId, assignment);
        //把分配的结果序列化,保存到map中返回,其中key是消费者的memberId,value是分配结果序列化后的ByuteBuffer
        Map<String, ByteBuffer> groupAssignment = new HashMap<>();
        for (Map.Entry<String, Assignment> assignmentEntry : assignment.entrySet()) {
            ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
            groupAssignment.put(assignmentEntry.getKey(), buffer);
        }

        return groupAssignment;
    }
}

第三阶段
在完成分区分配之后就进入了Synchronizing Group State阶段,主要逻辑是想GroupCoordinator发送SyncGroupRequest请求并处理SyncGroupResponse响应。

private class SyncGroupResponseHandler extends CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {

    @Override
    public SyncGroupResponse parse(ClientResponse response) {
        return new SyncGroupResponse(response.responseBody());
    }

    @Override
    public void handle(SyncGroupResponse syncResponse,
                       RequestFuture<ByteBuffer> future) {
        Errors error = Errors.forCode(syncResponse.errorCode());
        if (error == Errors.NONE) {
            log.info("Successfully joined group {} with generation {}", groupId, generation);
            sensors.syncLatency.record(response.requestLatencyMs());
            //调用future.complete方法传播分区分配结果
            future.complete(syncResponse.memberAssignment());
        } else {
            //有出现异常情况,设置rejoinNeeded = true
            AbstractCoordinator.this.rejoinNeeded = true;
            if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(groupId));
            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
                log.debug("SyncGroup for group {} failed due to coordinator rebalance", groupId);
                //传播异常
                future.raise(error);
            } else if (error == Errors.UNKNOWN_MEMBER_ID
                    || error == Errors.ILLEGAL_GENERATION) {
                log.debug("SyncGroup for group {} failed due to {}", groupId, error);
                AbstractCoordinator.this.memberId = JoinGroupRequest.UNKNOWN_MEMBER_ID;
                future.raise(error);
            } else if (error == Errors.GROUP_COORDINATOR_NOT_AVAILABLE
                    || error == Errors.NOT_COORDINATOR_FOR_GROUP) {
                log.debug("SyncGroup for group {} failed due to {}", groupId, error);
                coordinatorDead();
                future.raise(error);
            } else {
                future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
            }
        }
    }
}

SyncGroupResponse中得到的分区分配结果最终由ConsumerCoordinator.onJoinComplete()处理,此方法是在第二阶段ensureActiveGroup()方法的步骤中添加的RequestFutureListner调用。

protected void onJoinComplete(int generation,
                              String memberId,
                              String assignmentStrategy,
                              ByteBuffer assignmentBuffer) {
    // if we were the assignor, then we need to make sure that there have been no metadata updates
    // since the rebalance begin. Otherwise, we won't rebalance again until the next metadata change
    // leader在开始分配分区之前,leader使用assignmentSnapshot字段记录了Metadata的快照。此时在leader中把此快照和最新的metadata快照尽心个对比。
    //如果不一致就表示在分配过程中出现了topic变化,然后把needReassignment设置为true,重新进行rebalance
    if (assignmentSnapshot != null && !assignmentSnapshot.equals(metadataSnapshot)) {
        subscriptions.needReassignment();
        return;
    }
    //得到使用的分区策略
    PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
    if (assignor == null)
        throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
    //反序列化,拿到分配给当前消费者的分区,并添加到SubscriptionState.assignment集合中,之后消费者会按照此集合指定的分区进行消费,把needsPartitionAssignment设置为false
    Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
    // 允许从服务端获取最近一次提交的offset
    subscriptions.needRefreshCommits();
    // 填充assignment集合
    subscriptions.assignFromSubscribed(assignment.partitions());

    // onAssignment回调函数,默认为空,用户可以自定义。
    assignor.onAssignment(assignment);

    // 开启自动提交offset的定时任务
    if (autoCommitEnabled)
        autoCommitTask.reschedule();

    // 回调ConsumerRebalanceListener函数
    ConsumerRebalanceListener listener = subscriptions.listener();
    log.info("Setting newly assigned partitions {} for group {}", subscriptions.assignedPartitions(), groupId);
    try {
        Set<TopicPartition> assigned = new HashSet<>(subscriptions.assignedPartitions());
        listener.onPartitionsAssigned(assigned);
    } catch (WakeupException e) {
        throw e;
    } catch (Exception e) {
        log.error("User provided listener {} for group {} failed on partition assignment",
                listener.getClass().getName(), groupId, e);
    }
    //为下次rebalance操作作准备。
    needJoinPrepare = true;
    // 开启心跳定时任务
    heartbeatTask.reset();
}

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值