Kafka-Consumer 源码解析 -- rebalance过程和partition的确认

最新推荐文章于 2023-08-25 11:11:44 发布

方片龙

最新推荐文章于 2023-08-25 11:11:44 发布

阅读量1.2k

点赞数 1

分类专栏： kafka 文章标签： kafka java 分布式 spring spring boot

本文链接：https://blog.csdn.net/qq_38245668/article/details/106108190

版权

kafka 专栏收录该内容

6 篇文章 2 订阅

订阅专栏

Kafka-Consumer 源码解析 -- rebalance过程和partition的确认

前言
1、rebalance过程分析
- 1.1、过程总结
- 1.2、代码分析
2、consumer的partition确认

本文参考：
参考1：https://www.cnblogs.com/benfly/p/9605976.html

前言

在listener注册和启动之后，每个KafkaListener会开启若干个线程consumer进行数据拉取。这些consumer会先加入到对应的kafka消费组中，触发rebalance过程，之后由consumer客户端确认每一个consumer的partition分配，最后执行消费过程。

1、rebalance过程分析

rebalance本质上是一种协议，规定了一个consumer group下的所有consumer如何达成一致来分配订阅topic的每个分区。比如某个group下有20个consumer，它订阅了一个具有100个分区的topic。正常情况下，Kafka平均会为每个consumer分配5个分区。这个分配的过程就叫rebalance。
rebalance的触发条件：

组成员发生变更，新consumer加入组、已有consumer主动离开组或已有consumer崩溃了
订阅主题数发生变更——这当然是可能的，如果你使用了正则表达式的方式进行订阅，那么新建匹配正则表达式的topic就会触发rebalance订阅主题的分区数发生变更
订阅主题的分区数发生变更

本文以新的的consumer加入组进行分析rebalance。

1.1、过程总结

rebalance过程：

kafka确认有新的consumer加入，触发rebalance
kafka对于consumer的心跳请求做出 REBALANCE_IN_PROGRESS 响应
consumer心跳得到REBALANCE_IN_PROGRESS 响应之后，修改状态，通知消费主线程重新加入组
consumer消费主线程执行加入组的操作，此时为所有需要重新加入组的consumer都会发送加入组的请求
kafka再收集到所有成员consumer请求前，它会把已收到请求放入一个叫purgatory(炼狱)的地方。
在所有的consumer发送完成后，会选取一个consumer作为组的leader，这个leader将会执行partition的分配，之后kafka将topic对应的partition数目、consumer成员信息以及leader信息添加到 consumer加入组请求的响应信息中。
consumer客户端得到加入组的响应信息后，会判断自己是否为leader。如果是，执行partition的分配，并将分配结果发送至kafka，如果不是，也会发送一个空数据至kafka。
consumer在发送之后，kafka会将分配结果对应添加到各个consumer的响应信息中。consumer得到响应信息，将对应的分配结果保存，以供之后的数据拉取操作。
rebalance结束。

1.2、代码分析

    上文 Kafka-Consumer 源码解析 – listener 注册和启动已经说明consumer的注册和启动。在启动之后现有的consumer会主动加入group，从而触发rebalance。
    rebalance触发之后，所有consumer的心跳响应都会返回 REBALANCE_IN_PROGRESS，客户端开始执行rebalance。
    查看心跳线程HeartbeatThread的run方法会调用 sendHeartbeatRequest()进行心跳的发送，在sendHeartbeatRequest中会由HeartbeatResponseHandler处理响应数据，在返回 REBALANCE_IN_PROGRESS的情况下，会执行 requestRejoin，也就是标记当前consumer需要重新加入group，这里并不执行实际的加入操作。
    在consumer的主线程中会判断是否需要重新加入group，具体判断由ConsumerCoordinator的poll方法调用rejoinNeededOrPending执行。如果需要重新加入group，执行ensureActiveGroup，方法中调用joinGroupIfNeeded也就是在需要的情况下加入组，方法中执行initiateJoinGroup开始初始join，initiateJoinGroup中执行sendJoinGroupRequest向kafka发送join group的请求，并处理响应结果，响应结果由JoinGroupResponseHandler处理。
JoinGroupResponseHandler实现：

private class JoinGroupResponseHandler extends CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
    @Override
    public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
        Errors error = joinResponse.error();
        if (error == Errors.NONE) {
            log.debug("Received successful JoinGroup response: {}", joinResponse);
            sensors.joinLatency.record(response.requestLatencyMs());

            synchronized (AbstractCoordinator.this) {
                if (state != MemberState.REBALANCING) {
                    // if the consumer was woken up before a rebalance completes, we may have already left
                    // the group. In this case, we do not want to continue with the sync group.
                    future.raise(new UnjoinedGroupException());
                } else {
                    AbstractCoordinator.this.generation = new Generation(joinResponse.data().generationId(),
                            joinResponse.data().memberId(), joinResponse.data().protocolName());
                    // 判断当前consumer是否为leader
                    if (joinResponse.isLeader()) {
                        onJoinLeader(joinResponse).chain(future);
                    } else {
                        onJoinFollower().chain(future);
                    }
                }
            }
        } else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS) {
            log.debug("Attempt to join group rejected since coordinator {} is loading the group.", coordinator());
            // backoff and retry
            future.raise(error);
        } else if (error == Errors.UNKNOWN_MEMBER_ID) {
            // reset the member id and retry immediately
            resetGeneration();
            log.debug("Attempt to join group failed due to unknown member id.");
            future.raise(Errors.UNKNOWN_MEMBER_ID);
        } else if (error == Errors.COORDINATOR_NOT_AVAILABLE
                || error == Errors.NOT_COORDINATOR) {
            // re-discover the coordinator and retry with backoff
            markCoordinatorUnknown();
            log.debug("Attempt to join group failed due to obsolete coordinator information: {}", error.message());
            future.raise(error);
        } else if (error == Errors.FENCED_INSTANCE_ID) {
            log.error("Received fatal exception: group.instance.id gets fenced");
            future.raise(error);
        } else if (error == Errors.INCONSISTENT_GROUP_PROTOCOL
                || error == Errors.INVALID_SESSION_TIMEOUT
                || error == Errors.INVALID_GROUP_ID
                || error == Errors.GROUP_AUTHORIZATION_FAILED
                || error == Errors.GROUP_MAX_SIZE_REACHED) {
            // log the error and re-throw the exception
            log.error("Attempt to join group failed due to fatal error: {}", error.message());
            if (error == Errors.GROUP_MAX_SIZE_REACHED) {
                future.raise(new GroupMaxSizeReachedException(groupId));
            } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(groupId));
            } else {
                future.raise(error);
            }
        } else if (error == Errors.UNSUPPORTED_VERSION) {
            log.error("Attempt to join group failed due to unsupported version error. Please unset field group.instance.id and retry" +
                    "to see if the problem resolves");
            future.raise(error);
        } else if (error == Errors.MEMBER_ID_REQUIRED) {
            // Broker requires a concrete member id to be allowed to join the group. Update member id
            // and send another join group request in next cycle.
            synchronized (AbstractCoordinator.this) {
                AbstractCoordinator.this.generation = new Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID,
                        joinResponse.data().memberId(), null);
                AbstractCoordinator.this.rejoinNeeded = true;
                AbstractCoordinator.this.state = MemberState.UNJOINED;
            }
            future.raise(Errors.MEMBER_ID_REQUIRED);
        } else {
            // unexpected error, throw the exception
            log.error("Attempt to join group failed due to unexpected error: {}", error.message());
            future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
        }
    }
}

其中

// 判断当前consumer是否为leader
if (joinResponse.isLeader()) {
    onJoinLeader(joinResponse).chain(future);
} else {
    onJoinFollower().chain(future);
}

为响应结果判断，确认自己是否为leader。如果是leader，执行分区分配，之后执行sendSyncGroupRequest将分配结果发送至kafka，如果不是leader，则会直接执行sendSyncGroupRequest并发送空数据。
onJoinLeader实现：

private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
    try {
        // perform the leader synchronization and send back the assignment for the group
        // 执行partition分配的任务
        Map<String, ByteBuffer> groupAssignment = performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(),
                joinResponse.data().members());

        // 将分配结果格式化
        List<SyncGroupRequestData.SyncGroupRequestAssignment> groupAssignmentList = new ArrayList<>();
        for (Map.Entry<String, ByteBuffer> assignment : groupAssignment.entrySet()) {
            groupAssignmentList.add(new SyncGroupRequestData.SyncGroupRequestAssignment()
                    .setMemberId(assignment.getKey())
                    .setAssignment(Utils.toArray(assignment.getValue()))
            );
        }
        // 将格式化后的分配结果拼装为 SyncGroupRequest
        SyncGroupRequest.Builder requestBuilder =
                new SyncGroupRequest.Builder(
                        new SyncGroupRequestData()
                                .setGroupId(groupId)
                                .setMemberId(generation.memberId)
                                .setGroupInstanceId(this.groupInstanceId.orElse(null))
                                .setGenerationId(generation.generationId)
                                .setAssignments(groupAssignmentList)
                );
        log.debug("Sending leader SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
        // 执行分配结果的同步发送至kafka
        return sendSyncGroupRequest(requestBuilder);
    } catch (RuntimeException e) {
        return RequestFuture.failure(e);
    }
}

onJoinFollower()实现：

private RequestFuture<ByteBuffer> onJoinFollower() {
    // send follower's sync group with an empty assignment
    SyncGroupRequest.Builder requestBuilder =
            new SyncGroupRequest.Builder(
                    new SyncGroupRequestData()
                            .setGroupId(groupId)
                            .setMemberId(generation.memberId)
                            .setGroupInstanceId(this.groupInstanceId.orElse(null))
                            .setGenerationId(generation.generationId)
                            .setAssignments(Collections.emptyList())
            );
    log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
    return sendSyncGroupRequest(requestBuilder);
}

在onJoinLeader中performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members())为partition的分配，以下部分做此说明。

2、consumer的partition确认

performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members())中joinResponse.data().protocolName()为分区分配策略的名称，由kafka确认。
分区分配策略 PartitionAssignor 默认提供3种实现：

RangeAssignor：按照消费者总数和分区总数进行整除运算来获得一个跨度，然后将分区按照跨度进行平均分配，(一个Topic中partition总数 / 订阅这个Topic的Consumer数)。
RoundRobinAssignor：将消费组内的所有消费者以及消费者所订阅的所有topic的partition按照字典顺序排序，然后通过轮询的方式逐个将分区以此分配给每个消费者，说白了也就是先每一个consumer都分配一轮，一轮分配完成之后接着下一轮继续分配，知道分配完为止。
StickyAssignor：它保证分配尽可能平衡。分配给Consumer的topic partitions数量最多相差1个；或每个拥有比其他Consumer少2倍以上的topic partitions的Consumer无法将任何这些topic partitions转移给它。当发生重新分配时，它会保留尽可能多的现有分配。当topic partitions从一个使用者移动到另一个Consumer时，这有助于节省一些开销处理。

分区分配的目的是为了将Topic对应partition更加均匀的分布在各个consumer上，更好实现kafka数据消费的负载均衡。

performAssignment实现：

protected Map<String, ByteBuffer> performAssignment(String leaderId,
                                                    String assignmentStrategy,
                                                    List<JoinGroupResponseData.JoinGroupResponseMember> allSubscriptions) {
    // 根据分区分配策略的名称找到对应的 PartitionAssignor 实现
    PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
    if (assignor == null)
        throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);

    Set<String> allSubscribedTopics = new HashSet<>();
    Map<String, Subscription> subscriptions = new HashMap<>();
    // 将各个consumer member订阅的元数据执行反序列化
    for (JoinGroupResponseData.JoinGroupResponseMember memberSubScription : allSubscriptions) {
        Subscription subscription = ConsumerProtocol.deserializeSubscription(ByteBuffer.wrap(memberSubScription.metadata()));
        subscriptions.put(memberSubScription.memberId(), subscription);
        allSubscribedTopics.addAll(subscription.topics());
    }

    // the leader will begin watching for changes to any of the topics the group is interested in,
    // which ensures that all metadata changes will eventually be seen
    updateGroupSubscription(allSubscribedTopics);

    isLeader = true;
    // 调用 assignor.assign(metadata.fetch(), subscriptions) 执行分区分配
    Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions);
    
    // 以下操作为将配分结果再次格式化并返回
    Set<String> assignedTopics = new HashSet<>();
    for (Assignment assigned : assignment.values()) {
        for (TopicPartition tp : assigned.partitions())
            assignedTopics.add(tp.topic());
    }
    if (!assignedTopics.containsAll(allSubscribedTopics)) {
        Set<String> notAssignedTopics = new HashSet<>(allSubscribedTopics);
        notAssignedTopics.removeAll(assignedTopics);
    }

    if (!allSubscribedTopics.containsAll(assignedTopics)) {
        Set<String> newlyAddedTopics = new HashSet<>(assignedTopics);
        newlyAddedTopics.removeAll(allSubscribedTopics);

        allSubscribedTopics.addAll(assignedTopics);
        updateGroupSubscription(allSubscribedTopics);
    }
    assignmentSnapshot = metadataSnapshot;
    Map<String, ByteBuffer> groupAssignment = new HashMap<>();
    for (Map.Entry<String, Assignment> assignmentEntry : assignment.entrySet()) {
        ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
        groupAssignment.put(assignmentEntry.getKey(), buffer);
    }
    // 返回分区分配结果
    return groupAssignment;
}

在leader拿到分区分配结果之后会执行sendSyncGroupRequest将结果发送至kafka。
sendSyncGroupRequest实现：

private RequestFuture<ByteBuffer> sendSyncGroupRequest(SyncGroupRequest.Builder requestBuilder) {
    if (coordinatorUnknown())
        return RequestFuture.coordinatorNotAvailable();
    return client.send(coordinator, requestBuilder)
            .compose(new SyncGroupResponseHandler());
}

SyncGroupResponseHandler处理同步group的响应实现：

private class SyncGroupResponseHandler extends CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {
    @Override
    public void handle(SyncGroupResponse syncResponse,
                       RequestFuture<ByteBuffer> future) {
        Errors error = syncResponse.error();
        if (error == Errors.NONE) {
            // 分配结果同步成功
            sensors.syncLatency.record(response.requestLatencyMs());
            // 将kafka响应给自己需要消费哪个partition交予future的onSuccess处理
            // 此 future 会返回到 AbstractCoordinator 的 initiateJoinGroup 方法中使用joinFuture接收并添加对应的响应
            // 同时 此 future 也会在 AbstractCoordinator 的 joinGroupIfNeeded 方法中进行success判断，如果成功，会执行onJoinComplete方法
            future.complete(ByteBuffer.wrap(syncResponse.data.assignment()));
        } else {
            // 如果同步过程出现异常，执行 rejoin
            requestRejoin();

            if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(groupId));
            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
                log.debug("SyncGroup failed because the group began another rebalance");
                future.raise(error);
            } else if (error == Errors.FENCED_INSTANCE_ID) {
                log.error("Received fatal exception: group.instance.id gets fenced");
                future.raise(error);
            } else if (error == Errors.UNKNOWN_MEMBER_ID
                    || error == Errors.ILLEGAL_GENERATION) {
                log.debug("SyncGroup failed: {}", error.message());
                resetGeneration();
                future.raise(error);
            } else if (error == Errors.COORDINATOR_NOT_AVAILABLE
                    || error == Errors.NOT_COORDINATOR) {
                log.debug("SyncGroup failed: {}", error.message());
                markCoordinatorUnknown();
                future.raise(error);
            } else {
                future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
            }
        }
    }
}

sendSyncGroupRequest成功响应结果之后的处理过程：

执行 AbstractCoordinator 的 initiateJoinGroup 中为joinFuture添加的success监听事件，修改 rejoinNeeded状态，开启心跳线程
执行AbstractCoordinator 的joinGroupIfNeeded 方法中进行success判断，成功之后执行：

if (future.succeeded()) {
    // 得到kafka对于当前consumer member的partition分配结果
    ByteBuffer memberAssignment = future.value().duplicate();
    // 执行join完成后的操作
    onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment);

    // We reset the join group future only after the completion callback returns. This ensures
    // that if the callback is woken up, we will retry it on the next joinGroupIfNeeded.
    resetJoinGroupFuture();
    needsJoinPrepare = true;
}

onJoinComplete主要为leader consumer判断分配结果的响应是否和之前的分配结果一致，如果一致，则更新各consumer的partition分配状态，具体实现：

protected void onJoinComplete(int generation,
                              String memberId,
                              String assignmentStrategy,
                              ByteBuffer assignmentBuffer) {
    // only the leader is responsible for monitoring for metadata changes (i.e. partition changes)
    // 对于leader来说，它要检查一下进行分配时的metadata跟当前的metadata是否一致，不一致的话，就标记下需要重新协调一次assign
    if (!isLeader)
        assignmentSnapshot = null;

    PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
    if (assignor == null)
        throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);

    Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
    if (!subscriptions.assignFromSubscribed(assignment.partitions())) {
        handleAssignmentMismatch(assignment);
        return;
    }

    Set<TopicPartition> assignedPartitions = subscriptions.assignedPartitions();

    // The leader may have assigned partitions which match our subscription pattern, but which
    // were not explicitly requested, so we update the joined subscription here.
    maybeUpdateJoinedSubscription(assignedPartitions);

    // give the assignor a chance to update internal state based on the received assignment
    assignor.onAssignment(assignment, generation);

    // reschedule the auto commit starting from now
    if (autoCommitEnabled)
        this.nextAutoCommitTimer.updateAndReset(autoCommitIntervalMs);

    // execute the user's callback after rebalance
    ConsumerRebalanceListener listener = subscriptions.rebalanceListener();
    log.info("Setting newly assigned partitions: {}", Utils.join(assignedPartitions, ", "));
    try {
        listener.onPartitionsAssigned(assignedPartitions);
    } catch (WakeupException | InterruptException e) {
        throw e;
    } catch (Exception e) {
        log.error("User provided listener {} failed on partition assignment", listener.getClass().getName(), e);
    }
}

方片龙

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
Kafka-Consumer 源码解析 -- rebalance过程和partition的确认

Kafka-Consumer 源码解析 -- rebalance过程和partition的确认前言1、rebalance过程分析2、consumer的partition确认本文参考：参考1：https://www.cnblogs.com/benfly/p/9605976.html前言    在listener注册和启动之后，每个KafkaListener会开启若干个线程consumer进行数据拉取。这些consumer会先加入到对应的kafka消费组中，触
复制链接

扫一扫