Kafka-Consumer 源码解析 -- rebalance过程和partition的确认

Kafka-Consumer 源码解析 -- rebalance过程和partition的确认

本文参考:
参考1:https://www.cnblogs.com/benfly/p/9605976.html

前言

    在listener注册和启动之后,每个KafkaListener会开启若干个线程consumer进行数据拉取。这些consumer会先加入到对应的kafka消费组中,触发rebalance过程,之后由consumer客户端确认每一个consumer的partition分配,最后执行消费过程。

1、rebalance过程分析

    rebalance本质上是一种协议,规定了一个consumer group下的所有consumer如何达成一致来分配订阅topic的每个分区。比如某个group下有20个consumer,它订阅了一个具有100个分区的topic。正常情况下,Kafka平均会为每个consumer分配5个分区。这个分配的过程就叫rebalance。
rebalance的触发条件:

  • 组成员发生变更,新consumer加入组、已有consumer主动离开组或已有consumer崩溃了
  • 订阅主题数发生变更——这当然是可能的,如果你使用了正则表达式的方式进行订阅,那么新建匹配正则表达式的topic就会触发rebalance订阅主题的分区数发生变更
  • 订阅主题的分区数发生变更

    本文以新的的consumer加入组进行分析rebalance。

1.1、过程总结

rebalance过程:

  • kafka确认有新的consumer加入,触发rebalance
  • kafka对于consumer的心跳请求做出 REBALANCE_IN_PROGRESS 响应
  • consumer心跳得到REBALANCE_IN_PROGRESS 响应之后,修改状态,通知消费主线程重新加入组
  • consumer消费主线程执行加入组的操作,此时为所有需要重新加入组的consumer都会发送加入组的请求
  • kafka再收集到所有成员consumer请求前,它会把已收到请求放入一个叫purgatory(炼狱)的地方。
  • 在所有的consumer发送完成后,会选取一个consumer作为组的leader,这个leader将会执行partition的分配,之后kafka将topic对应的partition数目、consumer成员信息以及leader信息添加到 consumer加入组请求的响应信息中。
  • consumer客户端得到加入组的响应信息后,会判断自己是否为leader。如果是,执行partition的分配,并将分配结果发送至kafka,如果不是,也会发送一个空数据至kafka。
  • consumer在发送之后,kafka会将分配结果对应添加到各个consumer的响应信息中。consumer得到响应信息,将对应的分配结果保存,以供之后的数据拉取操作。
  • rebalance结束。

1.2、代码分析

    上文 Kafka-Consumer 源码解析 – listener 注册和启动 已经说明consumer的注册和启动。在启动之后现有的consumer会主动加入group,从而触发rebalance。
    rebalance触发之后,所有consumer的心跳响应都会返回 REBALANCE_IN_PROGRESS,客户端开始执行rebalance。
    查看心跳线程HeartbeatThreadrun方法会调用 sendHeartbeatRequest()进行心跳的发送,在sendHeartbeatRequest中会由HeartbeatResponseHandler处理响应数据,在返回 REBALANCE_IN_PROGRESS的情况下,会执行 requestRejoin,也就是标记当前consumer需要重新加入group,这里并不执行实际的加入操作。
    在consumer的主线程中会判断是否需要重新加入group,具体判断由ConsumerCoordinatorpoll方法调用rejoinNeededOrPending执行。如果需要重新加入group,执行ensureActiveGroup,方法中调用joinGroupIfNeeded也就是在需要的情况下加入组,方法中执行initiateJoinGroup开始初始join,initiateJoinGroup中执行sendJoinGroupRequest向kafka发送join group的请求,并处理响应结果,响应结果由JoinGroupResponseHandler处理。
JoinGroupResponseHandler实现:

private class JoinGroupResponseHandler extends CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
    @Override
    public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
        Errors error = joinResponse.error();
        if (error == Errors.NONE) {
            log.debug("Received successful JoinGroup response: {}", joinResponse);
            sensors.joinLatency.record(response.requestLatencyMs());

            synchronized (AbstractCoordinator.this) {
                if (state != MemberState.REBALANCING) {
                    // if the consumer was woken up before a rebalance completes, we may have already left
                    // the group. In this case, we do not want to continue with the sync group.
                    future.raise(new UnjoinedGroupException());
                } else {
                    AbstractCoordinator.this.generation = new Generation(joinResponse.data().generationId(),
                            joinResponse.data().memberId(), joinResponse.data().protocolName());
                    // 判断当前consumer是否为leader
                    if (joinResponse.isLeader()) {
                        onJoinLeader(joinResponse).chain(future);
                    } else {
                        onJoinFollower().chain(future);
                    }
                }
            }
        } else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS) {
            log.debug("Attempt to join group rejected since coordinator {} is loading the group.", coordinator());
            // backoff and retry
            future.raise(error);
        } else if (error == Errors.UNKNOWN_MEMBER_ID) {
            // reset the member id and retry immediately
            resetGeneration();
            log.debug("Attempt to join group failed due to unknown member id.");
            future.raise(Errors.UNKNOWN_MEMBER_ID);
        } else if (error == Errors.COORDINATOR_NOT_AVAILABLE
                || error == Errors.NOT_COORDINATOR) {
            // re-discover the coordinator and retry with backoff
            markCoordinatorUnknown();
            log.debug("Attempt to join group failed due to obsolete coordinator information: {}", error.message());
            future.raise(error);
        } else if (error == Errors.FENCED_INSTANCE_ID) {
            log.error("Received fatal exception: group.instance.id gets fenced");
            future.raise(error);
        } else if (error == Errors.INCONSISTENT_GROUP_PROTOCOL
                || error == Errors.INVALID_SESSION_TIMEOUT
                || error == Errors.INVALID_GROUP_ID
                || error == Errors.GROUP_AUTHORIZATION_FAILED
                || error == Errors.GROUP_MAX_SIZE_REACHED) {
            // log the error and re-throw the exception
            log.error("Attempt to join group failed due to fatal error: {}", error.message());
            if (error == Errors.GROUP_MAX_SIZE_REACHED) {
                future.raise(new GroupMaxSizeReachedException(groupId));
            } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(groupId));
            } else {
                future.raise(error);
            }
        } else if (error == Errors.UNSUPPORTED_VERSION) {
            log.error("Attempt to join group failed due to unsupported version error. Please unset field group.instance.id and retry" +
                    "to see if the problem resolves");
            future.raise(error);
        } else if (error == Errors.MEMBER_ID_REQUIRED) {
            // Broker requires a concrete member id to be allowed to join the group. Update member id
            // and send another join group request in next cycle.
            synchronized (AbstractCoordinator.this) {
                AbstractCoordinator.this.generation = new Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID,
                        joinResponse.data().memberId(), null);
                AbstractCoordinator.this.rejoinNeeded = true;
                AbstractCoordinator.this.state = MemberState.UNJOINED;
            }
            future.raise(Errors.MEMBER_ID_REQUIRED);
        } else {
            // unexpected error, throw the exception
            log.error("Attempt to join group failed due to unexpected error: {}", error.message());
            future.raise(new KafkaException("Unexpected error in join group response: " + error.message()));
        }
    }
}

其中

// 判断当前consumer是否为leader
if (joinResponse.isLeader()) {
    onJoinLeader(joinResponse).chain(future);
} else {
    onJoinFollower().chain(future);
}

为响应结果判断,确认自己是否为leader。如果是leader,执行分区分配,之后执行sendSyncGroupRequest将分配结果发送至kafka,如果不是leader,则会直接执行sendSyncGroupRequest并发送空数据。
onJoinLeader实现:

private RequestFuture<ByteBuffer> onJoinLeader(JoinGroupResponse joinResponse) {
    try {
        // perform the leader synchronization and send back the assignment for the group
        // 执行partition分配的任务
        Map<String, ByteBuffer> groupAssignment = performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(),
                joinResponse.data().members());

        // 将分配结果格式化
        List<SyncGroupRequestData.SyncGroupRequestAssignment> groupAssignmentList = new ArrayList<>();
        for (Map.Entry<String, ByteBuffer> assignment : groupAssignment.entrySet()) {
            groupAssignmentList.add(new SyncGroupRequestData.SyncGroupRequestAssignment()
                    .setMemberId(assignment.getKey())
                    .setAssignment(Utils.toArray(assignment.getValue()))
            );
        }
        // 将格式化后的分配结果拼装为 SyncGroupRequest
        SyncGroupRequest.Builder requestBuilder =
                new SyncGroupRequest.Builder(
                        new SyncGroupRequestData()
                                .setGroupId(groupId)
                                .setMemberId(generation.memberId)
                                .setGroupInstanceId(this.groupInstanceId.orElse(null))
                                .setGenerationId(generation.generationId)
                                .setAssignments(groupAssignmentList)
                );
        log.debug("Sending leader SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
        // 执行分配结果的同步发送至kafka
        return sendSyncGroupRequest(requestBuilder);
    } catch (RuntimeException e) {
        return RequestFuture.failure(e);
    }
}

onJoinFollower()实现:

private RequestFuture<ByteBuffer> onJoinFollower() {
    // send follower's sync group with an empty assignment
    SyncGroupRequest.Builder requestBuilder =
            new SyncGroupRequest.Builder(
                    new SyncGroupRequestData()
                            .setGroupId(groupId)
                            .setMemberId(generation.memberId)
                            .setGroupInstanceId(this.groupInstanceId.orElse(null))
                            .setGenerationId(generation.generationId)
                            .setAssignments(Collections.emptyList())
            );
    log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder);
    return sendSyncGroupRequest(requestBuilder);
}

onJoinLeaderperformAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members())为partition的分配,以下部分做此说明。

2、consumer的partition确认

performAssignment(joinResponse.data().leader(), joinResponse.data().protocolName(), joinResponse.data().members())joinResponse.data().protocolName()为分区分配策略的名称,由kafka确认。
分区分配策略 PartitionAssignor 默认提供3种实现:

  • RangeAssignor:按照消费者总数和分区总数进行整除运算来获得一个跨度,然后将分区按照跨度进行平均分配,(一个Topic中partition总数 / 订阅这个Topic的Consumer数)。

  • RoundRobinAssignor:将消费组内的所有消费者以及消费者所订阅的所有topic的partition按照字典顺序排序,然后通过轮询的方式逐个将分区以此分配给每个消费者,说白了也就是先每一个consumer都分配一轮,一轮分配完成之后接着下一轮继续分配,知道分配完为止。

  • StickyAssignor:它保证分配尽可能平衡。分配给Consumer的topic partitions数量最多相差1个;或 每个拥有比其他Consumer少2倍以上的topic partitions的Consumer无法将任何这些topic partitions转移给它。当发生重新分配时,它会保留尽可能多的现有分配。当topic partitions从一个使用者移动到另一个Consumer时,这有助于节省一些开销处理。

分区分配的目的是为了将Topic对应partition更加均匀的分布在各个consumer上,更好实现kafka数据消费的负载均衡。

performAssignment实现:

protected Map<String, ByteBuffer> performAssignment(String leaderId,
                                                    String assignmentStrategy,
                                                    List<JoinGroupResponseData.JoinGroupResponseMember> allSubscriptions) {
    // 根据分区分配策略的名称找到对应的 PartitionAssignor 实现
    PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
    if (assignor == null)
        throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);

    Set<String> allSubscribedTopics = new HashSet<>();
    Map<String, Subscription> subscriptions = new HashMap<>();
    // 将各个consumer member订阅的元数据执行反序列化
    for (JoinGroupResponseData.JoinGroupResponseMember memberSubScription : allSubscriptions) {
        Subscription subscription = ConsumerProtocol.deserializeSubscription(ByteBuffer.wrap(memberSubScription.metadata()));
        subscriptions.put(memberSubScription.memberId(), subscription);
        allSubscribedTopics.addAll(subscription.topics());
    }

    // the leader will begin watching for changes to any of the topics the group is interested in,
    // which ensures that all metadata changes will eventually be seen
    updateGroupSubscription(allSubscribedTopics);

    isLeader = true;
    // 调用 assignor.assign(metadata.fetch(), subscriptions) 执行分区分配
    Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions);
    
    // 以下操作为将配分结果再次格式化并返回
    Set<String> assignedTopics = new HashSet<>();
    for (Assignment assigned : assignment.values()) {
        for (TopicPartition tp : assigned.partitions())
            assignedTopics.add(tp.topic());
    }
    if (!assignedTopics.containsAll(allSubscribedTopics)) {
        Set<String> notAssignedTopics = new HashSet<>(allSubscribedTopics);
        notAssignedTopics.removeAll(assignedTopics);
    }

    if (!allSubscribedTopics.containsAll(assignedTopics)) {
        Set<String> newlyAddedTopics = new HashSet<>(assignedTopics);
        newlyAddedTopics.removeAll(allSubscribedTopics);

        allSubscribedTopics.addAll(assignedTopics);
        updateGroupSubscription(allSubscribedTopics);
    }
    assignmentSnapshot = metadataSnapshot;
    Map<String, ByteBuffer> groupAssignment = new HashMap<>();
    for (Map.Entry<String, Assignment> assignmentEntry : assignment.entrySet()) {
        ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
        groupAssignment.put(assignmentEntry.getKey(), buffer);
    }
    // 返回分区分配结果
    return groupAssignment;
}

在leader拿到分区分配结果之后会执行sendSyncGroupRequest将结果发送至kafka。
sendSyncGroupRequest实现:

private RequestFuture<ByteBuffer> sendSyncGroupRequest(SyncGroupRequest.Builder requestBuilder) {
    if (coordinatorUnknown())
        return RequestFuture.coordinatorNotAvailable();
    return client.send(coordinator, requestBuilder)
            .compose(new SyncGroupResponseHandler());
}

SyncGroupResponseHandler处理同步group的响应实现:

private class SyncGroupResponseHandler extends CoordinatorResponseHandler<SyncGroupResponse, ByteBuffer> {
    @Override
    public void handle(SyncGroupResponse syncResponse,
                       RequestFuture<ByteBuffer> future) {
        Errors error = syncResponse.error();
        if (error == Errors.NONE) {
            // 分配结果同步成功
            sensors.syncLatency.record(response.requestLatencyMs());
            // 将kafka响应给自己需要消费哪个partition交予future的onSuccess处理
            // 此 future 会返回到 AbstractCoordinator 的 initiateJoinGroup 方法中使用joinFuture接收并添加对应的响应
            // 同时 此 future 也会在 AbstractCoordinator 的 joinGroupIfNeeded 方法中进行success判断,如果成功,会执行onJoinComplete方法
            future.complete(ByteBuffer.wrap(syncResponse.data.assignment()));
        } else {
            // 如果同步过程出现异常,执行 rejoin
            requestRejoin();

            if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
                future.raise(new GroupAuthorizationException(groupId));
            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
                log.debug("SyncGroup failed because the group began another rebalance");
                future.raise(error);
            } else if (error == Errors.FENCED_INSTANCE_ID) {
                log.error("Received fatal exception: group.instance.id gets fenced");
                future.raise(error);
            } else if (error == Errors.UNKNOWN_MEMBER_ID
                    || error == Errors.ILLEGAL_GENERATION) {
                log.debug("SyncGroup failed: {}", error.message());
                resetGeneration();
                future.raise(error);
            } else if (error == Errors.COORDINATOR_NOT_AVAILABLE
                    || error == Errors.NOT_COORDINATOR) {
                log.debug("SyncGroup failed: {}", error.message());
                markCoordinatorUnknown();
                future.raise(error);
            } else {
                future.raise(new KafkaException("Unexpected error from SyncGroup: " + error.message()));
            }
        }
    }
}

sendSyncGroupRequest成功响应结果之后的处理过程:

  • 执行 AbstractCoordinatorinitiateJoinGroup 中为joinFuture添加的success监听事件,修改 rejoinNeeded状态,开启心跳线程
  • 执行AbstractCoordinatorjoinGroupIfNeeded 方法中进行success判断,成功之后执行:
if (future.succeeded()) {
    // 得到kafka对于当前consumer member的partition分配结果
    ByteBuffer memberAssignment = future.value().duplicate();
    // 执行join完成后的操作
    onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment);

    // We reset the join group future only after the completion callback returns. This ensures
    // that if the callback is woken up, we will retry it on the next joinGroupIfNeeded.
    resetJoinGroupFuture();
    needsJoinPrepare = true;
}

onJoinComplete主要为leader consumer判断分配结果的响应是否和之前的分配结果一致,如果一致,则更新各consumer的partition分配状态,具体实现:

protected void onJoinComplete(int generation,
                              String memberId,
                              String assignmentStrategy,
                              ByteBuffer assignmentBuffer) {
    // only the leader is responsible for monitoring for metadata changes (i.e. partition changes)
    // 对于leader来说,它要检查一下进行分配时的metadata跟当前的metadata是否一致,不一致的话,就标记下需要重新协调一次assign
    if (!isLeader)
        assignmentSnapshot = null;

    PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
    if (assignor == null)
        throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);

    Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
    if (!subscriptions.assignFromSubscribed(assignment.partitions())) {
        handleAssignmentMismatch(assignment);
        return;
    }

    Set<TopicPartition> assignedPartitions = subscriptions.assignedPartitions();

    // The leader may have assigned partitions which match our subscription pattern, but which
    // were not explicitly requested, so we update the joined subscription here.
    maybeUpdateJoinedSubscription(assignedPartitions);

    // give the assignor a chance to update internal state based on the received assignment
    assignor.onAssignment(assignment, generation);

    // reschedule the auto commit starting from now
    if (autoCommitEnabled)
        this.nextAutoCommitTimer.updateAndReset(autoCommitIntervalMs);

    // execute the user's callback after rebalance
    ConsumerRebalanceListener listener = subscriptions.rebalanceListener();
    log.info("Setting newly assigned partitions: {}", Utils.join(assignedPartitions, ", "));
    try {
        listener.onPartitionsAssigned(assignedPartitions);
    } catch (WakeupException | InterruptException e) {
        throw e;
    } catch (Exception e) {
        log.error("User provided listener {} failed on partition assignment", listener.getClass().getName(), e);
    }
}
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值