kafka源码分析之kafka的consumer的负载均衡管理

最新推荐文章于 2025-06-08 19:36:23 发布

原创

最新推荐文章于 2025-06-08 19:36:23 发布 · 1.7w 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#kafka0.9.0源码分析 #kafka的consumer的负载均衡源

本文深入分析Kafka 0.9.0版本中Consumer的负载均衡管理，涉及GroupCoordinator的启动、元数据管理、心跳处理、offset更新及超时处理等关键流程。内容涵盖实例创建、元数据partition的leader操作、offset缓存管理以及消费者心跳和同步机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

GroupCoordinator

说明,主要是消费者的连接建立,offset的更新操作。管理所有的consumer与对应的group的信息。Group的metadata的信息，consumer对应的offset的更新操作。

实例创建与启动

consumerCoordinator = GroupCoordinator.create(config, zkUtils, replicaManager)
consumerCoordinator.startup()

创建实例，

def create(config: KafkaConfig,
zkUtils: ZkUtils,
replicaManager: ReplicaManager): GroupCoordinator = {

读取与记录group的offset相关的配置信息：

1,配置项offset.metadata.max.bytes，默认值4096.用于配置offset的请求的最大请求的消息大小。

2,配置项offsets.load.buffer.size，默认值5MB，用于在读取offset信息到内存cache时，用于读取缓冲区的大小。

3,配置项offsets.retention.minutes，默认值24小时，针对一个offset的消费记录的最长保留时间。

4,配置项offsets.retention.check.interval.ms，默认值600秒，用于定期检查offset过期数据的检查周期。

5,配置项offsets.topic.num.partitions，默认值50,offset记录的topic的partition个数。

6,配置项offsets.topic.replication.factor，默认3,用于配置offset记录的topic的partition的副本个数。

7,配置项offsets.commit.timeout.ms，默认值5秒，用于配置提交offset的最长等待时间。

8,配置项offsets.commit.required.acks，默认值-1,用于配置提交offset的请求的ack的值。

9,配置项group.min.session.timeout.ms，默认值6秒，

10,配置项group.max.session.timeout.ms，默认值30秒，用于配置session的超时时间。

  val offsetConfig = OffsetConfig(maxMetadataSize = config.offsetMetadataMaxSize,
    loadBufferSize = config.offsetsLoadBufferSize,
    offsetsRetentionMs = config.offsetsRetentionMinutes * 60 * 1000L,
    offsetsRetentionCheckIntervalMs = config.offsetsRetentionCheckIntervalMs,
    offsetsTopicNumPartitions = config.offsetsTopicPartitions,
    offsetsTopicReplicationFactor = config.offsetsTopicReplicationFactor,
    offsetCommitTimeoutMs = config.offsetCommitTimeoutMs,
    offsetCommitRequiredAcks = config.offsetCommitRequiredAcks)
  val groupConfig = GroupConfig(

groupMinSessionTimeoutMs = config.groupMinSessionTimeoutMs,
groupMaxSessionTimeoutMs = config.groupMaxSessionTimeoutMs)

new GroupCoordinator(config.brokerId, groupConfig, offsetConfig, replicaManager,

zkUtils)
}

更新此topic对应的配置文件，主要修改日志清理部分的配置。

修改这个topic的segment的大小为100MB每一个。默认的非内置的topic的segment的大小为1GB.

def offsetsTopicConfigs: Properties = {
  val props = new Properties
  props.put(LogConfig.CleanupPolicyProp, LogConfig.Compact)
  props.put(LogConfig.SegmentBytesProp,

       offsetConfig.offsetsTopicSegmentBytes.toString)
  props.put(LogConfig.CompressionTypeProp, UncompressedCodec.name)
  props
}

生成GroupCoordinator中用于对offset进行操作的组件，GroupMetadataManager实例。

---------------------------

用于存储每个group消费的partition对应的offset
private val offsetsCache = new Pool[GroupTopicPartition, OffsetAndMetadata]
用于存储当前所有的消费者的信息，每个消费者中包含有多少个client进行消费等
private val groupsCache = new Pool[String, GroupMetadata]
如果正在对topic中的内容进行加载时，还没有加载到cache中，这个集合中存储有每个group与partition的名称。
private val loadingPartitions: mutable.Set[Int] = mutable.Set()
这个集合中存储有当前所有的group中已经cache到内存的partition的消费者信息，表示这个group的offse可以被读取。
private val ownedPartitions: mutable.Set[Int] = mutable.Set()

从zk中对应的这个记录消费者信息的topic中读取这个topic的partition信息与副本信息。
/* number of partitions for the consumer metadata topic */
private val groupMetadataTopicPartitionCount = getOffsetsTopicPartitionCount

/* Single-thread scheduler to handling offset/group metadata cache loading and unloading */
private val scheduler = new KafkaScheduler(threads = 1, threadNamePrefix = "group-metadata-manager-")

根据定时检查offset过期的时间周期，执行过期offset删除的操作，deleteExpiredOffsets函数。
scheduler.startup()
scheduler.schedule(name = "delete-expired-consumer-offsets",
  fun = deleteExpiredOffsets,
  period = config.offsetsRetentionCheckIntervalMs,
  unit = TimeUnit.MILLISECONDS)

启动GroupCoordinator实例时，生成的相关信息：

def startup() {
info("Starting up.")

定义用于处理client与group心跳超时的控制单元。
heartbeatPurgatory = new DelayedOperationPurgatory[DelayedHeartbeat]

("Heartbeat", brokerId)

定义用于处理group加入的超时控制单元。
joinPurgatory = new DelayedOperationPurgatory[DelayedJoin]

("Rebalance", brokerId)

设置当前的coordinator的实例为活动状态。
isActive.set(true)
info("Startup complete.")
}

Group元数据partition的leader上线操作

这个操作在对应元数据管理的partition的leader发生变化后，被选择成为新的leader的节点上会进行触发，或者一个broker启动时，也会触发这个动作。

这个onGroupLoaded函数用于处理在group的加载后执行的动作,这个回调函数主要完成对当前的所有的member进行心跳超时的监听动作，生成一个DelayedHeartbeat实例用于监听对member的心跳超时。

private def onGroupLoaded(group: GroupMetadata) {
group synchronized {
info(s"Loading group metadata for ${group.groupId} with generation

           ${group.generationId}")
    assert(group.is(Stable))
    group.allMemberMetadata.foreach(

completeAndScheduleNextHeartbeatExpiration(group, _)

)
}
}

当group的消费的topic的partition在当前的broker中被选举成leader时，触发的函数。
def handleGroupImmigration(offsetTopicPartitionId: Int) {

这里直接通过groupManager中的loadGroupsForPartition对partition进行加载。
groupManager.loadGroupsForPartition(offsetTopicPartitionId, onGroupLoaded)
}

接下来看看这个loadGroupsForPartition函数的处理流程：

/**
* Asynchronously read the partition from the offsets topic and populate the cache
*/
def loadGroupsForPartition(offsetsPartition: Int,
onGroupLoaded: GroupMetadata => Unit) {
val topicPartition = TopicAndPartition(GroupCoordinator.GroupMetadataTopicName,

offsetsPartition)

执行这个loadGroupsForPartition函数内的内部函数loadGroupsAndOffsets函数，来对这个partition的数据进行加载。
scheduler.schedule(topicPartition.toString, loadGroupsAndOffsets)

接下来看看这个加载partition数据的函数的处理逻辑：
def loadGroupsAndOffsets() {
info("Loading offsets and group metadata from " + topicPartition)

首先，如果要加载的partition已经在loadingPartitions集合中存在了，表示这个partition已经在执行加载操作，直接return回去，不进行处理,否则把这个partition加入到loadingPartitions中，这个表示是正在执行加载操作的partition的集合。

这里的offsetsPartition表示的是存储元数据与offset的内置topic的partition.
    loadingPartitions synchronized {
      if (loadingPartitions.contains(offsetsPartition)) {
        info("Offset load from %s already in progress.".format(topicPartition))
        return
      } else {
        loadingPartitions.add(offsetsPartition)
      }
    }

    val startMs = SystemTime.milliseconds
    try {

从LogManager中得到这个partition对应的Log实例，
replicaManager.logManager.getLog(topicPartition) match {
case Some(log) =>

如果在当前的机器上有这个partition的副本，那么这个Log实例就一定存在，得到这个Log中最小的segment的最小的offset.
var currOffset = log.logSegments.head.baseOffset

根据每次加载的数据量，生成一个加载数据的buffer.
          val buffer = ByteBuffer.allocate(config.loadBufferSize)
          inWriteLock(offsetExpireLock) {
            val loadedGroups = mutable.Map[String, GroupMetadata]()
            val removedGroups = mutable.Set[String]()

开始进行迭代读取这个partition的log中的消息，直到读取到offset等于当前partition的最大的offset为迭代结束。这里加载到的highWatermark的offset是当前副本同步到的最新的大小。

这个highWatermark根据对应的partition的follower的副本的同步，每次同步会更新这个副本的logEndOffset的值，而这个highWatermark的值是所有的副本中logEndOffset最小的一个值。
while (currOffset < getHighWatermark(offsetsPartition)

&& !shuttingDown.get()) {
buffer.clear()

读取指定大小的数据，并把消息存储到生成的buffer中。
val messages = log.read(currOffset,

config.loadBufferSize).messageSet.asInstanceOf[FileMessageSet]
messages.readInto(buffer, 0)

根据这个buffer生成用于消息读取的message的集合。
val messageSet = new ByteBufferMessageSet(buffer)

根据读取出来的消息集合进行迭代，处理每一条读取到的消息。这里调用的是messageSet的iterator的函数。
messageSet.foreach { msgAndOffset =>
require(msgAndOffset.message.key != null,

"Offset entry key should not be null")

解析出这一条消息的key值，并根据key值的类型做对应的处理流程。
val baseKey = GroupMetadataManager.readMessageKey(

msgAndOffset.message.key)

如果读取到的消息是一个consumer记录的offset的消费信息的记录，
                if (baseKey.isInstanceOf[OffsetKey]) {
                  // load offset
                  val key = baseKey.key.asInstanceOf[GroupTopicPartition]

这里检查下读取到的offset的metadata的记录的value部分是否为null,如果为null,表示这条offset已经过期被清理掉，从offsetsCache中移出这条offset的记录。
                  if (msgAndOffset.message.payload == null) {
                    if (offsetsCache.remove(key) != null)
                      trace("Removed offset for %s due to tombstone entry.".format(key))
                    else
                      trace("Ignoring redundant tombstone for %s.".format(key))
                  }

下面的else部分表示offset读取到的消息是一条正常的消息，把这条存储consumer offset记录的消息写入到offsetsCache集合中,如果这条offset的commit时，指定了过期时间时，那么这个消息直接使用这个过期时间，否则使用这个offset commit时的时间加上配置的过期延时来设置这个offset的过期时间。

else {
val value = GroupMetadataManager.readOffsetMessageValue(

                            msgAndOffset.message.payload)
                    putOffset(key, value.copy (
                      expireTimestamp = {
                        if (value.expireTimestamp == org.apache.kafka.common

.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP