ReplicaManager 负责管理和操作集群中 Broker 的副本,它定义如下:
class ReplicaManager(val config: KafkaConfig, // 配置管理类
metrics: Metrics, // 监控指标类
time: Time, // 定时器类
val zkClient: KafkaZkClient, // ZooKeeper客户端
scheduler: Scheduler, // Kafka调度器
val logManager: LogManager,
val isShuttingDown: AtomicBoolean, // 是否已经关闭
quotaManagers: QuotaManagers, // 配额管理器
val brokerTopicStats: BrokerTopicStats, // Broker主题监控指标类
val metadataCache: MetadataCache, // Broker元数据缓存,保存集群上分区的 Leader、ISR 等信息。
logDirFailureChannel: LogDirFailureChannel,
val delayedProducePurgatory: DelayedOperationPurgatory[DelayedProduce], // 处理延时PRODUCE请求的Purgatory
val delayedFetchPurgatory: DelayedOperationPurgatory[DelayedFetch], // 处理延时FETCH请求的Purgatory
val delayedDeleteRecordsPurgatory: DelayedOperationPurgatory[DelayedDeleteRecords], // 处理延时DELETE_RECORDS请求的Purgatory
val delayedElectLeaderPurgatory: DelayedOperationPurgatory[DelayedElectLeader], // 处理延时ELECT_LEADERS请求的Purgatory
threadNamePrefix: Option[String]) extends Logging with KafkaMetricsGroup {
/* epoch of the controller that last changed the leader */
// 作用是隔离过期 Controller 发送的请求。老的 Controller 发送的请求不能再被继续处理了
// 该字段表示最新一次变更分区 Leader 的 Controller 的 Epoch 值,其默认值为 0。Controller 每发生一次变更,该字段值都会 +1。
@volatile var controllerEpoch: Int = KafkaController.InitialControllerEpoch
/** 本地 broker 的 ID */
private val localBrokerId = config.brokerId
// Broker 上保存的所有分区对象数据
private val allPartitions = new Pool[TopicPartition, HostedPartition](
valueFactory = Some(tp => HostedPartition.Online(Partition(tp, time, this)))
)
private val replicaStateChangeLock = new Object
// replicaFetcherManager 它的主要任务是创建 ReplicaFetcherThread 类实例。
val replicaFetcherManager = createReplicaFetcherManager(metrics, time, threadNamePrefix, quotaManagers.follower)
val replicaAlterLogDirsManager = createReplicaAlterLogDirsManager(quotaManagers.alterLogDirs, brokerTopicStats)
/** 标记 highwatermark-checkpoint 定时任务是否已经启动 */
private val highWatermarkCheckPointThreadStarted = new AtomicBoolean(false)
/** 记录每个 log 目录与对应 topic 分区 HW 值的映射关系 */
@volatile var highWatermarkCheckpoints: Map[String, OffsetCheckpointFile] = logManager.liveLogDirs.map(dir =>
(dir.getAbsolutePath, new OffsetCheckpointFile(new File(dir, ReplicaManager.HighWatermarkFilename), logDirFailureChannel))).toMap
this.logIdent = s"[ReplicaManager broker=$localBrokerId] "
private val stateChangeLogger = new StateChangeLogger(localBrokerId, inControllerContext = false, None)
/** 记录 ISR 集合发生变化的 topic 分区信息 */
private val isrChangeSet: mutable.Set[TopicPartition] = new mutable.HashSet[TopicPartition]()
private val lastIsrChangeMs = new AtomicLong(System.currentTimeMillis())
private val lastIsrPropagationMs = new AtomicLong(System.currentTimeMillis())
private var logDirFailureHandler: LogDirFailureHandler = null
}
其中一个重要的字段是replicaFetcherManager,它的主要任务是创建 ReplicaFetcherThread 类实例。上介绍了 ReplicaFetcherThread 类,它的主要职责是帮助 Follower 副本向 Leader 副本拉取消息,并写入到本地日志中。它的定义和主要方法如下:
class ReplicaFetcherManager(brokerConfig: KafkaConfig,
protected val replicaManager: ReplicaManager,
metrics: Metrics,
time: Time,
threadNamePrefix: Option[String] = None,
quotaManager: ReplicationQuotaManager)
extends AbstractFetcherManager[ReplicaFetcherThread](
name = "ReplicaFetcherManager on broker " + brokerConfig.brokerId,
clientId = "Replica",
numFetchers = brokerConfig.numReplicaFetchers) {
override def createFetcherThread(fetcherId: Int, sourceBroker: BrokerEndPoint): ReplicaFetcherThread = {
val prefix = threadNamePrefix.map(tp => s"$tp:").getOrElse("")
val threadName = s"${prefix}ReplicaFetcherThread-$fetcherId-${sourceBroker.id}"
// 创建ReplicaFetcherThread线程实例并返回
new ReplicaFetcherThread(threadName, fetcherId, sourceBroker, brokerConfig, failedPartitions, replicaManager,
metrics, time, quotaManager)
}
}
Kafka 服务在启动时会创建 ReplicaManager 对象,并调用 ReplicaManager#startup 方法启动 ReplicaManager 管理的定时任务,即 isr-expiration 和 isr-change-propagation 定时任务。
def startup(): Unit = {
// start ISR expiration thread
// A follower can lag behind leader for up to config.replicaLagTimeMaxMs x 1.5 before it is removed from ISR
// 定时检测当前 broker 节点管理的每个分区是否需要缩减 ISR 集合,并执行缩减操作
// 周期性执行 ReplicaManager#maybeShrinkIsr 方法,尝试缩减当前 broker 节点管理的分区对应的 ISR 集合,具体缩减操作由 Partition#maybeShrinkIsr 方法实现
scheduler.schedule("isr-expiration", maybeShrinkIsr _, period = config.replicaLagTimeMaxMs / 2, unit = TimeUnit.MILLISECONDS)
// 定时将 ISR 集合发生变化的 topic 分区记录到 ZK,这个会在指挥详细介绍
scheduler.schedule("isr-change-propagation", maybePropagateIsrChanges _, period = 2500L, unit = TimeUnit.MILLISECONDS)
scheduler.schedule("shutdown-idle-replica-alter-log-dirs-thread", shutdownIdleReplicaAlterLogDirsThread _, period = 10000L, unit = TimeUnit.MILLISECONDS)
// If inter-broker protocol (IBP) < 1.0, the controller will send LeaderAndIsrRequest V0 which does not include isNew field.
// In this case, the broker receiving the request cannot determine whether it is safe to create a partition if a log directory has failed.
// Thus, we choose to halt the broker on any log diretory failure if IBP < 1.0
val haltBrokerOnFailure = config.interBrokerProtocolVersion < KAFKA_1_0_IV0
logDirFailureHandler = new LogDirFailureHandler("LogDirFailureHandler", haltBrokerOnFailure)
logDirFailureHandler.start()
}