「Kafka核心源码剖析系列」4. 走进Kafka服务端（上）

最新推荐文章于 2021-09-26 08:00:00 发布

数据与智能

最新推荐文章于 2021-09-26 08:00:00 发布

阅读量236

点赞数

文章标签： kafka java 大数据 hadoop 分布式

本文链接：https://blog.csdn.net/qq_43045873/article/details/120398072

版权

作者 | 吴邪大数据4年从业经验，目前就职于广州一家互联网公司，负责大数据基础平台自研、离线计算&实时计算研究

编辑 | auroral-L

全文共8401字，预计阅读50分钟。

第四章走进Kafka 服务端（上）

1. Kafka

2. 服务端初始化

2.1 ZkUtils 工具类初始化

2.2 LogManager 初始化

2.3 SocketServer 初始化

2.4 ReplicaManager 初始化

2.5 KafkaController 初始化

3. 接受&处理上游请求

3.1 KafkaRequestHandlerPool

3.2 KafkaRequestHandler

4. 小结

题外话：在写文章的过程中，或多或少能感觉到写作是非常耗费时间和精力的事情，特别是要把一篇文章写好更是如此，创作不易，且行且珍惜，我一直认为分享是一件很美妙的事情，互相分享，共同进步，让美好的事物触手可及，属于我，也属于你，哈哈，此时此刻我想吟诗一首，啊~。

Kafka作为主流的分布式消息中间件，在大数据生态系统中发挥着非常重要的作用，与kafka类似的竞品还有ActiveMQ、RocketMQ、RabbitMQ等等，最近又出来了号称要取代kafka的pulsar组件，后面可能针对pulsar分析一下两者的差异和共同之处，这些都是后话了。一个完整的kafka链路是由上游的生产者，kafka集群本身，还有下游的消费者组成，前面的文章中已经分析生产者发送消息的流程、集群的元数据和消息的封装、以及最终真正负责发送消息的sender线程和极其重要的NetworkClient网络通信组件，介绍了NetworkClient依赖的几个重要的类。下面我们开始针对Kafka服务端进行分析。

1. Kafka

Kafka 服务端是连接生产者和消费者的桥梁，前面分析了生产者发送消息原理和流程，消息送到下游的必备条件必须保证服务是正常的，我们需要关注下面几个重点问题：

l 服务端初始化

l 如何接收并处理上游请求

l 如何存储接收到的数据

l 副本数据同步

l 元数据更新

本篇文章我们讲解服务端初始化、如何接收并处理上游请求这2部分，余下的3部分我们留到下篇文章讲解。

2. 服务端初始化

熟悉 Kafka 服务的同学应该知道部署好 Kafka 集群后通过 jps 指令会得到一个叫 Kafka 的服务，因此我们从 Kafka 服务本身入手，从源码中可以发现一个类，与服务同名---Kafka。

Kafka这个类的代码只有短短的几十行，只有两个方法，getPropsFromArgs（）和 main（）方法，显然 main（）方法才是我们要关注的方法，同样的思路，我们看源码的时候第一遍只关注主线，不要花时间去研究旁枝末节，否则容易看着看着就不知道跑到哪去了，这是很多人容易出现的误区。

def main(args: Array[String]): Unit = {
  try {
    //解析启动 kafka 服务时传入的参数
    val serverProps = getPropsFromArgs(args)
    val kafkaServerStartable = KafkaServerStartable.fromProps(serverProps)
    // attach shutdown handler to catch control-c
    Runtime.getRuntime().addShutdownHook(new Thread() {
      override def run() = {
        kafkaServerStartable.shutdown
      }
    })
    // 核心代码
    kafkaServerStartable.startup
    kafkaServerStartable.awaitShutdown
  }
  catch {
    case e: Throwable =>
      fatal(e)
      System.exit(1)
  }
  System.exit(0)
}
调用 startup方法启动服务，往下看。
def startup() {
  try {
    //启动服务
    server.startup()
  }
  //异常则退出
  catch {
    case e: Throwable =>
      fatal("Fatal error during KafkaServerStartable startup. Prepare to shutdown", e)
      // KafkaServer already calls shutdown() internally, so this is purely for logging & the exit code
      System.exit(1)
  }
}

这里可以看到 startup（）方法中封装了另外一个 startup（）方法，这样可以大大提高代码的可读性以及将复杂底业务逻辑剥离出来。

/**
 * Start up API for bringing up a single instance of the Kafka server.
 * Instantiates the LogManager, the SocketServer and the request handlers - KafkaRequestHandlers
 */
 //负责启动 Kafka Server 同时初始化 LogManger、SocketServer以及KafkaRequestHandlers
 
def startup() {
  try {
    info("starting")
    if(isShuttingDown.get)
      throw new IllegalStateException("Kafka server is still shutting down, cannot re-start!")
    if(startupComplete.get)
      return
    val canStartup = isStartingUp.compareAndSet(false, true)
    if (canStartup) {
      metrics = new Metrics(metricConfig, reporters, kafkaMetricsTime, true)
      quotaManagers = QuotaFactory.instantiate(config, metrics, time)
      brokerState.newState(Starting)
      /* start scheduler */
      kafkaScheduler.startup()
      // 核心代码，初始化 zk 集群
      zkUtils = initZk()
      /* Get or create cluster_id */
      _clusterId = getOrGenerateClusterId(zkUtils)
      info(s"Cluster ID = $clusterId")
      notifyClusterListeners(kafkaMetricsReporters ++ reporters.asScala)
      // 实例化logManager
      logManager = createLogManager(zkUtils.zkClient, brokerState)
      //启动 LogManager
      logManager.startup()
      /* generate brokerId */
      config.brokerId =  getBrokerId
      this.logIdent = "[Kafka Server " + config.brokerId + "], "
      metadataCache = new MetadataCache(config.brokerId)
 
      //创建 SocketServer 服务端
      socketServer = new SocketServer(config, metrics, kafkaMetricsTime)
      //启动 SocketServer
      socketServer.startup()
 
      //实例化ReplicaMananger
      //核心参数logManager
      replicaManager = new ReplicaManager(config, metrics, time, kafkaMetricsTime, zkUtils, kafkaScheduler, logManager,
        isShuttingDown, quotaManagers.follower)
      //启动ReplicaMananger
      replicaManager.startup()
      //实例化启动controller
      kafkaController = new KafkaController(config, zkUtils, brokerState, kafkaMetricsTime, metrics, threadNamePrefix)
      //启动 KafkaController
      kafkaController.startup()
      adminManager = new AdminManager(config, metrics, metadataCache, zkUtils)
      /* start group coordinator */
      groupCoordinator = GroupCoordinator(config, zkUtils, replicaManager, kafkaMetricsTime)
      groupCoordinator.startup()
      /* Get the authorizer and initialize it if one is specified.*/
      authorizer = Option(config.authorizerClassName).filter(_.nonEmpty).map { authorizerClassName =>
        val authZ = CoreUtils.createObject[Authorizer](authorizerClassName)
        authZ.configure(config.originals())
        authZ
      }
      /* start processing requests */
      apis = new KafkaApis(socketServer.requestChannel, replicaManager, adminManager, groupCoordinator,
        kafkaController, zkUtils, config.brokerId, config, metadataCache, metrics, authorizer, quotaManagers, clusterId)
      // 就是它去处理队列里面的请求的
      requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, config.numIoThreads)
      Mx4jLoader.maybeLoad()
      /* start dynamic config manager */
      dynamicConfigHandlers = Map[String, ConfigHandler](ConfigType.Topic -> new TopicConfigHandler(logManager, config, quotaManagers),
                                                         ConfigType.Client -> new ClientIdConfigHandler(quotaManagers),
                                                         ConfigType.User -> new UserConfigHandler(quotaManagers),
                                                         ConfigType.Broker -> new BrokerConfigHandler(config, quotaManagers))
      // Create the config manager. start listening to notifications
      dynamicConfigManager = new DynamicConfigManager(zkUtils, dynamicConfigHandlers)
      dynamicConfigManager.startup()
      /* tell everyone we are alive */
      val listeners = config.advertisedListeners.map {case(protocol, endpoint) =>
        if (endpoint.port == 0)
          (protocol, EndPoint(endpoint.host, socketServer.boundPort(protocol), endpoint.protocolType))
        else
          (protocol, endpoint)
      }
      // 这儿就是每个broker完成注册的代码
      kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, listeners, zkUtils, config.rack,
        config.interBrokerProtocolVersion)
      kafkaHealthcheck.startup()
      // Now that the broker id is successfully registered via KafkaHealthcheck, checkpoint it
      checkpointBrokerId(config.brokerId)
      /* register broker metrics */
      registerStats()
      brokerState.newState(RunningAsBroker)
      shutdownLatch = new CountDownLatch(1)
      startupComplete.set(true)
      isStartingUp.set(false)
      AppInfoParser.registerAppInfo(jmxPrefix, config.brokerId.toString)
      info("started")
    }
  }
  catch {
    case e: Throwable =>
      fatal("Fatal error during KafkaServer startup. Prepare to shutdown", e)
      isStartingUp.set(false)
      shutdown()
      throw e
  }
}

2.1 ZkUtils 工具类初始化

可以看到initZk（）最终返回了一个ZkUtils 工具类，其作用是读取 kafka 配置文件中的 zookeeper 配置，用来操作 zookeeper 集群。

object ZkUtils {
  val ConsumersPath = "/consumers"
  val ClusterIdPath = "/cluster/id"
  val BrokerIdsPath = "/brokers/ids"
  val BrokerTopicsPath = "/brokers/topics"
  val ControllerPath = "/controller"
  val ControllerEpochPath = "/controller_epoch"
  val ReassignPartitionsPath = "/admin/reassign_partitions"
  val DeleteTopicsPath = "/admin/delete_topics"
  val PreferredReplicaLeaderElectionPath = "/admin/preferred_replica_election"
  val BrokerSequenceIdPath = "/brokers/seqid"
  val IsrChangeNotificationPath = "/isr_change_notification"
  val EntityConfigPath = "/config"
  val EntityConfigChangesPath = "/config/changes"
  ......
  }

初始化 ZkUtils：

private def initZk(): ZkUtils = {
  info(s"Connecting to zookeeper on ${config.zkConnect}")
  //这些都是读取的我们之前的配置文件里面的参数
  val chrootIndex = config.zkConnect.indexOf("/")
  val chrootOption = {
    if (chrootIndex > 0) Some(config.zkConnect.substring(chrootIndex))
    else None
  }
  val secureAclsEnabled = config.zkEnableSecureAcls
  val isZkSecurityEnabled = JaasUtils.isZkSecurityEnabled()
  if (secureAclsEnabled && !isZkSecurityEnabled)
    throw new java.lang.SecurityException(s"${KafkaConfig.ZkEnableSecureAclsProp} is true, but the verification of the JAAS login file failed.")
  chrootOption.foreach { chroot =>
    val zkConnForChrootCreation = config.zkConnect.substring(0, chrootIndex)
    val zkClientForChrootCreation = ZkUtils(zkConnForChrootCreation,
                                            config.zkSessionTimeoutMs,
                                            config.zkConnectionTimeoutMs,
                                            secureAclsEnabled)
    zkClientForChrootCreation.makeSurePersistentPathExists(chroot)
    info(s"Created zookeeper path $chroot")
    zkClientForChrootCreation.zkClient.close()
  }
  //实例化 ZkUtils 对象
  val zkUtils = ZkUtils(config.zkConnect,
                        config.zkSessionTimeoutMs,
                        config.zkConnectionTimeoutMs,
                        secureAclsEnabled)
  zkUtils.setupCommonPaths()
  zkUtils
}

2.2 LogManager 初始化

logManager = createLogManager(zkUtils.zkClient, brokerState)
logManager.startup()

首先是实例化了 LogManager。

private def createLogManager(zkClient: ZkClient, brokerState: BrokerState): LogManager = {
  //解析配置文件里面的参数
  val defaultProps = KafkaServer.copyKafkaConfigToLog(config)
  val defaultLogConfig = LogConfig(defaultProps)
 
  val configs = AdminUtils.fetchAllTopicConfigs(zkUtils).map { case (topic, configs) =>
    topic -> LogConfig.fromProps(defaultProps, configs)
  }
  // read the log configurations from zookeeper
  val cleanerConfig = CleanerConfig(numThreads = config.logCleanerThreads,
                                    dedupeBufferSize = config.logCleanerDedupeBufferSize,
                                    dedupeBufferLoadFactor = config.logCleanerDedupeBufferLoadFactor,
                                    ioBufferSize = config.logCleanerIoBufferSize,
                                    maxMessageSize = config.messageMaxBytes,
                                    maxIoBytesPerSecond = config.logCleanerIoMaxBytesPerSecond,
                                    backOffMs = config.logCleanerBackoffMs,
                                    enableCleaner = config.logCleanerEnable)
  //创建对象
  //logDirs 通常情况下会对应多个目录（生产环境里面）
  new LogManager(logDirs = config.logDirs.map(new File(_)).toArray,
                 topicConfigs = configs,
                 defaultConfig = defaultLogConfig,
                 cleanerConfig = cleanerConfig,
                 ioThreads = config.numRecoveryThreadsPerDataDir,
                 flushCheckMs = config.logFlushSchedulerIntervalMs,
                 flushCheckpointMs = config.logFlushOffsetCheckpointIntervalMs,
                 retentionCheckMs = config.logCleanupIntervalMs,
                 scheduler = kafkaScheduler,
                 brokerState = brokerState,
                 time = time)
}

然后启动 LogManager，通过定时任务刷写日志，以及清理过期日志。

/**
 *  Start the background threads to flush logs and do log cleanup
 */
 //启动后台线程用于刷写日志以及清理日志
def startup() {
  //定时调度了三个任务
  /* Schedule the cleanup task to delete old logs */
  if(scheduler != null) {
    info("Starting log cleanup with a period of %d ms.".format(retentionCheckMs))
    // 1）定时检查文件，清理过期的文件。
    scheduler.schedule("kafka-log-retention",
                       cleanupLogs,
                       delay = InitialTaskDelayMs,
                       period = retentionCheckMs,
                       TimeUnit.MILLISECONDS)
    info("Starting log flusher with a default period of %d ms.".format(flushCheckMs))
    // 2）定时把内存里面的数据刷写到磁盘
    scheduler.schedule("kafka-log-flusher",
                       flushDirtyLogs,
                       delay = InitialTaskDelayMs,
                       period = flushCheckMs,
                       TimeUnit.MILLISECONDS)
    //定时更新一个检查点的文件，Kafka服务重启的时候恢复数据使用。
    scheduler.schedule("kafka-recovery-point-checkpoint",
                       checkpointRecoveryPointOffsets,
                       delay = InitialTaskDelayMs,
                       period = flushCheckpointMs,
                       TimeUnit.MILLISECONDS)
  }
  if(cleanerConfig.enableCleaner)
    cleaner.startup()
}

2.3 SocketServer 初始化

//实例化 SocketServer
socketServer = new SocketServer(config, metrics, kafkaMetricsTime)
//启动 SocketServer
socketServer.startup()

启动完 LogManger 之后就创建 SocketServer对象，本身就是 NIO socket 服务端，线程的模型是基于 Acceptor，一个 Acceptor 线程用于接受并处理所有的新连接，每个Acceptor对应多个Processor线程，每个Processor线程拥有自己的Selector，主要用于从连接中读取请求和写回响应。

/**
 * An NIO socket server. The threading model is
 *   1 Acceptor thread that handles new connections
 *   Acceptor has N Processor threads that each have their own selector and read requests from sockets
 *   M Handler threads that handle requests and produce responses back to the processor threads for writing.
 */
class SocketServer(val config: KafkaConfig, val metrics: Metrics, val time: Time) extends Logging with KafkaMetricsGroup {
 
  private val endpoints = config.listeners
  private val numProcessorThreads = config.numNetworkThreads
  private val maxQueuedRequests = config.queuedMaxRequests
  private val totalProcessorThreads = numProcessorThreads * endpoints.size
 
  private val maxConnectionsPerIp = config.maxConnectionsPerIp
  private val maxConnectionsPerIpOverrides = config.maxConnectionsPerIpOverrides
 
  this.logIdent = "[Socket Server on Broker " + config.brokerId + "], "
 
  val requestChannel = new RequestChannel(totalProcessorThreads, maxQueuedRequests)
  private val processors = new Array[Processor](totalProcessorThreads)
 
  private[network] val acceptors = mutable.Map[EndPoint, Acceptor]()
  private var connectionQuotas: ConnectionQuotas = _
 
  private val allMetricNames = (0 until totalProcessorThreads).map { i =>
    val tags = new util.HashMap[String, String]()
    tags.put("networkProcessor", i.toString)
    metrics.metricName("io-wait-ratio", "socket-server-metrics", tags)
  }

然后就启动 SocketServer 服务端。

/**
 * Start the socket server
 */
def startup() {
  this.synchronized {
 
    connectionQuotas = new ConnectionQuotas(maxConnectionsPerIp, maxConnectionsPerIpOverrides)
    // Socket 发送请求缓存的大小
    val sendBufferSize = config.socketSendBufferBytes
    //Socket 接收请求缓存的大小
    val recvBufferSize = config.socketReceiveBufferBytes
    //当前broker主机的id
    val brokerId = config.brokerId
 
    var processorBeginIndex = 0
 
    /**
     * Kafka
     *
     */
    //遍历endpoints列表
    //listeners = PLAINTEXT://your.host.name:9092
    endpoints.values.foreach { endpoint =>
      val protocol = endpoint.protocolType
      //numProcessorThreads 等同于num.network.threads配置，默认是3
      val processorEndIndex = processorBeginIndex + numProcessorThreads
      
      for (i <- processorBeginIndex until processorEndIndex)
        //创建了三个Processor的线程
        processors(i) = newProcessor(i, connectionQuotas, protocol)
      // 创建Acceptor，同时为processor创建对应的线程
      val acceptor = new Acceptor(endpoint, sendBufferSize, recvBufferSize, brokerId,
        processors.slice(processorBeginIndex, processorEndIndex), connectionQuotas)
        
      acceptors.put(endpoint, acceptor)
      // 创建Acceptor对应的线程，并启动
      Utils.newThread("kafka-socket-acceptor-%s-%d".format(protocol.toString, endpoint.port), acceptor, false).start()
      // 主线程阻塞等待Acceptor线程启动完成
      acceptor.awaitStartup()
      // 修改processorBeginIndex，为下一个Endpint准备
      processorBeginIndex = processorEndIndex
    }
  }
 
  newGauge("NetworkProcessorAvgIdlePercent",
    new Gauge[Double] {
      def value = allMetricNames.map( metricName =>
        metrics.metrics().get(metricName).value()).sum / totalProcessorThreads
    }
  )
 
  info("Started " + acceptors.size + " acceptor threads")
}

2.4 ReplicaMananger初始化

完成 LogManager 和 SocketServer 的初始化和启动之后，需要初始化并启动 ReplicaManager，见名知义，大概能猜到这个类是用来管理副本的。

replicaManager = new ReplicaManager(config, metrics, time, kafkaMetricsTime, zkUtils, kafkaScheduler, logManager,
  isShuttingDown, quotaManagers.follower)
replicaManager.startup()

核心类，负责管理生产者的请求、fetch 请求、以及副本同步流程启动等等，涉及到诸多核心处理流程，内容非常多，还是按照主线的思路来走完服务端初始化流程，这里不单独做展开。

/**
  * @param config
  * @param metrics
  * @param time
  * @param jTime
  * @param zkUtils
  * @param scheduler
  * @param logManager
  * @param isShuttingDown
  * @param quotaManager
  * @param threadNamePrefix
  */
class ReplicaManager(val config: KafkaConfig,
                     metrics: Metrics,
                     time: Time,
                     jTime: JTime,
                     val zkUtils: ZkUtils,
                     scheduler: Scheduler,
                     val logManager: LogManager,
                     val isShuttingDown: AtomicBoolean,
                     quotaManager: ReplicationQuotaManager,
                     threadNamePrefix: Option[String] = None) extends Logging with KafkaMetricsGroup {
/* epoch of the controller that last changed the leader */
@volatile var controllerEpoch: Int = KafkaController.InitialControllerEpoch - 1
private val localBrokerId = config.brokerId
//TODO
private val allPartitions = new Pool[(String, Int), Partition](valueFactory = Some { case (t, p) =>
  new Partition(t, p, time, this)
})
private val replicaStateChangeLock = new Object
//follower partition是如何拉取leader partition的数据的？
val replicaFetcherManager = new ReplicaFetcherManager(config, this, metrics, jTime, threadNamePrefix, quotaManager)
private val highWatermarkCheckPointThreadStarted = new AtomicBoolean(false)
val highWatermarkCheckpoints = config.logDirs.map(dir => (new File(dir).getAbsolutePath, new OffsetCheckpoint(new File(dir, ReplicaManager.HighWatermarkFilename)))).toMap
private var hwThreadInitialized = false
this.logIdent = "[Replica Manager on Broker " + localBrokerId + "]: "
val stateChangeLogger = KafkaController.stateChangeLogger
private val isrChangeSet: mutable.Set[TopicAndPartition] = new mutable.HashSet[TopicAndPartition]()
private val lastIsrChangeMs = new AtomicLong(System.currentTimeMillis())
private val lastIsrPropagationMs = new AtomicLong(System.currentTimeMillis())

启动 ReplicManager 服务，内部会启动两个定时任务负责清理日志数据。

def startup() {
  // start ISR expiration thread
  //周期性调度的线程
  //周期性检查 isr 是否有 replica 过期需要从 isr 中移除
  scheduler.schedule("isr-expiration", maybeShrinkIsr, period = config.replicaLagTimeMaxMs, unit = TimeUnit.MILLISECONDS)
  //周期性检查是不是有 topic-partition 的 isr 需要变更, 如果需要, 就更新到 zk 上, 来触发 controller
  scheduler.schedule("isr-change-propagation", maybePropagateIsrChanges, period = 2500L, unit = TimeUnit.MILLISECONDS)
}

2.5 KafkaController 初始化

ReplicaManager 启动完成后，就准备启动 Kafka 集群，选举出 controller 节点。

kafkaController = new KafkaController(config, zkUtils, 
brokerState, kafkaMetricsTime, metrics, threadNamePrefix)

//这个启动过程会涉及到 kafka controller 的选举，用于对外服务

kafkaController.startup()

负责 kafka controler 的选举以及分区和副本的状态机管理。

class KafkaController(val config : KafkaConfig, zkUtils: ZkUtils, val brokerState: BrokerState, time: Time, metrics: Metrics, threadNamePrefix: Option[String] = None) extends Logging with KafkaMetricsGroup {
  this.logIdent = "[Controller " + config.brokerId + "]: "
  private var isRunning = true
  private val stateChangeLogger = KafkaController.stateChangeLogger
  val controllerContext = new ControllerContext(zkUtils, config.zkSessionTimeoutMs)
  val partitionStateMachine = new PartitionStateMachine(this)
  val replicaStateMachine = new ReplicaStateMachine(this)
  private val controllerElector = new ZookeeperLeaderElector(controllerContext, ZkUtils.ControllerPath, onControllerFailover,
    onControllerResignation, config.brokerId)
  // have a separate scheduler for the controller to be able to start and stop independently of the
  // kafka server
  private val autoRebalanceScheduler = new KafkaScheduler(1)
  var deleteTopicManager: TopicDeletionManager = null
  val offlinePartitionSelector = new OfflinePartitionLeaderSelector(controllerContext, config)
  private val reassignedPartitionLeaderSelector = new ReassignedPartitionLeaderSelector(controllerContext)
  private val preferredReplicaPartitionLeaderSelector = new PreferredReplicaPartitionLeaderSelector(controllerContext)
  private val controlledShutdownPartitionLeaderSelector = new ControlledShutdownLeaderSelector(controllerContext)
  private val brokerRequestBatch = new ControllerBrokerRequestBatch(this)
 
  private val partitionReassignedListener = new PartitionsReassignedListener(this)
  private val preferredReplicaElectionListener = new PreferredReplicaElectionListener(this)
  private val isrChangeNotificationListener = new IsrChangeNotificationListener(this)
  ......
  }

启动 KafkaController。

/**
 * Invoked when the controller module of a Kafka server is started up. This does not assume that the current broker
 * is the controller. It merely registers the session expiration listener and starts the controller leader
 * elector
 */
def startup() = {
  inLock(controllerContext.controllerLock) {
    info("Controller starting up")
    //注册zk连接的状态回调,处理SessionExpiration;
    registerSessionExpirationListener()
    isRunning = true
    //开始选主以及故障转移
    controllerElector.startup
    info("Controller startup complete")
  }
}
 
def startup {
  inLock(controllerContext.controllerLock) {
    //监听 zk的/controller目录的元数据变化
    controllerContext.zkUtils.zkClient.subscribeDataChanges(electionPath, leaderChangeListener)
    //选主
    elect
  }
}
 
def elect: Boolean = {
  val timestamp = SystemTime.milliseconds.toString
  val electString = Json.encode(Map("version" -> 1, "brokerid" -> brokerId, "timestamp" -> timestamp))
  //获取controller的 id，刚初始化的时候 leaderId=-1
  leaderId = getControllerID
  /*
   * We can get here during the initial startup and the handleDeleted ZK callback. Because of the potential race condition,
   * it's possible that the controller has already been elected when we get here. This check will prevent the following
   * createEphemeralPath method from getting into an infinite loop if this broker is already the controller.
   */
  if(leaderId != -1) {
     debug("Broker %d has been elected as leader, so stopping the election process.".format(leaderId))
    //如果代码执行到这儿说明，之前就已经完成选举了。
     return amILeader
  }
  try {
    //创建临时目录 /controller/
    val zkCheckedEphemeral = new ZKCheckedEphemeral(electionPath,
                                                    electString,
                                                    controllerContext.zkUtils.zkConnection.getZookeeper,
                                                    JaasUtils.isZkSecurityEnabled())
    //创建目录
    zkCheckedEphemeral.create()
    info(brokerId + " successfully elected as leader")
    //将选举出来的 broker 的 id 作为 leaderId
    leaderId = brokerId
    //成功成为 controler，执行回调函数，通知其他 broker 我是 controller， 并启动状态机管理
    onBecomingLeader()
  } catch {
    case e: ZkNodeExistsException =>
      // If someone else has written the path, then
      leaderId = getControllerID
      if (leaderId != -1)
        debug("Broker %d was elected as leader instead of broker %d".format(leaderId, brokerId))
      else
        warn("A leader has been elected but just resigned, this will result in another round of election")
    case e2: Throwable =>
      error("Error while electing or becoming leader on broker %d".format(brokerId), e2)
      resign()
  }
  amILeader
}

3. 接收&处理上游请求

3.1 KafkaRequestHandlerPool

完成服务初始化之后，生产者与服务端建立了连接，KafkaRequestHandlerPool是 IO 线程池，是负责处理请求的线程。

//处理生产者消息队列请求
requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, config.numIoThreads)
认识KafkaRequestHandlerPool类。
class KafkaRequestHandlerPool(val brokerId: Int,
                              val requestChannel: RequestChannel,
                              val apis: KafkaApis,
                              numThreads: Int) extends Logging with KafkaMetricsGroup {
 
  /* a meter to track the average free capacity of the request handlers */
  private val aggregateIdleMeter = newMeter("RequestHandlerAvgIdlePercent", "percent", TimeUnit.NANOSECONDS)
 
  this.logIdent = "[Kafka Request Handler on Broker " + brokerId + "], "
  val threads = new Array[Thread](numThreads)
  val runnables = new Array[KafkaRequestHandler](numThreads)
  //默认启动8个线程，可以根据需要在配置文件中更改
  for(i <- 0 until numThreads) {
    //创建线程
    runnables(i) = new KafkaRequestHandler(i, brokerId, aggregateIdleMeter, numThreads, requestChannel, apis)
    threads(i) = Utils.daemonThread("kafka-request-handler-" + i, runnables(i))
    //线程启动起来了
    threads(i).start()
  }
 
  def shutdown() {
    info("shutting down")
    for(handler <- runnables)
      handler.shutdown
    for(thread <- threads)
      thread.join
    info("shut down completely")
  }
}

可以看到KafkaRequestHandlerPool通过配置文件的线程数创建了对应数量的线程，并以守护线程的方式运行，然后启动全部线程，我们接着往下看，先看KafkaRequestHandler的 run()方法，不出意外的话处理请求的逻辑就是在里面实现的。

3.2 KafkaRequestHandler

/**
 * 响应 kafka 请求的线程
 * A thread that answers kafka requests.
 */
class KafkaRequestHandler(id: Int,
                          brokerId: Int,
                          val aggregateIdleMeter: Meter,
                          val totalHandlerThreads: Int,
                          val requestChannel: RequestChannel,
                          apis: KafkaApis) extends Runnable with Logging {
  this.logIdent = "[Kafka Request Handler " + id + " on Broker " + brokerId + "], "
 
  def run() {
    while(true) {
      try {
        var req : RequestChannel.Request = null
        while (req == null) {
          // We use a single meter for aggregate idle percentage for the thread pool.
          // Since meter is calculated as total_recorded_value / time_window and
          // time_window is independent of the number of threads, each recorded idle
          // time should be discounted by # threads.
          val startSelectTime = SystemTime.nanoseconds
          // 接收request对象
          req = requestChannel.receiveRequest(300)
          val idleTime = SystemTime.nanoseconds - startSelectTime
          aggregateIdleMeter.mark(idleTime / totalHandlerThreads)
        }
 
        if(req eq RequestChannel.AllDone) {
          debug("Kafka request handler %d on broker %d received shut down command".format(
            id, brokerId))
          return
        }
        req.requestDequeueTimeMs = SystemTime.milliseconds
        trace("Kafka request handler %d on broker %d handling request %s".format(id, brokerId, req))
        // 交给KafkaApis进行最终的处理
        apis.handle(req)
      } catch {
        case e: Throwable => error("Exception when handling request", e)
      }
    }
  }
 
  def shutdown(): Unit = requestChannel.sendRequest(RequestChannel.AllDone)
}

使用了 while（true）循环确保线程一直处于运行状态，持续不断地接RequestChannel的请求。

/** Get the next request or block until specified time has elapsed */
def receiveRequest(timeout: Long): RequestChannel.Request =
   //从队列里面获取Request对象
  requestQueue.poll(timeout, TimeUnit.MILLISECONDS)

最后调用 KafkaApis的 handle（）方法进行最终处理。

/**
 * Top-level method that handles all requests and multiplexes to the right api
 */
 //根据 ApiKeys 的类型进行请求分发
def handle(request: RequestChannel.Request) {
  try {
    trace("Handling request:%s from connection %s;securityProtocol:%s,principal:%s".
      format(request.requestDesc(true), request.connectionId, request.securityProtocol, request.session.principal))
    ApiKeys.forId(request.requestId) match {
      // 处理生产者发送过来的请求
      case ApiKeys.PRODUCE => handleProducerRequest(request)
      // 处理follower拉取数据的请求，同步数据
      case ApiKeys.FETCH => handleFetchRequest(request)
      case ApiKeys.LIST_OFFSETS => handleOffsetRequest(request)
      case ApiKeys.METADATA => handleTopicMetadataRequest(request)
      case ApiKeys.LEADER_AND_ISR => handleLeaderAndIsrRequest(request)
      case ApiKeys.STOP_REPLICA => handleStopReplicaRequest(request)
      //处理元数据的请求
      case ApiKeys.UPDATE_METADATA_KEY => handleUpdateMetadataRequest(request)
      case ApiKeys.CONTROLLED_SHUTDOWN_KEY => handleControlledShutdownRequest(request)
      case ApiKeys.OFFSET_COMMIT => handleOffsetCommitRequest(request)
      case ApiKeys.OFFSET_FETCH => handleOffsetFetchRequest(request)
      case ApiKeys.GROUP_COORDINATOR => handleGroupCoordinatorRequest(request)
      case ApiKeys.JOIN_GROUP => handleJoinGroupRequest(request)
      case ApiKeys.HEARTBEAT => handleHeartbeatRequest(request)
      case ApiKeys.LEAVE_GROUP => handleLeaveGroupRequest(request)
      case ApiKeys.SYNC_GROUP => handleSyncGroupRequest(request)
      case ApiKeys.DESCRIBE_GROUPS => handleDescribeGroupRequest(request)
      case ApiKeys.LIST_GROUPS => handleListGroupsRequest(request)
      case ApiKeys.SASL_HANDSHAKE => handleSaslHandshakeRequest(request)
      case ApiKeys.API_VERSIONS => handleApiVersionsRequest(request)
      case ApiKeys.CREATE_TOPICS => handleCreateTopicsRequest(request)
      case ApiKeys.DELETE_TOPICS => handleDeleteTopicsRequest(request)
      case requestId => throw new KafkaException("Unknown api code " + requestId)
    }
  } catch {
    case e: Throwable =>
      if (request.requestObj != null) {
        request.requestObj.handleError(e, requestChannel, request)
        error("Error when handling request %s".format(request.requestObj), e)
      } else {
        val response = request.body.getErrorResponse(request.header.apiVersion, e)
        val respHeader = new ResponseHeader(request.header.correlationId)
 
        /* If request doesn't have a default error response, we just close the connection.
           For example, when produce request has acks set to 0 */
        if (response == null)
          requestChannel.closeConnection(request.processor, request)
        else
          requestChannel.sendResponse(new Response(request, new ResponseSend(request.connectionId, respHeader, response)))
 
        error("Error when handling request %s".format(request.body), e)
      }
  } finally
    request.apiLocalCompleteTimeMs = SystemTime.milliseconds
}
处理生产者请求==>handleProducerRequest。
/**
 * Handle a produce request
 */
def handleProducerRequest(request: RequestChannel.Request) {
  //将RequestChannel.Request转换为 ProduceRequest
  val produceRequest = request.body.asInstanceOf[ProduceRequest]
  //计算请求字节数
  val numBytesAppended = request.header.sizeOf + produceRequest.sizeOf
  //根据 exit 权限和 describe 权限进行过滤
  val (existingAndAuthorizedForDescribeTopics, nonExistingOrUnauthorizedForDescribeTopics) = produceRequest.partitionRecords.asScala.partition {
    //主要就是针对权限进行判断
    case (topicPartition, _) => authorize(request.session, Describe, new Resource(auth.Topic, topicPartition.topic)) && metadataCache.contains(topicPartition.topic)
  }
  //判断是否有 write 权限
  val (authorizedRequestInfo, unauthorizedForWriteRequestInfo) = existingAndAuthorizedForDescribeTopics.partition {
    case (topicPartition, _) => authorize(request.session, Write, new Resource(auth.Topic, topicPartition.topic))
  }
 
  //回调函数，发送生产者响应信息，实际上调用的是DelayedProduce 的 responseCallback（）方法
  // the callback for sending a produce response
  def sendResponseCallback(responseStatus: Map[TopicPartition, PartitionResponse]) {
 
    val mergedResponseStatus = responseStatus ++
      unauthorizedForWriteRequestInfo.mapValues(_ =>
         new PartitionResponse(Errors.TOPIC_AUTHORIZATION_FAILED.code, -1, Message.NoTimestamp)) ++
      nonExistingOrUnauthorizedForDescribeTopics.mapValues(_ =>
         new PartitionResponse(Errors.UNKNOWN_TOPIC_OR_PARTITION.code, -1, Message.NoTimestamp))
 
    var errorInResponse = false
 
    mergedResponseStatus.foreach { case (topicPartition, status) =>
      if (status.errorCode != Errors.NONE.code) {
        errorInResponse = true
        debug("Produce request with correlation id %d from client %s on partition %s failed due to %s".format(
          request.header.correlationId,
          request.header.clientId,
          topicPartition,
          Errors.forCode(status.errorCode).exceptionName))
      }
    }
 
    def produceResponseCallback(delayTimeMs: Int) {
      //kafka ack 发送机制
      //0:消息发送出去就不管了，不需要 broker 确认状态
      //1：producer发送消息出去要等待leader成功收到数据并得到确认，才发送下一条消息
      //-1：意味着producer得到follwer确认，才发送下一条消息
      //ack 为 0，则无需等待 broker 响应
      if (produceRequest.acks == 0) {
        // no operation needed if producer request.required.acks = 0; however, if there is any error in handling
        // the request, since no response is expected by the producer, the server will close socket server so that
        // the producer client will know that some error has happened and will refresh its metadata
        if (errorInResponse) {
          val exceptionsSummary = mergedResponseStatus.map { case (topicPartition, status) =>
            topicPartition -> Errors.forCode(status.errorCode).exceptionName
          }.mkString(", ")
          info(
            s"Closing connection due to error during produce request with correlation id ${request.header.correlationId} " +
              s"from client id ${request.header.clientId} with ack=0\n" +
              s"Topic and partition to exceptions: $exceptionsSummary"
          )
          requestChannel.closeConnection(request.processor, request)
        } else {
          requestChannel.noOperation(request.processor, request)
        }
      } else {//ack 不为0的情况
        ///封装一个请求头
        val respHeader = new ResponseHeader(request.header.correlationId)
        //创建了ProduceResponse对象，封装了响应消息体
        val respBody = request.header.apiVersion match {
          case 0 => new ProduceResponse(mergedResponseStatus.asJava)
          case version@(1 | 2) => new ProduceResponse(mergedResponseStatus.asJava, delayTimeMs, version)
          // This case shouldn't happen unless a new version of ProducerRequest is added without
          // updating this part of the code to handle it properly.
          case version => throw new IllegalArgumentException(s"Version `$version` of ProduceRequest is not handled. Code must be updated.")
        }
        //服务端返回响应给生产者
        requestChannel.sendResponse(new RequestChannel.Response(request, new ResponseSend(request.connectionId, respHeader, respBody)))
      }
    }
 
    // When this callback is triggered, the remote API call has completed
    request.apiRemoteCompleteTimeMs = SystemTime.milliseconds
 
    quotas.produce.recordAndMaybeThrottle(
      request.session.sanitizedUser,
      request.header.clientId,
      numBytesAppended,
      produceResponseCallback)
  }
 
  if (authorizedRequestInfo.isEmpty)
  //如果ProducerRequest请求的所有分区都无写权限，就直接调用sendResponseCallback()回调，无响应数据
    sendResponseCallback(Map.empty)
  else {
    val internalTopicsAllowed = request.header.clientId == AdminUtils.AdminClientId
 
    // Convert ByteBuffer to ByteBufferMessageSet
    val authorizedMessagesPerPartition = authorizedRequestInfo.map {
      case (topicPartition, buffer) => (topicPartition, new ByteBufferMessageSet(buffer))
    }
 
    // call the replica manager to append messages to the replicas
    /**
      * 消息追加，落盘
      */
    replicaManager.appendMessages(
      produceRequest.timeout.toLong,
      produceRequest.acks,
      internalTopicsAllowed,
      authorizedMessagesPerPartition,
      sendResponseCallback)
 
    // if the request is put into the purgatory, it will have a held reference
    // and hence cannot be garbage collected; hence we clear its data here in
    // order to let GC re-claim its memory since it is already appended to log
    //将ProducerRequest内部数据清除以进行GC   
     produceRequest.clearPartitionRecords()
  }
}

以上创建了消息处理线程实例，调用 handler（）方法根据 ApiKeys 的类型分发请求，进行各自请求的逻辑处理，有兴趣的可以一个个分析，这里只针对处理生产者发送消息请求的流程进行分析，接着就启动线程。

threads(i) = Utils.daemonThread("kafka-request-handler-" + i, runnables(i))
//启动线程
threads(i).start()

值得一提的是，还是熟悉的配方，对 KafkaRequestHandler 封装了多一层，灵活性更强，并以后台线程运行。

/**
 * Create a daemon thread
 * @param name The name of the thread
 * @param runnable The runnable to execute in the background
 * @return The unstarted thread
 */
public static Thread daemonThread(String name, Runnable runnable) {
    return newThread(name, runnable, true);
}
 
/**
 * Create a new thread
 * @param name The name of the thread
 * @param runnable The work for the thread to do
 * @param daemon Should the thread block JVM shutdown?
 * @return The unstarted thread
 */
public static Thread newThread(String name, Runnable runnable, boolean daemon) {
    Thread thread = new Thread(runnable, name);
    thread.setDaemon(daemon);
    thread.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
        public void uncaughtException(Thread t, Throwable e) {
            log.error("Uncaught exception in thread '" + t.getName() + "':", e);
        }
    });
    return thread;
}

4. 小结

这篇小文站在 kafka 服务端的角度分析了服务端的初始化，包括各个依赖组件的初始化，从而完成与上游的网络连接，其次，针对服务端与上游建立好连接之后服务端是如何接收生产者消息请求的，以及服务端如何对接收到的请求进行分发和处理的，希望对读者有所启发，更多精彩内容请听下回分享。