当前,Controller 只会向 Broker 发送三类请求,分别是 LeaderAndIsrRequest、StopReplicaRequest 和 UpdateMetadataRequest。
- LeaderAndIsrRequest:告诉 Broker 相关主题各个分区的 Leader 副本位于哪台 Broker 上、ISR 中的副本都在哪些 Broker 上。
- StopReplicaRequest:告知指定 Broker 停止它上面的副本对象,该请求甚至还能删除副本底层的日志数据。这个请求主要的使用场景,是分区副本迁移和删除主题。在这两个场景下,都要涉及停掉 Broker 上的副本操作。
- UpdateMetadataRequest:顾名思义,该请求会更新 Broker 上的元数据缓存。集群上的所有元数据变更,都首先发生在 Controller 端,然后再经由这个请求广播给集群上的所有 Broker。
这三类请求 Java 类的定义就封装在 clients 中,它们的抽象基类是 AbstractControlRequest 类,看 AbstractControlRequest 类的主要代码。
public abstract class AbstractControlRequest extends AbstractRequest {
public static final long UNKNOWN_BROKER_EPOCH = -1L;
public static abstract class Builder<T extends AbstractRequest> extends AbstractRequest.Builder<T> {
protected final int controllerId; // Controller 所在的 Broker ID。
protected final int controllerEpoch; // Controller 的版本信息。
protected final long brokerEpoch; // 目标 Broker 的 Epoch。
protected Builder(ApiKeys api, short version, int controllerId, int controllerEpoch, long brokerEpoch) {
super(api, version);
this.controllerId = controllerId;
this.controllerEpoch = controllerEpoch;
this.brokerEpoch = brokerEpoch;
}
}
}
public class LeaderAndIsrRequest extends AbstractControlRequest { ...... }
public class StopReplicaRequest extends AbstractControlRequest { ...... }
public class UpdateMetadataRequest extends AbstractControlRequest { ...... }
RequestSendThread 线程
Controller 事件处理线程负责向这个队列写入待发送的请求,而一个名为 RequestSendThread 的线程负责执行真正的请求发送。
Controller 往阻塞队列上放的数据由源码中的 QueueItem 类定义的。
ps.这里的“<:”符号,它在 Scala 中表示上边界的意思,即字段 request 必须是 AbstractControlRequest 的子类,也就是上面说到的那三类请求。
case class QueueItem(apiKey: ApiKeys, request: AbstractControlRequest.Builder[_ <: AbstractControlRequest], callback: AbstractResponse => Unit, enqueueTimeMs: Long)
每个 QueueItem 的核心字段都是 AbstractControlRequest.Builder 对象。它就是阻塞队列上 AbstractControlRequest 类型。
RequestSendThread 类的定义
class RequestSendThread(val controllerId: Int, // Controller所在Broker的Id
val controllerContext: ControllerContext, // Controller元数据信息
val queue: BlockingQueue[QueueItem], // 请求阻塞队列
val networkClient: NetworkClient, // 用于执行发送的网络I/O类
val brokerNode: Node, // 目标Broker节点
val config: KafkaConfig, // Kafka配置信息
val time: Time,
val requestRateAndQueueTimeMetrics: Timer,
val stateChangeLogger: StateChangeLogger,
name: String)
extends ShutdownableThread(name = name) {
}
RequestSendThread 最重要的是它的 doWork 方法
override def doWork(): Unit = {
def backoff(): Unit = pause(100, TimeUnit.MILLISECONDS)
// 获取缓冲队列中的 QueueItem 对象,封装了请求类型、请求对象,以及响应回调函数
val QueueItem(apiKey, requestBuilder, callback, enqueueTimeMs) = queue.take()
// 更新指标
requestRateAndQueueTimeMetrics.update(time.milliseconds() - enqueueTimeMs, TimeUnit.MILLISECONDS)
var clientResponse: ClientResponse = null
try {
var isSendSuccessful = false // 标识请求是否发送成功
// 当 broker 节点宕机后,会触发 ZK 的监听器调用 removeBroker 方法停止当前线程,在停止前会一直尝试重试
while (isRunning && !isSendSuccessful) {
// if a broker goes down for a long time, then at some point the controller's zookeeper listener will trigger a
// removeBroker which will invoke shutdown() on this thread. At that point, we will stop retrying.
try {
// 如果没有创建与目标Broker的TCP连接,或连接暂时不可用
// 阻塞等待指定 broker 节点是否允许接收请求
if (!brokerReady()) {
isSendSuccessful = false
backoff() // 等待重试
}
else {
val clientRequest = networkClient.newClientRequest(brokerNode.idString, requestBuilder,
time.milliseconds(), true)
// 发送请求,等待接收Response
// sendAndReceive 方法在发送完请求之后,会原地进入阻塞状态,等待 Response 返回
clientResponse = NetworkClientUtils.sendAndReceive(networkClient, clientRequest, time)
// 标记发送成
isSendSuccessful = true
}
} catch {
case e: Throwable => // if the send was not successful, reconnect to broker and resend the message
warn(s"Controller $controllerId epoch ${controllerContext.epoch} fails to send request $requestBuilder " +
s"to broker $brokerNode. Reconnecting to broker.", e)
// 如果出现异常,关闭与对应Broker的连接
networkClient.close(brokerNode.idString)
isSendSuccessful = false
backoff()
}
}
// 如果接收到了Response,解析响应
if (clientResponse != null) {
val requestHeader = clientResponse.requestHeader
// 解析请求类型
val api = requestHeader.apiKey
// 此Response的请求类型必须是LeaderAndIsrRequest、StopReplicaRequest或UpdateMetadataRequest中的一种
if (api != ApiKeys.LEADER_AND_ISR && api != ApiKeys.STOP_REPLICA && api != ApiKeys.UPDATE_METADATA)
throw new KafkaException(s"Unexpected apiKey received: $apiKey")
// 执行响应回调函数
val response = clientResponse.responseBody
stateChangeLogger.withControllerEpoch(controllerContext.epoch).trace(s"Received response " +
s"${response.toString(requestHeader.apiVersion)} for request $api with correlation id " +
s"${requestHeader.correlationId} sent to broker $brokerNode")
if (callback != null) {
// 处理回调
callback(response)
}
}
} catch {
case e: Throwable =>
error(s"Controller $controllerId fails to send a request to broker $brokerNode", e)
// If there is any socket error (eg, socket timeout), the connection is no longer usable and needs to be recreated.
networkClient.close(brokerNode.idString)
}
}
doWork 的逻辑很直观。它的主要作用是从阻塞队列中取出待发送的请求,然后把它发送出去,之后等待 Response 的返回。在等待 Response 的过程中,线程将一直处于阻塞状态。当接收到 Response 之后,调用 callback 执行请求处理完成后的回调逻辑。
ControllerChannelManager
ControllerChannelManager有两个主要任务。
- 管理 Controller 与集群 Broker 之间的连接,并为每个 Broker 创建 RequestSendThread 线程实例;
- 将要发送的请求放入到指定 Broker 的阻塞队列中,等待 RequestSendThread 线程进行处理。
ControllerChannelManager 类最重要的数据结构是 brokerStateInfo
protected val brokerStateInfo = new HashMap[Int, ControllerBrokerStateInfo]
ControllerBrokerStateInfo的定义如下
case class ControllerBrokerStateInfo(networkClient: NetworkClient,
brokerNode: Node, // 目标 Broker 节点对象,里面封装了目标 Broker 的连接信息,比如主机名、端口号等。
messageQueue: BlockingQueue[QueueItem], // 请求消息阻塞队列。你可以发现,Controller 为每个目标 Broker 都创建了一个消息队列。
requestSendThread: RequestSendThread, // Controller 使用这个线程给目标 Broker 发送请求。
queueSizeGauge: Gauge[Int],
requestRateAndTimeMetrics: Timer,
reconfigurableChannelBuilder: Option[Reconfigurable])
它定义了 5 个 public 方法
- startup 方法:Controller 组件在启动时,会调用 ControllerChannelManager 的 startup 方法。该方法会从元数据信息中找到集群的 Broker 列表,然后依次为它们调用
- addBroker 方法,把它们加到 brokerStateInfo 变量中,最后再依次启动 brokerStateInfo 中的 RequestSendThread 线程。
- shutdown 方法:关闭所有 RequestSendThread 线程,并清空必要的资源。
- sendRequest 方法:发送请求,实际上就是把请求对象提交到请求队列。
- addBroker 方法:添加目标 Broker 到 brokerStateInfo 数据结构中,并创建必要的配套资源,如请求队列、RequestSendThread 线程对象等。最后,RequestSendThread 启动线程。
- removeBroker 方法:从 brokerStateInfo 移除目标 Broker 的相关数据。
重点说下addBroker,以及底层相关的私有方法 addNewBroker 和 startRequestSendThread 方法。
def addBroker(broker: Broker): Unit = {
// be careful here. Maybe the startup() API has already started the request send thread
brokerLock synchronized {
// 如果该Broker是新Broker的话
if (!brokerStateInfo.contains(broker.id)) {
// 将新Broker加入到Controller管理,并创建对应的RequestSendThread线程
addNewBroker(broker)
// 启动RequestSendThread线程
startRequestSendThread(broker.id)
}
}
}
addNewBroker 的关键在于,要为目标 Broker 创建一系列的配套资源,比如,NetworkClient 用于网络 I/O 操作、messageQueue 用于阻塞队列、requestThread 用于发送请求,等等。
private def addNewBroker(broker: Broker): Unit = {
// 为该Broker构造请求阻塞队列
val messageQueue = new LinkedBlockingQueue[QueueItem]
debug(s"Controller ${config.brokerId} trying to connect to broker ${broker.id}")
val controllerToBrokerListenerName = config.controlPlaneListenerName.getOrElse(config.interBrokerListenerName)
val controllerToBrokerSecurityProtocol = config.controlPlaneSecurityProtocol.getOrElse(config.interBrokerSecurityProtocol)
// 获取待连接Broker节点对象信息
val brokerNode = broker.node(controllerToBrokerListenerName)
val logContext = new LogContext(s"[Controller id=${config.brokerId}, targetBrokerId=${brokerNode.idString}] ")
// 创建网络连接客户端
val (networkClient, reconfigurableChannelBuilder) = {
val channelBuilder = ChannelBuilders.clientChannelBuilder(
controllerToBrokerSecurityProtocol,
JaasContext.Type.SERVER,
config,
controllerToBrokerListenerName,
config.saslMechanismInterBrokerProtocol,
time,
config.saslInterBrokerHandshakeRequestEnable
)
val reconfigurableChannelBuilder = channelBuilder match {
case reconfigurable: Reconfigurable =>
config.addReconfigurable(reconfigurable)
Some(reconfigurable)
case _ => None
}
// 创建NIO Selector实例用于网络数据传输
val selector = new Selector(
NetworkReceive.UNLIMITED,
Selector.NO_IDLE_TIMEOUT_MS,
metrics,
time,
"controller-channel",
Map("broker-id" -> brokerNode.idString).asJava,
false,
channelBuilder,
logContext
)
// 创建NetworkClient实例
// NetworkClient类是Kafka clients工程封装的顶层网络客户端API
// 提供了丰富的方法实现网络层IO数据传输
val networkClient = new NetworkClient(
selector,
new ManualMetadataUpdater(Seq(brokerNode).asJava),
config.brokerId.toString,
1,
0,
0,
Selectable.USE_DEFAULT_BUFFER_SIZE,
Selectable.USE_DEFAULT_BUFFER_SIZE,
config.requestTimeoutMs,
ClientDnsLookup.DEFAULT,
time,
false,
new ApiVersions,
logContext
)
(networkClient, reconfigurableChannelBuilder)
}
// 为这个RequestSendThread线程设置线程名称
val threadName = threadNamePrefix match {
case None => s"Controller-${config.brokerId}-to-broker-${broker.id}-send-thread"
case Some(name) => s"$name:Controller-${config.brokerId}-to-broker-${broker.id}-send-thread"
}
// 构造请求处理速率监控指标
val requestRateAndQueueTimeMetrics = newTimer(
RequestRateAndQueueTimeMetricName, TimeUnit.MILLISECONDS, TimeUnit.SECONDS, brokerMetricTags(broker.id)
)
// 创建RequestSendThread实例
val requestThread = new RequestSendThread(config.brokerId, controllerContext, messageQueue, networkClient,
brokerNode, config, time, requestRateAndQueueTimeMetrics, stateChangeLogger, threadName)
requestThread.setDaemon(false)
val queueSizeGauge = newGauge(
QueueSizeMetricName,
new Gauge[Int] {
def value: Int = messageQueue.size
},
brokerMetricTags(broker.id)
)
// 创建该Broker专属的ControllerBrokerStateInfo实例
// 并将其加入到brokerStateInfo统一管理
brokerStateInfo.put(broker.id, ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue,
requestThread, queueSizeGauge, requestRateAndQueueTimeMetrics, reconfigurableChannelBuilder))
}
startRequestSendThread就比较简单了
protected def startRequestSendThread(brokerId: Int): Unit = {
// 获取指定Broker的专属RequestSendThread实例
val requestThread = brokerStateInfo(brokerId).requestSendThread
if (requestThread.getState == Thread.State.NEW)
// 启动线程
requestThread.start()
}