KafkaApis
说明:用于处理对kafka的消息请求的中心转发组件,kafkaapis需要依赖于如下几个组件:
apis = new KafkaApis(socketServer.requestChannel, replicaManager,
consumerCoordinator,
kafkaController, zkUtils, config.brokerId, config, metadataCache, metrics,
authorizer)
其最核心的处理主要由KafkaApis中的handle函数进行调度.
请求处理池
在KafkaApis实例生成后,会同时生成一个KafkaRequestHandlerPool实例.
这个实例主要用于对kafka的请求进行处理的实例,需要依赖如下几个组件与配置:
配置项num.io.threads,默认值8,用于处理IO操作的线程个数.
requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId,
socketServer.requestChannel, apis, config.numIoThreads)
这里会根据io的线程个数,生成对应的处理线程KafkaRequestHandler.
this.logIdent = "[Kafka Request Handler on Broker " + brokerId + "], "
val threads = new Array[Thread](numThreads)
val runnables = new Array[KafkaRequestHandler](numThreads)
for(i <- 0 until numThreads) {
runnables(i) = new KafkaRequestHandler(i, brokerId, aggregateIdleMeter, numThreads, requestChannel, apis)
threads(i) = Utils.daemonThread("kafka-request-handler-" + i, runnables(i))
threads(i).start()
}
接下来看看KafkaRequestHandler线程:
def run() {
while(true) {
try {
这里从请求队列中,取出一个请求,直接交给KafkaApis进行处理.
var req : RequestChannel.Request = null
while (req == null) {
// We use a single meter for aggregate idle percentage for the thread pool.
// Since meter is calculated as total_recorded_value / time_window and
// time_window is independent of the number of threads, each recorded idle
// time should be discounted by # threads.
val startSelectTime = SystemTime.nanoseconds
req = requestChannel.receiveRequest(300)
val idleTime = SystemTime.nanoseconds - startSelectTime
aggregateIdleMeter.mark(idleTime / totalHandlerThreads)
}
if(req eq RequestChannel.AllDone) {
debug("Kafka request handler %d on broker %d received shut down
command".format(
id, brokerId))
return
}
req.requestDequeueTimeMs = SystemTime.milliseconds
trace("Kafka request handler %d on broker %d handling request %s".format(id,
brokerId, req))
apis.handle(req)
} catch {
case e: Throwable => error("Exception when handling request", e)
}
}
}
对网络请求进行处理
这个部分通过KafkaApis中的handle函数进行处理,并根据不同的请求路由进行不同的处理.
处理metadata更新请求
当某个partition发生变化后,会通过生成UpdateMetadataRequest请求向所有的brokers发送这个请求,也就是说每一个活着的broker都会接受到metadata变化的请求,并对请求进行处理.
这个处理在partition的状态发生变化,partition重新分配,broker的启动与停止时,会发起update metadata的请求.
入口通过KafkaApis中的handle函数
case RequestKeys.UpdateMetadataKey => handleUpdateMetadataRequest(request)
接下来看看handleUpdateMetadataRequest的函数处理流程:
def handleUpdateMetadataRequest(request: RequestChannel.Request) {
val updateMetadataRequest =
request.requestObj.asInstanceOf[UpdateMetadataRequest]
首先检查当前的用户是否有ClusterAction操作的权限,如果有接着执行下面的流程。
authorizeClusterAction(request)
根据请求的metadata的更新消息,更新对memtadataCache中的内容。这个包含有broker的添加与删除,partition的状态更新等。
replicaManager.maybeUpdateMetadataCache(updateMetadataRequest, metadataCache)
val updateMetadataResponse = new UpdateMetadataResponse(
updateMetadataRequest.correlationId)
requestChannel.sendResponse(new Response(request,
new RequestOrResponseSend(request.connectionId, updateMetadataResponse)))
}
看看ReplicaManager中处理对更新metadata的请求的流程:
在副本管理组件中,直接通过MetadataCache中的updateCache函数对请求过来的消息进行处理,用于更新当前的broker中的cache信息。
更新cache的流程:
1,更新cache中用于存储所有的broker节点的aliveBrokers集合。
2,对请求过来的修改过状态的partition的集合进行迭代,
2,1,如果partition的leader的节点被标记为-2,表示这是一个被删除的partition,从cache集合中找到这个partition对应的topic的子集合,并从这个集合中移出这个partition,如果这个topic中已经不在包含partition时,从cache中直接移出掉这个topic.
2,2,这种情况下,表示是对partition的状态的修改,包含partition的副本信息,与partition的leader的isr的信息,直接更新cache集合中topic子集合中对应此partition的状态信息。
def maybeUpdateMetadataCache(updateMetadataRequest: UpdateMetadataRequest,
metadataCache: MetadataCache) {
replicaStateChangeLock synchronized {
if(updateMetadataRequest.controllerEpoch < controllerEpoch) {
val stateControllerEpochErrorMessage = ("Broker %d received update metadata
request with correlation id %d from an " +
"old controller %d with epoch %d. Latest known controller epoch is %d")
.format(localBrokerId,
updateMetadataRequest.correlationId, updateMetadataRequest.controllerId,
updateMetadataRequest.controllerEpoch,
controllerEpoch)
stateChangeLogger.warn(stateControllerEpochErrorMessage)
throw new ControllerMovedException(stateControllerEpochErrorMessage)
} else {
metadataCache.updateCache(updateMetadataRequest, localBrokerId,
stateChangeLogger)
controllerEpoch = updateMetadataRequest.controllerEpoch
}
}
}
处理partition的LeaderAndIsr请求
这个请求主要是针对partition的leader或者isr发生变化后的请求处理.这个接收请求的broker节点一定会是包含有对应的partition的副本的节点才会被接收到数据.
case RequestKeys.LeaderAndIsrKey => handleLeaderAndIsrRequest(request)
接下来看看handleLeaderAndIsrRequest的处理流程:
def handleLeaderAndIsrRequest(request: RequestChannel.Request) {
首先先得到请求的内容.针对一个LeaderAndIsr的请求,得到的请求内容是一个LeaderAndIsrRequest的实例.
val leaderAndIsrRequest = request.requestObj.asInstanceOf[LeaderAndIsrRequest]
检查当前的请求用户是否具备ClusterAction操作的权限.
authorizeClusterAction(request)
try {
这个函数用于在partition的isr被改变后,对成为leader的副本与成为follower的副本判断这个副本对应的topic是否是内置的__consumer_offsets topic,通过GroupMetadataManager中的对应函数来处理内置的topic的leader上线与下线的操作.
def onLeadershipChange(updatedLeaders: Iterable[Partition],
updatedFollowers: Iterable[Partition]) {
updatedLeaders.foreach { partition =>
if (partition.topic == GroupCoordinator.GroupMetadataTopicName)
coordinator.handleGroupImmigration(partition.partitionId)
}
updatedFollowers.foreach { partition =>
if (partition.topic == GroupCoordinator.GroupMetadataTopicName)
coordinator.handleGroupEmigration(partition.partitionId)
}
}
根据请求的partition,通过副本管理组件来对partition进行leader或者follower的选择.
// call replica manager to handle updating partitions to become leader or follower
val result = replicaManager.becomeLeaderOrFollower(leaderAndIsrRequest,
metadataCache, onLeadershipChange)
val leaderAndIsrResponse = new LeaderAndIsrResponse(
leaderAndIsrRequest.correlationId,
result.responseMap, result.errorCode)
生成操作成功后的返回结果,并向请求方进行响应.
requestChannel.sendResponse(new Response(request,
new RequestOrResponseSend(request.connectionId, leaderAndIsrResponse)))
} catch {
case e: KafkaStorageException =>
fatal("Disk error during leadership change.", e)
Runtime.getRuntime.halt(1)
}
}
在ReplicaManager中的becomeLeaderOrFollower函数:
这个函数用于判断指定的partition是应该成为leader还是应该成为follower.
def becomeLeaderOrFollower(leaderAndISRRequest: LeaderAndIsrRequest,
metadataCache: MetadataCache,
onLeadershipChange: (Iterable[Partition],
Iterable[Partition]) => Unit): BecomeLeaderOrFollowerResult = {
leaderAndISRRequest.partitionStateInfos.foreach { case (
(topic, partition), stateInfo) =>
stateChangeLogger.trace("日志)
}
replicaStateChangeLock synchronized {
val responseMap = new mutable.HashMap[(String, Int), Short]
如果当前请求的epoch的值小于当前controllerEpoch的值,打印warn的日志,
并返回StaleControllerEpochCode错误代码.
if (leaderAndISRRequest.controllerEpoch < controllerEpoch) {
leaderAndISRRequest.partitionStateInfos.foreach {
case ((topic, partition), stateInfo) =>
stateChangeLogger.warn(("日志)
}
BecomeLeaderOrFollowerResult(responseMap,
ErrorMapping.StaleControllerEpochCode)
} else {
这里得到当前请求的最新的epoch的值,并设置当前的broker的epoch的值为请求的值,
val controllerId = leaderAndISRRequest.controllerId
val correlationId = leaderAndISRRequest.correlationId
controllerEpoch = leaderAndISRRequest.controllerEpoch
对请求的所有的partition进行迭代,并对partition的状态进行检查.
// First check partition's leader epoch
val partitionState = new mutable.HashMap[Partition, PartitionStateInfo]()
leaderAndISRRequest.partitionStateInfos.foreach {
case ((topic, partitionId), partitionStateInfo) =>
这里通过getOrCreatePartition从allPartitions集合中得到这个partition的实例,如果这个partition的实例在集合中不存在时,会创建这个实例.
val partition = getOrCreatePartition(topic, partitionId)
val partitionLeaderEpoch = partition.getLeaderEpoch()
检查当前的partition中的leaderEpoch的值是否小于新请求的值,如果小于这个值,同时这个partition对应的副本包含有当前的broker时,把这个partition与状态添加到partitionState的集合中,否则表示当前的broker中不包含这个partition的副本,打印一个日志,并在responseMap中记录这个partition的error code为UnknownTopicOrPartitionCode.如果partition的leaderEpoch的值大于或等于请求的epoch的值,打印日志,并在responseMap中添加这个partition的error code的值为StaleLeaderEpochCode.
if (partitionLeaderEpoch < partitionStateInfo.
leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) {
if(partitionStateInfo.allReplicas.contains(config.brokerId))
partitionState.put(partition, partitionStateInfo)
else {
stateChangeLogger.warn(("日志)
responseMap.put((topic, partitionId),
ErrorMapping.UnknownTopicOrPartitionCode)
}
} else {
// Otherwise record the error code in response
stateChangeLogger.warn(("日志)
responseMap.put((topic, partitionId), ErrorMapping.StaleLeaderEpochCode)
}
}
这里根据partition的副本包含有当前的broker节点的所有的partition的集合,得到这个partition的leader是当前的broker的所有的partition的集合,同时得到包含有当前的broker的副本的partition中,leader不是当前的broker的所有的partition的集合.
val partitionsTobeLeader = partitionState.filter {
case (partition, partitionStateInfo) =>
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leader ==
config.brokerId
}
val partitionsToBeFollower = (partitionState -- partitionsTobeLeader.keys)
如果partitions需要被切换成leader的集合不为空,对这些需要在当前的broker中的partition执行leader操作的集合执行makeLeaders函数.这里得到的集合是partition中当前broker被搞成leader的partition集合,
val partitionsBecomeLeader = if (!partitionsTobeLeader.isEmpty)
makeLeaders(controllerId, controllerEpoch, partitionsTobeLeader,
leaderAndISRRequest.correlationId, responseMap)
else
Set.empty[Partition]
如果partitions需要被切换成follower的集合不为空,对这些需要在当前的broker中的partition执行follower操作的集合执行makeFollowers函数.这里得到的集合是partition中当前broker被搞成follower的partition集合,
val partitionsBecomeFollower = if (!partitionsToBeFollower.isEmpty)
makeFollowers(controllerId, controllerEpoch, partitionsToBeFollower,
leaderAndISRRequest.correlationId, responseMap, metadataCache)
else
Set.empty[Partition]
更新每个partition中最后一个offset的值到日志目录下的checkpoint文件中.
if (!hwThreadInitialized) {
startHighWaterMarksCheckPointThread()
hwThreadInitialized = true
}
停止掉没有partition引用的fetcher的线程.这个线程用于对partition的消息的同步,从leader的partition中同步数据到follower中.
replicaFetcherManager.shutdownIdleFetcherThreads()
这里根据当前节点是leader的partition集合与当前节点变成follower的partition集合,检查这些partition对应的topic是否是__consumer_offsets topic,这个topic用来记录每个consumer对应的消费的offset的信息,如果是这个topic的partition时,根据leader与follower的集合,通过GroupMetadataManager实例对两个集合分别执行partition的leader的上线与下线的操作.
onLeadershipChange(partitionsBecomeLeader, partitionsBecomeFollower)
BecomeLeaderOrFollowerResult(responseMap, ErrorMapping.NoError)
}
}
}
Partition的leader设置
在LeaderAndIsr的请求过来时,如果请求的消息中对应的partition的leader是当前的broker节点时,会根据这个partitions的集合执行ReplicaManager.makeLeaders的操作,
*/
private def makeLeaders(controllerId: Int,
epoch: Int,
partitionState: Map[Partition, PartitionStateInfo],
correlationId: Int,
responseMap: mutable.Map[(String, Int)