概述
分布式事务:producer可能会给多个topic,多个partition发消息,这些消息也需要能放在一个事务里面,这就形成了一个典型的分布式事务。
事务协调者:因为producer发送消息可能是分布式事务,所以引入了常用的2PC,所以有事务协调者(Transaction Coordinator)。Transaction Coordinator和之前为了解决脑裂和惊群问题引入的Group Coordinator在选举和failover上面类似。
transaction_state topic和事务日志:事务管理中事务日志是必不可少的,kafka使用一个内部topic来保存事务日志,这个设计和之前使用内部topic保存位点的设计保持一致。事务日志是Transaction Coordinator管理的状态的持久化,因为不需要回溯事务的历史状态,所以事务日志只用保存最近的事务状态。
control message:因为事务存在commit和abort两种操作,而客户端又有read committed和read uncommitted两种隔离级别,所以消息队列必须能标识事务状态,这个被称作Control Message。
transactionalId:producer挂掉重启或者漂移到其它机器需要能关联的之前的未完成事务所以需要有一个唯一标识符来进行关联,这个就是TransactionalId,一个producer挂了,另一个有相同TransactionalId的producer能够接着处理这个事务未完成的状态。
producer epoch:为了保证新的Producer启动后,旧的具有相同Transaction ID
的Producer即失效,每次Producer通过Transaction ID
拿到PID的同时,还会获取一个单调递增的epoch。由于旧的Producer的epoch比新Producer的epoch小,Kafka可以很容易识别出该Producer是老的Producer并拒绝其请求。
服务端处理 findCoordinator 请求
def handleFindCoordinatorRequest(request: RequestChannel.Request) {
val findCoordinatorRequest = request.body[FindCoordinatorRequest]
if (findCoordinatorRequest.coordinatorType == FindCoordinatorRequest.CoordinatorType.GROUP &&
!authorize(request.session, Describe, Resource(Group, findCoordinatorRequest.coordinatorKey, LITERAL)))
sendErrorResponseMaybeThrottle(request, Errors.GROUP_AUTHORIZATION_FAILED.exception)
else if (findCoordinatorRequest.coordinatorType == FindCoordinatorRequest.CoordinatorType.TRANSACTION &&
!authorize(request.session, Describe, Resource(TransactionalId, findCoordinatorRequest.coordinatorKey, LITERAL)))
sendErrorResponseMaybeThrottle(request, Errors.TRANSACTIONAL_ID_AUTHORIZATION_FAILED.exception)
else {
// get metadata (and create the topic if necessary)
val (partition, topicMetadata) = findCoordinatorRequest.coordinatorType match {
case FindCoordinatorRequest.CoordinatorType.GROUP =>
val partition = groupCoordinator.partitionFor(findCoordinatorRequest.coordinatorKey)
val metadata = getOrCreateInternalTopic(GROUP_METADATA_TOPIC_NAME, request.context.listenerName)
(partition, metadata)
//如果是寻找协调者的请求类型是事务
case FindCoordinatorRequest.CoordinatorType.TRANSACTION =>
//将transactionalId的hashCode求模transaction topic的partition数量,获取所在的partitionId
val partition = txnCoordinator.partitionFor(findCoordinatorRequest.coordinatorKey)
//创建transaction_state topic,获取该topic的元数据信息
val metadata = getOrCreateInternalTopic(TRANSACTION_STATE_TOPIC_NAME, request.context.listenerName)
(partition, metadata)
case _ =>
throw new InvalidRequestException("Unknown coordinator type in FindCoordinator request")
}
def createResponse(requestThrottleMs: Int): AbstractResponse = {
val responseBody = if (topicMetadata.error != Errors.NONE) {
new FindCoordinatorResponse(requestThrottleMs, Errors.COORDINATOR_NOT_AVAILABLE, Node.noNode)
} else {
val coordinatorEndpoint = topicMetadata.partitionMetadata.asScala
.find(_.partition == partition)
.map(_.leader)
.flatMap(p => Option(p))
coordinatorEndpoint match {
case Some(endpoint) if !endpoint.isEmpty =>
new FindCoordinatorResponse(requestThrottleMs, Errors.NONE, endpoint)
case _ =>
new FindCoordinatorResponse(requestThrottleMs, Errors.COORDINATOR_NOT_AVAILABLE, Node.noNode)
}
}
trace("Sending FindCoordinator response %s for correlation id %d to client %s."
.format(responseBody, request.header.correlationId, request.header.clientId))
responseBody
}
sendResponseMaybeThrottle(request, createResponse)
}
}
创建transaction_state topic
private def createInternalTopic(topic: String): MetadataResponse.TopicMetadata = {
if (topic == null)
throw new IllegalArgumentException("topic must not be null")
val aliveBrokers = metadataCache.getAliveBrokers
topic match {
case GROUP_METADATA_TOPIC_NAME =>
if (aliveBrokers.size < config.offsetsTopicReplicationFactor) {
error(s"Number of alive brokers '${aliveBrokers.size}' does not meet the required replication factor " +
s"'${config.offsetsTopicReplicationFactor}' for the offsets topic (configured via " +
s"'${KafkaConfig.OffsetsTopicReplicationFactorProp}'). This error can be ignored if the cluster is starting up " +
s"and not all brokers are up yet.")
new MetadataResponse.TopicMetadata(Errors.COORDINATOR_NOT_AVAILABLE, topic, true, java.util.Collections.emptyList())
} else {
createTopic(topic, config.offsetsTopicPartitions, config.offsetsTopicReplicationFactor.toInt,
groupCoordinator.offsetsTopicConfigs)
}
//如果是transaction_state topic
case TRANSACTION_STATE_TOPIC_NAME =>
if (aliveBrokers.size < config.transactionTopicReplicationFactor) {
error(s"Number of alive brokers '${aliveBrokers.size}' does not meet the required replication factor " +
s"'${config.transactionTopicReplicationFactor}' for the transactions state topic (configured via " +
s"'${KafkaConfig.TransactionsTopicReplicationFactorProp}'). This error can be ignored if the cluster is starting up " +
s"and not all brokers are up yet.")
new MetadataResponse.TopicMetadata(Errors.COORDINATOR_NOT_AVAILABLE, topic, true, java.util.Collections.emptyList())
} else {
createTopic(topic, config.transactionTopicPartitions, config.transactionTopicReplicationFactor.toInt,
txnCoordinator.transactionTopicConfigs)
}
case _ => throw new IllegalArgumentException(s"Unexpected internal topic name: $topic")
}
}
根据transaction_state topic的配置信息创建topic,返回topic的元数据信息。
topic的元数据信息包括:topic名、是否内部topic、topic中每个partition的元数据信息。
private def createTopic(topic: String,
numPartitions: Int,
replicationFactor: Int,
properties: Properties = new Properties()): MetadataResponse.TopicMetadata = {
try {
adminZkClient.createTopic(topic, numPartitions, replicationFactor, properties, RackAwareMode.Safe)
info("Auto creation of topic %s with %d partitions and replication factor %d is successful"
.format(topic, numPartitions, replicationFactor))
//topic的元数据信息包括:topic名、是否内部topic、topic中每个partition的元数据信息
new MetadataResponse.TopicMetadata(Errors.LEADER_NOT_AVAILABLE, topic, isInternal(topic),
java.util.Collections.emptyList())
} catch {
case _: TopicExistsException => // let it go, possibly another broker created this topic
new MetadataResponse.TopicMetadata(Errors.LEADER_NOT_AVAILABLE, topic, isInternal(topic),
java.util.Collections.emptyList())
case ex: Throwable => // Catch all to prevent unhandled errors
new MetadataResponse.TopicMetadata(Errors.forException(ex), topic, isInternal(topic),
java.util.Collections.emptyList())
}
}