Kafka源码篇 No.5-Producer消息发送

第1章 简介

经过前面几篇文章,我们已经将kafka producer端 获取元数据->分区分配->消息封装 介绍完毕,本篇文章将介绍kafka消息发送在源码上的具体实现。

第2章 消息步骤

kafka消息的发送是由sender线程执行的,我们先回顾一下sender线程的初始化。

2.1 sender线程初始化

org.apache.kafka.clients.producer.KafkaProducer#KafkaProducer

//TODO 实例化sender线程,并启动
this.sender = newSender(logContext, kafkaClient, this.metadata);
//线程名
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
//TODO 通过KafkaThread启动sender线程
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();

2.2 sender线程run方法

org.apache.kafka.clients.producer.internals.Sender#run

@Override
public void run() {
	log.debug("Starting Kafka producer I/O thread.");

	// main loop, runs until close is called
	while (running) {
		try {
			runOnce();
		} catch (Exception e) {
			log.error("Uncaught error in kafka producer I/O thread: ", e);
		}
	}
	//...
}

2.3 sender线程runOnce方法

org.apache.kafka.clients.producer.internals.Sender#runOnce

void runOnce() {
	//TODO 事务相关
	if (transactionManager != null) {
		try {
			transactionManager.maybeResolveSequences();

			// do not continue sending if the transaction manager is in a failed state
			if (transactionManager.hasFatalError()) {
				RuntimeException lastError = transactionManager.lastError();
				if (lastError != null)
					maybeAbortBatches(lastError);
				client.poll(retryBackoffMs, time.milliseconds());
				return;
			}

			// Check whether we need a new producerId. If so, we will enqueue an InitProducerId
			// request which will be sent below
			transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();

			if (maybeSendAndPollTransactionalRequest()) {
				return;
			}
		} catch (AuthenticationException e) {
			// This is already logged as error, but propagated here to perform any clean ups.
			log.trace("Authentication exception while processing transactional request", e);
			transactionManager.authenticationFailed(e);
		}
	}

	long currentTimeMs = time.milliseconds();
	//TODO 准备要发送的数据,并建立与Broker的连接
	long pollTimeout = sendProducerData(currentTimeMs);
	//TODO 拉取元数据、发送数据
	client.poll(pollTimeout, currentTimeMs);
}

这里的 sendProducerData方法内部的client.send 和 client.poll 方法调用的都是 org.apache.kafka.clients.NetworkClient ,内部调用kafka client内部的网络请求包 org.apache.kafka.common.network 进行数据发送。

2.4 sendProducerData建立链接准备数据

org.apache.kafka.clients.producer.internals.Sender#sendProducerData

private long sendProducerData(long now) {
	//TODO 集群元数据
	Cluster cluster = metadata.fetch();
	// get the list of partitions with data ready to send
	//TODO 获取到准备发送数据的partition对应的leader节点(数据最总发送给leader所在的broker)
	RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

	// if there are any partitions whose leaders are not known yet, force metadata update
	if (!result.unknownLeaderTopics.isEmpty()) {
		// The set of topics with unknown leader contains topics with leader election pending as well as
		// topics which may have expired. Add the topic again to metadata to ensure it is included
		// and request metadata update, since there are messages to send to the topic.
		for (String topic : result.unknownLeaderTopics)
			this.metadata.add(topic, now);

		log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
			result.unknownLeaderTopics);
		this.metadata.requestUpdate();
	}

	// remove any nodes we aren't ready to send to
	//TODO 检查并建立连接
	Iterator<Node> iter = result.readyNodes.iterator();
	long notReadyTimeout = Long.MAX_VALUE;
	while (iter.hasNext()) {
		Node node = iter.next();
		if (!this.client.ready(node, now)) {
			iter.remove();
			notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
		}
	}

	// create produce requests
	//TODO 按broker进行分组,将在同一broker节点上的partition合并为一组,组成一个map
	Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
	addToInflightBatches(batches);
	if (guaranteeMessageOrder) {
		// Mute all the partitions drained
		for (List<ProducerBatch> batchList : batches.values()) {
			for (ProducerBatch batch : batchList)
				this.accumulator.mutePartition(batch.topicPartition);
		}
	}

	accumulator.resetNextBatchExpiryTime();
	//TODO 超时的batches
	List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
	List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
	expiredBatches.addAll(expiredInflightBatches);

	// Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
	// for expired batches. see the documentation of @TransactionState.resetIdempotentProducerId to understand why
	// we need to reset the producer id here.
	if (!expiredBatches.isEmpty())
		log.trace("Expired {} batches in accumulator", expiredBatches.size());
	for (ProducerBatch expiredBatch : expiredBatches) {
		String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
			+ ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
		failBatch(expiredBatch, -1, NO_TIMESTAMP, new TimeoutException(errorMessage), false);
		if (transactionManager != null && expiredBatch.inRetry()) {
			// This ensures that no new batches are drained until the current in flight batches are fully resolved.
			transactionManager.markSequenceUnresolved(expiredBatch);
		}
	}
	sensors.updateProduceRequestMetrics(batches);

	// If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
	// loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
	// time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
	// sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
	// that aren't ready to send since they would cause busy looping.
	long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
	pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
	pollTimeout = Math.max(pollTimeout, 0);
	if (!result.readyNodes.isEmpty()) {
		log.trace("Nodes with data ready to send: {}", result.readyNodes);
		// if some partitions are already ready to be sent, the select time would be 0;
		// otherwise if some partition already has some data accumulated but not ready yet,
		// the select time will be the time difference between now and its linger expiry time;
		// otherwise the select time will be the time difference between now and the metadata expiry time;
		pollTimeout = 0;
	}
	//TODO 将发送消息的请求进行排队,真正发送数据是poll方法
	sendProduceRequests(batches, now);
	return pollTimeout;
}

2.4.1 this.client.ready建立链接

org.apache.kafka.clients.NetworkClient#ready

@Override
public boolean ready(Node node, long now) {
	if (node.isEmpty())
		throw new IllegalArgumentException("Cannot connect to empty node " + node);

	//TODO node节点是否准备好发送数据
	if (isReady(node, now))
		return true;

	//TODO 是否可以建立链接
	if (connectionStates.canConnect(node.idString(), now))
		// if we are interested in sending to a node and we don't have a connection to it, initiate one
		//TODO 初始化链接
		initiateConnect(node, now);

	return false;
}

initiateConnect初始化链接

private void initiateConnect(Node node, long now) {
	String nodeConnectionId = node.idString();
	try {
		connectionStates.connecting(nodeConnectionId, now, node.host(), clientDnsLookup);
		InetAddress address = connectionStates.currentAddress(nodeConnectionId);
		log.debug("Initiating connection to node {} using address {}", node, address);
		//TODO 尝试建立链接
		selector.connect(nodeConnectionId,
				new InetSocketAddress(address, node.port()),
				this.socketSendBuffer,
				this.socketReceiveBuffer);
	} catch (IOException e) {
		log.warn("Error connecting to node {}", node, e);
		// Attempt failed, we'll try again after the backoff
		connectionStates.disconnected(nodeConnectionId, now);
		// Notify metadata updater of the connection failure
		metadataUpdater.handleServerDisconnect(now, nodeConnectionId, Optional.empty());
	}
}

selector.connect是调用org.apache.kafka.common.network.Selector#connect建立网络链接(NIO)。

2.4.2 sendProduceRequests请求排队

org.apache.kafka.clients.producer.internals.Sender#sendProduceRequests

/**
 * Transfer the record batches into a list of produce requests on a per-node basis
 */
private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
	for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
		sendProduceRequest(now, entry.getKey(), acks, requestTimeoutMs, entry.getValue());
}

/**
 * Create a produce request from the given record batches
 */
private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
	if (batches.isEmpty())
		return;

	//TODO partition->消息
	Map<TopicPartition, MemoryRecords> produceRecordsByPartition = new HashMap<>(batches.size());
	final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());

	// find the minimum magic version used when creating the record sets
	byte minUsedMagic = apiVersions.maxUsableProduceMagic();
	for (ProducerBatch batch : batches) {
		if (batch.magic() < minUsedMagic)
			minUsedMagic = batch.magic();
	}

	for (ProducerBatch batch : batches) {
		TopicPartition tp = batch.topicPartition;
		MemoryRecords records = batch.records();

		// down convert if necessary to the minimum magic used. In general, there can be a delay between the time
		// that the producer starts building the batch and the time that we send the request, and we may have
		// chosen the message format based on out-dated metadata. In the worst case, we optimistically chose to use
		// the new message format, but found that the broker didn't support it, so we need to down-convert on the
		// client before sending. This is intended to handle edge cases around cluster upgrades where brokers may
		// not all support the same message format version. For example, if a partition migrates from a broker
		// which is supporting the new magic version to one which doesn't, then we will need to convert.
		if (!records.hasMatchingMagic(minUsedMagic))
			records = batch.records().downConvert(minUsedMagic, 0, time).records();
		produceRecordsByPartition.put(tp, records);
		recordsByPartition.put(tp, batch);
	}

	String transactionalId = null;
	if (transactionManager != null && transactionManager.isTransactional()) {
		transactionalId = transactionManager.transactionalId();
	}
	ProduceRequest.Builder requestBuilder = ProduceRequest.Builder.forMagic(minUsedMagic, acks, timeout,
			produceRecordsByPartition, transactionalId);
	RequestCompletionHandler callback = response -> handleProduceResponse(response, recordsByPartition, time.milliseconds());

	String nodeId = Integer.toString(destination);
	//TODO 创建请求
	ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
			requestTimeoutMs, callback);
	//TODO 发送请求进行排队,真正发送数据是poll方法
	client.send(clientRequest, now);
	log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
}

2.4.3 client.send发送准备

org.apache.kafka.clients.NetworkClient#send

调用下面的方法⬇

org.apache.kafka.clients.NetworkClient#doSend(org.apache.kafka.clients.ClientRequest, boolean, long)

调用下面的方法⬇

org.apache.kafka.clients.NetworkClient#doSend(org.apache.kafka.clients.ClientRequest, boolean, long, org.apache.kafka.common.requests.AbstractRequest)

private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now, AbstractRequest request) {
	String destination = clientRequest.destination();
	RequestHeader header = clientRequest.makeHeader(request.version());
	if (log.isDebugEnabled()) {
		log.debug("Sending {} request with header {} and timeout {} to node {}: {}",
			clientRequest.apiKey(), header, clientRequest.requestTimeoutMs(), destination, request);
	}
	Send send = request.toSend(destination, header);
	InFlightRequest inFlightRequest = new InFlightRequest(
			clientRequest,
			header,
			isInternalRequest,
			request,
			send,
			now);
	//TODO 发送了,还没有收到响应的请求(默认最多5个在请求中)
	this.inFlightRequests.add(inFlightRequest);
	selector.send(send);
}

 selector.send调用org.apache.kafka.common.network.Selector#send

调用下面的方法⬇

org.apache.kafka.common.network.KafkaChannel#setSend

    public void setSend(Send send) {
        if (this.send != null)
            throw new IllegalStateException("Attempt to begin a send operation with prior send operation still in progress, connection id is " + id);
        this.send = send;
        //TODO 绑定OP_WRITE事件
        this.transportLayer.addInterestOps(SelectionKey.OP_WRITE);
    }

2.5 client.poll发送数据

org.apache.kafka.clients.NetworkClient#poll

@Override
public List<ClientResponse> poll(long timeout, long now) {
	ensureActive();

	if (!abortedSends.isEmpty()) {
		// If there are aborted sends because of unsupported version exceptions or disconnects,
		// handle them immediately without waiting for Selector#poll.
		List<ClientResponse> responses = new ArrayList<>();
		handleAbortedSends(responses);
		completeResponses(responses);
		return responses;
	}
	//TODO 封装获取元数据的请求(metadataRequest),返回元数据的超时时间
	long metadataTimeout = metadataUpdater.maybeUpdate(now);
	try {
		//TODO 发送请求
		this.selector.poll(Utils.min(timeout, metadataTimeout, defaultRequestTimeoutMs));
	} catch (IOException e) {
		log.error("Unexpected error during I/O", e);
	}

	// process completed actions
	long updatedNow = this.time.milliseconds();
	List<ClientResponse> responses = new ArrayList<>();
	handleCompletedSends(responses, updatedNow);
	//TODO 处理获取到的元数据
	handleCompletedReceives(responses, updatedNow);
	handleDisconnections(responses, updatedNow);
	handleConnections();
	handleInitiateApiVersionRequests(updatedNow);
	handleTimedOutConnections(responses, updatedNow);
	handleTimedOutRequests(responses, updatedNow);
	completeResponses(responses);

	return responses;
}

org.apache.kafka.common.network.Selector#poll

调用下面的方法⬇

org.apache.kafka.common.network.Selector#pollSelectionKeys

void pollSelectionKeys(Set<SelectionKey> selectionKeys,
					   boolean isImmediatelyConnected,
					   long currentTimeNanos) {
			
			//...

			/* if channel is ready write to any sockets that have space in their buffer and for which we have data */

			long nowNanos = channelStartTimeNanos != 0 ? channelStartTimeNanos : currentTimeNanos;
			try {
				//TODO Write 发送数据
				attemptWrite(key, channel, nowNanos);
			} catch (Exception e) {
				sendFailed = true;
				throw e;
			}
			
			//...

}

org.apache.kafka.common.network.Selector#attemptWrite

private void attemptWrite(SelectionKey key, KafkaChannel channel, long nowNanos) throws IOException {
	if (channel.hasSend()
			&& channel.ready()
			&& key.isWritable()
			&& !channel.maybeBeginClientReauthentication(() -> nowNanos)) {
		write(channel);
	}
}

org.apache.kafka.common.network.Selector#write

void write(KafkaChannel channel) throws IOException {
	String nodeId = channel.id();
	//TODO 发送数据
	long bytesSent = channel.write();
	//TODO 移除事件
	Send send = channel.maybeCompleteSend();
	// We may complete the send with bytesSent < 1 if `TransportLayer.hasPendingWrites` was true and `channel.write()`
	// caused the pending writes to be written to the socket channel buffer
	if (bytesSent > 0 || send != null) {
		long currentTimeMs = time.milliseconds();
		if (bytesSent > 0)
			this.sensors.recordBytesSent(nodeId, bytesSent, currentTimeMs);
		if (send != null) {
			//TODO 完成发送
			this.completedSends.add(send);
			this.sensors.recordCompletedSend(nodeId, send.size(), currentTimeMs);
		}
	}
}

2.5.1 maybeCompleteSend移除事件

org.apache.kafka.common.network.KafkaChannel#maybeCompleteSend

public Send maybeCompleteSend() {
	if (send != null && send.completed()) {
		midWrite = false;
		//TODO 发送完成,移除OP_WRITE事件
		transportLayer.removeInterestOps(SelectionKey.OP_WRITE);
		Send result = send;
		send = null;
		return result;
	}
	return null;
}

至此,kafka producer就将消息发送给broker,这里可以看出kafka的数据传输并没有使用netty等网络通信框架,而实自己基于java nio实现的网络通信。

今天是9月1号,学校开学的日子,赶在今天输出一篇文章,跟所有读者一起,好好学习天天向上!!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

pezynd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值