Kafka源码篇 No.1-Producer发送消息的流程设计

第1章 简介

Kafka为什么能在全世界被广泛使用?为什么吞吐量能这么大?他既能在OLTP场景系统中做消息队列,又能在OLAP系统中做大数据实时消息流的暂存。这么强大的一个框架,源码必须得学习!

笔者Kafka源码文章使用Kafka版本v2.7.0进行编写,不妥之处欢迎留言指点,感激不尽!

Kafka源码是Java和Scala语言编写,生产者部分主要是Java语言。下面我们先看看整体的流程,再一步一步剖析细节。

 

第2章 源码结构

Kafka源码结构如下:生产者和消费者主要再clients下,而服务端代码在core下。

当然各个框架源码都会提供一套example,初识源码时,最好是从example入手。

 

第3章 Producer整体流程图

笔者从example中的producer入手,以一次消息发送为例,梳理了以下生产者源码中各方法的调用流程,后续详细步骤也以这个流程+关键代码进行说明。

 

第4章 详细步骤

4.1 example

example中producer代码,构造方法中主要是实例化Producer,这里进去我们就可以看到当我们new一个Producer的时候到底kafka做了什么。

public Producer(final String topic,
                final Boolean isAsync,
                final String transactionalId,
                final boolean enableIdempotency,
                final int numRecords,
                final int transactionTimeoutMs,
                final CountDownLatch latch) {
    // 配置各种属性
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);
    props.put(ProducerConfig.CLIENT_ID_CONFIG, "DemoProducer");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class.getName());
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    if (transactionTimeoutMs > 0) {
        props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, transactionTimeoutMs);
    }
    if (transactionalId != null) {
        props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, transactionalId);
    }
    props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, enableIdempotency);
    //TODO 实例化Producer,实例化RecordAccumulator、启动sender线程(真正发送消息的线程)、获取metadata元数据
    producer = new KafkaProducer<>(props);
    this.topic = topic;
    this.isAsync = isAsync;
    this.numRecords = numRecords;
    this.latch = latch;
}

run方法调用producer中的send方法发送消息

@Override
public void run() {
    int messageKey = 0;
    int recordsSent = 0;
    while (recordsSent < numRecords) {
        String messageStr = "Message_" + messageKey;
        long startTime = System.currentTimeMillis();
        if (isAsync) { // Send asynchronously
            //TODO 异步发送消息 DemoCallBack回调
            producer.send(new ProducerRecord<>(topic,
                messageKey,
                messageStr), new DemoCallBack(startTime, messageKey, messageStr));
        } else { // Send synchronously
            //TODO 同步发送消息
            try {
                producer.send(new ProducerRecord<>(topic,
                    messageKey,
                    messageStr)).get();
                System.out.println("Sent message: (" + messageKey + ", " + messageStr + ")");
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        messageKey += 2;
        recordsSent += 1;
    }
    System.out.println("Producer sent " + numRecords + " records successfully");
    latch.countDown();
}

4.2 实例化KafkaProducer

4.2.1 实例化RecordAccumulator

KafkaProducer的构造方法有点长,这里只截取关键代码。

在这里实例化了一个BufferPool,Kafka为了更好的管理内存,自己实现了内存池,发送数据时,会从这个BufferPool中申请内存。

//TODO 实例化RecordAccumulator(封装发送的消息用),实例化BufferPool内存池
this.accumulator = new RecordAccumulator(logContext,
		config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
		this.compressionType,
		lingerMs(config),
		retryBackoffMs,
		deliveryTimeoutMs,
		metrics,
		PRODUCER_METRIC_GROUP_NAME,
		time,
		apiVersions,
		transactionManager,
		new BufferPool(this.totalMemorySize, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG), metrics, time, PRODUCER_METRIC_GROUP_NAME));

4.2.3 实例化元数据信息

//TODO 实例化metadata元数据
if (metadata != null) {
	this.metadata = metadata;
} else {
	this.metadata = new ProducerMetadata(retryBackoffMs,
			config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
			config.getLong(ProducerConfig.METADATA_MAX_IDLE_CONFIG),
			logContext,
			clusterResourceListeners,
			Time.SYSTEM);
	this.metadata.bootstrap(addresses);
}

4.2.4 实例化sender线程并启动

//TODO 实例化sender线程,并启动
this.sender = newSender(logContext, kafkaClient, this.metadata);
//线程名
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
//通过KafkaThread启动sender线程
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();

4.2.5 sender线程

sender线程是producer真正与服务端交互的线程,是一个实现了Runnable接口的类,我们直接看她的Run方法。

run方法的main loop是执行的runOnce();

@Override
public void run() {
	log.debug("Starting Kafka producer I/O thread.");

	// main loop, runs until close is called
	while (running) {
		try {
			runOnce();
		} catch (Exception e) {
			log.error("Uncaught error in kafka producer I/O thread: ", e);
		}
	}
	......
}

runOnce()里重点执行的是下面的方法

//TODO 准备要发送的数据,并建立与Broker的连接
long pollTimeout = sendProducerData(currentTimeMs);
//TODO 拉取元数据、发送数据
client.poll(pollTimeout, currentTimeMs);

client是一个接口(KafkaClient),这里的poll调用的是Kafka client公用的网络请求NetworkClient执行的消息发送。

 

4.3 run方法发送消息

run方法调用producer的send方法后还是执行发送消息的逻辑。

内部再次调用dosend方法

@Override
public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
	// intercept the record, which can be potentially modified; this method does not throw exceptions
	ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
	return doSend(interceptedRecord, callback);
}

4.3.1 等待拉取元数据

doSend方法里我们截取关键代码,首先是等待sender线程拉取元数据,指的是等待4.2.5中sender线程poll方法拉取元数据,这里具体的拉取细节,后面单独写一篇文章。

ClusterAndWaitTime clusterAndWaitTime;
try {
	//TODO 拉取元数据信息(同步、wait)
	clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
} catch (KafkaException e) {
	if (metadata.isClosed())
		throw new KafkaException("Producer closed while send in progress", e);
	throw e;
}

4.3.2 序列化

//TODO 对消息的Key和Value进行序列化操作
byte[] serializedKey;
try {
	serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
} catch (ClassCastException cce) {
	throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
			" to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
			" specified in key.serializer", cce);
}
byte[] serializedValue;
try {
	serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
} catch (ClassCastException cce) {
	throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
			" to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
			" specified in value.serializer", cce);
}

4.3.3 分区分配

//TODO 根据元数据信息,选择消息应该发往的分区;知识点:生产者分区分配规则!
int partition = partition(record, serializedKey, serializedValue, cluster);
tp = new TopicPartition(record.topic(), partition);

4.3.4 校验发送的消息大小

ensureValidRecordSize内部如果校验不符合,会抛出异常。

//TODO 校验消息大小,默认1M
int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
		compressionType, serializedKey, serializedValue, headers);
ensureValidRecordSize(serializedSize);
long timestamp = record.timestamp() == null ? nowMs : record.timestamp();
if (log.isTraceEnabled()) {
	log.trace("Attempting to append record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
}

4.3.5 绑定回调函数

这里就是绑定异步发送消息的回调,4.1中的DemoCallBack

//TODO 为消息绑定回调函数
Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

4.3.6 封装消息

这里是kafka生产者端发送消息能实现高吞吐的重点,后面单独起一篇文章详细介绍。

//TODO 把消息封装入RecordAccumulator,内部是多个Deque<ProducerBatch>队列,队列中再分批次;内部定义Bufferpool管理内存
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
		serializedValue, headers, interceptCallback, remainingWaitMs, true, nowMs);

if (result.abortForNewBatch) {
	int prevPartition = partition;
	partitioner.onNewBatch(record.topic(), cluster, prevPartition);
	partition = partition(record, serializedKey, serializedValue, cluster);
	tp = new TopicPartition(record.topic(), partition);
	if (log.isTraceEnabled()) {
		log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
	}
	// producer callback will make sure to call both 'callback' and interceptor callback
	interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

	result = accumulator.append(tp, timestamp, serializedKey,
		serializedValue, headers, interceptCallback, remainingWaitMs, false, nowMs);
}

4.3.7 唤醒sender线程

Producer在消息封装完成后,直接唤醒sender线程,由sender线程去发送数据,也就是4.2.5中的代码。

//TODO 批次batch数据以填满 or 新建了batch,则唤醒sender线程发送数据
if (result.batchIsFull || result.newBatchCreated) {
	log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
	//TODO 唤醒sender线程,发送数据
	this.sender.wakeup();
}

 

总结,以上就是Kafka Producer生产者发送消息时的流程和逻辑,后续文章咱们再由浅入深,详细的介绍每一个点。当然除了Producer,Kafka服务端和Consumer的源码走读也会陆续推出,欢迎您的关注!

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

pezynd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值