版本说明
基于Kafka 0.10的版本源码总结一下发送消息的大致流程
源码分析
@Override
public Future<RecordMetadata> send(ProducerRecord<K, V> record) {
//调用重载的send方法
return send(record, null);
}
@Override
public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
// intercept the record, which can be potentially modified; this method does not throw exceptions
//1.根据用户自定义的拦截器,封装消息,在消息发送前后进行拦截通知
ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
//调用核心的doSend方法,核心逻辑都在此方法内
return doSend(interceptedRecord, callback);
}
/**
* Implementation of asynchronously send a record to a topic. Equivalent to <code>send(record, null)</code>.
* See {@link #send(ProducerRecord, Callback)} for details.
*/
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// first make sure the metadata for the topic is available
//2.根据Topic 阻塞的获取元数据,并缓存。在这个过程中会获取topic的partition, 以及partition的leader和ISR等元数据
long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);
byte[] serializedKey;
try {
//3.通过序列化器对key和value进行序列化为byte数组, 以发送到broker
serializedKey = keySerializer.serialize(record.topic(), record.key());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
" specified in key.serializer");
}
byte[] serializedValue;
try {
serializedValue = valueSerializer.serialize(record.topic(), record.value());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
" specified in value.serializer");
}
//4. 通过分区器确认消息的分区,分区的规则为:
// 若用户指定了分区,则直接定位为指定分区
// 没有指定分区,则根据key进行hash对分区数取模定位所在分区
// 若没有key值,则通过轮询的方式指定一个分区,达到均衡的效果
int partition = partition(record, serializedKey, serializedValue, metadata.fetch());
int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
//5.检查序列化后的消息大小,是否超过单个request和缓冲区的限制
ensureValidRecordSize(serializedSize);
tp = new TopicPartition(record.topic(), partition);
long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
// producer callback will make sure to call both 'callback' and interceptor callback
Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
//6.把序列化后的消息写入缓冲区
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
//7.达到发送条件时唤醒发送线程
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
//8.返回异步发送的结果
return result.future;
// handling exceptions and record the errors;
// for API exceptions return them in the future,
// for other exceptions throw directly
} catch (ApiException e) {
log.debug("Exception occurred during message send:", e);
if (callback != null)
callback.onCompletion(null, e);
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
return new FutureFailure(e);
} catch (InterruptedException e) {
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw new InterruptException(e);
} catch (BufferExhaustedException e) {
this.errors.record();
this.metrics.sensor("buffer-exhausted-records").record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
} catch (KafkaException e) {
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
} catch (Exception e) {
// we notify interceptor about all exceptions, since onSend is called before anything else in this method
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
}
}
学习总结
从kafka发送端源码的分析可以得出, 一个高吞吐、高并发的消息系统发送客户端的设计的核心点有以下几个:
- 元数据按需加载/懒加载。 根据发送的topic拉取对应的元数据,而不是拉取整个集群所有topic的元数据。同时在初始化kafkaProducer时不是马上获取元数据,而是在发送时再去获取、更新元数据
- 可扩展。 通过拦截器/回调函数的方式为客户端留下扩展点
- 序列化器。考虑到消息系统的key/value有可能是各种数据类型,在网络通信时应通过序列化的方式将数据转换为通用的byte数组以便接收和处理。这也是大部分分布式系统通信时的标配,否则系统将会为处理各种数据类型而焦头烂额
- 分区器。通过独立的分区器进行消息分区的选择和路由,可以有用户扩展的空间
- 缓冲区。通过缓冲区缓存将要发送的消息缓存起来,等满足条件时一次性批量发送;减少网络的IO,将网络的带宽最大的用于消息数据传输,过多的网络IO将产生很多无效的信息头、校验信息等数据。
- 检查核心数据的大小。避免内存的溢出
- 使用异步请求发送消息。消息先进入缓冲区,触发发送条件时再唤醒发送线程(平时处于挂起状态,不占用CPU),提高CPU利用率,提升系统性能。