ç
KafkaProduce向消息缓冲区发送消息的大致流程
- 如果有自定义拦截器会回调自定义的拦截器
- 同步阻塞获取topic的元数据
- 序列化key和value,将信息转化为byte数组的格式
- 检查消息是否超出请求最大请求大小以及内存缓冲最大的大小
- 将消息加入到缓冲区中
- 如果缓冲区的batch被填满了,则创建一个新的batch,这时会唤醒Sender线程,让他来工作,发送batch
public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
// intercept the record, which can be potentially modified; this method does not throw exceptions
ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
return doSend(interceptedRecord, callback);
}
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// first make sure the metadata for the topic is available
long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);
byte[] serializedKey;
try {
serializedKey = keySerializer.serialize(record.topic(), record.key());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
" specified in key.serializer");
}
byte[] serializedValue;
try {
serializedValue = valueSerializer.serialize(record.topic(), record.value());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
" specified in value.serializer");
}
int partition = partition(record, serializedKey, serializedValue, metadata.fetch());
int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
ensureValidRecordSize(serializedSize);
tp = new TopicPartition(record.topic(), partition);
long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
// producer callback will make sure to call both 'callback' and interceptor callback
Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
return result.future;
// handling exceptions and record the errors;
// for API exceptions return them in the future,
// for other exceptions throw directly
} catch (ApiException e) {
log.debug("Exception occurred during message send:", e);
if (callback != null)
callback.onCompletion(null, e);
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
return new FutureFailure(e);
} catch (InterruptedException e) {
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw new InterruptException(e);
} catch (BufferExhaustedException e) {
this.errors.record();
this.metrics.sensor("buffer-exhausted-records").record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
} catch (KafkaException e) {
this.errors.record();
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
} catch (Exception e) {
// we notify interceptor about all exceptions, since onSend is called before anything else in this method
if (this.interceptors != null)
this.interceptors.onSendError(record, tp, e);
throw e;
}
}
如何保证topic的元数据是可以使用的
在客户端的方法尝试等待获取topic元数据的过程中,核心的逻辑,就是说先必须唤醒Sender线程,然后呢就会通过一个while循环,直接去wait释放锁,尝试最多就是等待默认的60s的时间
topic元数据的拉取,是走的是异步的方式的,但是对异步的结果进行同步的阻塞的等待,他其实唤醒Sender线程,就是本质上就是在让那个Sender线程去从broker拉取对应的topic的元数据
如果拉取成功了,那么version版本号,集群元数据的版本号一定会累加,所以只要判断version版本号还没有累加,就说明此时Sender线程还没有成功的拉取元数据,此时就是在主线程里,就是要wait阻塞等待最多60s即可
接下来肯定是分为两种情况:
(1)Sender线程成功的在60s内把topic元数据加载到了,然后缓存到了Metadata里去,更新了version版本号,而且此时一定会尝试把wait阻塞等待的主线程给唤醒,让主线程直接返回阻塞等待的时长
(2)如果wait(60s)一直超时了,你的Sender线程都没加载成功元数据,此时人家在60s后自动醒来了,此时会直接超时抛异常
注意的是这个版本更新其实未必是对应的topic被拉取回来,也有可能是其他线程出发了更的操作,在获取版本之后还需要判断所需要的topic的元信息是否已经被更新,如果更新则成功,不存在则继续触发更新
KafkaProducer写入缓冲区的流程
KafkaProducer设计的理念就是多线程并发安全的,可以让多个线程并发的来调用KafkaProducer还保证数据不会错乱的
他会从内存缓冲区里获取一个分区对应的Deque,这个Deque里是一个队列,放了很多的Batch,就是这个分区对应的多个batch, 适合读多写少的场景
this.batches = new CopyOnWriteMap<>();
private final ConcurrentMap<TopicPartition, Deque<RecordBatch>> batches;
大量的主要还是在读取,就是去大量的从map里读取一个分区对应的Deque,最后高并发频繁更新的就是分区对应的那个Deque,读的时候基于快照来读即可,所以这种场景非常适合使用CopyOnWrite系列的数据结构
如果说此时还没有创建对应的batch,此时会导致放入Deque会失败
他会基于BufferPool给这个batch分配一块内存出来,之所以说是Pool,就是因为这个batch代表的内存空间是可以复用的,用完一块内存之后会放回去下次给别人来使用,复用内存,避免了频繁的使用内存,丢弃对象,垃圾回收
已经可以往Deque队列里写入消息了,已经有一个新分配的batch了(对应了BufferPool分配的一块内存空间)
public RecordAppendResult append(TopicPartition tp,
long timestamp,
byte[] key,
byte[] value,
Callback callback,
long maxTimeToBlock) throws InterruptedException {
// We keep track of the number of appending thread to make sure we do not miss batches in
// abortIncompleteBatches().
appendsInProgress.incrementAndGet();
try {
// check if we have an in-progress batch
Deque<RecordBatch> dq = getOrCreateDeque(tp);
synchronized (dq) {
if (closed)
throw new IllegalStateException("Cannot send after the producer is closed.");
RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
if (appendResult != null)
return appendResult;
}
// we don't have an in-progress record batch try to allocate a new batch
int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));
log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
ByteBuffer buffer = free.allocate(size, maxTimeToBlock);
synchronized (dq) {
// Need to check if producer is closed again after grabbing the dequeue lock.
if (closed)
throw new IllegalStateException("Cannot send after the producer is closed.");
RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
if (appendResult != null) {
// Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
free.deallocate(buffer);
return appendResult;
}
MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, callback, time.milliseconds()));
dq.addLast(batch);
incomplete.add(batch);
return new RecordAppendResult(future, dq.size() > 1 || batch.records.isFull(), true);
}
} finally {
appendsInProgress.decrementAndGet();
}
}
一条消息如何按照二进制协议写入batch的bytebuffer的
offset | size | crc | magic | attibutes | timestamp | key size | key | value size | value
不断申请内存空间导致内存消耗光了之后会做什么
会阻塞一段时间,maxBlockMs - 可能获取元数据耗费的时间,如果还是不行的话,就会抛异常了,但是这段时间里有可用内存腾出来了(有一些batch被发送出去了,获取到了响应,此时就可以释放那个batch底层对应的ByteBuffer,就会被放回到BufferPool里面去,此时就可以唤醒阻塞的线程,再次申请一个新的ByteBuffer构造一个Batch)