一 序
这两天看这块代码看的头大,比之前预想的要复杂。回头一想这也可理解,因为又要高性能设计,还是线程安全的。很值得看看。接着前面的send()方法,在消息选择完分区之后,就是往暂存到RecordAccumulator队列中。然后主线程就可以从send()方法返回了。其实这时候消息没有真正的发送给kafka,之后业务线程通过KafkaProducer.send()方法不断向RecordAccumulator追加消息,当达到一定的条件,会唤醒Sender线程发送RecordAccumulator中的消息。
RecordAccumulator至少有一个业务线程和一个Sender线程并发操作,必须是线程安全的。下面是send调用RecordAccumulator的代码中介有事务这块还没看。
tp = new TopicPartition(record.topic(), partition);
//设置消息头只读
setReadOnly(record.headers());
Header[] headers = record.headers().toArray();
//估算消息的大小
int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
compressionType, serializedKey, serializedValue, headers);
//检查大小不超过maxRequestSize和totalMemorySize
ensureValidRecordSize(serializedSize);
//获取消息的时间戳
long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
// producer callback will make sure to call both 'callback' and interceptor callback
//构建一个InterceptorCallback回调对象
Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);
//检查是否需要发送AddPartitionsToTxnRequest,(事务这块还没看)
if (transactionManager != null && transactionManager.isTransactional())
transactionManager.maybeAddPartitionToTransaction(tp);
//队列容器中执行添加数据操作( 重要,跟性能有关),供Sender线程去读取数据,然后发给broker
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
RecordAccumulator中有一个以TopicPartition为key的ConcurrentMap,value是deque<ProducerBatch>.就是下面这行。跟书上老版本的<Recordbatch>不一样了。每个ProducerBatch 有MemoryRecordsBuilder对象的引用,MemoryRecordsBuilder有MemoryRecords的引用。MemoryRecords是消息最终存放的地方。
ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;
下面我们逐个来看。
二 MemoryRecords
这是从内存中读取records. 这个record包里类真多有30多个。一下也看不过来,就从MemoryRecords开始看相关的吧。
public class MemoryRecords extends AbstractRecords {
MemoryRecords 继承了AbstractRecords,而AbstractRecords 实现了Records接口。
public abstract class AbstractRecords implements Records {
//迭代器
private final Iterable<Record> records = this::recordsIterator;
...
private Iterator<Record> recordsIterator() {
// 继承AbstractIterator<Record>,实现makeNext方法
return new AbstractIterator<Record>() {
//实现遍历RecordBatch
private final Iterator<? extends RecordBatch> batches = batches().iterator();
//遍历RecordBatch的Record
private Iterator<Record> records;
@Override
protected Record makeNext() {
// 首先检查records是否还有数据
if (records != null && records.hasNext())
// 如果有,直接返回
return records.next();
// 如果records没有数据,再检查batche是否有数据
if (batches.hasNext()) {
// 如果batches有数据,则更新records为当前batch的数据
records = batches.next().iterator();
// 递归调用
return makeNext();
}
//上面都没有数据,表示已经读完了,调用alldone
return allDone();
}
};
}
MemoryRecords表示从内存中读取数据
public class MemoryRecords extends AbstractRecords {
private static final Logger log = LoggerFactory.getLogger(MemoryRecords.class);
public static final MemoryRecords EMPTY = MemoryRecords.readableRecords(ByteBuffer.allocate(0));
//用于保存消息数据的byteBuffer
private final ByteBuffer buffer;
private final Iterable<MutableRecordBatch> batches = this::batchIterator;
private int validBytes = -1;
// Construct a writable memory records
private MemoryRecords(ByteBuffer buffer) {
Objects.requireNonNull(buffer, "buffer should not be null");
this.buffer = buffer;
}
重要属性:
- buffer:用于保存消息数据的Java NIO ByteBuffer。
单书上老版本的compressor这些在2.1版本都没看到。
其他的方法暂不展开,应该是跟consumer有关的。
三 MemoryRecordsBuilder
MemoryRecordsBuilder用于给记录重新分派offset.这个会写到消息中。作用就是将每条消息追加到ByteBuffer中。
这个类方法比较多,下面会先看append()的方法。
四 ProducerBatch
为了梳理这块的关系,还是画个图有助于理解。
4.1ProduceRequestResult
public final class ProduceRequestResult {
//用CountDownLatch实现了类似于future的功能
private final CountDownLatch latch = new CountDownLatch(1);
private final TopicPartition topicPartition;
//服务端为此batch中的第一条消息分配的offset.可以配合消息的offset维护消息顺序
private volatile Long baseOffset = null;
private volatile long logAppendTime = RecordBatch.NO_TIMESTAMP;
private volatile RuntimeException error;
ProduceRequestResult并未实现java.util.concurrent.Future接口,但是通过包含一个count值为1的CountDownLatch对象,实现了,类似Future的功能(这里不展开CountDownLatch。Java并发的基础知识,可以简单理解为倒数计数器,Future也是JUC的核心知识点)。
当ProducerBatch中全部的消息被正常响应,会调用ProduceRequestResult.done()这个方法,将produceFuture标记为完成并通过ProduceRequestResult.done()方法,将producerFuture标记为完成并通过ProduceRequestResult.error字段区分“异常完成”还是“正常完成”,之后调用CountDownLatch对象的countDown()方法。此时,会唤醒阻塞在CountDownLatch对象的await()方法的线程。
/**
* Mark this request as complete and unblock any threads waiting on its completion.
* 标识请求已经处理完成。唤醒await.
*/
public void done() {
if (baseOffset == null)
throw new IllegalStateException("The method `set` must be invoked before this method.");
this.latch.countDown();
}
/**
* Await the completion of this request
*/
public void await() throws InterruptedException {
latch.await();
}
4.2 Trunk
/**
* A callback and the associated FutureRecordMetadata argument to pass to it.
*/
final private static class Thunk {
final Callback callback;
final FutureRecordMetadata future;
Thunk(Callback callback, FutureRecordMetadata future) {
this.callback = callback;
this.future = future;
}
}
trunk是ProducerBatch的内部类,它可以理解为消息的回调对象队列。
它的Callback对应send()方法的第二个参数Callback。另一个字段future是FutureRecordMetadata类型。
/**
* The future result of a record send
*/
public final class FutureRecordMetadata implements Future<RecordMetadata> {
//指向了对应消息所在的ProducerBatch的produceFuture字段
private final ProduceRequestResult result;
//记录了对应消息在ProducerBatch中的偏移量
private final long relativeOffset;
private final long createTimestamp;
private final Long checksum;
private final int serializedKeySize;
private final int serializedValueSize;
private volatile FutureRecordMetadata nextRecordMetadata = null;
FutureRecordMetadata实现了java.util.concurrent.Future接口,但是实现都由引用的ProduceRequestResult对应的方法实现。
当生产者已经收到某消息的响应时,FutureRecordMetadata.get()方法会返回RecordMetadata对象,包含消息所在Partition中的offset等其他元数据,供用户自定义的Callback使用。
@Override
public RecordMetadata get() throws InterruptedException, ExecutionException {
//阻塞等待
this.result.await();
if (nextRecordMetadata != null)
return nextRecordMetadata.get();
return valueOrError();
}
当ProducerBatch成功收到正常响应,超时,关闭生产者时,会调用ProducerBatch的done()方法。调用场景如下:
这里很重要。我们前面说过了await就是需要唤醒的,不然就会一直处于waiting中。网上有人贴了篇文章,就跟这里有关。
看下done方法:
public boolean done(long baseOffset, long logAppendTime, RuntimeException exception) {
final FinalState tryFinalState = (exception == null) ? FinalState.SUCCEEDED : FinalState.FAILED;
if (tryFinalState == FinalState.SUCCEEDED) {
log.trace("Successfully produced messages to {} with base offset {}.", topicPartition, baseOffset);
} else {
log.trace("Failed to produce messages to {} with base offset {}.", topicPartition, baseOffset, exception);
}
if (this.finalState.compareAndSet(null, tryFinalState)) {
completeFutureAndFireCallbacks(baseOffset, logAppendTime, exception);
return true;
}
if (this.finalState.get() != FinalState.SUCCEEDED) {
if (tryFinalState == FinalState.SUCCEEDED) {
// Log if a previously unsuccessful batch succeeded later on.
log.debug("ProduceResponse returned {} for {} after batch with base offset {} had already been {}.",
tryFinalState, topicPartition, baseOffset, this.finalState.get());
} else {
// FAILED --> FAILED and ABORTED --> FAILED transitions are ignored.
log.debug("Ignored state transition {} -> {} for {} batch with base offset {}",
this.finalState.get(), tryFinalState, topicPartition, baseOffset);
}
} else {
// A SUCCESSFUL batch must not attempt another state change.
throw new IllegalStateException("A " + this.finalState.get() + " batch must not attempt another state change to " + tryFinalState);
}
return false;
}
核心就是completeFutureAndFireCallbacks。再看看
private void completeFutureAndFireCallbacks(long baseOffset, long logAppendTime, RuntimeException exception) {
// Set the future before invoking the callbacks as we rely on its state for the `onCompletion` call
produceFuture.set(baseOffset, logAppendTime, exception);
// execute callbacks (循环执行内个消息的callback)
for (Thunk thunk : thunks) {
try {
if (exception == null) {//正常处理完成
//获取服务端返回信息
RecordMetadata metadata = thunk.future.value();
if (thunk.callback != null)
//调用消息自定义的callback
thunk.callback.onCompletion(metadata, null);
} else {//异常情况
if (thunk.callback != null)//返回异常
thunk.callback.onCompletion(null, exception);
}
} catch (Exception e) {
log.error("Error executing user-provided callback on message for topic-partition '{}'", topicPartition, e);
}
}
//标识整个batch都已经处理完成。
produceFuture.done();
}
这里的done就是上面提到的。
/**
* Mark this request as complete and unblock any threads waiting on its completion.
* 标识请求已经处理完成。唤醒await.
*/
public void done() {
if (baseOffset == null)
throw new IllegalStateException("The method `set` must be invoked before this method.");
this.latch.countDown();
}
花了,限于篇幅,先整理到这里,下篇继续。