Sender thread
demo中的线程模型:
接着kafkaProducer的doSend()方法,当RecordAccumulator添加成功,这个时候,会触发sender线程的启动,它的条件是当deque满了,或者创建了新的batch的时候。
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
if (result.batchIsFull || result.newBatchCreated) { //dequeue full or create queue
this.sender.wakeup(); //new sender thread and Main thread ;now, main thread is over
}
Sender线程作用:向kafka集群发送请求,那么需要做的事情:1.得到消息;2.连接集群。3.掌握集群的元数。
Sender线程的启动:位于kafkaProducer的构造方法中,它的running值,在Sender类的构造方法中,默认为true。
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start(); //启动
发送流程:
1.根据RecordAccumulator缓存情况,筛选出可以向哪些node节点发送消息。read()方法得到
2.根据生产者和节点的链接情况(由NetworkClient管理),过滤Node节点
3.生成相应的请求,每个Node节点,只生成一个请求。
4.调用NetWorkClient 将请求发送出去。
一,Sender
如果没有添加事务,那么就直接走sendProducerData()方法
void run(long now) {
if (transactionManager != null) {
...
}
long pollTimeout = sendProducerData(now); //【入】
client.poll(pollTimeout, now);
}
1.sendProducerData()
private long sendProducerData(long now) {
Cluster cluster = metadata.fetch(); //获得 集群元数据
// 1.从消息累计器中获取 集群节点
RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
//如果有分区的leader 未知,强制 更新
if (!result.unknownLeaderTopics.isEmpty()) {
for (String topic : result.unknownLeaderTopics)
this.metadata.add(topic);
this.metadata.requestUpdate();
}
//筛选 node
Iterator<Node> iter = result.readyNodes.iterator();
long notReadyTimeout = Long.MAX_VALUE;
while (iter.hasNext()) {
Node node = iter.next();
if (!this.client.ready(node, now)) { //判断client中有没有
iter.remove();
notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));
}
}
// 2. 通过readyNodes 获得Batch
Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
if (guaranteeMessageOrder) {
for (List<ProducerBatch> batchList : batches.values()) {
for (ProducerBatch batch : batchList)
this.accumulator.mutePartition(batch.topicPartition);
}
}
List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);
sensors.updateProduceRequestMetrics(batches);
long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
//3. 创建request
sendProduceRequests(batches, now); //【入】
return pollTimeout;
}
1.1 RecordAccumulator:read()
获得recordAccumulator中 准备好的 分区,并且要找到该分区的 leader node.
public ReadyCheckResult ready(Cluster cluster, long nowMs) {
Set<Node> readyNodes = new HashSet<>(); //用来记录可以向哪些节点发送消息
//记录下次需要调用read() 方法的时间间隔
long nextReadyCheckDelayMs = Long.MAX_VALUE;
Set<String> unknownLeaderTopics = new HashSet<>();
boolean exhausted = this.free.queued() > 0;
//遍历 batches集合
for (Map.Entry<TopicPartition, Deque<ProducerBatch>> entry : this.batches.entrySet()) {
TopicPartition part = entry.getKey();
Deque<ProducerBatch> deque = entry.getValue();
Node leader = cluster.leaderFor(part); //获得分区的 leader节点
synchronized (deque) {
if (leader == null && !deque.isEmpty()) {
unknownLeaderTopics.add(part.topic()); //放入集合中
} else if (!readyNodes.contains(leader) && !muted.contains(part)) {
ProducerBatch batch = deque.peekFirst(); //弹出第一个
if (batch != null) { //五个条件
long waitedTimeMs = batch.waitedTimeMs(nowMs);
boolean backingOff = batch.attempts() > 0 && waitedTimeMs < retryBackoffMs;
long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
boolean full = deque.size() > 1 || batch.isFull();
boolean expired = waitedTimeMs >= timeToWaitMs;
boolean sendable = full || expired || exhausted || closed || flushInProgress();
if (sendable && !backingOff) { //准备好了
readyNodes.add(leader);
} else { //失败
nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
}
}
}
}
}
return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeaderTopics);
}
注意,readNodes集合保存的是leader节点,和batch没有关系!,所有引出了darin()方法,通过node找到batch。
1.2.RecordAccumulator:drain()
根据node ,返回map<Integer ,List<xxBatch>> ,key是node id。
public Map<Integer, List<ProducerBatch>> drain(...) {
Map<Integer, List<ProducerBatch>> batches = new HashMap<>();
for (Node node : nodes) { //遍历
int size = 0;
//获得当前 node上的分区集合
List<PartitionInfo> parts = cluster.partitionsForNode(node.id());
List<ProducerBatch> ready = new ArrayList<>();
//drainIndex 记录发送停止时的未知, 此处计算开始位置
int start = drainIndex = drainIndex % parts.size();
do {
//获取分区详细信息
PartitionInfo part = parts.get(drainIndex);
TopicPartition tp = new TopicPartition(part.topic(), part.partition());
if (!muted.contains(tp)) { //如果累计器中 缓存了该分区
//获得该 分区,所要发送的 batch list
Deque<ProducerBatch> deque = getDeque(tp);
if (deque != null) {
synchronized (deque) { // 获得第一个,并没有remove
ProducerBatch first = deque.peekFirst();
if (first != null) {
boolean backoff = first.attempts() > 0 && first.waitedTimeMs(now) < retryBackoffMs;
if (!backoff) {
if (size + first.estimatedSizeInBytes() > maxSize && !ready.isEmpty()) {
//数据量 已满, 结束循环。一般是一个请求的大小。
break;
} else {
ProducerIdAndEpoch producerIdAndEpoch = null;
boolean isTransactional = false;
// 获取一个,放到ready中。
ProducerBatch batch = deque.pollFirst();
batch.close();
size += batch.records().sizeInBytes();
ready.add(batch); //保存
batch.drained(now);
}
...
}
this.drainIndex = (this.drainIndex + 1) % parts.size();
} while (start != drainIndex);
batches.put(node.id(), ready);
}
return batches;
}
1.3.创建request体
遍历每个Node的batch
private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
sendProduceRequest(now, entry.getKey(), acks, requestTimeout, entry.getValue());
}
sendProduceRequest() 传入batch列表,一个node 创建一个request体,
private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
//此处分为MemoryRecord(封装的ByteBuffer) , batch两个容器
Map<TopicPartition, MemoryRecords> produceRecordsByPartition = new HashMap<>(batches.size());
final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());
byte minUsedMagic = apiVersions.maxUsableProduceMagic();
for (ProducerBatch batch : batches) { //遍历batches ,获得最小版本的magic
if (batch.magic() < minUsedMagic)
minUsedMagic = batch.magic();
}
//1. 遍历 batch, 保存到上方两个容器中
for (ProducerBatch batch : batches) {
TopicPartition tp = batch.topicPartition; //获得分区
MemoryRecords records = batch.records(); //获得ByteBuffer的包装类
if (!records.hasMatchingMagic(minUsedMagic))
records = batch.records().downConvert(minUsedMagic, 0, time).records(); //转型
//保存
produceRecordsByPartition.put(tp, records);
recordsByPartition.put(tp, batch);
}
// request builder创建 也就是一个包装类
ProduceRequest.Builder requestBuilder = ProduceRequest.Builder.forMagic(minUsedMagic, acks, timeout,
produceRecordsByPartition, transactionalId);
//回调
RequestCompletionHandler callback = new RequestCompletionHandler() {
public void onComplete(ClientResponse response) {
handleProduceResponse(response, recordsByPartition, time.milliseconds());
}
};
String nodeId = Integer.toString(destination);
// 2.request创建 【入】 ★
ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0, callback);
//3. 发送请求 ★
client.send(clientRequest, now);
}
遍历传入的batch,将memoryRecord提取出来,注意需要magic的区分,然后构建request,最后发送request。
1.3.1 ProducerRequest.builder(内部类)
将MemoryRecords集合,封装到了Builder中
public static class Builder extends AbstractRequest.Builder<ProduceRequest> {
private final short acks;
private final int timeout;
private final Map<TopicPartition, MemoryRecords> partitionRecords; //重点
private final String transactionalId;
...
}
1.3.2 ClientRequest
public final class ClientRequest {
private final String destination; //节点node Id
private final AbstractRequest.Builder<?> requestBuilder;
private final int correlationId;
private final String clientId;
private final long createdTimeMs;
private final boolean expectResponse;
private final RequestCompletionHandler callback; //回调对象
...
}
最后要发送的信息,保存在了requestBuilder中,进行发送,
总结:一次send()操作,将能够发送的所有partition对应的首个batch,提出来,然后将memoryRecord存放到一起,最后打包成一个request包发送。
二,NetworkClient
接着sender类,我们来到了client.send()方法,自然就跳转道理NetworkClient类中了
2.1 send()
public void send(ClientRequest request, long now) {
doSend(request, false, now);
}
在networkClient中,从ClientRequest中,把builder取出来
private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now) {
String nodeId = clientRequest.destination();
if (!isInternalRequest) { //检测是否能够向指定node发送请求
if (!canSendRequest(nodeId))
throw new IllegalStateException("Attempt to send a request to node " + nodeId + " which is not ready.");
}
AbstractRequest.Builder<?> builder = clientRequest.requestBuilder();
try {
NodeApiVersions versionInfo = apiVersions.get(nodeId);
short version;
if (versionInfo == null) {
version = builder.latestAllowedVersion();
} else {
version = versionInfo.latestUsableVersion(clientRequest.apiKey(), builder.oldestAllowedVersion(),
builder.latestAllowedVersion());
}
doSend(clientRequest, isInternalRequest, now, builder.build(version)); //【入】
}
}
再次进入doSend()
这里组要是构建 inFlightReuqest ,将其放到容器中
private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now, AbstractRequest request) {
String nodeId = clientRequest.destination();
RequestHeader header = clientRequest.makeHeader(request.version());
Send send = request.toSend(nodeId, header);
//上面的字段,都是给这一步,做准备!组装 inFlightRequest
InFlightRequest inFlightRequest = new InFlightRequest(...);
this.inFlightRequests.add(inFlightRequest);
selector.send(inFlightRequest.send); //【入】
}
添加完毕,我们就需要跳转道 selector,选择器了!
补充:Send,实现类ByteBufferSend,其实就是ByteBuffer的封装,+发送目的
public class ByteBufferSend implements Send {
private final String destination;
private final int size;
protected final ByteBuffer[] buffers;
private int remaining;
private boolean pending = false;
...
}
2.2 Selector
这里的selector并不是 nio中的selector,它包装了nio的selector,它的字段也是很多!看一下核心字段:
public class Selector implements Selectable, AutoCloseable {
private final java.nio.channels.Selector nioSelector; //监听网络io事件
private final Map<String, KafkaChannel> channels; //维护 nodeid 与kafkaChannel关系
private final Set<KafkaChannel> explicitlyMutedChannels;
private boolean outOfMemory;
private final List<Send> completedSends; //记录完全发送出去的请求, send和xxReceive 表示读和写用的缓存,顶层通过ByteBuffer实现
private final List<NetworkReceive> completedReceives; //记录完全收到的请求
private final Map<KafkaChannel, Deque<NetworkReceive>> stagedReceives; //暂存一次 OP_READ事件过程中读取到的全部请求,之后保存到com..Receives中
private final Set<SelectionKey> immediatelyConnectedKeys;
private final Map<String, KafkaChannel> closingChannels;
private Set<SelectionKey> keysWithBufferedRead;
private final Map<String, ChannelState> disconnected; //记录一次poll 中发现的断开的连接 和新建立的连接
private final List<String> connected;
private final List<String> failedSends; //记录,向哪些Node发送请求失败了。
...
}
核心方法:connect(),负责创建kafkaChannel,并添加到Channels集合中保存起来。
public void connect(String id, InetSocketAddress address, int sendBufferSize, int receiveBufferSize) throws IOException {
SocketChannel socketChannel = SocketChannel.open(); //创建sockerChannel
socketChannel.configureBlocking(false); //配置成 非阻塞模式
Socket socket = socketChannel.socket();
socket.setKeepAlive(true); //设置为长连接
if (sendBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
socket.setSendBufferSize(sendBufferSize); //设置SO_SNDBUF大小
if (receiveBufferSize != Selectable.USE_DEFAULT_BUFFER_SIZE)
socket.setReceiveBufferSize(receiveBufferSize); //设置SO_RCVBUF大小
socket.setTcpNoDelay(true);
boolean connected;
connected = socketChannel.connect(address);
//将这个socketChannel注册到 nioSelector上,并且关注OP_CONNECT事件
SelectionKey key = socketChannel.register(nioSelector, SelectionKey.OP_CONNECT);
//创建kafkaChannel
KafkaChannel channel = buildChannel(socketChannel, id, key);
}
回到主线路,使用kafka的selector的send()方法,发送请求request
public void send(Send send) {
String connectionId = send.destination();
//获得send缓存
KafkaChannel channel = openOrClosingChannelOrFail(connectionId);
if (closingChannels.containsKey(connectionId)) {
this.failedSends.add(connectionId);
} else {
channel.setSend(send);
}
}
selector真正执行 网络I/O的地方,poll(),它会调用nioSelector的select()方法,