Kafka客户端向缓冲区写入消息的流程

小北漂漂飘

已于 2024-04-13 20:54:20 修改

阅读量549

点赞数 22

文章标签： kafka

于 2024-04-13 13:44:14 首次发布

本文链接：https://blog.csdn.net/smallmarks/article/details/137711031

版权

ç

KafkaProduce向消息缓冲区发送消息的大致流程
如何保证topic的元数据是可以使用的
KafkaProducer写入缓冲区的流程
一条消息如何按照二进制协议写入batch的bytebuffer的
不断申请内存空间导致内存消耗光了之后会做什么

KafkaProduce向消息缓冲区发送消息的大致流程

如果有自定义拦截器会回调自定义的拦截器
同步阻塞获取topic的元数据
序列化key和value，将信息转化为byte数组的格式
检查消息是否超出请求最大请求大小以及内存缓冲最大的大小
将消息加入到缓冲区中
如果缓冲区的batch被填满了，则创建一个新的batch，这时会唤醒Sender线程，让他来工作，发送batch

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
        return doSend(interceptedRecord, callback);
    }

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        TopicPartition tp = null;
        try {
            // first make sure the metadata for the topic is available
            long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
            long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);
            byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer");
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer");
            }
            int partition = partition(record, serializedKey, serializedValue, metadata.fetch());
            int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
            ensureValidRecordSize(serializedSize);
            tp = new TopicPartition(record.topic(), partition);
            long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            // producer callback will make sure to call both 'callback' and interceptor callback
            Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
            }
            return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
            if (callback != null)
                callback.onCompletion(null, e);
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            return new FutureFailure(e);
        } catch (InterruptedException e) {
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw new InterruptException(e);
        } catch (BufferExhaustedException e) {
            this.errors.record();
            this.metrics.sensor("buffer-exhausted-records").record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        } catch (KafkaException e) {
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        } catch (Exception e) {
            // we notify interceptor about all exceptions, since onSend is called before anything else in this method
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        }
    }

如何保证topic的元数据是可以使用的

在客户端的方法尝试等待获取topic元数据的过程中，核心的逻辑，就是说先必须唤醒Sender线程，然后呢就会通过一个while循环，直接去wait释放锁，尝试最多就是等待默认的60s的时间
topic元数据的拉取，是走的是异步的方式的，但是对异步的结果进行同步的阻塞的等待，他其实唤醒Sender线程，就是本质上就是在让那个Sender线程去从broker拉取对应的topic的元数据
如果拉取成功了，那么version版本号，集群元数据的版本号一定会累加，所以只要判断version版本号还没有累加，就说明此时Sender线程还没有成功的拉取元数据，此时就是在主线程里，就是要wait阻塞等待最多60s即可
接下来肯定是分为两种情况：
（1）Sender线程成功的在60s内把topic元数据加载到了，然后缓存到了Metadata里去，更新了version版本号，而且此时一定会尝试把wait阻塞等待的主线程给唤醒，让主线程直接返回阻塞等待的时长
（2）如果wait(60s)一直超时了，你的Sender线程都没加载成功元数据，此时人家在60s后自动醒来了，此时会直接超时抛异常
注意的是这个版本更新其实未必是对应的topic被拉取回来，也有可能是其他线程出发了更的操作，在获取版本之后还需要判断所需要的topic的元信息是否已经被更新，如果更新则成功，不存在则继续触发更新
请添加图片描述

KafkaProducer写入缓冲区的流程

KafkaProducer设计的理念就是多线程并发安全的，可以让多个线程并发的来调用KafkaProducer还保证数据不会错乱的

他会从内存缓冲区里获取一个分区对应的Deque，这个Deque里是一个队列，放了很多的Batch，就是这个分区对应的多个batch, 适合读多写少的场景
this.batches = new CopyOnWriteMap<>();

private final ConcurrentMap<TopicPartition, Deque<RecordBatch>> batches;

大量的主要还是在读取，就是去大量的从map里读取一个分区对应的Deque，最后高并发频繁更新的就是分区对应的那个Deque，读的时候基于快照来读即可，所以这种场景非常适合使用CopyOnWrite系列的数据结构

如果说此时还没有创建对应的batch，此时会导致放入Deque会失败

他会基于BufferPool给这个batch分配一块内存出来，之所以说是Pool，就是因为这个batch代表的内存空间是可以复用的，用完一块内存之后会放回去下次给别人来使用，复用内存，避免了频繁的使用内存，丢弃对象，垃圾回收

已经可以往Deque队列里写入消息了，已经有一个新分配的batch了（对应了BufferPool分配的一块内存空间）

public RecordAppendResult append(TopicPartition tp,
                                     long timestamp,
                                     byte[] key,
                                     byte[] value,
                                     Callback callback,
                                     long maxTimeToBlock) throws InterruptedException {
        // We keep track of the number of appending thread to make sure we do not miss batches in
        // abortIncompleteBatches().
        appendsInProgress.incrementAndGet();
        try {
            // check if we have an in-progress batch
            Deque<RecordBatch> dq = getOrCreateDeque(tp);
            synchronized (dq) {
                if (closed)
                    throw new IllegalStateException("Cannot send after the producer is closed.");
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
                if (appendResult != null)
                    return appendResult;
            }

            // we don't have an in-progress record batch try to allocate a new batch
            int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));
            log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
            ByteBuffer buffer = free.allocate(size, maxTimeToBlock);
            synchronized (dq) {
                // Need to check if producer is closed again after grabbing the dequeue lock.
                if (closed)
                    throw new IllegalStateException("Cannot send after the producer is closed.");

                RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
                if (appendResult != null) {
                    // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
                    free.deallocate(buffer);
                    return appendResult;
                }
                MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
                RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
                FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, callback, time.milliseconds()));

                dq.addLast(batch);
                incomplete.add(batch);
                return new RecordAppendResult(future, dq.size() > 1 || batch.records.isFull(), true);
            }
        } finally {
            appendsInProgress.decrementAndGet();
        }
    }

一条消息如何按照二进制协议写入batch的bytebuffer的

不断申请内存空间导致内存消耗光了之后会做什么

会阻塞一段时间，maxBlockMs - 可能获取元数据耗费的时间，如果还是不行的话，就会抛异常了，但是这段时间里有可用内存腾出来了（有一些batch被发送出去了，获取到了响应，此时就可以释放那个batch底层对应的ByteBuffer，就会被放回到BufferPool里面去，此时就可以唤醒阻塞的线程，再次申请一个新的ByteBuffer构造一个Batch）
请添加图片描述

小北漂漂飘

关注

22
点赞
踩
9

收藏

觉得还不错? 一键收藏
2
评论
Kafka客户端向缓冲区写入消息的流程

会阻塞一段时间，maxBlockMs - 可能获取元数据耗费的时间，如果还是不行的话，就会抛异常了，但是这段时间里有可用内存腾出来了（有一些batch被发送出去了，获取到了响应，此时就可以释放那个batch底层对应的ByteBuffer，就会被放回到BufferPool里面去，此时就可以唤醒阻塞的线程，再次申请一个新的ByteBuffer构造一个Batch）（2）如果wait(60s)一直超时了，你的Sender线程都没加载成功元数据，此时人家在60s后自动醒来了，此时会直接超时抛异常。
复制链接

扫一扫