3.Producer设计分析之RecordAccumulator

最新推荐文章于 2023-03-24 15:31:55 发布

置顶公众号：义笔记

最新推荐文章于 2023-03-24 15:31:55 发布

阅读量6.2k

点赞数

分类专栏： kafka基本知识点 kafka源码文章标签： kafka

该文章来自微信公众号：义笔记，为该公众号原创，未经允许不得转载。

本文链接：https://blog.csdn.net/qq_31430665/article/details/113532462

版权

kafka源码同时被 2 个专栏收录

9 篇文章 2 订阅

订阅专栏

kafka基本知识点

5 篇文章 2 订阅

订阅专栏

文章目录

3.Producer设计分析之RecordAccumulator

3.Producer设计分析之RecordAccumulator

上节讲解了大部分kafka的实现原理,后续我们会逐步深入到kafka源码中体验实现细节,如果分析的有问题可以在评论区进行讨论.

类图

在这里插入图片描述

kafka 发送消息流程

我们上节知道了kafka发送消息采用同步/异步两种方式发送消息,在发送时候涉及到的重要线程则是Sender线程,以及线程共享的变量RecordAccumulator,sender现场不断从accumulator中拉取消息发送到broker.

在这里插入图片描述

// 在KafkaProducer 中
private final RecordAccumulator accumulator;
private final Sender sender;

RecordAccumulator介绍:

其功能类似于缓冲队列,在其中,会根据TopicPartition对象对消息进行分组,每一个TopicPartition对象会对应一个队列,ProducerBatch表示一批消息,在Kafka发送消息时,总是从队列尾部追加,而Sender则是从队列头部进行获取,如上面的流程图.

//主要利用ConcurrentMap进行缓存
private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;
//压缩类型gzip/snappy/lz4/zstd
private final CompressionType compression;
//基于NIO ByteBuffer 的缓冲池,
private final BufferPool free;
//用于保存尚未确认的batch消息(包括已发送未回ack和未发送的消息),实际上是一个Set
private final IncompleteBatches incomplete;

RecordAccumulator先介绍append流程:

    public RecordAppendResult append(TopicPartition tp,
                                     long timestamp,
                                     byte[] key,
                                     byte[] value,
                                     Header[] headers,
                                     Callback callback,
                                     long maxTimeToBlock,
                                     boolean abortOnNewBatch,
                                     long nowMs) throws InterruptedException {
        //appendsInProgress记录当前正在进行append消息的线程数,方便当客户端调用KafkaProducer.close()强制关闭发送消息操作时放弃未处理完的消息请求,释放资源
        appendsInProgress.incrementAndGet();
        ByteBuffer buffer = null;
        if (headers == null) headers = Record.EMPTY_HEADERS;
        try {
            // 检查是否已经存在于batches中,有则获取,没有则新建.
            Deque<ProducerBatch> dq = getOrCreateDeque(tp);
            // 对当前Deque进行同步,防止消息存入的顺序发生变化
            synchronized (dq) {
                // 尝试往buffer中写入消息,尝试过程会先从Deque里的tail获取一条ProducerBatch,若获取不到则返回null,获取的到则会判断ProducerBatch是否有足够的容量存放消息,容量不足同样返回null,否则将消息以bype[]的形式写入缓冲区中.
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq, nowMs);
                if (appendResult != null)
                    return appendResult;
            }
            byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
            int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
            buffer = free.allocate(size, maxTimeToBlock);

            // Update the current time in case the buffer allocation blocked above.
            nowMs = time.milliseconds();
            synchronized (dq) {
                // 同上解释,这里为什么要进行二次判断呢?
                // 为了保险起见可能此时已经有相同的TopicPartition的其他线程创建了ProducerBatch中的部分消息已经被sender进行处理释放了空间,此时已有空间可容纳新的消息,则再次调用tryAppend()尝试写入,若写入成功,则最后通过finally块释放刚才free申请的空间
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq, nowMs);
                if (appendResult != null) {
                    return appendResult;
                }
                // 建造者模式,构建 MemoryRecords 对象,MemoryRecords合理的将buffer操作进行封装,通过NIO去进行写操作.
                MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
                ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, nowMs);
                // 最后,确实tryAppend()失败,则创建一个新的ProducerBatch对象,并将消息append到缓冲区
                FutureRecordMetadata future = Objects.requireNonNull(batch.tryAppend(timestamp, key, value, headers,
                        callback, nowMs));

                dq.addLast(batch);
                // 加入尚未确定的队列中,保证消息的可靠性.
                incomplete.add(batch);
                // 为什么要在这里只为null 而不在finally统一处理呢?
                buffer = null;
                return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true, false);
            }
        } finally {
            if (buffer != null)
                // 释放buffer资源
                free.deallocate(buffer);
            // 减少发送线程个数
            appendsInProgress.decrementAndGet();
        }
    }

上面先简单讲解了RecordAccumulator.append()是在做什么,就是把消息进行包装,存储到对应的TopicPartition中的队列中,并保存消息到缓冲区

batchIsFull和newBatchCreated在调用RecordAccumulator.append()方法后来判断是否需要唤醒sender线程进行发送消息

如果batchIsFull为true:代表双向队列里面有RecordBatch满了,可以唤醒发送现成发送消息了
如果newBatchCreated为true:代表旧的RecordBatch满了或者装不下新的消息了,可以唤醒发送消息了

RecordAppendResult中的几个参数有必要说一下:

public final static class RecordAppendResult {
    	//符合处理当前消息是同步/异步的操作
        public final FutureRecordMetadata future;
    	//标识RecordBatch是否已满
        public final boolean batchIsFull;
    	//是否需要重新创建新的RecordBatch。
        public final boolean newBatchCreated;
    	//是否需要创建新的RecordBatch
        public final boolean abortForNewBatch;

        public RecordAppendResult(FutureRecordMetadata future, boolean batchIsFull, boolean newBatchCreated, boolean abortForNewBatch) {
            this.future = future;
            this.batchIsFull = batchIsFull;
            this.newBatchCreated = newBatchCreated;
            this.abortForNewBatch = abortForNewBatch;
        }
    }

下面我们看一下最主要的send方法:

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        TopicPartition tp = null;
        try {
            throwIfProducerClosed();
            // first make sure the metadata for the topic is available
            long nowMs = time.milliseconds();
            ClusterAndWaitTime clusterAndWaitTime;
            try {
                // 等待cluster获取元数据
                clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), nowMs, maxBlockTimeMs);
            } catch (KafkaException e) {
                if (metadata.isClosed())
                    throw new KafkaException("Producer closed while send in progress", e);
                throw e;
            }
            nowMs += clusterAndWaitTime.waitedOnMetadataMs;
            long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
            Cluster cluster = clusterAndWaitTime.cluster;
            byte[] serializedKey;
            try {
                // 将数据进行序列话
                serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
            } catch (ClassCastException cce) {
               // 忽略catch
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer", cce);
            }
            // 确定要发送的分区
            int partition = partition(record, serializedKey, serializedValue, cluster);
            tp = new TopicPartition(record.topic(), partition);
            setReadOnly(record.headers());
            Header[] headers = record.headers().toArray();
            int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                    compressionType, serializedKey, serializedValue, headers);
            ensureValidRecordSize(serializedSize);
            long timestamp = record.timestamp() == null ? nowMs : record.timestamp();
            if (log.isTraceEnabled()) {
                log.trace("Attempting to append record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            }
            // 拦截器回掉,前面讲过请求过程可以配置拦截器
            Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

            if (transactionManager != null && transactionManager.isTransactional()) {
                transactionManager.failIfNotReadyForSend();
            }
            // 发送核心：将消息交给消息累加器
            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
                    serializedValue, headers, interceptCallback, remainingWaitMs, true, nowMs);

            // 需要创建Batch所以没成功
            if (result.abortForNewBatch) {
                int prevPartition = partition;
                // 这里调用了分区器的onNewBatch
                // 如果分区器使用了StickyPartitionCache，通常会在这步执行nextPartition进行更新
                partitioner.onNewBatch(record.topic(), cluster, prevPartition);
                // 重新分区
                partition = partition(record, serializedKey, serializedValue, cluster);
                tp = new TopicPartition(record.topic(), partition);
                if (log.isTraceEnabled()) {
                    log.trace("Retrying append due to new batch creation for topic {} partition {}. The old partition was {}", record.topic(), partition, prevPartition);
                }

                // producer callback will make sure to call both 'callback' and interceptor callback
                interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);

                // 重新执行添加
                result = accumulator.append(tp, timestamp, serializedKey,
                    serializedValue, headers, interceptCallback, remainingWaitMs, false, nowMs);
            }

            if (transactionManager != null && transactionManager.isTransactional())
                transactionManager.maybeAddPartitionToTransaction(tp);

            // 追加一条消息到收集器后，如果记录收集器满了或者当前是新创建的Batch，则通知唤醒Sender 发送消息
            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
            }
            // 返回future 判断是否需要同步或者异步
            return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            // 忽略异常处理
        } catch (InterruptedException e) {
            // 忽略异常处理
        } catch (KafkaException e) {
            // 忽略异常处理
        } catch (Exception e) {
            // 忽略异常处理
        }
    }

doSend代码比较长,但是实际逻辑很清晰:

确保producer还在运行；throwIfProducerClosed();
确定topic的元数据可用（处理/更新元数据）；
key和value序列化的处理；
计算出发送的topic分区；
回调接口封装；
将消息交给消息累加器accumulator；
处理异常中断情况；

下面讲解几个细节的点

kafka内部序列化方式:

序列化方式
ByteArraySerializer :字节数组序列化
ByteBufferSerializer:ByteBuffer(NIO)序列化
BytesSerializer:字节序列化
DoubleSerializer:Double序列化
ExtendedSerializer:被移除了,以前序列化接口,现在用Serializer代替
FloatSerializer:float序列化
IntegerSerializer:integer序列化
LongSerializer:long序列化
ShortSerializer:short序列化
StringSerializer:string序列化
UUIDSerializer:uuid序列化,与StringSerializer实现相同
VoidSerializer:返回null

partition值的计算

默认实现:DefaultPartitioner

private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ?
                partition :
                partitioner.partition(
                        record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
                         int numPartitions) {
        if (keyBytes == null) {
            return stickyPartitionCache.partition(topic, cluster);
        }
        // hash the keyBytes to choose a partition
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }

实现算法如下:

如果指定了partition则会选择指定的
如果没有指定:
1. 但是指定了key,根据key进行hash%numPartitions,选取partition
2. 但是没有指定key,根据StickyPartitionCache.nextPartition()根据随机算法方式进行选取partition
上面是有优先级关系的,如果同时指定了partition/key,则优先partition

针对高并发的设计

先看batches 实现的map,其实this.batches = new CopyOnWriteMap<>();,其中采用了juc中的CopyOnWriteMap(针对于读多写少的场景),为什么这么说呢?我们看看他们put方法与get方法

private volatile Map<K, V> map;
public synchronized V put(K k, V v) {
    
        Map<K, V> copy = new HashMap<K, V>(this.map);
        V prev = copy.put(k, v);
        this.map = Collections.unmodifiableMap(copy);
        return prev;
    }
public V get(Object k) {
        return map.get(k);
    }

相信大家对volatile都不陌生,它保证了map变量的内存可见性,在put时先会copy一份map的数据,之后在put在map中,最后给map更换地址.这是一个纯粹采取空间换时间的Map.

公众号：义笔记

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
5
评论
3.Producer设计分析之RecordAccumulator

typora-root-url: ./2.producer设计分析typora-copy-images-to: ./2.producer设计分析2.Producer设计分析之RecordAccumulator上节讲解了大部分kafka的实现原理,后续我们会逐步深入到kafka源码中体验实现细节,如果分析的有问题可以在评论区进行讨论.类图[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SAOlZVXn-1612194471627)(/image-2021020120.
复制链接

扫一扫