深入Kafka源码04：生产者Producer之序列化、Partitions、RecordAccumulator、高并发下的map结构、内存池原理与内存分配回收

最新推荐文章于 2022-12-09 09:17:59 发布

pub.ryan

最新推荐文章于 2022-12-09 09:17:59 发布

阅读量261

点赞数

分类专栏： Kafka及源码

本文链接：https://blog.csdn.net/qq_36269641/article/details/109965687

版权

1、send：序列化与反序列化

客户端发送的消息的key和value都是byte数组，而kafka中提供了相应的接口与不同类型序列化实现

若需要自定义序列化器，则只需要实现对应的Serializer中的configure方法，选择相应的编码方式即可实现序列化了

此处省略……

2、send：Partitioner

进入分区器

    /**
     * computes partition for given record.
     * if the record has partition returns the value otherwise
     * calls configured partitioner class to compute the partition.
     */
    private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        //如果你的消息已经分配了分区号，就直接使用此分区号，正常情况下的消息是没有分区号的，只有重试的消息才会有分区号，
        Integer partition = record.partition();
        return partition != null ?
                partition :
                //使用分区器选择合适的分区
                partitioner.partition(
                        record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

具体分区策略：

    /**
     * Compute the partition for the given record.
     *
     * @param topic The topic name
     * @param key The key to partition on (or null if no key)
     * @param keyBytes serialized key to partition on (or null if no key)
     * @param value The value to partition on or null
     * @param valueBytes serialized value to partition on or null
     * @param cluster The current cluster metadata
     */
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        //获取到我们要发送消息的对应的topic分区的信息
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        //计算分区总个数
        int numPartitions = partitions.size();
        //两种分区:策略一：如果发送消息时没有指定key
        if (keyBytes == null) {
            //计数器，每次都会自动递增
            int nextValue = nextValue(topic);
            //获取到可用的分区数
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            //当可用的分区数大于0
            if (availablePartitions.size() > 0) {
                //计算出我们需要发送到的分区上：将一个不断变化的值与可用的分区数进行取模，实现一个轮询效果，实现负载均衡
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                //返回可用分区号
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            //策略二：指定key
            //直接对key取一个hash并%以分区总数。如果是同一个key计算出来的分区肯定是同一个分区
            //如果想让消息能发送到同一个分区上，我们就必须指定key
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

步骤四、五、六比较简单，省略……

3、消息累加缓存RecordAccumulator

3.1 消息累加器原理

1）、异步发送——消息缓存到累加器中，达到阈值唤醒sender批量取出消息

Kafka Producer底层是使得异步消息发送方式的，主线程调用KafkaProducer.send()发送消息时，先将消息放到RecordAccumulator中暂存，然后主线程就从send方法返回了，此时的消息并未真正的发送给Kafka，而是缓存在RecordAccumulator中，之后，业务线程通过KafkaProducer.send()不断向RecordAccumulator追加消息，追加方法的返回值表示批记录RecordBatch是否满了，如果满了，会唤醒sender线程批量取出RecordAccumulator中的这一批消息，将其发送到分区对应的节点上，如果批记录没有满，就会继续等待直到收集到足够的信息。

2）、RecordBatch中消息存放位置——MemoryRecords

RecordAccumulator中有一个以TopicPartition为key的ConcurrentMap，每个value是ArrayDeque<RecordBatch>，Deque缓存了发送对应TopicPartition的消息，每个RecordBatch拥有一个MemoryRecords的引用。MemoryRecords才是消息最终存放的地方。

3.2 RecordAccumulator——每次加入Deque时都会加锁，以保证线程安全

代码优点：在收集消息过程中，采用了分段加锁方式，最大程度提升性能并保证线程安全。

batches：TopicPartition与RecordBatch集合的映射关系，类型是CopyOnWriteMap，这是一个线程安全的集合

    /**
     * Add a record to the accumulator, return the append result
     * <p>
     * The append result will contain the future metadata, and flag for whether the appended batch is full or a new batch is created
     * <p>
     *
     * @param tp The topic/partition to which this record is being sent
     * @param timestamp The timestamp of the record
     * @param key The key for the record
     * @param value The value for the record
     * @param headers the Headers for the record
     * @p

最低0.47元/天解锁文章

pub.ryan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
深入Kafka源码04：生产者Producer之序列化、Partitions、RecordAccumulator、高并发下的map结构、内存池原理与内存分配回收

1、send步骤二：序列化与反序列化客户端发送的消息的key和value都是byte数组，而kafka中提供了相应的接口与不同类型序列化实现若需要自定义序列化器，则只需要实现对应的Serializer中的configure方法，选择相应的编码方式即可实现序列化了此处省略……2、Partitioner...
复制链接

扫一扫