从源码分析Kafka客户端发送消息的大致流程

最新推荐文章于 2022-11-16 12:56:32 发布

Joson_cyz

最新推荐文章于 2022-11-16 12:56:32 发布

阅读量221

点赞数

分类专栏：大数据分布式文章标签： kafka 消息队列源码

本文链接：https://blog.csdn.net/qq_24236769/article/details/116571991

版权

大数据同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

分布式

2 篇文章 0 订阅

订阅专栏

从源码分析Kafka客户端发送消息的大致流程

版本说明
源码分析
学习总结

版本说明

基于Kafka 0.10的版本源码总结一下发送消息的大致流程

源码分析

 @Override
    public Future<RecordMetadata> send(ProducerRecord<K, V> record) {
     	//调用重载的send方法
        return send(record, null);
    }
 @Override
    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        //1.根据用户自定义的拦截器，封装消息，在消息发送前后进行拦截通知
        ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
        //调用核心的doSend方法，核心逻辑都在此方法内
        return doSend(interceptedRecord, callback);
    }

    /**
     * Implementation of asynchronously send a record to a topic. Equivalent to <code>send(record, null)</code>.
     * See {@link #send(ProducerRecord, Callback)} for details.
     */
    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        TopicPartition tp = null;
        try {
            // first make sure the metadata for the topic is available
            //2.根据Topic 阻塞的获取元数据，并缓存。在这个过程中会获取topic的partition, 以及partition的leader和ISR等元数据
            long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
            long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);
            byte[] serializedKey;
            try {
            //3.通过序列化器对key和value进行序列化为byte数组， 以发送到broker
                serializedKey = keySerializer.serialize(record.topic(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer");
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer");
            }
            //4.	通过分区器确认消息的分区，分区的规则为：
			// 	若用户指定了分区，则直接定位为指定分区
			// 	没有指定分区，则根据key进行hash对分区数取模定位所在分区
			// 	若没有key值，则通过轮询的方式指定一个分区，达到均衡的效果
            int partition = partition(record, serializedKey, serializedValue, metadata.fetch());
            int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
            //5.检查序列化后的消息大小，是否超过单个request和缓冲区的限制
            ensureValidRecordSize(serializedSize);
            tp = new TopicPartition(record.topic(), partition);
            long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            // producer callback will make sure to call both 'callback' and interceptor callback
            Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
            //6.把序列化后的消息写入缓冲区
            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
            //7.达到发送条件时唤醒发送线程
            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
            }
            //8.返回异步发送的结果
            return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
            if (callback != null)
                callback.onCompletion(null, e);
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            return new FutureFailure(e);
        } catch (InterruptedException e) {
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw new InterruptException(e);
        } catch (BufferExhaustedException e) {
            this.errors.record();
            this.metrics.sensor("buffer-exhausted-records").record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        } catch (KafkaException e) {
            this.errors.record();
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        } catch (Exception e) {
            // we notify interceptor about all exceptions, since onSend is called before anything else in this method
            if (this.interceptors != null)
                this.interceptors.onSendError(record, tp, e);
            throw e;
        }
    }

学习总结

从kafka发送端源码的分析可以得出，一个高吞吐、高并发的消息系统发送客户端的设计的核心点有以下几个：

元数据按需加载/懒加载。根据发送的topic拉取对应的元数据，而不是拉取整个集群所有topic的元数据。同时在初始化kafkaProducer时不是马上获取元数据，而是在发送时再去获取、更新元数据
可扩展。通过拦截器/回调函数的方式为客户端留下扩展点
序列化器。考虑到消息系统的key/value有可能是各种数据类型，在网络通信时应通过序列化的方式将数据转换为通用的byte数组以便接收和处理。这也是大部分分布式系统通信时的标配，否则系统将会为处理各种数据类型而焦头烂额
分区器。通过独立的分区器进行消息分区的选择和路由，可以有用户扩展的空间
缓冲区。通过缓冲区缓存将要发送的消息缓存起来，等满足条件时一次性批量发送；减少网络的IO,将网络的带宽最大的用于消息数据传输，过多的网络IO将产生很多无效的信息头、校验信息等数据。
检查核心数据的大小。避免内存的溢出
使用异步请求发送消息。消息先进入缓冲区，触发发送条件时再唤醒发送线程（平时处于挂起状态，不占用CPU），提高CPU利用率，提升系统性能。

Joson_cyz

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
从源码分析Kafka客户端发送消息的大致流程

从源码分析Kafka客户端发送消息的大致流程版本说明源码分析学习总结版本说明基于Kafka 0.10的版本源码总结一下发送消息的大致流程源码分析 @Override public Future<RecordMetadata> send(ProducerRecord<K, V> record) { //调用重载的send方法 return send(record, null); } @Override public Futu
复制链接

扫一扫