Kafka生产者相关配置

Kafak 生产者配置

首先我们看一下配置生产者的时候怎么用? 一开始需要先配置一些生产者需要的参数,比如地址,反序列化器,还需要一些诸如自定义分区规则,重试机制,重试时间等配置。先看看最简单客户端代码的案例

Properties props = new Properties();
        // 用户拉取kafka的元数据
        props.put("bootstrap.servers", "hadoop1:9092");
        props.put("client.id", "DemoProducer");
        //设置序列化的类。
        //二进制的格式
        props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        //消费者,消费数据的时候,就需要进行反序列化。
        //TODO 初始化kafkaProducer
        producer = new KafkaProducer<>(props);

看下KafkaProducer构造函数:


ProducerConfig(Map<?, ?> props) {
        super(CONFIG, props);
    }

CONFIG是一个ConfigDef 对象

CONFIG = new ConfigDef().define(BOOTSTRAP_SERVERS_CONFIG, Type.LIST, Importance.HIGH, CommonClientConfigs.BOOTSTRAP_SERVERS_DOC)
                                .define(BUFFER_MEMORY_CONFIG, Type.LONG, 32 * 1024 * 1024L, atLeast(0L), Importance.HIGH, BUFFER_MEMORY_DOC)
                                .define(RETRIES_CONFIG, Type.INT, 0, between(0, Integer.MAX_VALUE), Importance.HIGH, RETRIES_DOC)

它记录了生产者需要的配置的英文名称,该配置需要的数据类型,默认值,校验器,配置的重要程度,以及英文DOC解释说明。
比如:序列化器
define(PARTITIONER_CLASS_CONFIG,
Type.CLASS,
DefaultPartitioner.class,
Importance.MEDIUM, PARTITIONER_CLASS_DOC)
首先PARTITIONER_CLASS_CONFIG: “partitioner.class” 即支持用户对该配置进行设置自定义的值,它的类型时Class, 如果用户不指定自定义的值得话会使用默认值: DefaultPartitioner.class,也就是说会使用DefaultPartitioner这个类的逻辑对Kafka的kV数值进行分区,PARTITIONER_CLASS_DOC = “Partitioner class that implements the Partitioner interface.”; 这个是对应的解释,也就是说这里的配置需要一个实现Partitioner接口的类。其余配置也都是通过define来描述,逻辑和分区器这个一致。

关键基础配置的含义

  • ProducerConfig.RETRY_BACKOFF_MS_CONFIG
public static final String RETRY_BACKOFF_MS_CONFIG = "retry.backoff.ms";
    public static final String RETRY_BACKOFF_MS_DOC = "The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios.";
    --时间含义
     .define(RETRY_BACKOFF_MS_CONFIG, Type.LONG, 100L, atLeast(0L), Importance.LOW, CommonClientConfigs.RETRY_BACKOFF_MS_DOC)

Kafka发送消息有可能因为网络等原因导致失败或者不稳定,因此就需要进行重试,这个值代表的是重试的时间. 它主要的意义在于不希望在不稳定的情况下,密集的不断重试。默认时间是100ms

  • ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG
    Kafka消息KV对中KEY的发送序列化配置,这个用户必须传进来

  • ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG
    Kafka消息KV对中Value的发送序列化配置,这个用户必须传进来

  • ProducerConfig.METADATA_MAX_AGE_CONFIG

public static final String METADATA_MAX_AGE_CONFIG = "metadata.max.age.ms";
    public static final String METADATA_MAX_AGE_DOC = "The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.";


 .define(METADATA_MAX_AGE_CONFIG, Type.LONG, 5 * 60 * 1000, atLeast(0), Importance.LOW, METADATA_MAX_AGE_DOC)

生产者从服务端那儿拉取过来的kafka的元数据,默认是5min,生产者每隔一段时间都要去更新一下集群的元数据。

  • ProducerConfig.MAX_REQUEST_SIZE_CONFIG
public static final String MAX_REQUEST_SIZE_CONFIG = "max.request.size";
    private static final String MAX_REQUEST_SIZE_DOC = "The maximum size of a request in bytes. This is also effectively a cap on the maximum record size. Note that the server "
                                                       + "has its own cap on record size which may be different from this. This setting will limit the number of record "
                                                       + "batches the producer will send in a single request to avoid sending huge requests.";

define(MAX_REQUEST_SIZE_CONFIG,
                                        Type.INT,
                                        1 * 1024 * 1024,
                                        atLeast(0),
                                        Importance.MEDIUM,
                                        MAX_REQUEST_SIZE_DOC)

这东西代表的是一个Request请求的大小,这个东西直接影响了生产者发送的批次,默认是1M,太小了,往往生产环境不够用,需要结合业务自身来调整,经验推荐可以用10M,32M,64M

  • ProducerConfig.BUFFER_MEMORY_CONFIG
public static final String BUFFER_MEMORY_CONFIG = "buffer.memory";
    private static final String BUFFER_MEMORY_DOC = "The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are "
                                                    + "sent faster than they can be delivered to the server the producer will block for <code>" + MAX_BLOCK_MS_CONFIG + "</code> after which it will throw an exception."
                                                    + "<p>"
                                                    + "This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since "
                                                    + "not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if "
                                                    + "compression is enabled) as well as for maintaining in-flight requests.";

 .define(BUFFER_MEMORY_CONFIG, Type.LONG, 32 * 1024 * 1024L, atLeast(0L), Importance.HIGH, BUFFER_MEMORY_DOC)

这个指标代表客户端要发给服务端的缓存大小,如果发送过快但是服务端消费过慢,就会导致生产者阻塞卡主。默认值是32M,需结合实际业务自我调整。

  • ProducerConfig.COMPRESSION_TYPE_CONFIG
public static final String COMPRESSION_TYPE_CONFIG = "compression.type";
    private static final String COMPRESSION_TYPE_DOC = "The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid "
                                                       + " values are <code>none</code>, <code>gzip</code>, <code>snappy</code>, or <code>lz4</code>. "
                                                       + "Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).";


 .define(COMPRESSION_TYPE_CONFIG, Type.STRING, "none", Importance.HIGH, COMPRESSION_TYPE_DOC)

生产者允许使用压缩,也是为了能够投递更多的数据提高吞吐,但是压缩也会多耗费CPU,并且如果服务端也开启了压缩,那这里就会有一个需要注意的点以防止入坑,那就是如果生产者与服务端的压缩算法不一样就麻烦了,中间会多一层解压缩再压缩的过程,将耗费更多的时间以及CPU

  • BATCH_SIZE_CONFIG

public static final String BATCH_SIZE_CONFIG = "batch.size";
    private static final String BATCH_SIZE_DOC = "The producer will attempt to batch records together into fewer requests whenever multiple records are being sent"
                                                 + " to the same partition. This helps performance on both the client and the server. This configuration controls the "
                                                 + "default batch size in bytes. "
                                                 + "<p>"
                                                 + "No attempt will be made to batch records larger than this size. "
                                                 + "<p>"
                                                 + "Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. "
                                                 + "<p>"
                                                 + "A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable "
                                                 + "batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a "
                                                 + "buffer of the specified batch size in anticipation of additional records.";

  .define(BATCH_SIZE_CONFIG, Type.INT, 16384, atLeast(0), Importance.MEDIUM, BATCH_SIZE_DOC)

这个配置含义代表需要将多少的record 打包在一起放到一个Request中去,这个值需要结合业务场景,太小了影响吞吐,太大了浪费内存。默认是16384

  • ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG
    public static final String CONNECTIONS_MAX_IDLE_MS_CONFIG = "connections.max.idle.ms";
    public static final String CONNECTIONS_MAX_IDLE_MS_DOC = "Close idle connections after the number of milliseconds specified by this config.";
    define(CONNECTIONS_MAX_IDLE_MS_CONFIG,
                                        Type.LONG,
                                        9 * 60 * 1000,
                                        Importance.MEDIUM,
                                        CommonClientConfigs.CONNECTIONS_MAX_IDLE_MS_DOC)

一个网络连接最多空闲多久,超过这个空闲时间,就关闭这个网络连接。默认值是9分钟

  • ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION
public static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = "max.in.flight.requests.per.connection";
    private static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_DOC = "The maximum number of unacknowledged requests the client will send on a single connection before blocking."
                                                                            + " Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of"
                                                                            + " message re-ordering due to retries (i.e., if retries are enabled).";
define(MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
                                        Type.INT,
                                        5,
                                        atLeast(1),
                                        Importance.LOW,
                                        MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_DOC)

producer向broker发送数据的时候,其实是有多个网络连接。每个网络连接可以忍受 producer端发送给broker 消息然后消息没有响应的个数。这个值如果大于1,一旦有发了没响应的场景就会导致消息有乱序的问题(因为kafka有重试机制,所以有可能会造成数据乱序,如果想要保证有序,这个值要把设置为1.),业务自身需要注意。 默认是5

  • ProducerConfig.SEND_BUFFER_CONFIG
public static final String SEND_BUFFER_CONFIG = "send.buffer.bytes";
    public static final String SEND_BUFFER_DOC = "The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used.";
.define(SEND_BUFFER_CONFIG, Type.INT, 128 * 1024, atLeast(-1), Importance.MEDIUM, CommonClientConfigs.SEND_BUFFER_DOC)

send.buffer.bytes:socket发送数据的缓冲区的大小,默认值是128K

  • ProducerConfig.RECEIVE_BUFFER_CONFIG
public static final String RECEIVE_BUFFER_CONFIG = "receive.buffer.bytes";
    public static final String RECEIVE_BUFFER_DOC = "The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used.";

 .define(RECEIVE_BUFFER_CONFIG, Type.INT, 32 * 1024, atLeast(-1), Importance.MEDIUM, CommonClientConfigs.RECEIVE_BUFFER_DOC)

receive.buffer.bytes:socket接受数据的缓冲区的大小,默认值是32K。

  • ProducerConfig.ACKS_CONFIG
public static final String ACKS_CONFIG = "acks";
    private static final String ACKS_DOC = "The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the "
                                           + " durability of records that are sent. The following settings are allowed: "
                                           + " <ul>"
                                           + " <li><code>acks=0</code> If set to zero then the producer will not wait for any acknowledgment from the"
                                           + " server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be"
                                           + " made that the server has received the record in this case, and the <code>retries</code> configuration will not"
                                           + " take effect (as the client won't generally know of any failures). The offset given back for each record will"
                                           + " always be set to -1."
                                           + " <li><code>acks=1</code> This will mean the leader will write the record to its local log but will respond"
                                           + " without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after"
                                           + " acknowledging the record but before the followers have replicated it then the record will be lost."
                                           + " <li><code>acks=all</code> This means the leader will wait for the full set of in-sync replicas to"
                                           + " acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica"
                                           + " remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.";
define(ACKS_CONFIG,
                                        Type.STRING,
                                        "1",
                                        in("all", "-1", "0", "1"),
                                        Importance.HIGH,
                                        ACKS_DOC)

acks: 有如下4个值,“all”(等同于-1), “-1”, “0”, “1”。默认值是1,代表数据发送到broker后要写入到主分区并且返回响应。含义介绍:
0: producer发送数据到broker后,就完了,没有返回值,不管写成功还是写失败都不管了。
1: producer发送数据到broker后,数据成功写入leader partition以后返回响应。
-1: producer发送数据到broker后,数据要写入到leader partition里面,并且数据同步到所有的 follower partition里面以后,才返回响应。实际上这里就是一个权衡(性能与丢数据的权衡),如果发了完全不管性能最好但是有可能丢数据,但是如果发了需要所有的partition都写入响应才算成功那么性能最差吞吐也最受影响,1则是两者的一个折中选择。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值