Failed to allocate memory within the configured max blocking time 60000 ms
kafka生产者写入消息的时候出现了这个问题。
网上的一般方法是设置 batch.size = 0,一般都是可以的。具体怎么设置,为什么会出现这个问题,下面我会详细分析。
首先我想了一下,最近代码做过什么变动。
我们用kafka放爬取的新闻数据,代码上线之后爬取的网站变多了,而且那些网站有一些从历史开始爬取,就会有很多数据。对应业务变化就是:业务数据量增多
网上找到的第一篇文章是这个,在里面有一句解释:
达到的消息,比发送的消息要快的多了
kafka异步模式下,会有一个缓冲区,如果这个缓冲区满了,数据就进不去了,所以就有可能会出现超时。也就是缓冲区的数据没有发送出去,又有消息来了,缓冲区放不下了这条消息就得等待,如果等待的时间超过了这个时间,就出现超时。
第二篇文章是Stack Overflow的文章,说设置batch.size=0。我用的java客户端,具体在哪里设置,设置这个有什么用?
我找到了java设置配置的代码:
@Bean
public Map producerConfig(){
Map props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, 60000);
return props;
}
里面有个MAX_BLOCK_MS_CONFIG刚好是60000ms,此事必有蹊跷啊。最近学会了看源码,这能力很重要啊,直接点进去看
MAX_BLOCK_MS_CONFIG:
public static final String MAX_BLOCK_MS_CONFIG = "max.block.ms";
那我搜一下batch.size,果然有,
public static final String BATCH_SIZE_CONFIG = "batch.size";
照理来说我直接设置了就ok了。但我还是想一探究竟。
我研究了一下,kafka发送消息分为同步和异步,同步情况下我们直接发送等待响应就行了。异步模式下我们会有一个队列,为了提高吞吐量,我们会把数据积累到一定的大小才发送,这个大小就是batch.size,源码里面有batch.size的解释:
private static final String BATCH_SIZE_DOC = "The producer will attempt to batch records together into fewer requests whenever multiple records are being sent"
+ " to the same partition. This helps performance on both the client and the server. This configuration controls the "
+ "default batch size in bytes. "
+ "<p>"
+ "No attempt will be made to batch records larger than this size. "
+ "<p>"
+ "Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. "
+ "<p>"
+ "A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable "
+ "batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a "
+ "buffer of the specified batch size in anticipation of additional records.";
也就是,这个batch.size如果设置得比较小,会降低吞度量;如果设置得比较大,又耗费内存。我查找到源码,他默认是16k:
.define(BATCH_SIZE_CONFIG, Type.INT, 16384, atLeast(0), Importance.MEDIUM, BATCH_SIZE_DOC)
如果batch.size=0的话,那就不会积累到一定程度才一起发送了。所以batch.size=0一般可以解决这个问题。这样有数据就直接发,队列就不会满,也就不会超时,这是第一个解决办法。
想象一个场景,数据的大小很大,超过了16k,那么还是会在队列里面,如果到达的数据是在太多,真正发送出去的数据小于到达的数据,队列会慢慢满起来的。这样子,设置了batch.size=0也是无济于事。
那么我们可不可以把超时时间设置得长一点,甚至直接不超时(设置超时时间为Integer.MAX_VALUE就永远不会超时了),一直等待?我们看一下这个超时时间配置的源码的解释:
public static final String MAX_BLOCK_MS_CONFIG = "max.block.ms";
private static final String MAX_BLOCK_MS_DOC = "The configuration controls how long <code>KafkaProducer.send()</code> and <code>KafkaProducer.partitionsFor()</code> will block."
+ "These methods can be blocked either because the buffer is full or metadata unavailable."
+ "Blocking in the user-supplied serializers or partitioner will not be counted against this timeout.";
理论上,我们可以延长超时时间解决这个问题。现在上面方法设置了batch.size=0不行,我加大了超时时间,也不行,会出现另外一个错:
Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers
另外一个解决办法,我们增大buffer可以吗?看源码:
/** <code>buffer.memory</code> */
public static final String BUFFER_MEMORY_CONFIG = "buffer.memory";
private static final String BUFFER_MEMORY_DOC = "The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are "
+ "sent faster than they can be delivered to the server the producer will block for <code>" + MAX_BLOCK_MS_CONFIG + "</code> after which it will throw an exception."
+ "<p>"
+ "This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since "
+ "not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if "
+ "compression is enabled) as well as for maintaining in-flight requests.";
默认值为32m:
.define(BUFFER_MEMORY_CONFIG, Type.LONG, 32 * 1024 * 1024L, atLeast(0L), Importance.HIGH, BUFFER_MEMORY_DOC)
当然,理论上增大这个buffer也是可以的。
最后查找到问题是调用kafka的程序存在内存泄漏,创建的连接比销毁的连接多,最终导致kafka的资源耗尽