Kafka源码分析-序列6 -Consumer -消费策略分析

这里写图片描述

从这一篇开始,我们将进入Consumer的分析。同Producer一样, Consumer也分旧的Scala版和新的Java版,在此我们只分析新的Java版。

在分析之前,我们先看一下, Consumer的基本用法:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">     Properties props = new Properties()<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"bootstrap.servers"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"localhost:9092"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"group.id"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"test"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"enable.auto.commit"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"true"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"auto.commit.interval.ms"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"1000"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"session.timeout.ms"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"30000"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"key.deserializer"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"org.apache.kafka.common.serialization.StringDeserializer"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     props<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"value.deserializer"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"org.apache.kafka.common.serialization.StringDeserializer"</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>

     consumer<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.subscribe</span>(Arrays<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.asList</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"foo"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"bar"</span>))<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;  //核心函数1:订阅topic</span>
     while (true) {
         ConsumerRecords<String, String> records = consumer<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.poll</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">; //核心函数2:long poll,一次拉取回来多个消息</span>
         for (ConsumerRecord<String, String> record : records)
             System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.out</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.printf</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"offset = %d, key = %s, value = %s"</span>, record<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.offset</span>(), record<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.key</span>(), record<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.value</span>())<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">;</span>
     }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li></ul>

Consumer的非线程安全

在前面我们讲到,KafkaProducer是线程安全的,可以多个线程共享一个producer实例。但Consumer却不是。

在KafkaConsumer的几乎所有函数中,我们都会看到这个:

<code class="hljs r has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">    public ConsumerRecords<K, V> poll(long timeout) {
        acquire();   //这里的acquire/release不是为了多线程加锁,恰恰相反:是为了防范多线程调用。如果发现多线程调用,内部会直接抛异常出来
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">...</span>
        release(); 
   }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

Consumer Group – 负载均衡模式 vs. Pub/Sub模式

每一个consumer实例,在初始化的时候,都需要传一个group.id,这个group.id决定了多个Consumer在消费同一个topic的时候,是分摊,还是广播。

假设多个Consumer都订阅了同一个topic,这个topic有多个partition.

负载均衡模式: 多个Consumer属于同一个group,则topic对应的partition的消息会分摊到这些Consumer上。

Pub/Sub模式:多个Consumer属于不同的group,则这个topic的所有消息,会广播到每一个group。

Partition 自动分配 vs. 手动指定

在上面的负载均衡模式中,我们调用subscrible函数,只指定了topic,不指定partition,这个时候,partition会自动在这个group的所有对应consumer中分摊。

另外一种方式是,强制指定consumer消费哪个topic的哪个partion,使用的是assign函数。

<code class="hljs cs has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">subscribe</span>(List<String> topics) {
        subscribe(topics, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> NoOpConsumerRebalanceListener());
    }

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">assign</span>(List<TopicPartition> partitions) {
        。。。
    }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

一个关键点是:这2种模式是互斥的,使用了subscribe,就不能使用assign。反之亦然。

在代码中,这2种模式,是分别存放在2个不同的变量中:

<code class="hljs java has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">SubscriptionState</span> {</span>
    。。。
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">final</span> Set<String> subscription;  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//对应subscrible模式</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">final</span> Set<TopicPartition> userAssignment; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//对应assign模式</span>
}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

同样,代码中调用subscrible或者assign的时候,有相应的检查。如果发现互斥,会抛异常出来。

消费确认 - consume offset vs. committed offset

在前面我们提到,“消费确认”是所有消息中间件都要解决的一个问题:拿到消息之后,处理完毕,向消息中间件发送ack,或者说confirm。

那这里就会涉及到2个消费位置,或者说2个offset值: 一个是当前取消息所在的consume offset,一个是处理完毕,发送ack之后所确定的committed offset。

很显然,在异步模式下,committed offset要落后于consume offset。

这里的一个关键点:假如consumer挂了重启,那它将从committed offset位置开始重新消费,而不是consume offset位置。这也就意味着有可能重复消费

在0.9客户端中,有3种ack策略: 
策略1: 自动的,周期性的ack。也就是上面demo所展示的方式:

<code class="hljs sql has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">     props.put("enable.auto.<span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">commit</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", "</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">true</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">");
     props.put("</span>auto.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">commit</span>.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">interval</span>.ms<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">", "</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1000</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">");</span></span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

策略2:consumer.commitSync() //调用commitSync,手动同步ack。每处理完1条消息,commitSync 1次

策略3:consumer. commitASync() //手动异步ack

Exactly Once – 自己保存offset

在前面我们讲过,Kafka只保证消息不漏,即at lease once,而不保证消息不重。

重复发送:这个客户端解决不了,需要服务器判重,代价太大。

重复消费:有了上面的commitSync(),我们可以每处理完1条消息,就发送一次commitSync。那这样是不是就可以解决“重复消费”了呢?就像下面的代码:

<code class="hljs lasso has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">     <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span> (<span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">true</span>) {
         ConsumerRecords<span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;"><</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">String</span>, <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">String</span><span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;">></span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">records</span> <span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;">=</span> consumer<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>poll(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>);
         for (ConsumerRecord<span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;"><</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">String</span>, <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">String</span><span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;">></span> record : <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">records</span>) {
             buffer<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>add(record);
         }
         <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (buffer<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>size() <span class="hljs-subst" style="color: rgb(0, 0, 0); box-sizing: border-box;">>=</span> minBatchSize) {
             insertIntoDb(buffer);    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//消除处理,存到db</span>
             consumer<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>commitSync();   <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//同步发送ack</span>
             buffer<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>clear();
         }
     }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>

答案是否定的!因为上面的insertIntoDb和commitSync做不到原子操作:如果在数据处理完成,commitSync的时候挂了,服务器再次重启,消息仍然会重复消费。

那这个问题有什么解决办法呢?

答案是自己保存committed offset,而不是依赖kafka的集群保存committed offset,把消息的处理和保存offset做成一个原子操作。

在kafka的官方文档中,列举了以下2种自己保存offset的使用场景:

<code class="hljs mizar has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">//关系数据库,通过事务存取。consumer挂了,重启,消息也不会重复消费
If the results <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the consumption are <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">being</span> stored <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> a relational database, storing the offset <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the database <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> well can allow committing both the results <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> offset <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> a single transaction. Thus either the transaction will succeed <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the offset will <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> updated based on what was consumed <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> the result will <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> stored <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the offset won't <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> updated.

//搜索引擎:把offset跟数据一起,建在索引里面
If the results are <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">being</span> stored <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> a local store it may <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> possible to store the offset there <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> well. For example a search index could <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> built <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> subscribing to a particular partition <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> storing both the offset <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the indexed data together. If this <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> done <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> a way <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> atomic, it <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> often possible to have it <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> the case <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> even if a crash occurs <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> causes unsync'd data to <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">be</span> lost, whatever <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> left has the corresponding offset stored <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> well. This <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">means</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> this case the indexing process <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> comes back having lost recent updates just resumes indexing <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> what it has ensuring <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> no updates are lost.</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

同时,官方也说了,要自己保存offset,就需要做以下几个操作

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">Configure enable.auto.commit=<span class="hljs-constant" style="box-sizing: border-box;">false</span>  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"> //禁用自动ack</span>
Use <span class="hljs-operator" style="box-sizing: border-box;">the</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">offset</span> provided <span class="hljs-operator" style="box-sizing: border-box;">with</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">each</span> ConsumerRecord <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">to</span> save your position.<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"> //每次取到消息,把对应的offset存下来</span>
On restart restore <span class="hljs-operator" style="box-sizing: border-box;">the</span> position <span class="hljs-operator" style="box-sizing: border-box;">of</span> <span class="hljs-operator" style="box-sizing: border-box;">the</span> consumer <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">using</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">seek</span>(TopicPartition, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">long</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">.//下次重启,通过consumer.seek函数,定位到自己保存的offset,从那开始消费</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

通过上述办法,我们也就达到了在消费端的 ”Exactly Once “,在消费端,消息不丢,不重。

更进一步把producer + consumer合在一起思考,如果有了消费端的Exactly Once,再加上DB的判重,即使发送端有“重复发送”,也没问题了。


原文地址:http://blog.csdn.net/chunlongyu/article/details/52663090

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值