Kafka编程指南之三:生产者API实战


前一篇文章用控制台命令实践了Kafka的消息生产和消费,接下来用Java编写生产者和消费者程序。
生产者先声明一个ProducerRecord对象,包含了topic、partition、key、value信息,然后通过Send()方法发送。因为要通过网络传输,所以要经过序列化。还可以自定义分区器,下面通过例子来说明。需要注意的是,发送消息是分批的,如果没有达到批次要求,也是不会实际发送的。

搭建环境

按之前的文章搭建好Kafka服务器后,在本地用IDEA新建一个maven项目kafkatest。使用如下pom.xml文件,自动导入依赖包。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>kafkatest</groupId>
    <artifactId>kafkatest</artifactId>
    <version>1.0-SNAPSHOT</version>
    <name>kafkatest</name>
    <url>http://maven.apache.org</url>
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.11</artifactId>
        <version>1.0.0</version>
    </dependency>
</dependencies>
</project>

编写生产者producer.send(record);

在消息发送时,需要考虑应用场景:
消息不允许丢失,也不允许重复(金融业务);
允许丢失少量消息,也可以延迟,保持高吞吐(用户行为记录);
生产者调用send()函数将record,发送给broker。
三种Send方法

Properties props = new Properties();
props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i=0; i<100; i++)
	producer.send(new ProducerRecord<String, String>("javatopic", Integer.toString(i), "messager:"+Integer.toString(i)));
producer.close();//未达到批次要求大小强制发送

send()方法是异步的,把消息加入到消息队列中后立即返回,这样可以批量发送。
producer为每个partition维护了未发送消息的缓冲区,缓冲区的大小可以设置( props.put(“batch.size”, 16384);)

Fire-and-forget

不保证消息会成功发送,producer会自动重试,但还是无法保证不会丢失消息。如上述的示例。

同步发送

send()方法会返回Future对象并使用get()方法阻塞,

        try {
            Future<RecordMetadata> futurerm= producer.send(record);
            RecordMetadata rm = futurerm.get();
            long offset = rm.offset();
            int partition = rm.partition();
            String topic = rm.topic();
            System.out.println("topic:"+topic+",partition:"+partition+",offset:"+offset);

            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }

异步发送

send()方法加入回调函数callback

for (int i=0; i<100; i++){
	try {
		producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(i),"messagecallback:"+i),new MyCallback());
	} catch(Exception e){
                e.printStackTrace();
    }
}
producer.close();

callback函数如下:

package producer;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.RecordMetadata;
public class MyCallback implements Callback{

    @Override
    public void onCompletion(RecordMetadata recordMetadata, Exception e) {
        if(e != null){
            //异常处理
            e.printStackTrace();
        }
        else{
            long offset = recordMetadata.offset();
            int partition = recordMetadata.partition();
            String topic = recordMetadata.topic();
            System.out.println("topic:" + topic + ",partition:" + partition + ",offset:" + offset);

        }
    }
}

多线程模式

public class MultiProducer extends Thread{
    private KafkaProducer<String,String> producer;
    private String topicName;
    public MultiProducer(String topic){
        Properties props = new Properties();
        props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());

        producer = new KafkaProducer<>(props);
        topicName = topic;
    }
    @Override
    public void run(){
        int messageCount = 0;
        while(messageCount<100){
            producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(messageCount),"Multimessage:"+messageCount),new MyCallback());
            messageCount++;
            producer.flush();

        }

    }

    public static void main(String[] args) {
        ExecutorService es = Executors.newFixedThreadPool(3);
        for(int i=0;i<20;i++){
            es.execute(new MultiProducer("multijavatopic"));
        }
        es.shutdown();
    }
}

自定义分区器

如果不指定分区器,则默认按key的hashCode来分区,如果key为空则按分区个数轮询平均分配到各个分区。如果分区数量增加,那么相同的key前后不能保证分配到同一个分区,所以尽量创建足够的分区从不添加。
自定义分区器要实现Partitioner接口:

public interface Partitioner extends Configurable, Closeable {

    /**
     * Compute the partition for the given record.
     *
     * @param topic The topic name
     * @param key The key to partition on (or null if no key)
     * @param keyBytes The serialized key to partition on( or null if no key)
     * @param value The value to partition on or null
     * @param valueBytes The serialized value to partition on or null
     * @param cluster The current cluster metadata
     */
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);

    /**
     * This is called when partitioner is closed.
     */
    public void close();
}

Kafka默认的分区器DefaultPartitioner实现了此接口,并且对key进行哈希取模:

...
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
//murmur2:Generates 32 bit murmur2 hash from byte array
//toPositive:return number & 0x7fffffff;只用与操作把负数转为正数

自定义分区器举例:

public class MyPartitioner implements Partitioner{
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            throw new InvalidRecordException("null key  is not allowed");
        }
        if(key.equals("1")){
            System.out.println("My Partitioner for key 1");
            return numPartitions-1;//如果key=1放入最后一个分区
        }
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;;
    }
    @Override
    public void close() {
    }
    @Override
    public void configure(Map<String, ?> configs) {

    }
}

将自定义分区器类的路径加入配置:

props.put("partitioner.class","producer.MyPartitioner");

相关参数配置

在Kafka官网可以查看相关配置说明,进入http://kafka.apache.org点击左边DOCUMENTATION,搜索producer configs可以定位到参数说明位置。
在这里插入图片描述

消息写入成功的判断标准acks

acks是消息请求完成的判断标准:收到的ack的数量(The number of acknowledgments the producer requires the leader to have received before considering a request complete)。
acks可以取四个值:0、1、all、-1(按此顺序要求越来越严格)。比如0代表生产者只要发送就认为成功了,不等待broker回复,吞吐量最高;1代表消息发送到leader并返回成功后即为成功,此时的吞吐量取决于消息是同步发送还是异步发送;all和-1作用相同,代表发送到broker的所有副本都返回成功才认为消息发送成功,这是最安全的也是最严格的模式。即使一个broker崩溃,由于存在副本,消息也不会丢失。
上述Java代码中可以这样设置,注意类型是String类型,不是Int型。

props.put("acks", "1");

消息保留期

在broker的server.properties配置文件中配置

offsets.retention.minutes

offsets.retention.minutes参数是个int型,默认值是10080(
2.0.0版本将默认的offset retention time从1天改为7天,即10080分钟),表示消息偏移量的过期时间,这点跟消费者的消费位置密切相关。

log.retention.minutes

topic的保存时间。建议offsets.retention.minutes设置的时间大于 log.retention.minutes时间,否则topic没有过期,但是偏移量已经过期,无法正常读取topic。

怎么修改消息默认大小?

消息最大发送的大小设置max.partition.fetch.bytes,默认为1048576,可以修改:

props.put("max.partition.fetch.bytes","10485760")"

消息发送重试次数

props.put("message.send.max.retries","3");

批次消息数量

props.put("batch.num.message","200");

源码分析

send()方法源码

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        ProducerRecord<K, V> interceptedRecord = this.interceptors == null?record:this.interceptors.onSend(record);
        return this.doSend(interceptedRecord, callback);
    }

实际调用了doSend方法

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
	TopicPartition tp = null;//此对象存储了topic和partition信息,实现了hashCode、equals、toString方法
try {
    // first make sure the metadata for the topic is available
    ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
    long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
    Cluster cluster = clusterAndWaitTime.cluster;
    byte[] serializedKey;
    try {
    serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
    } catch (ClassCastException cce) {...}
    byte[] serializedValue;
    try {//序列化
    	serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
    } catch (ClassCastException cce) {...}
    //计算发送到哪个partition
    int partition = this.partition(record, serializedKey, serializedValue, cluster);
    tp = new TopicPartition(record.topic(), partition);//得到topic和partition信息
    this.setReadOnly(record.headers());
    Header[] headers = record.headers().toArray();
    int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(this.apiVersions.maxUsableProduceMagic(), this.compressionType, serializedKey, serializedValue, headers);
    this.ensureValidRecordSize(serializedSize);
    //时间戳
    long timestamp = record.timestamp() == null?this.time.milliseconds():record.timestamp().longValue();
    this.log.trace("Sending record {} with callback {} to topic {} partition {}", new Object[]{record, callback, record.topic(), Integer.valueOf(partition)});
    //拦截器与回调函数
    Callback interceptCallback = this.interceptors == null?callback:new KafkaProducer.InterceptorCallback(callback, this.interceptors, tp);
    if(this.transactionManager != null && this.transactionManager.isTransactional()) {this.transactionManager.maybeAddPartitionToTransaction(tp);}
    //核心代码
	RecordAppendResult result = this.accumulator.append(tp, timestamp, serializedKey, serializedValue, headers, (Callback)interceptCallback, remainingWaitMs);
	if(result.batchIsFull || result.newBatchCreated) {
    this.log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), Integer.valueOf(partition));
                this.sender.wakeup();
    }
    return result.future;
    }catch(){...//省略各种异常捕获
    }

this.accumulator.append函数讲消息打包到批次中,消息批次的数据结构为队列

Deque<ProducerBatch> dq = this.getOrCreateDeque(tp);

通过TopicPartition对象得到或者生成队列ArrayDeque()

private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
        Deque<ProducerBatch> d = (Deque)this.batches.get(tp);
        if(d != null) {
            return d;
        } else {
            Deque<ProducerBatch> d = new ArrayDeque();
            Deque<ProducerBatch> previous = (Deque)this.batches.putIfAbsent(tp, d);
            return (Deque)(previous == null?d:previous);
        }
    }

其中this.batches是一个ConcurrentMap<TopicPartition, Deque<ProducerBatch>>,以tp为key,用get(tp)方法说明相同的topic和partition用的是同一个队列。批次满了就新建一个批次,并加入到队列中:

ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, this.time.milliseconds());
...
dq.addLast(batch);

参考资料:
http://kafka.apache.org/documentation/#producerconfigs
http://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值