Kafka的JavaAPI操作

最新推荐文章于 2024-05-11 11:47:17 发布

小哪吒的BD

最新推荐文章于 2024-05-11 11:47:17 发布

阅读量309

点赞数 2

分类专栏：大数据 kafka 文章标签： kafka

本文链接：https://blog.csdn.net/Mr_Yang888/article/details/105013821

版权

大数据同时被 2 个专栏收录

56 篇文章 0 订阅

订阅专栏

kafka

6 篇文章 0 订阅

订阅专栏

文章目录

- Kafka的JavaAPI操作

Kafka的JavaAPI操作

在这里插入图片描述

1、创建maven工程并添加jar包

创建maven工程并添加以下依赖jar包的坐标到pom.xml

<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>1.0.0</version>
</dependency>    
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>1.0.0</version>
    </dependency>

</dependencies>

<build>
    <plugins>
        <!-- java编译插件 -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.2</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
    </plugins>
</build>

2、生产者代码

2.1、使用生产者，生产数据

/**
* 订单的生产者代码，
*/
public class OrderProducer {
public static void main(String[] args) throws InterruptedException {
/* 1、连接集群，通过配置文件的方式
* 2、发送数据-topic:order，value
*/
Properties props = new Properties(); 
//kafka服务器地址
props.put("bootstrap.servers", "node01:9092"); 
//消息确认机制
props.put("acks", "all");
//重试机制
props.put("retries", 0);
//批量发送的大小
props.put("batch.size", 16384);
//消息延迟
props.put("linger.ms", 1);
//批量的缓冲区大小
props.put("buffer.memory", 33554432); 
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer"); 
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

 KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>
(props);
for (int i = 0; i < 1000; i++) {
// 发送数据 ,需要一个producerRecord对象,最少参数 String topic, V value 
kafkaProducer.send(new ProducerRecord<String, String>("order", "订单信息！"+i));
Thread.sleep(100);
}
}
}

2.2、kafka当中的数据分区

kafka生产者发送的消息，都是保存在broker当中，我们可以自定义分区规则，决定消息发送到哪个partition里面去进行保存
查看ProducerRecord这个类的源码，就可以看到kafka的各种不同分区策略
kafka当中支持以下四种数据的分区方式：

//第一种分区策略，如果既没有指定分区号，也没有指定数据key，那么就会使用轮询的方式将数据均匀的发送到不同的分区里面去
  //ProducerRecord<String, String> producerRecord1 = new ProducerRecord<>("mypartition", "mymessage" + i);
  //kafkaProducer.send(producerRecord1);
  
  //第二种分区策略 如果没有指定分区号，指定了数据key，通过key.hashCode  % numPartitions来计算数据究竟会保存在哪一个分区里面
  //注意：如果数据key，没有变化   key.hashCode % numPartitions  =  固定值  所有的数据都会写入到某一个分区里面去
  //ProducerRecord<String, String> producerRecord2 = new ProducerRecord<>("mypartition", "mykey", "mymessage" + i);
  //kafkaProducer.send(producerRecord2);
  
  //第三种分区策略：如果指定了分区号，那么就会将数据直接写入到对应的分区里面去
//  ProducerRecord<String, String> producerRecord3 = new ProducerRecord<>("mypartition", 0, "mykey", "mymessage" + i);
 // kafkaProducer.send(producerRecord3);
 
  //第四种分区策略：自定义分区策略。如果不自定义分区规则，那么会将数据使用轮询的方式均匀的发送到各个分区里面去
  //kafkaProducer.send(new ProducerRecord<String, String>("mypartition","mymessage"+i));

2.2.1自定义分区策略

public class KafkaCustomPartitioner implements Partitioner {
	@Override
	public void configure(Map<String, ?> configs) {
	}

	@Override
	public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
	//自定义分区业务逻辑代码
		//List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
	    //int partitionNum = partitions.size();
		//Random random = new Random();
		//int partition = random.nextInt(partitionNum);
	    //return partition;
	}

	@Override
	public void close() {
		
	}

}

2.2.2主代码中添加配置

@Test
	public void kafkaProducer() throws Exception {
		//1、准备配置文件
	    Properties props = new Properties();
	    props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
	    props.put("acks", "all");
	    props.put("retries", 0);
	    props.put("batch.size", 16384);
	    props.put("linger.ms", 1);
	    props.put("buffer.memory", 33554432);
	    //设置自定义分区类
	    props.put("partitioner.class", "cn.itcast.kafka.partitioner.KafkaCustomPartitioner");
	    
	    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
	    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
	    //2、创建KafkaProducer
	    KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(props);
	    for (int i=0;i<100;i++){
	        //3、发送数据
	        kafkaProducer.send(new ProducerRecord<String, String>("testpart","0","value"+i));
	    }

		kafkaProducer.close();
	}

3、消费者代码

消费必要条件
消费者要从kafka Cluster进行消费数据，必要条件有以下四个

#1、地址
bootstrap.servers=node01:9092
#2、序列化 
key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer
#3、主题（topic） 需要制定具体的某个topic（order）即可。
#4、消费者组 group.id=test

3.1、自动提交offset

消费完成之后，自动提交offset

/**
* 消费订单数据--- javaben.tojson
*/
public class OrderConsumer {
public static void main(String[] args) {
// 1\连接集群
Properties props = new Properties(); 
//指定kafka服务器
props.put("bootstrap.servers", "hadoop-01:9092"); 
//消费组
props.put("group.id", "test");

//以下两行代码 ---消费者自动提交offset值 
props.put("enable.auto.commit", "true"); 
//自动提交的周期
props.put("auto.commit.interval.ms",  "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>
(props);
//		 2、发送数据 发送数据需要，订阅下要消费的topic。	order kafkaConsumer.subscribe(Arrays.asList("order")); 
while (true) {
ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(100);// jdk queue offer插入、poll获取元素。 blockingqueue put插入原生， take获取元素
for (ConsumerRecord<String, String> record : consumerRecords) { System.out.println("消费的数据为：" + record.value());
}
}
}
}

3.2、手动提交offset

如果Consumer在获取数据后，需要加入处理，数据完毕后才确认offset，需要程序来控制offset的确认？关闭自动提交确认选项

/**
* 消费订单数据--- javaben.tojson
*/
public class OrderConsumer {
public static void main(String[] args) {
// 1\连接集群
Properties props = new Properties(); 
//指定kafka服务器
props.put("bootstrap.servers", "hadoop-01:9092"); 
//消费组
props.put("group.id", "test");

//以下两行代码 ---消费者自动提交offset值 
props.put("enable.auto.commit",  "false");
手动提交ofset值
  kafkaConsumer.commitSync();
完整代码如下所示：
Properties props = new Properties(); 
props.put("bootstrap.servers", "localhost:9092"); 
props.put("group.id", "test");
//关闭自动提交确认选项
props.put("enable.auto.commit", "false"); 
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer"); 
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer"); 
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList("test"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>(); 
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) { 
insertIntoDb(buffer);
// 手动提交offset值
consumer.commitSync(); 
buffer.clear();
}
}
}
}

3.3、消费完每个分区之后手动提交offset

上面的示例使用commitSync将所有已接收的记录标记为已提交。在某些情况下，您可能希望通过明确指定偏移量来更好地控制已提交的记录。在下面的示例中，我们在完成处理每个分区中的记录后提交偏移量。

try {
while(running) {
ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE); 
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) { System.out.println(record.offset() + ": " + record.value());
}
long lastOffset = partitionRecords.get(partitionRecords.size() -1).offset();
consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
}
}
} finally { consumer.close();}

注意事项：
提交的偏移量应始终是应用程序将读取的下一条消息的偏移量。因此，在调用commitSync（偏移量）时，应该在最后处理的消息的偏移量中添加一个

3.4、消费完每个分区之后手动提交offset

1、如果进程正在维护与该分区关联的某种本地状态（如本地磁盘上的键值存储），那么它应该只获取它在磁盘上维护的分区的记录。
2、如果进程本身具有高可用性，并且如果失败则将重新启动（可能使用YARN，Mesos或AWS工具等集群管理框架，或作为流处理框架的一部分）。在这种情况下，Kafka不需要检测故障并重新分配分区，因为消耗过程将在另一台机器上重新启动。

Properties props = new Properties(); 
props.put("bootstrap.servers", "localhost:9092"); 
props.put("group.id", "test"); 
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000"); 
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
//consumer.subscribe(Arrays.asList("foo",  "bar"));

//手动指定消费指定分区的数据---start 
String topic = "foo";
TopicPartition partition0 = new TopicPartition(topic, 0); 
TopicPartition partition1 = new TopicPartition(topic, 1); consumer.assign(Arrays.asList(partition0,  partition1));
//手动指定消费指定分区的数据---end
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100); 
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

3.5、重复消费与数据丢失

说明：
1、已经消费的数据对于kafka来说，会将消费组里面的oﬀset值进行修改，那什么时候进行修改了？是在数据消费完成之后，比如在控制台打印完后自动提交；
2、提交过程：是通过kafka将oﬀset进行移动到下个message所处的oﬀset的位置。
3、拿到数据后，存储到hbase中或者mysql中，如果hbase或者mysql在这个时候连接不上，就会抛出异常，如果在处理数据的时候已经进行了提交，那么kafka伤的oﬀset值已经进行了修改了，但是hbase或者mysql中没有数据，这个时候就会出现数据丢失。
4、什么时候提交oﬀset值？在Consumer将数据处理完成之后，再来进行oﬀset的修改提交。默认情况下oﬀset是自动提交，需要修改为手动提交oﬀset值。
5、如果在处理代码中正常处理了，但是在提交oﬀset请求的时候，没有连接到kafka或者出现了故障，那么该次修改oﬀset的请求是失败的，那么下次在进行读取同一个分区中的数据时，会从已经处理掉的oﬀset值再进行处理一次，那么在hbase中或者mysql中就会产生两条一样的数据，也就是数据重复

3.6、consumer消费者消费数据流程

在这里插入图片描述

流程描述
Consumer连接指定的Topic partition所在leader broker，采用pull方式从kafkalogs中获取消息。对于不同的消费模式，会将offset保存在不同的地方
官网关于high level API 以及low level API的简介
链接地址

4、kafka Streams API开发

需求：使用StreamAPI获取test这个topic当中的数据，然后将数据全部转为大写，写入到test2这个topic当中去

第一步：创建一个topic

cd /export/servers/kafka_2.11-1.0.0/
bin/kafka-topics.sh --create  --partitions 3 --replication-factor 2 --topic test2 --zookeeper node01:2181,node02:2181,node03:2181

第二步：开发StreamAPI

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import java.util.Properties;

public class Stream {

    public static void main(String[] args) {
        Properties props = new Properties();
        //设置程序的唯一标识
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
        //设置kafka集群
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "node01:9092");
        //设置序列化与反序列化
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        //实例一个计算逻辑
        StreamsBuilder streamsBuilder = new StreamsBuilder();
        //设置计算逻辑
        streamsBuilder.stream("18BD12").mapValues(line->line.toString().toUpperCase()).to("18BD12-1");
        //构建Topology对象（拓扑，流程）
        final Topology topology = streamsBuilder.build();
        //实例 kafka流
       KafkaStreams streams = new KafkaStreams(topology, props);
       //启动流计算
        streams.start();

    }
}

第三步：生产数据
node01执行以下命令，向test这个topic当中生产数据

cd /export/servers/kafka_2.11-1.0.0
bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic test

第四步：消费数据
node02执行一下命令消费test2这个topic当中的数据

cd /export/servers/kafka_2.11-1.0.0
bin/kafka-console-consumer.sh --from-beginning  --topic test2 --zookeeper node01:2181,node02:2181,node03:2181

好了，本章内容就到这里结束啦。各位的 【三连】 就是小编坚持下去的动力。
--------------------------------------------------------------------------------------------------------------------------------------
我是小哪吒，一个互联网行业的砖家，（哪里需要哪里往哪搬的砖家）…哈哈

如果你不去尝试，所有机会都会跟你擦肩而过

小哪吒的BD

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Kafka的JavaAPI操作

Kafka的JavaAPI操作1、创建maven工程并添加jar包创建maven工程并添加以下依赖jar包的坐标到pom.xml<dependencies><dependency> <gro...
复制链接

扫一扫