问题来源
项目运行中我们经常需要诊断个个环节是否正确,其中到kafka就需要查看最新的消息到达kafka没有,达到的内容是什么,这就需要查看kafka指定topic的最近的n条消息(将kakfa消息全部打印出来非常耗时而且不必要)。当然我们可以使用第三方提供的kafka工具查看,可是使用这些工具耗时费力而且不能很好集成到项目中。
备注:第三方工具包括kakfa命令行工具一起其他第三方的工具。
注意事项:本博客所有代码是为了介绍相关内容而编写或者引用的,示例代码并非可直接用于生产的代码。仅供参考而已。
大体思路
我们知道如果使用KafkaConsumer类,要么从topic最老的消息开始,要么从最新的消息开始。下面是官方文档中关于auto.offset.reset 的解释。
The default is “latest,” which means that lacking a valid offset,
the consumer will start reading from the newest records (records that were written after the consumer started running).
The alternative is “earliest,” which means that lacking a valid offset, the consumer will read all the data in the partition,
starting from the very beginning.
因此直接使用KafkaConsumer就行不通了。 但是我们可以先通过给指定coisumerGroup设置offset,然后在让KafkaConsumer以该coisumerGroup读取数据。这样就能从指定offset开始读取消息了。
具体代码
完整的代码在这里,欢迎加星和fork。 谢谢!
package com.yq.consumer;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.DescribeTopicsResult;
import org.apache.kafka.clients.admin.TopicDescription;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.KafkaFuture;
import org.apache.kafka.common.Node;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.TopicPartitionInfo;
import java.util.Arrays;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* 本例子试着读取当前topic每个分区内最新的30条消息(如果topic额分区内有没有30条,就获取实际消息)
* className: ReceiveLatestMessageMain
*
* @author EricYang
* @version 2019/01/10 11:30
*/
@Slf4j
public class ReceiveLatestMessageMain {
private static final int COUNT = 30;
public static void main(String... args) throws Exception {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "ubuntu:9092");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "yq-consumer12");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 20);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
System.out.println("create KafkaConsumer");
final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
AdminClient adminClient = AdminClient.create(props);
String topic = "topic01";
adminClient.describeTopics(Arrays.asList(topic));
try {
DescribeTopicsResult topicResult = adminClient.describeTopics(Arrays.asList(topic));
Map<String, KafkaFuture<TopicDescription>> descMap = topicResult.values();
Iterator<Map.Entry<String, KafkaFuture<TopicDescription>>> itr = descMap.entrySet().iterator();
while(itr.hasNext()) {
Map.Entry<String, KafkaFuture<TopicDescription>> entry = itr.next();
System.out.println("key: " + entry.getKey());
List<TopicPartitionInfo> topicPartitionInfoList = entry.getValue().get().partitions();
topicPartitionInfoList.forEach((e) -> {
int partitionId = e.partition();
Node node = e.leader();
TopicPartition topicPartition = new TopicPartition(topic, partitionId);
Map<TopicPartition, Long> mapBeginning = consumer.beginningOffsets(Arrays.asList(topicPartition));
Iterator<Map.Entry<TopicPartition, Long>> itr2 = mapBeginning.entrySet().iterator();
long beginOffset = 0;
//mapBeginning只有一个元素,因为Arrays.asList(topicPartition)只有一个topicPartition
while(itr2.hasNext()) {
Map.Entry<TopicPartition, Long> tmpEntry = itr2.next();
beginOffset = tmpEntry.getValue();
}
Map<TopicPartition, Long> mapEnd = consumer.endOffsets(Arrays.asList(topicPartition));
Iterator<Map.Entry<TopicPartition, Long>> itr3 = mapEnd.entrySet().iterator();
long lastOffset = 0;
while(itr3.hasNext()) {
Map.Entry<TopicPartition, Long> tmpEntry2 = itr3.next();
lastOffset = tmpEntry2.getValue();
}
long expectedOffSet = lastOffset - COUNT;
expectedOffSet = expectedOffSet > 0? expectedOffSet : 1;
System.out.println("Leader of partitionId: " + partitionId + " is " + node + ". expectedOffSet:"+ expectedOffSet
+ ", beginOffset:" + beginOffset + ", lastOffset:" + lastOffset);
consumer.commitSync(Collections.singletonMap(topicPartition, new OffsetAndMetadata(expectedOffSet -1 )));
});
}
consumer.subscribe(Arrays.asList(topic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("read offset =%d, key=%s , value= %s, partition=%s\n",
record.offset(), record.key(), record.value(), record.partition());
}
}
} catch (Exception ex) {
ex.printStackTrace();
System.out.println("when calling kafka output error." + ex.getMessage());
} finally {
adminClient.close();
consumer.close();
}
}
}
运行效果
效果图解释:
我们只读取topic01每个分区最近的30条消息,如果该partition没有30条,那就读取全部的。 可以看到partition 0有39条消息,因此从第8条开始读取。