- kafka consumer按照时间戳消费topic
直接相当于命令行版本消费 指定到offest 会很快 推荐使用 sparkStreaming在启动会很慢
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.TopicPartition;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;
public class ConsumerKafka {
public static void main(String[] args) throws IOException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost");
props.put("group.id", "test");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("deserializer.encoding", "utf-8");
props.put("auto.offset.reset", "latest");
if (args[0].equals("1")) {
props.put("sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"" + username + "\" password=\"" + pwd + "\";");
props.put("security.protocol", "SASL_PLAINTEXT");
props.put("sasl.mechanism", "PLAIN");
} else if (args[1].equals("2")) {
props.put("sasl.jaas.config", "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"" + username + "\" password=\"" + pwd + "\";");
props.put("security.protocol", "SASL_PLAINTEXT");
props.put("sasl.mechanism", "SCRAM-SHA-256");
}
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>(props);
String topics = "msg";
String[] split = topics.split(",");
kafkaConsumer.subscribe(Arrays.asList(split));
Set<TopicPartition> assignment = new HashSet<>();
while (assignment.size() == 0) {
kafkaConsumer.poll(100L);
assignment = kafkaConsumer.assignment();
}
Map<TopicPartition, Long> map = new HashMap<>();
SimpleDateFormat sf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date = new Date();
try {
date = sf.parse("2021-06-16 12:38:34");
} catch (ParseException e) {
e.printStackTrace();
}
for (TopicPartition tp : assignment) {
map.put(tp, date.getTime());
}
Map<TopicPartition, OffsetAndTimestamp> offsets = kafkaConsumer.offsetsForTimes(map);
for (TopicPartition topicPartition : offsets.keySet()) {
OffsetAndTimestamp offsetAndTimestamp = offsets.get(topicPartition);
if (offsetAndTimestamp != null) {
kafkaConsumer.seek(topicPartition, offsetAndTimestamp.offset());
}
}
while (true) {
ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(1000L);
for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
String line = consumerRecord.value();
System.out.println(line);
}
}
}
}
- sparkStreaming 从指定时间戳开始消费kafka topic
def getOffsetByTimestamp(kafkaParams: collection.Map[String, Object], time: String, topic: String): mutable.HashMap[TopicPartition, Long] = {
val consumer = new KafkaConsumer[String, String](new java.util.HashMap[String, Object](kafkaParams.asJava))
val fetchTime = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").parseMillis(time)
val timestampToSearch: java.util.Map[TopicPartition, java.lang.Long] = new java.util.HashMap[TopicPartition, java.lang.Long]()
val partitionOffset = new mutable.HashMap[TopicPartition, Long]
val partitionInfos = consumer.partitionsFor(topic)
for (partitionInfo <- partitionInfos.asScala) {
val tp = new TopicPartition(partitionInfo.topic(), partitionInfo.partition());
timestampToSearch.put(tp, fetchTime)
}
val topicPartitionToOffsetAndTimestamp = consumer.offsetsForTimes(timestampToSearch)
for ((tp, offsetAndTimeStamp) <- topicPartitionToOffsetAndTimestamp.asScala) {
val offset = offsetAndTimeStamp.offset()
partitionOffset += tp -> offset
}
consumer.close()
partitionOffset
}
val messages: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams, getOffsetByTimestamp(kafkaParams, startTime, topic)))
messages
}