java循环动态获取时间,kafka按时间戳,消费者循环获取记录

我正在使用Kafka 0.10.2.1集群 . 我正在使用Kafka的offsetForTimes API来寻找特定的偏移量,并希望在达到结束时间戳时突破循环 .

我的代码是这样的:

//package kafka.ex.test;

import java.util.*;

import org.apache.kafka.clients.consumer.KafkaConsumer;

import org.apache.kafka.clients.consumer.ConsumerRecords;

import org.apache.kafka.clients.consumer.ConsumerRecord;

import org.apache.kafka.clients.consumer.OffsetAndTimestamp;

import org.apache.kafka.common.PartitionInfo;

import org.apache.kafka.common.TopicPartition;

public class ConsumerGroup {

public static OffsetAndTimestamp fetchOffsetByTime( KafkaConsumer consumer , TopicPartition partition , long startTime){

Map query = new HashMap<>();

query.put(

partition,

startTime);

final Map offsetResult = consumer.offsetsForTimes(query);

if( offsetResult == null || offsetResult.isEmpty() ) {

System.out.println(" No Offset to Fetch ");

System.out.println(" Offset Size "+offsetResult.size());

return null;

}

final OffsetAndTimestamp offsetTimestamp = offsetResult.get(partition);

if(offsetTimestamp == null ){

System.out.println("No Offset Found for partition : "+partition.partition());

}

return offsetTimestamp;

}

public static KafkaConsumer assignOffsetToConsumer( KafkaConsumer consumer, String topic , long startTime ){

final List partitionInfoList = consumer.partitionsFor(topic);

System.out.println("Number of Partitions : "+partitionInfoList.size());

final List topicPartitions = new ArrayList<>();

for (PartitionInfo pInfo : partitionInfoList) {

TopicPartition partition = new TopicPartition(topic, pInfo.partition());

topicPartitions.add(partition);

}

consumer.assign(topicPartitions);

for(TopicPartition partition : topicPartitions ){

OffsetAndTimestamp offSetTs = fetchOffsetByTime(consumer, partition, startTime);

if( offSetTs == null ){

System.out.println("No Offset Found for partition : " + partition.partition());

consumer.seekToEnd(Arrays.asList(partition));

}else {

System.out.println(" Offset Found for partition : " +offSetTs.offset()+" " +partition.partition());

System.out.println("FETCH offset success"+

" Offset " + offSetTs.offset() +

" offSetTs " + offSetTs);

consumer.seek(partition, offSetTs.offset());

}

}

return consumer;

}

public static void main(String[] args) throws Exception {

String topic = args[0].toString();

String group = args[1].toString();

long start_time_Stamp = Long.parseLong( args[3].toString());

String bootstrapServers = args[2].toString();

long end_time_Stamp = Long.parseLong( args[4].toString());

Properties props = new Properties();

boolean reachedEnd = false;

props.put("bootstrap.servers", bootstrapServers);

props.put("group.id", group);

props.put("enable.auto.commit", "true");

props.put("auto.commit.interval.ms", "1000");

props.put("session.timeout.ms", "30000");

props.put("key.deserializer",

"org.apache.kafka.common.serialization.StringDeserializer");

props.put("value.deserializer",

"org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer consumer = new KafkaConsumer(props);

assignOffsetToConsumer(consumer, topic, start_time_Stamp);

System.out.println("Subscribed to topic " + topic);

int i = 0;

int arr[] = {0,0,0,0,0};

while (true) {

ConsumerRecords records = consumer.poll(6000);

int count= 0;

long lasttimestamp = 0;

long lastOffset = 0;

for (ConsumerRecord record : records) {

count++;

if(arr[record.partition()] == 0){

arr[record.partition()] =1;

}

if (record.timestamp() >= end_time_Stamp) {

reachedEnd = true;

break;

}

System.out.println("record=>"+" offset="

+record.offset()

+ " timestamp="+record.timestamp()

+ " :"+record);

System.out.println("recordcount = "+count+" bitmap"+Arrays.toString(arr));

}

if (reachedEnd) break;

if (records == null || records.isEmpty()) break; // dont wait for records

}

}

}

我面临以下问题:

consumer.poll甚至在1000毫秒内失败 . 如果我使用1000毫秒,我不得不在循环中轮询几次 . 我现在有一个非常大的 Value . 但是,已经寻求分区内的相关偏移量,如何可靠地设置轮询超时以便立即返回数据?

我的观察是,当返回数据时,并不总是来自所有分区 . 即使从所有分区返回数据,也不会返回所有记录 . 主题中的记录数量超过1000.但实际以循环方式获取和打印的记录数量较少(约200) . 目前使用我的Kafka API有什么问题吗?

如何可靠地突破已经获得开始和结束时间戳之间的所有数据的循环,而不是过早地?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值