首先给出DataStreamAPI消费kafka的函数
下面重点描述下
-
kafka起始读取位置选择
-
待补充
public static KafkaSource<String> getKafkaSource(String topic, String groupId, String[] args) {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
String bootstrapServers = parameterTool.get("bootstrap", KAFKA_SERVER);
topic = parameterTool.get("topic", topic);
if (topic == null) {
throw new IllegalArgumentException("主题名不可为空:命令行传参为空且没有默认值!");
}
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers(bootstrapServers)
.setTopics(topic)
.setGroupId(groupId)
//
.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.LATEST))
// /注意:使用SimpleStringSchema进行反序列化,如果读到的消息为空,处理不了,需要自定义返序列化类
.setValueOnlyDeserializer(new DeserializationSchema<String>() {
@Override
public String deserialize(byte[] message) throws IOException {
if (message != null) {
return new String(message);
}
return null;
}
@Override
public boolean isEndOfStream(String nextElement) {
return false;
}
@Override
public TypeInformation<String> getProducedType() {
return TypeInformation.of(String.class);
}
})
.build();
return source;
}
-
StartingOffsets
-
最早偏移量(Earliest)
如果未找到偏移量或偏移量超出了有效范围,则从最早(最小)偏移量开始消费。
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("kafka:9092")
.setTopics("topic")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.earliest())
.build();
最新偏移量(Latest)
如果未找到偏移量或偏移量超出了有效范围,则从最新(最大)偏移量开始消费。例如:
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("kafka:9092")
.setTopics("topic")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.latest())
.build();
已提交的偏移(Committed)
如果消费者组中的所有分区都找到了已提交的偏移量,那么将从这些位置开始消费。如果任一分区没有找到已提交的偏移量,那么将根据 OffsetResetStrategy
参数(可以是 earliest
或 latest
)来确定开始消费的位置
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("kafka:9092")
.setTopics("topic")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.LATEST))
.build();
自定义偏移量(Specific Offsets)
你还可以为每个主题分区指定一个偏移量,以这些偏移量为起点开始消费:
Map<TopicPartition, Long> specificStartOffsets = new HashMap<>();
specificStartOffsets.put(new TopicPartition("myTopic", 0), 23L);
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("kafka:9092")
.setTopics("topic")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.partitions(specificStartOffsets))
.build();
自定义时间戳(Timestamp)
你可以指定一个时间戳,以这个时间戳之后的消息为起点开始消费:
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("kafka:9092")
.setTopics("topic")
.setValueOnlyDeserializer(new SimpleStringSchema())
.setStartingOffsets(OffsetsInitializer.timestamps(...))
.build();