1.Flink读取kafka策略
读取kafka策略有
- org.apache.kafka.clients.consumer.RangeAssignor
- org.apache.kafka.clients.consumer.RoundRobinAssignor
- org.apache.kafka.clients.consumer.StickyAssignor
- org.apache.kafka.clients.consumer.CooperativeStickyAssignor
默认为 RangeAssignor,flinksql中可以如下调整:
'properties.partition.assignment.strategy' = 'org.apache.kafka.clients.consumer.RoundRobinAssignor'
2.Flink写入Kafka策略
2.1默认构造器
一般情况下使用
FlinkKafkaProducer(String topicId,SerializationSchema<IN> serializationSchema,Properties producerConfig)
构造器,默认为 FlinkFixedPartitioner分区器。公式为parallelInstanceId % partitions.length。即按照分区轮询,如 flink sink有5个subtask分区,kafka 有3个分区。则 1 -> 1,2 -> 2, 3 -> 3,4 -> 1,5 -> 2 。如此
代码如下
// targetTopic为flink task sink的 parallelism,partitions 为kafka的分区,parallelInstanceId 为 当前task 在 parallelism中的编号
@Override
public int partition(T record, byte[] key, byte[] value, String targetTopic, int[] partitions) {
Preconditions.checkArgument(
partitions != null && partitions.length > 0,
"Partitions of the target topic is empty.");
return partitions[parallelInstanceId % partitions.length];
}
思考:按照该策略,如果sink subtask数比topic的partition数少,会不会有partition没有数据?
待研究
2.2.自定义Kafka schema
如自定义 KafkaSerializationSchema。且调用
FlinkKafkaProducer(String defaultTopic,KafkaSerializationSchema<IN> serializationSchema,Properties producerConfig,FlinkKafkaProducer.Semantic semantic)
构造器。此时分区器为null。
此时依次调用了
record = kafkaSchema.serialize(next, context.timestamp());
transaction.producer.send(record, callback);
方法。 send方法为kafka的发送方法DefaultPartitioner 代码如下
/**
* The default partitioning strategy:
* <ul>
* <li>If a partition is specified in the record, use it
* <li>If no partition is specified but a key is present choose a partition based on a hash of the key
* <li>If no partition or key is present choose the sticky partition that changes when the batch is full.
*
* See KIP-480 for details about sticky partitioning.
*/
public class DefaultPartitioner implements Partitioner {
private final StickyPartitionCache stickyPartitionCache = new StickyPartitionCache();
public void configure(Map<String, ?> configs) {}
/**
* Compute the partition for the given record.
*
* @param topic The topic name
* @param key The key to partition on (or null if no key)
* @param keyBytes serialized key to partition on (or null if no key)
* @param value The value to partition on or null
* @param valueBytes serialized value to partition on or null
* @param cluster The current cluster metadata
*/
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
if (keyBytes == null) {
return stickyPartitionCache.partition(topic, cluster);
}
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
public void close() {}
/**
* If a batch completed for the current sticky partition, change the sticky partition.
* Alternately, if no sticky partition has been determined, set one.
*/
public void onNewBatch(String topic, Cluster cluster, int prevPartition) {
stickyPartitionCache.nextPartition(topic, cluster, prevPartition);
}
}
即:
- 如果指定了分区,则写入指定分区
- 如果则定了key,则按照key进行hash计算分区
- 如果没有指定key则采用粘性分区,即分批随机写入,保证负载均衡
关于 StickyPartitionCache.nextPartition 代码如下:
public int nextPartition(String topic, Cluster cluster, int prevPartition) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
Integer oldPart = indexCache.get(topic);
Integer newPart = oldPart;
// Check that the current sticky partition for the topic is either not set or that the partition that
// triggered the new batch matches the sticky partition that needs to be changed.
if (oldPart == null || oldPart == prevPartition) {
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() < 1) {
Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
newPart = random % partitions.size();
} else if (availablePartitions.size() == 1) {
newPart = availablePartitions.get(0).partition();
} else {
while (newPart == null || newPart.equals(oldPart)) {
Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
newPart = availablePartitions.get(random % availablePartitions.size()).partition();
}
}
// Only change the sticky partition if it is null or prevPartition matches the current sticky partition.
if (oldPart == null) {
indexCache.putIfAbsent(topic, newPart);
} else {
indexCache.replace(topic, prevPartition, newPart);
}
return indexCache.get(topic);
}
return indexCache.get(topic);
}
关于Sticky Partitioner ,具体参考 Apache Kafka Producer Improvements: Sticky Partitionerhttps://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/
待整理 负载均衡:kafka的Rebalance问题分析_大叶子不小的博客-CSDN博客_kafka的rebalance
3.Flink 提交 Kafka offset
3.1 提交offset规则
Flink Kafka Consumer 允许有配置如何将 offset 提交回 Kafka broker 的行为。Flink Kafka Consumer 不依赖于提交的 offset 来实现容错保证。提交的 offset 只是一种方法,用于公开 consumer 的进度以便进行监控。
配置 offset 提交行为的方法是否相同,取决于是否为 job 启用了 checkpointing。
-
禁用 Checkpointing: 如果禁用了 checkpointing,则 Flink Kafka Consumer 依赖于内部使用的 Kafka client 自动定期 offset 提交功能。需设置
enable.auto.commit默认值为 true,auto.commit.interval.ms 默认值为5000。具体可查看kafka官网或者 org.apache.kafka.clients.consumer.ConsumerConfig 类enable.auto.commit
或者auto.commit.interval.ms值
-
启用 Checkpointing: 如果启用了 checkpointing,那么当 checkpointing 完成时,Flink Kafka Consumer 将提交的 offset 存储在 checkpoint 状态中。确保 Kafka broker 中提交的 offset 与 checkpoint 状态中的 offset 一致。 用户可以通过调用 consumer 上的
setCommitOffsetsOnCheckpoints(boolean)
方法来禁用或启用 offset 的提交(默认情况下,这个值是 true )。 注意,在这个场景中,Properties
中的自动定期 offset 提交设置会被完全忽略
3.4 Last Committed Offset
Last Committed Offset表示consumer Group 已经提交的offset。记录当前消费点位,用于下次消费时定位offset
3.3 lag监控
如果开启了checkpoint,且时间周期为10min(10min提交一次),此情况下,通过Last Committed Offset来监控kafka lag显然是不对的。kafka提供了相关指标来进行监控,如 records-lag-max。该指标为当前partition的Log End Offset(LEO) - Current Position Offset,在flink中可以将该指标上报给Prometheus进行监控,另外flink也有一些指标可供监控使用