Kafka

最新推荐文章于 2022-06-28 14:34:30 发布

山野闲者

最新推荐文章于 2022-06-28 14:34:30 发布

阅读量225

点赞数

分类专栏： Java 文章标签： kafka

本文链接：https://blog.csdn.net/lantuxin/article/details/110114376

版权

Java 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.kafka入门

参考网站：https://kafka.apachecn.org/documentation.html

参考书籍：Kafka：The Definitive Guide

最主要的需要理解kafka的本质：一个消息中间件，或者就是一个消息队列，分布式流处理平台。

组成：生产者、消费者、topic、broker、partition

简单说明：

生产者可以向topic(一个主题，这是一个概念，也可以叫类别)中生产消息。生产者只会向一个topic的其中一个partition生产消息，但是一个partition中会有leader和follower两种角色。leader负责存储消息，follower负责备份同一个topic下的其他消息。

broker的概念更简单（一个broker就是kafka集群中的一台服务器节点）。

消费者从指定topic中进行消费，一个消费者组中的N个消费者(所有)不会重复消费一个topic下的内容，这N个消费者不会同时消费同一个partition中的消息（具体可参考：kafka-the-definitive书籍第4章）。即可理解为，一个消费者组是分工完成一个topic下的消费。如果需要多个消费者消费同样的数据，就定义多个消费者组。

1.1 kafka生产者

首先需要创建一个ProducerRecord，ProducerRecord中包含了topic以及发送的消息，也可以设置partition，不设置的话会根据key进行分配（kafka自带的分区器去做），ProducerRecord需要序列化之后才能传入kafka(这个传入网络都需要做序列化)。整个过程描述如下：

We start producing messages to Kafka by creating a ProducerRecord, which must include the topic we want to send the record to and a value. Optionally, we can also specify a key and/or a partition. Once we send the ProducerRecord, the first thing the producer will do is serialize the key and value objects to ByteArrays so they can be sent over the network.

Next, the data is sent to a partitioner. If we specified a partition in the ProducerRecord, the partitioner doesn’t do anything and simply returns the partition we specified. If we didn’t, the partitioner will choose a partition for us, usually based on the ProducerRecord key. Once a partition is selected, the producer knows which topic and partition the record will go to. It then adds the record to a batch of records that will also be sent to the same topic and partition. A separate thread is responsible for sending those batches of records to the appropriate Kafka brokers.

When the broker receives the messages, it sends back a response. If the messages were successfully written to Kafka, it will return a RecordMetadata object with the topic, partition, and the offset of the record within the partition. If the broker failed to write the messages, it will return an error. When the producer receives an error, it may retry sending the message a few more times before giving up and returning an error.

1.2 kafka消费者

消费者消费分多种情况

A.指定topic下有4个partition

①一个消费者组仅一个消费者，包揽指定topic下的所有partition的消费任务；

②一个消费者组两个消费者，两个消费者各自分工消费其中2个partition（具体消费哪两个是可以在程序中指定的），不重合。

③一个消费者组四个消费者，各消费一个partition

④一个消费者组超过4个消费者，会有一个空闲的消费者

B.需要重复消费

这里下面应该是Consumer Group 2，两个消费者组都可以消费到相同的数据。即重复消费。

C.交换partition对不同的consumer的所有权

参考网站：https://www.cnblogs.com/sodawoods-blogs/p/8969774.html

这种情况会存在前一个消费者已经消费了，后一个消费者如何知道这个partition是否还需要消费上一条消息？

The way consumers maintain membership in a consumer group and ownership of the partitions assigned to them is by sending heartbeats to a Kafka broker designated as the group coordinator (this broker can be different for different consumer groups). As long as the consumer is sending heartbeats at regular intervals, it is assumed to be alive, well, and processing messages from its partitions. Heartbeats are sent when the consumer polls (i.e., retrieves records) and when it commits records it has consumed.

①If the consumer stops sending heartbeats for long enough, its session will time out and the group coordinator will consider it dead and trigger a rebalance.

②If a consumer crashed and stopped processing messages, it will take the group coordinator a few seconds without heartbeats to decide it is dead and trigger the rebalance. During those seconds, no messages will be processed from the partitions owned by the dead consumer. When closing a consumer cleanly, the consumer will notify the group coordinator that it is leaving, and the group coordinator will trigger a rebalance immediately, reducing the gap in processing.

参考网站给了详细分析

2.指定时间戳消费

参考网站1：https://blog.csdn.net/creepcheck/article/details/105941293

参考网站2：https://blog.csdn.net/qq_39839075/article/details/105522855

最主要的思路就是需要使用offsetsForTimes指定对应时间点的偏移量，然后seek指定分区(可以是所有分区)到偏移量的位置，最后使用poll将消息拉下来。

Q:排查offsetsForTimes分区value为null的情况？

A:通过使用endOffsets与beginningOffsets发现对应的value相同，推测应该是该topic下没有数据，因此往topic下生产一条数据，发现endOffsets与beginningOffsets不同了。

package com.cmb.util.comsumer;

import org.apache.commons.lang3.time.DateUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.Duration;
import java.util.*;
import java.util.concurrent.atomic.AtomicReference;

/**
 * 思路：
 * 1.给定历史时间段，将startTime通过offsetsForTimes解析成指定topic下，每个分区(partition)的偏移量(offset);
 * 2.使用seek，将每个分区(partition)的偏移量都设置为1中解析的offset；
 * 3.使用poll消费，直到msg的记录超过endTime(如果没有超过的记录，则通过tempOffset记录最新msg的时间对应的offset，
 * 第二次tempOffset没有更新时，也消费结束)
 *
 * @author
 * @date 2020/11/24 19:23
 */
public class TimeSlotConsumer {
    private static final Logger log = LogManager.getLogger(TimeSlotConsumer.class);
    private static KafkaConsumer<String, String> timeRangeConsumer;
    private static final String TIME_PATTERN = "yyyy-MM-dd HH:mm:ss";

    public static void setTimeSlotConsumer(KafkaConsumer<String, String> consumer) {
        TimeSlotConsumer.timeRangeConsumer = consumer;
    }

    public static Map<String, String> startConsumer(String topics, String startTime, String endTime) throws ParseException {
        long startTime1 = DateUtils.parseDate(startTime, TIME_PATTERN).getTime();
        long endTime1 = DateUtils.parseDate(endTime, TIME_PATTERN).getTime();

        ArrayList<TopicPartition> topicPartitionLists = new ArrayList<>();
        Map<String, String> historyMsg = new HashMap<>();

        /**获取topic中所有的分区信息*/
        log.info("获取topic({})所有的分区...", topics);
        List<PartitionInfo> partitionInfoList = timeRangeConsumer.partitionsFor(topics);
        /**将开始时间startTime1存入一个map*/
        Map<TopicPartition, Long> topicPartitionTimeMap = setPartitionWithTimeMap(partitionInfoList, startTime1);
        /**assign表明consumer不属于任何消费者组，subscribe使用了group管理*/
        timeRangeConsumer.subscribe(Arrays.asList(topics));
        timeRangeConsumer.poll(0);
        timeRangeConsumer.assignment();
        /**带时间戳的偏移量*/
        log.info("获取topic({})下各个分区，时间戳({})对应的偏移量...", topics, startTime);
        topicPartitionLists.forEach(tp -> topicPartitionTimeMap.computeIfAbsent(tp, tp1 -> startTime1));
        Map<TopicPartition, OffsetAndTimestamp> startOffsets = timeRangeConsumer.offsetsForTimes(topicPartitionTimeMap);
        AtomicReference<Boolean> isExistHistoryMsg = new AtomicReference<>(false);
        startOffsets.forEach((topicPartition, offsetAndTimestamp) -> {
            if (offsetAndTimestamp != null) {
                isExistHistoryMsg.set(true);
                log.info("msg开始时间：{}之后有历史消息", startTime);
            }
        });
        if (isExistHistoryMsg.get() == false) {
            return historyMsg;
        }
        seek2TempOffsets(startOffsets);

        log.info("开始消费，msg开始时间：{}", startTime);
        Long tempTime = startTime1;
        /**poll中的参数决定了数据不可用的阻塞时间间隔，超过时间限制，partition就会被分配给其他consumer*/
        while (true) {
            ConsumerRecords<String, String> consumerRecords = timeRangeConsumer.poll(Duration.ofMillis(1000));
            for (ConsumerRecord<String, String> record : consumerRecords) {
                if (record.timestamp() > endTime1) {
                    log.info("消费结束，msg结束时间：{}", timeFormat(record.timestamp()));
                    break;
                }
                historyMsg.put(timeFormat(record.timestamp()), record.value());
                tempTime = record.timestamp();
            }
            /**根据consumerRecords中的最后一条记录时间，获取偏移量*/
            Map<TopicPartition, Long> topicPartitionLongMap = setPartitionWithTimeMap(partitionInfoList, tempTime);
            Map<TopicPartition, OffsetAndTimestamp> tempOffsets = timeRangeConsumer.offsetsForTimes(topicPartitionLongMap);
            /**用一个tempOffsets过度，如果tempOffsets与startOffsets相同，则没有新消息消费，结束；*/
            if (tempOffsets.equals(startOffsets)) {
                break;
            } else {
                startOffsets = tempOffsets;
            }
            seek2TempOffsets(tempOffsets);
        }
        return historyMsg;
    }

    /**
     * 将每个分区seek到当前时间对应偏移量
     *
     * @param offsets
     */
    public static void seek2TempOffsets(Map<TopicPartition, OffsetAndTimestamp> offsets) {
        for (TopicPartition topicPartition : offsets.keySet()) {
            OffsetAndTimestamp offsetAndTimestamp = offsets.get(topicPartition);
            if (offsetAndTimestamp != null) {
                /**找到startTime1时间点的，所有topic下的partition的offset*/
                timeRangeConsumer.seek(topicPartition, offsetAndTimestamp.offset());
            }
        }
    }

    /**
     * 将时间time与partition封装成一个map
     *
     * @param partitionInfoList
     * @param time
     * @return
     */
    public static Map<TopicPartition, Long> setPartitionWithTimeMap(List<PartitionInfo> partitionInfoList, Long time) {
        Map<TopicPartition, Long> topicPartitionTimeMap = new HashMap<>();
        for (PartitionInfo partitionInfo : partitionInfoList) {
            TopicPartition topicPartition = new TopicPartition(partitionInfo.topic(), partitionInfo.partition());
            topicPartitionTimeMap.put(topicPartition, Long.valueOf(time));
        }
        return topicPartitionTimeMap;
    }

    public static String timeFormat(Long time) {
        Date date = new Date(time);
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat(TIME_PATTERN);
        return simpleDateFormat.format(date);
    }

    public static void main(String[] args) {
        //1.setTimeSlotConsumer();
        //2.startConsumer();
    }
}

3.多消费者消费多个topic

参考网站1：https://ask.csdn.net/questions/701345

参考网站2：https://blog.csdn.net/shmily_lsl/article/details/81877447

重平衡（rebalance）问题：当新的消费者加入消费组，它会消费一个或多个分区，而这些分区之前是由其他消费者负责的；另外，当消费者离开消费组（比如重启、宕机等）时，它所消费的分区会分配给其他分区。

往一个消费者组添加消费者时，会触发rebalance。

4.kafa命令行操作

4.1 查看kafka中的topic

./bin/kafka-topics.sh --zookeeper 127.0.0.1:2181 --list

4.2 查看指定topic下面的分区

./bin/kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic test_topic0 --describe

这里是test_topic0下面的情况：

Topic:test_topic0       PartitionCount:2        ReplicationFactor:2     Configs:
        Topic: test_topic0      Partition: 0    Leader: 7       Replicas: 7,1   Isr: 7,1
        Topic: test_topic0      Partition: 1    Leader: 8       Replicas: 8,2   Isr: 8,2

4.3 查看所有的groupId

./bin/kafka-consumer-groups.sh --zookeeper 127.0.0.1:2181 --list

4.4 创建topic

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test_topic0

4.5 生产者发送

./bin/kafka-console-producer.sh --broker-list 0.0.0.0:9092 --topic test_topic0

4.6 消费者接收

./bin/kafka-console-consumer.sh --new-consumer --bootstrap-server 0.0.0.0:9092 --topic test_topic0 --from-beginning

山野闲者

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录