一、Kafka简介
1.1 Kafka架构
(1)点对点消息系统
(2)发布 - 订阅消息系统
1.2 Kafka的组件介绍
1、Producer:消息生产者,就是向kafka broker发送消息的客户端
2、Concumer:消息消费者,向kafka broker拉取消息的客户端
3、Topic:每条发布到Kafka集群的消息都有一个类别,这个类别被称为Topic。(物理上不同Topic的消息分开存储,逻辑上一个Topic的消息虽然保存于一个或多个broker上但用户只需指定消息的Topic即可生产或消费数据而不必关心数据存于何处)
4、Broker:Kafka集群包含一个或多个服务器,这种服务器被称为broker
5、Partition:是物理上的概念,每个Topic包含一个或多个Partition(分区),partition中的每个消息都会分配一个有序的id(offset),kafka只保证按一个partition中的顺序将消息发送给Consumer,不保证一个topic整体(partition间)的顺序。
6、Consumer Group:每个Consumer属于一个特定的Consumer Group(可为每个Consumer指定group name,若不指定group name则属于默认的group)
7、Offset:kafka的存储文件都是按照offset.kafka来命名,用offset做名字的好处就是方便查找。例如你想找打1025的位置,只要找到1024.kafka的文件即可,当然第一个文件就是00000000000.kafka。
1.3 什么是consumer group?
一言以蔽之,consumer group是kafka提供的可扩展且具有容错性的消费者机制。既然是一个组,那么组内必然可以有多个消费者或消费者实例(consumer instance),它们共享一个公共的ID,即group ID。组内的所有消费者协调在一起来消费订阅主题(subscribed topics)的所有分区(partition)。当然,每个分区只能由同一个消费组内的一个consumer来消费。
理解consumer group记住下面这三个特性就好了:
- group.id是一个字符串,唯一标识一个consumer group。
- consumer group下订阅的topic下的每个分区只能分配给某个group下的一个consumer(当然该分区还可以被分配给其他group)
- consumer group下可以有一个或多个consumer instance,consumer instance可以是一个进程,也可以是一个线程
上面的话其实还是有些术语化,我们可以用更通俗易懂的话来解释它:
- group.id是一个组的唯一标识id
- group订阅的topic会分配给组下的一个consumer,与其他组可一起分享,但同组内只能被分享一次
- 一个group下可以有多个consumer,一个consumer也可以在多个组中,但每个组内应该是唯一的。
二、Demo演示
pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
</dependency>
application.yml
kafka:
producer:
batch-size: 16785
buffer-memory: 33554432
linger: 1
retries: 3
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
consumer:
auto-commit-interval: 100
enable-auto-commit: true
max-poll-records: 5
session-timeout: 20000
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
bootstrap-server: xxx.xxx.xxx.xxx
topic:
group-id: aliyunGroup
topic-name:
- dev.kafka.topic1
- dev.kafka.topic2
batch-topic:
- dev.kafka.batch.topic1
- dev.kafka.batch.topic2
Kafka的配置类:
KafkaTopicConfiguration.java
@Bean
public KafkaAdmin kafkaAdmin() {
Map<String, Object> props = new HashMap<>();
//配置Kafka实例的连接地址
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaServer);
KafkaAdmin admin = new KafkaAdmin(props);
return admin;
}
@Bean
public AdminClient adminClient() {
return AdminClient.create(kafkaAdmin().getConfig());
}
KafkaProducerConfig.java
@Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
//kafka服务器
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
//重试次数
props.put(ProducerConfig.RETRIES_CONFIG, retries);
//一次最多发送数据量
props.put(ProducerConfig.BATCH_SIZE_CONFIG, batchSize);
props.put(ProducerConfig.LINGER_MS_CONFIG, linger);
//批处理缓冲区
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, bufferMemory);
//键的序列化方式
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class);
//值的序列化方式
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return props;
}
@Bean
public ProducerFactory<Integer, String> producerFactory() {
DefaultKafkaProducerFactory<Integer, String> producerFactory = new DefaultKafkaProducerFactory<>(producerConfigs());
return producerFactory;
}
@Bean
public KafkaTemplate<Integer, String> kafkaTemplate() {
KafkaTemplate<Integer, String> kafkaTemplate = new KafkaTemplate<>(producerFactory());
kafkaTemplate.setProducerListener(producerListener);
return kafkaTemplate;
}
KafkaConsumerConfig.java
@Autowired(required = false)
public void setDefaultRecordFilterStrategy(DefaultRecordFilterStrategy defaultRecordFilterStrategy){
if(defaultRecordFilterStrategy == null){
defaultRecordFilterStrategy = new DefaultRecordFilterStrategy();
}
this.defaultRecordFilterStrategy = defaultRecordFilterStrategy;
}
@Bean
public ConcurrentKafkaListenerContainerFactory<Integer, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setAckDiscarded(true);
factory.setRecordFilterStrategy(defaultRecordFilterStrategy);
return factory;
}
@Bean
public ConsumerFactory<Integer, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
//初始型配置
@Bean
public Map<String, Object> consumerConfigs() {
if(bootstrapServers.equals("aliyun")){
bootstrapServers = CommonParam.aliyunServer;
}else if(bootstrapServers.equals("dev")){
bootstrapServers = CommonParam.DEVServer;
}else{
bootstrapServers = CommonParam.aliyunServer;
}
Map<String, Object> props = new HashMap<>(10);
//自动提交的频率
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, autoCommitInterval);
//连接地址
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
//是否开启自动提交
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, autoCommit);
//连接超时时间
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, sessionTimeout);
//键反序列化方式
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, IntegerDeserializer.class);
//值反序列化方式
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return props;
}
Kafka重在配置,配置之后,只需要一行API就可以实现发送:
kafkaTemplate.send(topic, new GsonBuilder().create().toJson(message)).get();
一个注解实现监听:
@KafkaListener(topics = "#{kafkaTopicName}", groupId = "#{topicGroupId}", errorHandler = "consumerAwareErrorHandler")
其中errorHandler是监听出错的处理,可省略
三、测试
[20190809 14:39:02:166\ INFO ] WindowsSystemLog.asyncSend(43) -
params#dev.kafka.topic2#this is async test
[20190809 14:39:02:167\ INFO ] WindowsSystemLog.asyncSend(48) -
async send successful!!!
[20190809 14:39:02:211\ INFO ] WindowsSystemLog.onSuccess(20) -
Message sendTo topic success : ProducerRecord(topic=dev.kafka.topic2, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"id":"ae25972710d34d8baa238dcad80db18e","msg":"this is async test","topic":"dev.kafka.topic2","sendTime":"Aug 9, 2019 2:39:02 PM"}, timestamp=null)
[20190809 14:39:02:213\ INFO ] WindowsSystemLog.filter(24) -
consumerRecord is : ConsumerRecord(topic = dev.kafka.topic2, partition = 0, offset = 9, CreateTime = 1565332742167, serialized key size = -1, serialized value size = 131, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = {"id":"ae25972710d34d8baa238dcad80db18e","msg":"this is async test","topic":"dev.kafka.topic2","sendTime":"Aug 9, 2019 2:39:02 PM"})
[20190809 14:39:02:227\ INFO ] WindowsSystemLog.defaultListen(32) -
params#{"id":"ae25972710d34d8baa238dcad80db18e","msg":"this is async test","topic":"dev.kafka.topic2","sendTime":"Aug 9, 2019 2:39:02 PM"}
[20190809 14:39:02:239\ INFO ] WindowsSystemLog.doController(24) -
consumer successful:Message(id=ae25972710d34d8baa238dcad80db18e, msg=this is async test, topic=dev.kafka.topic2, sendTime=Fri Aug 09 14:39:02 CST 2019)
[20190809 14:39:02:239\ INFO ] WindowsSystemLog.ack(17) -
自定义的 consumer successful:Message(id=ae25972710d34d8baa238dcad80db18e, msg=this is async test, topic=dev.kafka.topic2, sendTime=Fri Aug 09 14:39:02 CST 2019)
其中,我还增加了消息发送成功的回调,支持自定义过滤规则的filter,监听异常的错误处理,监听成功的消息回调,之后我会在下篇文章中进行分享。欢迎大家点评。