简介
本文的重点在于从kafka读取数据之后的异步处理,批量提交等旨在提高性能的操作。有关Kafka Customer, Kafka Customer Group,以及Kafka Customer的配置等内容不做过多的解释,如果需要了解这方面的内容可参考Kafka消费者-从Kafka读取数据(本人读了感觉讲的停详细的),了解更多的内容。
gradle
implementation "org.jetbrains.kotlin:kotlin-stdlib" // Generate optimized JSON (de)serializers
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-jdk8:1.3.7'
// kafka
implementation 'org.apache.kafka:kafka-clients:2.0.0'
code
1.kafka customer相关的配置项
data class KafkaConfig(
val host: String,
val topic: String,
val consumer: KafkaConsumerConfig
)
data class KafkaConsumerConfig(
val groupId: String,
val keyDeserializer: String,
val valueDeserializer: String,
val maxPoll: Int = 500,
val timeoutRequest: Int = 3000,
val timeoutSession: Int = 3000,
val pollDuration: Long = 500,
val autoOffsetReset: String,
val pollInterval: Int = 500
)
本次我们直接把kafka consumer部分配置直接定义成数据类,当然也可以定义成一个map,通过配置文件直接加载。
2.从kafka读取数据
private typealias Record = ConsumerRecord<String, String>
private typealias RecordConsumer = KafkaConsumer<String, String>
class KafkaConsumer(
// 使用kafka record的业务类
private val service: Services,
private val kafkaConfig: KafkaConfig,
) {
private val kafkaConsumerConfig = kafkaConfig.consumer
// 创建kafka consumer
private val consumer: RecordConsumer = createConsumer()
// 每次从kafka读取的record数
private val pollTimeout = Duration.ofMillis(kafkaConsumerConfig.pollDuration)
// Buffer up to 1000 records before blocking
private val recordsChannel = Channel<Record>(MAX_UNCOMMITTED_ITEMS)
// 用来保存当前从kafka读取到的每个分区的offset
private val offsets = mutableMapOf<TopicPartition, OffsetAndMetadata>()
// This function needs to be an extension of the coroutine scope since we're launching a background job.
private fun CoroutineScope.forwardRecords(from: RecordConsumer, to: Channel<Record>) =
launch(Dispatchers.IO) {
while (true) {
val records = from.poll(pollTimeout)
records.forEach { to.send(it) }
}
}
private fun CoroutineScope.consumeRecords() =
launch(Dispatchers.IO) {
recordsChannel.receiveAsFlow().collectIndexed { index, record ->
// 使用kafka record
service.set(record)
// put offset to map
offsets[TopicPartition(record.topic(), record.partition())] =
OffsetAndMetadata(record.offset() + 1, "")
commitIfNeeded(index)
}
}
private fun createConsumer() = RecordConsumer(
mapOf(
"bootstrap.servers" to kafkaConfig.host,
"group.id" to kafkaConsumerConfig.groupId,
"key.deserializer" to Class.forName(kafkaConsumerConfig.keyDeserializer),
"value.deserializer" to Class.forName(kafkaConsumerConfig.valueDeserializer),
"max.poll.interval.ms" to kafkaConsumerConfig.pollInterval,
"session.timeout.ms" to kafkaConsumerConfig.timeoutSession,
"auto.offset.reset" to kafkaConsumerConfig.autoOffsetReset,
"enable.auto.commit" to false, // Always commit manually
"max.poll.records" to kafkaConsumerConfig.maxPoll,
"request.timeout.ms" to kafkaConsumerConfig.timeoutRequest
)
)
suspend fun fetchData() {
consumer.use { consumer ->
consumer.subscribe(listOf(kafkaConfig.topic))
coroutineScope {
forwardRecords(consumer, recordsChannel)
consumeRecords()
}
}
}
/**
* Will try to commit for another 3 times when TimeoutException and CommitFailedException are thrown
*/
private fun tryCommit() {
try {
consumer.commitAsync(offsets, null)
true
} catch (e: CommitFailedException) {
false
} catch (e: TimeoutException) {
false
}
}
private fun commitIfNeeded(index: Int) {
if (index % MAX_UNCOMMITTED_ITEMS == MAX_UNCOMMITTED_ITEMS - 1) {
tryCommit()
}
}
// Will be called by shutdown hook
fun close() {
logger.info("Closing kafka consumer.")
recordsChannel.close()
consumer.wakeup()
}
companion object {
const val MAX_UNCOMMITTED_ITEMS = 1000
}
}
本实例起了两个协程forwardRecords和consumeRecords分别为异步的从kafka读取数据和消费读取到的数据,然后通过一个channel把两者联系起来,在加上业务类异步处理,使得整个处理过程异步话,进一步提高了性能,在这通过offsets保存当前读到的各个分区的最新的offset,然后通过commitIfNeeded来批量commit。