python调用kafka拉取数据,kafka-python使用者开始从偏移量开始读取(自动)

I'm trying to build an application with kafka-python where a consumer reads data from a range of topics. It is extremely important that the consumer never reads the same message twice, but also never misses a message.

Everything seems to be working fine, except when I turn off the consumer (e.g. failure) and try to start reading from offset. I can only read all the messages from the topic (which creates double reads) or listen for new messages only (and miss messages that where emitted during the breakdown). I don't encounter this problem when pausing the consumer.

I created an isolated simulation in order to try to solve the problem.

Here the generic producer:

from time import sleep

from json import dumps

from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

x=0 # set manually to avoid duplicates

for e in range(1000):

if e <= x:

pass

else:

data = dumps(

{

'number' : e

}

).encode('utf-8')

producer.send('numtest', value=data)

print(e, ' send.')

sleep(5)

And the consumer. If auto_offset_reset is set to 'earliest', all the messages will be read again. If auto_offset_reset is set to 'latest', no messages during down-time will be read.

from kafka import KafkaConsumer

from pymongo import MongoClient

from json import loads

## Retrieve data from kafka (WHAT ABOUT MISSED MESSAGES?)

consumer = KafkaConsumer('numtest', bootstrap_servers=['localhost:9092'],

auto_offset_reset='earliest', enable_auto_commit=True,

auto_commit_interval_ms=1000)

## Connect to database

client = MongoClient('localhost:27017')

collection = client.counttest.counttest

# Send data

for message in consumer:

message = loads(message.value.decode('utf-8'))

collection.insert_one(message)

print('{} added to {}'.format(message, collection))

I feel like the auto-commit isn't working properly.

I know that this questions is similar to this one, but I would like a specific solution.

Thanks for helping me out.

解决方案

You are getting this behavior because your consumer is not using a Consumer Group. With a Consumer Group, the consumer will regularly commit (save) its position to Kafka. That way if it's restarted it will pick up from its last committed position.

To make your consumer use a Consumer Group, you need to set group_id when constructing it.

See group_id description from the docs:

The name of the consumer group to join for dynamic partition

assignment (if enabled), and to use for fetching and committing

offsets. If None, auto-partition assignment (via group coordinator)

and offset commits are disabled. Default: None

For example:

consumer = KafkaConsumer('numtest', bootstrap_servers=['localhost:9092'],

auto_offset_reset='earliest', enable_auto_commit=True,

auto_commit_interval_ms=1000, group_id='my-group')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值