1.kafka消费方式有:
自动提交
同步提交
异步提交
异步加同步提交
几种方式的优缺点介绍和代码示例
1.1 自动提交
-可以设置自动提交时间间隔,每间隔固定时间提交一次当前消费最大offset,但无法确保提交成功,不知道消费状态
#!/usr/bin/env python
# -*- coding:UTF-8 -*-
from kafka import KafkaConsumer
import logging,time
from datetime import datetime
#若需打印日志加加上logging.basicConfig
logging.basicConfig(level=logging.DEBUG,#控制台打印的日志级别
filename='consumer.log',
filemode='a',##模式,有w和a,w就是写模式,每次都会重新写日志,覆盖之前的日志
#a是追加模式,默认如果不写的话,就是追加模式
format=
'%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s'
#日志格式
)
consumer = KafkaConsumer('test_pp01',
bootstrap_servers="192.168.211.110:9092",
group_id='test_001',
auto_offset_reset='earliest'
)
with open('5','w') as f:
for msg in consumer:
print '{} {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f'),msg.value)
f.write("{} {}\n".format(msg.partition,msg.offset))
f.write("{} {} \n".format(msg.key,msg.value))
1.2 同步提交
-同步提交可以自定义拉取数据多少以及fetch的数据offset等元数据信息,回阻塞直到提交成功,也可以计算出每次消费的时间,消费状态等,每次处理完了才会进行下一次,遇到异常可以自己进行处理记录等
#!/usr/bin/env python
# -*- coding:UTF-8 -*-
from kafka import KafkaConsumer
import logging,time
from datetime import datetime
from kafka.structs import TopicPartition
broker = '192.168.211.110:9092'
topicname = 'test_0613'
consumer = KafkaConsumer(
bootstrap_servers=broker,
group_id='test_001',
auto_offset_reset='earliest',
enable_auto_commit=False
)
consumer.subscribe(topicname)
try:
start_time = time.time()
while True:
records = consumer.poll(timeout_ms=100,max_records=2)
if records:
num = 0
for k,v in records.items():
for r in v:
num += 1
print 'topic={} partition={} offset={} key={} value={}'.format(r.topic,r.partition,r.offset,r.key,r.value)
print "此批次的消息数为:%s"%num
num = 0
try:
consumer.commit()
end_time = time.time()
time_cost = end_time - start_time
print '此批次消费耗时 %s s'%time_cost
except Exception as e:
print 'commit failed %s'%str(e)
except Exception as e:
print e
1.3 异步提交加回调函数
- 异步提交是通过单独线程异步进行提交,有可能提交成功有可能提交失败,可以通过回调函数知道提交结果,若前面offset提交失败,后面较大的offset提交成功,则不影响
#!/usr/bin/env python
# -*- coding:UTF-8 -*-
from kafka import KafkaConsumer
import logging,time
from datetime import datetime
from kafka.structs import TopicPartition
logging.basicConfig(level=logging.DEBUG,#控制台打印的日志级别
filename='consumer.log',
filemode='a',##模式,有w和a,w就是写模式,每次都会重新写日志,覆盖之前的日志
#a是追加模式,默认如果不写的话,就是追加模式
format=
'%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s'
#日志格式
)
broker = '192.168.211.110:9092'
topicname = 'test_0613'
consumer = KafkaConsumer(
bootstrap_servers=broker,
group_id='test_001',
auto_offset_reset='earliest',
enable_auto_commit=False
)
consumer.subscribe(topicname)
partitions = consumer.partitions_for_topic(topicname)
#`pts = [str(i) for i in partitions]
#tpartition = [TopicPartition(topicname,i) for i in partitions]
def _on_send_response(*args,**kwwargs):
if isinstance(args[1],Exception):
print '偏移量提交异常{}'.format(args[1])
else:
print '偏移量提交成功{}'.format(args)
try:
start_time = time.time()
while True:
records = consumer.poll(timeout_ms=100,max_records=2)
if records:
num = 0
for k,v in records.items():
for r in v:
num += 1
print 'topic={} partition={} offset={} key={} value={}'.format(r.topic,r.partition,r.offset,r.key,r.value)
print "此批次的消息数为:%s"%num
try:
consumer.commit_async(callback=_on_send_response)
except Exception as e:
print 'commit failed %s'%str(e)
except Exception as e:
print e
1.4 异步加同步提交
- 由于异步提交不可靠,无法确保当前offset一定提交成功,若消费突然关闭导致offset没有提交则会重新消费造成数据重复消费,可以通过最后同步提交的方式来弥补,消费线程关闭之前会手动提交一次,确保offet提交成功
# -*- coding:utf-8 -*-
#!/usr/bin/env python
from kafka import KafkaConsumer
import time
broker = '192.168.211.110:9092'
topic = 'test_0613'
consumer = KafkaConsumer(bootstrap_servers=broker,
group_id = 'test_001',
enable_auto_commit = False,
auto_offset_reset = 'earliest'
)
consumer.subscribe(topic)
def on_consumer_response(*args,**kwargs):
if isinstance(args[1],Exception):
print "offset提交失败{}".format(agrs)
else:
print "offset提交成功{}".format(args)
try:
while True:
start_time = time.time()
records = consumer.poll(timeout_ms=100,max_records=3)
if records:
num = 0
for k,v in records.items():
for i in v:
num += 1
print k
print 'topic={},partition={},offset={},key={},value={}'.format(i.topic,i.partition,i.offset,i.key,i.value)
print '当前消费消费{}条'.format(num)
consumer.commit_async(callback=on_consumer_response)
end_time = time.time()
print 'cost time {}s'.format(end_time - start_time)
except Exception as e:
print e
finally:
try:
consumer.commit()
print ('同步提交成功')
except Exception as e:
print '同步提交失败{}'.format(e)
通过以上对比可以看出:
- 同步提交会阻塞直到提交成功才会进行下一次消费,可以确保消费的成功以及尽可能避免了重新消费的情况,也可以在本地记录offset来完全避免,是生产确保数据可靠性用的最常用方式
- 自动提交是固定时间间隔提交,时间可设置,没法确保提交成功,不太可靠
- 异步提交通过回调的方式可以直到消息是否提交成功,提交失败不能做处理,万一后面较大的offset提交成功就不能在补提交较小的offset了,所以提交失败不一定会有影响,较大的offset失败则有影响,异常关闭前加上同步提交可以确保最后提交成功