需求说明:
一、启用kafka进行数据传输
二、启用多进程进行生产及消费
难点说明:
一、kafka运行环境搭建
二、python操作kafka接口了解
三、python多进程了解
在这时kafka运行环境不做过多的解释。直接使用Python操作,创建30个生产者。创建30个消费者,启动程序进行生产和消费
消费端代码:
from kafka import KafkaConsumer from kafka.structs import TopicPartition from concurrent.futures import ProcessPoolExecutor from multiprocessing import Process from concurrent.futures import ProcessPoolExecutor def kafka_consumer(topic): consumer = KafkaConsumer(topic,auto_offset_reset='earliest',bootstrap_servers='localhost:9092') for msg in consumer: # print(msg) print(msg.timestamp,msg.value.decode(),msg.topic) def main_contrale(): process = ProcessPoolExecutor(max_workers=30) topic_list = ['test_%d' % i for i in range(30)] for i in range(len(topic_list)): process.submit(kafka_consumer,topic_list[i]) print('\n') if __name__ == '__main__': main_contrale()
生产端代码:
import json from kafka import KafkaProducer,KafkaConsumer import time from concurrent.futures import ProcessPoolExecutor def kafka_producer(topic,msg): producer = KafkaProducer(bootstrap_servers='localhost:9092') producer.send(topic,msg.encode()) if __name__ == '__main__': process = ProcessPoolExecutor(max_workers=30) topic_list = ['test_%d'%i for i in range(30)] for i in range(100): for j in range(len(topic_list)): process.submit(kafka_producer,topic_list[j],'hello_%d'%j) time.sleep(3)
在python中,多进程操作过程中,其实有一个思想很重要,就是任务切片,其实也就是我们要想着把任务做为一个参数传递给要执行的方法或函数,因为我们可能要面临数据量较为庞大的进程和任务,所以如果想要区分,很不容易,所以更多的时候,建议从源头去处理,任务执行写一个方法就够了,通过变量的方式进行任务传递及输出。