Python-多个生产者消费者-读写文件

最新推荐文章于 2022-12-17 22:28:42 发布

陌上阳光

最新推荐文章于 2022-12-17 22:28:42 发布

阅读量764

点赞数 2

分类专栏： python 文章标签：多进程 python

本文链接：https://blog.csdn.net/weixin_42831564/article/details/107639134

版权

python 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

0.前言

从文件中读取内容，分析获得想要的数据，拼接URL发起请求，获取响应数据并保存。
读写文件较快，但是网络请求速度比较慢，但是代码是串联执行的，耦合性较高，为了加快进度，采用生产者[读文件]–队列–消费者（同时是另一个队列的生产者）[网络请求]–队列2–消费者[写文件]模式去获取保存数据。

1.读文件，将想要的数据put到队列

from multiprocessing import JoinableQueue, Process
import time

# 可将for循环改成自己代码要读的文件
def producer(q, name):
    for i in range(200000):
        info = name + ' ' + str(i)
        with open('info.txt', 'a')as f:
            f.write(info+'\n')
            f.flush()
        q.put(info)

2.从对列中取数据并操作，将得到的数据put到新队列

def consumer(q, q_file):
    while True:
        line = q.get()
        if line:
            file_content = 'hello ni hao ' + line + '\n'
            q_file.put(file_content)
        else:
            break
        q.task_done()

3.将数据从对列中取出，保存到文件

有在消费者里面直接写入保存文件，但是有导致部分数据丢失，丢失一行数据或者丢失一行中的一部分数据，没有找到可以解决的办法，尝试重新弄一个队列，看行不行，最后证明可以。有同学有其他办法也阔以在评论里分享哈^_^

def deal_file(q_file):
    while True:
        info = q_file.get()
        if info:
            with open('hello.txt', 'a')as f:
                f.write(info)
                f.flush()  # 多进程写文件注意缓存刷新
        else:
            print(info)
            break
        q_file.task_done()

4.创建进程，执行代码

if __name__ == '__main__':
    t = time.time()
    q = JoinableQueue(10)
    q_file = JoinableQueue(50)
    p_pro = Process(target=producer, args=(q, u'producer'))
    p_con = Process(target=consumer, args=(q, q_file))
    p_con2 = Process(target=consumer, args=(q, q_file))
    p_con3 = Process(target=consumer, args=(q, q_file))
    p_con4 = Process(target=deal_file, args=(q_file,))
    p_pro.start()
    p_con.start()
    p_con2.start()
    p_con3.start()
    p_con4.start()
    
    p_pro.join()
    q.join()
    # 有尝试在这里做消费者的join等待，因为有q_file队列的put，但是好像是造成了等待死锁，取消掉也没有出现问题
    # p_con.join()
    # p_con2.join()
    # p_con3.join()
    q_file.join()
   

    q.put(None)
    q.put(None)
    q.put(None)
    q_file.put(None)
    
    print(time.time() - t)

注意点：

1.flush()

重点,在多进程中写文件需要尽快刷新,否则可能会导致数据丢失

https://www.cnblogs.com/mahailuo/p/11460739.html

2.q.put(None)

表示对列已经为空，结束进程，有几个消费者进程就应该put几个None，否则一个消费者已经获取了None，但是其他消费者在等待，会造成死锁，程序不能结束

https://www.cnblogs.com/mike-liu/p/9279313.html

3.q.join()

对列q.join()，等消费者把对列的数据取空之后（所有的消费者执行了q.task_done()），join等待才会结束

https://www.cnblogs.com/mike-liu/p/9279313.html

4.死锁

有尝试在取对列数据时加锁+释放锁，但是这个代码里出现了错误，“RuntimeError: release unlocked lock”，没有解决。我是将消费者的join()等待注释掉了就可以了。有清楚的同学也可以在下面评论吼吼。

https://www.cnblogs.com/dplearning/p/6947213.html
http://www.cocoachina.com/articles/477979

完整代码

from multiprocessing import JoinableQueue, Process
import time


def deal_file(q_file):
    while True:
        info = q_file.get()
        if info:
            with open('hello.txt', 'a')as f:
                f.write(info)
                f.flush()
        else:
            print(info)
            break
        q_file.task_done()

def producer(q, name):
    for i in range(200000):
        info = name + ' ' + str(i)
        with open('info.txt', 'a')as f:
            f.write(info+'\n')
            f.flush()
        q.put(info)

def consumer(q, q_file):
    while True:
        line = q.get()
        if line:
            file_content = 'hello ni hao ' + line + '\n'
            q_file.put(file_content)
        else:
            break
        q.task_done()


if __name__ == '__main__':
    t = time.time()
    q = JoinableQueue(10)
    q_file = JoinableQueue(50)
    p_pro = Process(target=producer, args=(q, u'producer'))
    p_con = Process(target=consumer, args=(q, q_file))
    p_con2 = Process(target=consumer, args=(q, q_file))
    p_con3 = Process(target=consumer, args=(q, q_file))
    p_con4 = Process(target=deal_file, args=(q_file,))
    p_pro.start()
    p_con.start()
    p_con2.start()
    p_con3.start()
    p_con4.start()

    p_pro.join()
    q.join()
    q_file.join()

    q.put(None)
    q.put(None)
    q.put(None)
    q_file.put(None)

    print(time.time() - t)

陌上阳光

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python-多个生产者消费者-读写文件

0.前言从文件中读取内容，分析获得想要的数据，拼接URL发起请求，获取响应数据并保存。读写文件较快，但是网络请求速度比较慢，但是代码是串联执行的，耦合性较高，为了加快进度，采用生产者[读文件]–队列–消费者（同时是另一个队列的生产者）[网络请求]–队列2–消费者[写文件]模式去获取保存数据。1.读文件，将想要的数据put到队列from multiprocessing import JoinableQueue, Processimport time# 可将for循环改成自己代码要读的文件def
复制链接

扫一扫