问题描述
参考别人项目里的多进程代码,今天针对自己的项目编写了多进程代码,结果程序运行时卡住了。现用精简的代码重现上述情况:
import queue
from torch.multiprocessing import Process
def prefetch_data(queue):
i = 0
while True:
queue.put(i)
i += 1
def main():
training_queue = queue.Queue(5)
task = Process(target=prefetch_data, args=(training_queue, ))
task.daemon = True # 守护进程,必须写在start()前,守护进程不能有子进程,且当主进程结束时同时结束
task.start()
training_data = training_queue.get(block=True)
print(training_data)
task.terminate()
if __name__ == '__main__':
main()
在tack.start()
的时候报错:TypeError: can't pickle _thread.lock objects
,具体报错位置为ForkingPickler(file, protocol).dump(obj)
。
解决问题
回想起在参考代码中队列有两类:torch.multiprocessing.Queue
,queue.Queue
。前者用在多进程里,作为多进程中target的参数;后者用在多线程中,作为多线程中target的参数。于是在代码中用前者替换后者,程序正常运行。
深度分析
queue.Queue
为什么不能用于多进程?找到该类定义:
class Queue:
'''Create a queue object with a given maximum size.
If maxsize is <= 0, the queue size is infinite.
'''
def __init__(self, maxsize=0):
self.maxsize = maxsize
self._init(maxsize)
# mutex must be held whenever the queue is mutating. All methods
# that acquire mutex must release it before returning. mutex
# is shared between the three conditions, so acquiring and
# releasing the conditions also acquires and releases mutex.
self.mutex = threading.Lock()
# Notify not_empty whenever an item is added to the queue; a
# thread waiting to get is notified then.
self.not_empty = threading.Condition(self.mutex)
# Notify not_full whenever an item is removed from the queue;
# a thread waiting to put is notified then.
self.not_full = threading.Condition(self.mutex)
# Notify all_tasks_done whenever the number of unfinished tasks
# drops to zero; thread waiting to join() is notified to resume
self.all_tasks_done = threading.Condition(self.mutex)
self.unfinished_tasks = 0
...
该类用到了线程锁。具体而言,当队列由空变为非空、由满变为非满、任务完成时,程序都要用到线程锁。结合之前的报错内容:TypeError: can't pickle _thread.lock objects
,猜想可能是threading.Lock()
导致打包失败,进而导致进程start时报错。将有关threading.Lock()的内容删去后,报错消失了,证明了上述猜想。因此,queue.Queue
不能用于多进程。