我有这个脚本来并行处理一些网址:
import multiprocessing
import time
list_of_urls = []
for i in range(1,1000):
list_of_urls.append('http://example.com/page=' + str(i))
def process_url(url):
page_processed = url.split('=')[1]
print 'Processing page %s'% page_processed
time.sleep(5)
pool = multiprocessing.Pool(processes=4)
pool.map(process_url, list_of_urls)
该列表是有序的,但是当我运行它时,脚本不会按顺序从列表中选择URL:
Processing page 1
Processing page 64
Processing page 127
Processing page 190
Processing page 65
Processing page 2
Processing page 128
Processing page 191
相反,我希望它首先处理页面1,2,3,4,然后继续按照列表中的顺序.有没有选择这样做?
解决方法:
如果你没有传递参数chunksize map将使用这个算法计算块:
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
它将你的iterable切入task_batches并在sperate进程中运行它.这就是为什么它不合适.解决方案是将chunk equil声明为1.
import multiprocessing
import time
list_test = range(10)
def proces(task):
print "task:", task
time.sleep(1)
pool = multiprocessing.Pool(processes=3)
pool.map(proces, list_test, chunksize=1)
task: 0
task: 1
task: 2
task: 3
task: 4
task: 5
task: 6
task: 7
task: 8
task: 9
标签:python,multiprocessing,python-multiprocessing
来源: https://codeday.me/bug/20190627/1309220.html