我认为你最好的选择是在这里使用两个池:
from multiprocessing import Pool
# import parsers here
parsers = {
'parser1': parser1.process,
'parser2': parser2.process,
'parser3': parser3.process,
'parser4': parser4.process,
'parser5': parser5.process,
'parser6': parser6.process,
'parser7': parser7.process,
}
# Sets that define which items can use high parallelism,
# and which must use low
high_par = {"parser1", "parser3", "parser4", "parser6", "parser7"}
low_par = {"parser2", "parser5"}
def process_items(key, value):
parsers[key](value)
def run_pool(func, items, num_items, check_set):
pool = Pool(num_items)
out = pool.map(func, (item for item in items if item[0] in check_set))
pool.close()
pool.join()
return out
if __name__ == "__main__":
items = [('parser2', x), ...] # Your list of tuples
# Process with high parallelism
high_results = run_pool(process_items, items, 4, high_par)
# Process with low parallelism
low_results = run_pool(process_items, items, 2, low_par)
通过巧妙地使用同步原语,可以尝试在一个池中执行此操作,但我认为它看起来不会比这更清晰.它也可能最终运行效率较低,因为有时你的池需要等待工作完成,所以它可以处理一个低并行项,即使队列后面有高并行性项目.
如果您需要以与原始迭代中相同的顺序获取每个process_items调用的结果,这会变得有点复杂,这意味着每个Pool的结果需要合并,但基于您的示例我不认为这是一个要求.如果是的话请告诉我,我会相应地调整我的答案.