1) What am I missing here; why shouldn’t a Pool be shared between processes?
并不是所有的对象/实例都是可挑选的/可序列化的,在这种情况下,池使用的是不可挑剔的thread.lock:
>>> import threading, pickle
>>> pickle.dumps(threading.Lock())
Traceback (most recent call last):
File "", line 1, in
[...]
File "/Users/rafael/dev/venvs/general/bin/../lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle lock objects
或更好:
>>> import threading, pickle
>>> from concurrent.futures import ThreadPoolExecutor
>>> pickle.dumps(ThreadPoolExecutor(1))
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File
[...]
"/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/Users/rafael/dev/venvs/general/bin/../lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle lock objects
如果你考虑它,这是有道理的,一个锁是由操作系统管理的信号量原语(由于python使用本机线程).能够在python运行时间内腌制并保存该对象状态,真的不会完成任何有意义的事情,因为它的真实状态被操作系统保存.
2) What is a pattern for implementing nested parallelism in Python? If possible, maintaining a recursive structure, and not trading it for iteration
现在,对于声望,上面提到的所有内容并不真正适用于您的示例,因为您使用线程(ThreadPoolExecutor)而不是进程(ProcessPoolExecutor),因此不会发生跨进程的数据共享.
您的java示例似乎更有效,因为您正在使用的线程池(CachedThreadPool)正在根据需要创建新线程,而Python执行器实现是有限的,并且需要显式的最大线程数(max_workers).语言之间有一些语法差异,这些语言似乎也在抛弃(python中的静态实例本质上是没有明确定义的范围),但基本上这两个示例将创建完全相同数量的线程才能执行.例如,这里是一个使用python中相当天真的CachedThreadPoolExecutor实现的例子:
from concurrent.futures import ThreadPoolExecutor
class CachedThreadPoolExecutor(ThreadPoolExecutor):
def __init__(self):
super(CachedThreadPoolExecutor, self).__init__(max_workers=1)
def submit(self, fn, *args, **extra):
if self._work_queue.qsize() > 0:
print('increasing pool size from %d to %d' % (self._max_workers, self._max_workers+1))
self._max_workers +=1
return super(CachedThreadPoolExecutor, self).submit(fn, *args, **extra)
pool = CachedThreadPoolExecutor()
def fibonacci(n):
print n
if n < 2:
return n
a = pool.submit(fibonacci, n - 1)
b = pool.submit(fibonacci, n - 2)
return a.result() + b.result()
print(fibonacci(10))
性能调整:
我强烈建议您查看gevent,因为它会给您带来高并发性,而不需要线程开销.这并不总是这样,但您的代码实际上是用于地理使用的海报小孩.以下是一个例子:
import gevent
def fibonacci(n):
print n
if n < 2:
return n
a = gevent.spawn(fibonacci, n - 1)
b = gevent.spawn(fibonacci, n - 2)
return a.get() + b.get()
print(fibonacci(10))
完全不科学,但在我的计算机上,代码上面的代码比线程等效的速度快9倍.
我希望这有帮助.