好吧,如果您运行以下命令,这是一个有趣的注释:
import numpy
from multiprocessing import Pool
a = numpy.arange(1000000)
pool = Pool(processes = 5)
result = pool.map(numpy.sin, a)
UnpicklingError: NEWOBJ class argument has NULL tp_new
没想到那是怎么回事,好吧:
>>> help(numpy.sin)
Help on ufunc object:
sin = class ufunc(__builtin__.object)
| Functions that operate element by element on whole arrays.
|
| To see the documentation for a specific ufunc, use np.info(). For
| example, np.info(np.sin). Because ufuncs are written in C
| (for speed) and linked into Python with NumPy's ufunc facility,
| Python's help() function finds this page whenever help() is called
| on a ufunc.
yep numpy.sin是用c实现的,因此您不能真正直接在多处理中使用它。
所以我们必须用另一个函数来包装它
性能:
import time
import numpy
from multiprocessing import Pool
def numpy_sin(value):
return numpy.sin(value)
a = numpy.arange(1000000)
pool = Pool(processes = 5)
start = time.time()
result = numpy.sin(a)
end = time.time()
print 'Singled threaded %f' % (end - start)
start = time.time()
result = pool.map(numpy_sin, a)
pool.close()
pool.join()
end = time.time()
print 'Multithreaded %f' % (end - start)
$ python perf.py
Singled threaded 0.032201
Multithreaded 10.550432
哇,也没想到,对于初学者来说,我们正在使用python函数,即使它只是包装器还是纯c函数,也存在一些问题,而且还有复制值的开销,默认情况下,多处理不会 t共享数据,因此每个值都需要回滚/回传。
请注意,如果正确分割了数据:
import time
import numpy
from multiprocessing import Pool
def numpy_sin(value):
return numpy.sin(value)
a = [numpy.arange(100000) for _ in xrange(10)]
pool = Pool(processes = 5)
start = time.time()
result = numpy.sin(a)
end = time.time()
print 'Singled threaded %f' % (end - start)
start = time.time()
result = pool.map(numpy_sin, a)
pool.close()
pool.join()
end = time.time()
print 'Multithreaded %f' % (end - start)
$ python perf.py
Singled threaded 0.150192
Multithreaded 0.055083
因此,我们可以从中得到什么,多处理非常棒,但是我们应该始终对其进行测试和比较,有时它的速度更快,有时它的速度更慢,这取决于它的使用方式...
授予您您没有使用pool.map的功能,但我建议您首先验证另一个功能,即多处理确实会加快计算速度,也许来回复制值的开销可能会影响您。
无论哪种方式,我也都相信使用pool.map是多线程代码的最佳,最安全的方法。
我希望这有帮助。