因此,首先,我想尝试运行一个简单的线性回归,并使用从正态分布随机生成的10000000个数据点(4个特征,1个目标变量)来评估串行和并行计算之间的性能差异。在
这是我的代码:import pandas as pd
import numpy as np
import random
from scoop import futures
import statsmodels.api as sm
from time import time
def linreg(vals):
global model
model = sm.OLS(y_vals,X_vals).fit()
return model
print(model.summary())
if __name__ == '__main__':
random.seed(42)
vals = pd.DataFrame(np.random.normal(loc = 3, scale = 100, size =(10000000,5)))
vals.columns = ['dep', 'ind1', 'ind2', 'ind3', 'ind4']
y_vals = vals['dep']
X_vals = vals[['ind1', 'ind2', 'ind3', 'ind4']]
bt = time()
model_vals = list(map(linreg, [1,2,3]))
mval = model_vals[0]
print(mval.summary())
serial_time = time() - bt
bt1 = time()
model_vals_1 = list(futures.map(linreg, [1,2,3]))
mval_1 = model_vals_1[0]
print(mval_1.summary())
parallel_time = time() - bt1
print(serial_time, parallel_time)
然而,在这之后,回归摘要确实是通过Python的标准map函数以串行方式生成的,这是一个错误:Traceback (most recent call last):
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 302, in
b.main()
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 92, in main
self.run()
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 290, in run
futures_startup()
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\bootstrap__main__.py", line 271, in futures_startup
run_name="main"
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 64, in _startup
result = _controller.switch(rootFuture, *args, **kargs)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop_control.py", line 253, in runController
raise future.exceptionValue
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop_control.py", line 127, in runFuture
future.resultValue = future.callable(*future.args, **future.kargs)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "Scoop_map_linear_regression1.py", line 33, in
model_vals_1 = list(futures.map(linreg, [1,2,3]))
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 102, in _mapGenerator
for future in _waitAll(*futures):
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 358, in _waitAll
for f in _waitAny(future):
File "C:\Users\niccolo.gentile\AppData\Local\Continuum\anaconda3\envs\tensorenviron\lib\site-packages\scoop\futures.py", line 335, in _waitAny
raise childFuture.exceptionValue
NameError: name 'y_vals' is not defined
是后来制作的。这意味着代码停止在model_vals_1 = list(futures.map(linreg, [1,2,3]))。在
我也试过用map运行它两次,确实没有出现错误。在
我还指定脚本已正确启动:
^{pr2}$
从Anaconda提示符命令行。在
实际上,如果在没有-m scoop参数的情况下启动它,它将不会被并行化,并且会实际运行,而只是使用两倍于Python内置map函数的函数,正如警告中所报告的那样。也就是说,在启动时没有指定-m scoop参数,期货.map会被地图取代。在
我的目标是平行跑期货.map并对绩效改进进行评估。在
指定它以避免任何其他类似的回答,并因此而被搁置。在
任何评论都是高度赞赏和欢迎。在