1. pandarallel (pip install )
对于一个带有Pandas DataFrame df的简单用例和一个应用func的函数,只需用parallel_apply替换经典的apply。
from pandarallel import pandarallel
# Initialization
pandarallel.initialize()
# Standard pandas apply
df.apply(func)
# Parallel apply
df.parallel_apply(func)
注意,如果不想并行化计算,仍然可以使用经典的apply方法。
另外可以通过在initialize函数中传递progress_bar=True来显示每个工作CPU的一个进度条。
2. joblib (pip install )
# Embarrassingly parallel helper: to make it easy to write readable parallel code and debug it quickly
from math import sqrt
from joblib import Parallel, delayed
def test():
start = time.time()
result1 = Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10000))
end = time.time()
print(end-start)
result2 = Parallel(n_jobs=8)(delayed(sqrt)(i**2) for i in range(10000))
end2 = time.time()
print(end2-end)
-------输出结果----------
0.4434356689453125
0.6346755027770996
3. multiprocessing
import multiprocessing as mp
with mp.Pool(mp.cpu_count()) as pool: