并发与并行的区别
- 并发:concurrency。单个cpu+多道技术就可以实现并发
- 并行:parallel:同时运行,只有具备多个cpu才能实现并行
使用场景
- 计算密集型
- IO密集型:多线程
多种方式测试
重要说明
- 以下测试结果虽有随机性,但是在数量级上还是能说明问题的
- 测试环境
- os:64位win10
- anconda3:1915 64 bit
- python:3.7.3
- 结论
方法 | 耗时 |
---|---|
for & map | 0.9460015296936035 |
[] & map | 0.8980069160461426 |
numba.jit & for & map | 1.0460188388824463 |
numba.jit & [] & map | 0.9310059547424316 |
concurrent.futures.ProcessPoolExecutor | 1.4520056247711182 |
multiprocessing.Pool | 1.6059844493865967 |
multiprocessing.Process | 0.6759865283966064 |
pp 单机 | 0.004994630813598633 |
joblib | 1.2500150203704834 |
-
测试基本函数
def read_csv_pd(file): df = pd.read_csv(file) df = df.dropna() df = df[df['status'] < 5] return df['OC NO'].tolist() def read_csv_open(file): """ 读取文件内容,返回状态不为空,且小于5的对应id :param file: :return: """ set_id = set() with open(file, encoding='utf8') as f: lines = f.readlines() for num, line in enumerate(lines): if num == 0: continue fields = line.split(',') if len(fields[1]) > 0 and int(fields[1]) < 5: set_id.add(fields[0]) return set_id
-
测试一:
for
循环下测试pd.read_csv
与open
效率对比for file in files: mid_set =func(file=file) set_id_2.update(mid_set)
-
结论
方法 耗时 read_csv_pd
2.1082966327667236 read_csv_open
0.9119999408721924
-
-
测试二:
for
循环的map
与python内循环的比较for mid_set in map(read_csv_open, files): set_id_1.update(mid_set) [set_id_2.update(mid_set) for mid_set in map(read_csv_open, files)]
-
结论
方法 耗时 for
0.9460015296936035 []
0.8980069160461426
-
-
测试三:测试
concurrent.futures.ProcessPoolExecutor
with ProcessPoolExecutor(3) as pool: for mid_set in pool.map(read_csv_open, files): set_id.update(mid_set)
- 耗时:1.4520056247711182
-
测试四:
multiprocessing.Pool
with multiprocessing.Pool(cores) as pool: rs = pool.map(read_csv_open, files)
- 耗时:1.6059844493865967
-
测试五:
multiprocessing.Process
for file in files: logger.info(file) t = multiprocessing.Process(target=read_csv_open_test, kwargs={'file': file, 'q': q}) process_arr.append(t) t.start()
- 耗时:0.6759865283966064
-
测试六:
pp
单机job = job_server.submit(pp_test, (files,), (read_csv_open,), ())
- 耗时:0.004994630813598633
-
测试七:
joblib
rs = joblib.Parallel(4)(joblib.delayed(read_csv_open)(file) for file in files)
- 耗时:1.2500150203704834