解法2Python多进程操作
岁月虽漫长,但也耐不住时光如梭,多快好省才是硬道理。
程序要做的是两件事儿:(1)打开文件;(2)将打开的文件合并在一起。我们先运行一小段代码测试下这两个功能各自的时耗。
import time
import pandas as pd
start = time.time
df1 = pd.read_excel(r'c:/python/aSourseFiles/bi/bi (1).xlsx')
df2 = pd.read_excel(r'c:/python/aSourseFiles/bi/bi (2).xlsx')
df3 = pd.read_excel(r'c:/python/aSourseFiles/bi/bi (3).xlsx')
median = time.time
df = pd.concat([df1, df2, df3])
end = time.time
use1 = median - start #计算打开文件耗时
use2 = end - median #计算合并表格耗时
print(use1)
print(use2)
运行结果如下:
分析发现,时间主要浪费在打开文件上了,因此用多进程一次多打开几个,理论上来说是可以节省时间的,说干就干:
import pandas as pd
import time
from multiprocessing import Pool
filelist = ['c:/python/aSourseFiles/bi/bi (' + str(i+1) + ').xlsx' for i in range(40)]
def read_excel(path):
temp = pd.read_excel(path)
print('running')
return temp
def merge_excel(temp):
global df
df = pd.concat([df, temp])
if __name__ == '__main__':
start = time.time
df = pd.read_excel(filelist[-1]) #以最后一个文件为基础进行合并操作
p1 = Pool(8) #生成8个进程
for i in range(39):
p1.apply_async(read_excel, args=(filelist[i],), callback=merge_excel) #将打开文件的任务放入到进程中,完成任务时回调merge_excel进行合并
p1.close
p1.join
print(df.shape)
end = time.time
run_time = end - start
print('本次运行耗时%.02f' %(run_time))