pandas 批量读取excel_pandas 批量读取excel_将Excel文件读取到Pandas DataFrame的更快方法...

这篇博客探讨了如何加速读取大量数据的Excel文件。作者分享了一个使用Pandas库读取14MB Excel文件的例子,该操作耗时9分钟。为了解决这个问题,作者提出了一种解决方案:首先通过VBScript将Excel转换为CSV,然后使用Pandas的read_csv函数读取CSV文件,这种方法将时间缩短到30秒。此外,还提供了创建VBScript的Python代码片段。
摘要由CSDN通过智能技术生成

41518

I have a 14MB Excel file with five worksheets that I'm reading into a Pandas dataframe, and although the code below works, it takes 9 minutes!

Does anyone have suggestions for speeding it up?

import pandas as pd

def OTT_read(xl,site_name):

df = pd.read_excel(xl.io,site_name,skiprows=2,parse_dates=0,index_col=0,

usecols=[0,1,2],header=None,

names=['date_time','%s_depth'%site_name,'%s_temp'%site_name])

return df

def make_OTT_df(FILEDIR,OTT_FILE):

xl = pd.ExcelFile(FILEDIR + OTT_FILE)

site_names = xl.sheet_names

df_list = [OTT_read(xl,site_name) for site_name in site_names]

return site_names,df_list

FILEDIR='c:/downloads/'

OTT_FILE='OTT_Data_All_stations.xlsx'

site_names_OTT,df_list_OTT = make_OTT_df(FILEDIR,OTT_FILE)

解决方案

As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the Excel to csv and then read the csv. I tried the script below and it took about 30 seconds.

# create a list with sheet numbers you want to process

sheets = map(str,range(1,6))

# convert each sheet to csv and then read it using read_csv

df={}

from subprocess import call

excel='C:\\Users\\rsignell\\OTT_Data_All_stations.xlsx'

for sheet in sheets:

csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv'

call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet])

df[sheet]=pd.read_csv(csv)

Here's a little snippet of python to create the ExcelToCsv.vbs script:

#write vbscript to file

vbscript="""if WScript.Arguments.Count < 3 Then

WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv "

Wscript.Quit

End If

csv_format = 6

Set objFSO = CreateObject("Scripting.FileSystemObject")

src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))

dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))

worksheet_number = CInt(WScript.Arguments.Item(2))

Dim oExcel

Set oExcel = CreateObject("Excel.Application")

Dim oBook

Set oBook = oExcel.Workbooks.Open(src_file)

oBook.Worksheets(worksheet_number).Activate

oBook.SaveAs dest_file, csv_format

oBook.Close False

oExcel.Quit

""";

f = open('ExcelToCsv.vbs','w')

f.write(vbscript.encode('utf-8'))

f.close()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值