openpyxl 打开大文件很慢_与xlrd相比,使用openpyxl读取Excel文件的速度要慢一些

weixin_39881859

于 2021-01-28 10:12:50 发布

阅读量525

点赞数

文章标签： openpyxl 打开大文件很慢

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39881859/article/details/113470237

版权

博主在对比xlrd和openpyxl库读取大型Excel文件到SQLServer的效率时发现，xlrd在2秒内能处理250,000行数据，而openpyxl的性能显著下降，随着读取行数增加，耗时成倍增长。尽管xlrd更快，但博主面临格式转换的硬编码问题，这可能需要在多个脚本中维护。寻求优化代码或避免硬编码的方法。

摘要由CSDN通过智能技术生成

我有一个Excel电子表格,我需要每天导入到SQL Server.电子表格将在大约50列中包含大约250,000行.我使用openpyxl和xlrd使用几乎相同的代码测试了它们.

这是我正在使用的代码(减去调试语句)：

import xlrd

import openpyxl

def UseXlrd(file_name):

workbook = xlrd.open_workbook(file_name, on_demand=True)

worksheet = workbook.sheet_by_index(0)

first_row = []

for col in range(worksheet.ncols):

first_row.append(worksheet.cell_value(0,col))

data = []

for row in range(1, worksheet.nrows):

record = {}

for col in range(worksheet.ncols):

if isinstance(worksheet.cell_value(row,col), str):

record[first_row[col]] = worksheet.cell_value(row,col).strip()

else:

record[first_row[col]] = worksheet.cell_value(row,col)

data.append(record)

return data

def UseOpenpyxl(file_name):

wb = openpyxl.load_workbook(file_name, read_only=True)

sheet = wb.active

first_row = []

for col in range(1,sheet.max_column+1):

first_row.append(sheet.cell(row=1,column=col).value)

data = []

for r in range(2,sheet.max_row+1):

record = {}

for col in range(sheet.max_column):

if isinstance(sheet.cell(row=r,column=col+1).value, str):

record[first_row[col]] = sheet.cell(row=r,column=col+1).value.strip()

else:

record[first_row[col]] = sheet.cell(row=r,column=col+1).value

data.append(record)

return data

xlrd_results = UseXlrd('foo.xls')

openpyxl_resuts = UseOpenpyxl('foo.xls')

传递包含3500行的相同Excel文件会产生截然不同的运行时间.使用xlrd我可以在2秒内将整个文件读入字典列表.使用openpyxl我得到以下结果：

Reading Excel File...

Read 100 lines in 114.14509415626526 seconds

Read 200 lines in 471.43183994293213 seconds

Read 300 lines in 982.5288782119751 seconds

Read 400 lines in 1729.3348784446716 seconds

Read 500 lines in 2774.886833190918 seconds

Read 600 lines in 4384.074863195419 seconds

Read 700 lines in 6396.7723388671875 seconds

Read 800 lines in 7998.775000572205 seconds

Read 900 lines in 11018.460735321045 seconds

虽然我可以在最终脚本中使用xlrd,但由于各种问题,我将不得不对很多格式进行硬编码(即int读取为float,date读取为int,datetime读取为float).由于我需要将这些代码重用于更多的导入,因此尝试硬编码特定列以正确格式化它们并且必须在4个不同的脚本中维护类似的代码是没有意义的.

关于如何进行的任何建议？

weixin_39881859

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
openpyxl 打开大文件很慢_与xlrd相比,使用openpyxl读取Excel文件的速度要慢一些

我有一个Excel电子表格,我需要每天导入到SQL Server.电子表格将在大约50列中包含大约250,000行.我使用openpyxl和xlrd使用几乎相同的代码测试了它们.这是我正在使用的代码(减去调试语句)：import xlrdimport openpyxldef UseXlrd(file_name):workbook = xlrd.open_workbook(file_name, on...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。