python怎么读excelsheet,Python:使用Openpyxl读取大型Excel工作表

I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.

I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

if num.find(substring) > -1:

sheet=importedfile.get_sheet_by_name(num)

name = "C:/Users/User/Desktop/Test/" + num + ".csv"

with open(name, 'w', newline='') as file:

savefile = csv.writer(file)

for i in sheet.rows:

savefile.writerow([cell.value for cell in i])

file.close()

print(time.ctime())

Any help would be appreciated.

Thanks

EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.

解决方案

Try using the read_only=True property for load_workbook() class, this causes the worksheets you get to be IterableWroksheet , meaning you can only iterate over them, you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption according to documentation .

Also, you do not need to close the file, with statement would handle that for you.

Example -

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

if num.find(substring) > -1:

sheet=importedfile.get_sheet_by_name(num)

name = "C:/Users/User/Desktop/Test/" + num + ".csv"

with open(name, 'w', newline='') as file:

savefile = csv.writer(file)

for i in sheet.rows:

savefile.writerow([cell.value for cell in i])

print(time.ctime())

Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值