python怎么读excelsheet_Python：使用Openpyxl读取大型Excel工作表

最新推荐文章于 2024-07-18 10:10:14 发布

weixin_39901685

最新推荐文章于 2024-07-18 10:10:14 发布

阅读量150

点赞数

文章标签： python怎么读excelsheet

I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.

I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

if num.find(substring) > -1:

sheet=importedfile.get_sheet_by_name(num)

name = "C:/Users/User/Desktop/Test/" + num + ".csv"

with open(name, 'w', newline='') as file:

savefile = csv.writer(file)

for i in sheet.rows:

savefile.writerow([cell.value for cell in i])

file.close()

print(time.ctime())

Any help would be appreciated.

Thanks

EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.

解决方案

Try using the read_only=True property for load_workbook() class, this causes the worksheets you get to be IterableWroksheet , meaning you can only iterate over them, you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption according to documentation .

Also, you do not need to close the file, with statement would handle that for you.

Example -

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

if num.find(substring) > -1:

sheet=importedfile.get_sheet_by_name(num)

name = "C:/Users/User/Desktop/Test/" + num + ".csv"

with open(name, 'w', newline='') as file:

savefile = csv.writer(file)

for i in sheet.rows:

savefile.writerow([cell.value for cell in i])

print(time.ctime())

Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.