I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.
I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
file.close()
print(time.ctime())
Any help would be appreciated.
Thanks
EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.
解决方案
Try using the read_only=True property for load_workbook() class, this causes the worksheets you get to be IterableWroksheet , meaning you can only iterate over them, you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption according to documentation .
Also, you do not need to close the file, with statement would handle that for you.
Example -
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.