日常操作,会遇到多个excel文件需要合并操作,即:几十个excel需要合并成一个excel, 写了一个python脚本操作一下。
1、将待合并的excel放在一个文件夹下,例如:path1
2、脚本分解
(1)生成新excel,用于存放合并后的内容
def creat_excel(path, newFileName, sheetName):
workbook = xlw.Workbook(path + '/' + newFileName)
sheet = workbook.add_worksheet(sheetName)
workbook.close()
(2)将重复文件写入到新生成excel中(fileListPath 为 多excel文档所在路径, outFilePath 为合并后excel所在路径),若将两个文件放在同一目录中,循环写入文件时,最好加上判断, 例如:if(file_name.__contains__("report")):
def read_excel(fileListPath, outFilePath, newFileName, sheetName):
# 打开文件
# 写入文件
rows = 1
filePath = os.walk(fileListPath)
# 读取文件夹中所有文件,依次进行文件处理
for path, dir_list, file_list in filePath:
for file_name in file_list:
data = pd.DataFrame(pd.read_excel(fileListPath + '/' +file_name))
book = openpyxl.load_workbook(outFilePath + '/' + newFileName)
writer = pd.ExcelWriter(outFilePath + '/' + newFileName, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
#获得有差异文件的行数
row = data.shape[0]
headerBool = False
data.to_excel(writer, sheet_name=sheetName, index_label=True, header=headerBool, startrow=rows, startcol=0)
rows = rows + row
print(rows)
writer.save()
(3)main 函数中用 tkinter 获取“多excel路径” 即path1。 合并后的excel默认在脚本所在路径
def main():
root = tk.Tk()
root.withdraw()
# 选取 excel文件列表 文件夹路径
fileListPath = filedialog.askdirectory()
# 脚本所在路径
outFilePath = os.path.split(os.path.realpath(__file__))[0]
# 新excel文件名称
newFileName = 'outFile.xlsx'
# sheet name
sheetName = 'sheet1'
creat_excel(outFilePath, newFileName, sheetName)
read_excel(fileListPath, outFilePath, newFileName, sheetName)
3、完整脚本如下:
import xlrd
import xlsxwriter as xlw
import shutil
import openpyxl
import os
import json
import numpy as np
import xlwings as ws
import pandas as pd
import tkinter as tk
from tkinter import filedialog
def read_excel(fileListPath, outFilePath, newFileName, sheetName):
# 打开文件
# 写入文件
rows = 1
filePath = os.walk(fileListPath)
# 读取文件夹中所有文件,依次进行文件处理
for path, dir_list, file_list in filePath:
for file_name in file_list:
# if (file_name.__contains__("report")):
data = pd.DataFrame(pd.read_excel(fileListPath + '/' +file_name))
book = openpyxl.load_workbook(outFilePath + '/' + newFileName)
writer = pd.ExcelWriter(outFilePath + '/' + newFileName, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
# 如需对excel中某一列进行筛选, 可以用data.loc实现, 例如: 将excel中Different列中所有为NO的去除,下面注释代码可实现
# data = data.loc[(data['Different'] != 'No')]
#获得有差异文件的行数
row = data.shape[0]
headerBool = False
data.to_excel(writer, sheet_name=sheetName, index_label=True, header=headerBool, startrow=rows, startcol=0)
rows = rows + row
print(rows)
writer.save()
def creat_excel(path, newFileName, sheetName):
workbook = xlw.Workbook(path + '/' + newFileName)
sheet = workbook.add_worksheet(sheetName)
workbook.close()
def main():
root = tk.Tk()
root.withdraw()
# 选取 excel文件列表 文件夹路径
fileListPath = filedialog.askdirectory()
# 脚本所在路径
outFilePath = os.path.split(os.path.realpath(__file__))[0]
# 新excel文件名称
newFileName = 'outFile.xlsx'
# sheet name
sheetName = 'sheet1'
creat_excel(outFilePath, newFileName, sheetName)
read_excel(fileListPath, outFilePath, newFileName, sheetName)
if __name__ == '__main__':
main()