前言
问题:按照指定行数切分excel到多个表格中
解决:pandas.DataFrame.iloc
DataFrame.iloc
Purely integer-location based indexing for selection by position.
.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
“基于整数位置的索引,用于按位置进行选择。”
一、df.iloc
示例:
scores = np.random.randint(50, 101, (5, 3))
names = ('张三', '李四', '王五', '黄韬', '彦霖')
courses = ('语文', '数学', '英语')
df = pd.DataFrame(data=scores, columns=courses, index=names)
df
切片获得第2行和第3行:
df.iloc[1:3]
二、代码
1.功能实现
"""
按行数分割表格函数
1.按照指定的行数分割表格
2.分割出来的表格按照序号命名
"""
import pandas as pd
import os
def splitExcel(excel_file, save_dir, num):
if not os.path.exists(save_dir):
os.mkdir(save_dir)
df = pd.DataFrame(pd.read_excel(excel_file, sheet_name=0))
row_num = int(df.shape[0]) # 获取行数
if num >= row_num: # 如果分割行数大于总行数,报错
raise Exception('too much!!')
row_list = []
try:
for i in list(range(num,row_num,num)):
row_list.append(i)
row_list.append(row_num) # 得到完整列表
except Exception as e:
print(e)
(name,ext) = os.path.splitext(excel_file) # 获取文件名
n = 1
for m in row_list:
filename = os.path.join(save_dir, name.split("/")[-1] + '-' + str(n) + '.xlsx')
# print(filename)
if m < row_num:
df_handle = df.iloc[m-num:m] # 获取n行之前
writer = pd.ExcelWriter(filename, engine='xlsxwriter', options={'strings_to_urls': False})
df_handle.to_excel(writer,index=False)
writer.save()
elif m == int(row_num):
remainder = int(int(row_num) % num) # 余数
df_handle = df.iloc[m-remainder:m] # 获取最后不能整除的行
writer = pd.ExcelWriter(filename, engine='xlsxwriter', options={'strings_to_urls': False})
df_handle.to_excel(writer,index=False)
writer.save()
n = n + 1
2.使用
if __name__ == '__main__':
excel_file = r"filename.xlsx" # 需拆分excel文件夹路径
save_dir = "split_result" # 创建目录
plitExcel(excel_file, save_dir, num=1000) # 按数量Num拆分
print("拆分完成")
总结
- 切片时只需切割某些列,使用iloc[a:b, c:d]
- 可使用lambda匿名函数实现多种功能:如使用
df.iloc[lambda x: x.index % 2 == 0]
获取索引标签为偶数的行 - 也可传入布尔值列表以获取满足条件的值
df.iloc[[True, False, True, …]]
- 使用xlrd、xlwt等库实现