.xlsb 格式是Binary格式存储的excel文件,比普通的xlsx文件的体积要小很多,在数据量极大的场景比较多用。
目前有两种方式读取 .xlsb 格式的文件
Pandas
官方文档:Binary Excel (.xlsb) files
The read_excel() method can also read binary Excel files using the pyxlsb module. The semantics and features for reading binary Excel files mostly match what can be done for Excel files using engine=‘pyxlsb’. pyxlsb does not recognize datetime types in files and will return floats instead.
Currently pandas only supports reading binary Excel files. Writing is not implemented.
import pandas as pd
df_excel = pd.read_excel('test.xlsb', engine='pyxlsb',sheet_name='详细数据')
但如果表中有合并的单元格,则在处理dataframe的时候会出错,可以用pyxlsb库逐行读取数据
pyxlsb
import pyxlsb as px
import pandas as pd
path = "test.xlsb"
list_row=[]
list_accumulate= []
# 逐行读取
with px.open_workbook(path) as wb:
sheets = wb.sheets
for sheet in sheets:
row_generator = wb.get_sheet(sheet).rows()
for row in row_generator:
for cell in row:
list_row.append(cell.v)
list_accumulate.append(list_row)
list_row=[]
df_excel = pd.DataFrame(list_accumulate)