python 读excel是把所有数据都读到内存里吗_excel 大文件如何用python 分段读取

最新推荐文章于 2024-06-12 22:31:27 发布

weixin_39945792

最新推荐文章于 2024-06-12 22:31:27 发布

阅读量122

点赞数

文章标签： python 读excel是把所有数据都读到内存里吗

题主发现的问题如果还是一下子读取，可以尝试根据一楼回答修改下代码就可以做到不全部读取了。

代码中主动指定sheetname，不用pd.ExcelFile读取方式获取sheetname就可以运行以下函数了。

import os

import pandas as pd

HERE = os.path.abspath(os.path.dirname(__file__))

DATA_DIR = os.path.abspath(os.path.join(HERE, '..', 'data'))

def make_df_from_excel(file_name, nrows):

"""Read from an Excel file in chunks and make a single DataFrame.

Parameters

----------

file_name : str

nrows : int

Number of rows to read at a time. These Excel files are too big,

so we can't read all rows in one go.

"""

file_path = os.path.abspath(os.path.join(DATA_DIR, file_name))

# 源代码注释掉以下这一段

#xl = pd.ExcelFile(file_path)

## In this case, there was only a single Worksheet in the Workbook.

#sheetname = xl.sheet_names[0]

# 主动给予sheet名字

sheetname = "sheet1"

# Read the header outside of the loop, so all chunk reads are

# consistent across all loop iterations.

df_header = pd.read_excel(file_path, sheetname=sheetname, nrows=1)

print(f"Excel file: {file_name} (worksheet: {sheetname})")

chunks = []

i_chunk = 0

# The first row is the header. We have already read it, so we skip it.

skiprows = 1

while True:

df_chunk = pd.read_excel(

file_path, sheetname=sheetname,

nrows=nrows, skiprows=skiprows, header=None)

skiprows += nrows

# When there is no data, we know we can break out of the loop.

if not df_chunk.shape[0]:

break

else:

print(f" - chunk {i_chunk} ({df_chunk.shape[0]} rows)")

chunks.append(df_chunk)

i_chunk += 1

df_chunks = pd.concat(chunks)

# Rename the columns to concatenate the chunks with the header.

columns = {i: col for i, col in enumerate(df_header.columns.tolist())}

df_chunks.rename(columns=columns, inplace=True)

df = pd.concat([df_header, df_chunks])

return df

if __name__ == '__main__':

df = make_df_from_excel('claims-2002-2006_0.xls', nrows=10000)

weixin_39945792

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 读excel是把所有数据都读到内存里吗_excel 大文件如何用python 分段读取

题主发现的问题如果还是一下子读取，可以尝试根据一楼回答修改下代码就可以做到不全部读取了。代码中主动指定sheetname，不用pd.ExcelFile读取方式获取sheetname就可以运行以下函数了。import osimport pandas as pdHERE = os.path.abspath(os.path.dirname(__file__))DATA_DIR = os.path.abs...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。