python 读excel是把所有数据都读到内存里吗_excel 大文件 如何用python 分段读取

题主发现的问题如果还是一下子读取,可以尝试根据一楼回答修改下代码就可以做到不全部读取了。

代码中主动指定sheetname,不用pd.ExcelFile读取方式获取sheetname就可以运行以下函数了。

import os

import pandas as pd

HERE = os.path.abspath(os.path.dirname(__file__))

DATA_DIR = os.path.abspath(os.path.join(HERE, '..', 'data'))

def make_df_from_excel(file_name, nrows):

"""Read from an Excel file in chunks and make a single DataFrame.

Parameters

----------

file_name : str

nrows : int

Number of rows to read at a time. These Excel files are too big,

so we can't read all rows in one go.

"""

file_path = os.path.abspath(os.path.join(DATA_DIR, file_name))

# 源代码注释掉以下这一段

#xl = pd.ExcelFile(file_path)

## In this case, there was only a single Worksheet in the Workbook.

#sheetname = xl.sheet_names[0]

# 主动给予sheet名字

sheetname = "sheet1"

# Read the header outside of the loop, so all chunk reads are

# consistent across all loop iterations.

df_header = pd.read_excel(file_path, sheetname=sheetname, nrows=1)

print(f"Excel file: {file_name} (worksheet: {sheetname})")

chunks = []

i_chunk = 0

# The first row is the header. We have already read it, so we skip it.

skiprows = 1

while True:

df_chunk = pd.read_excel(

file_path, sheetname=sheetname,

nrows=nrows, skiprows=skiprows, header=None)

skiprows += nrows

# When there is no data, we know we can break out of the loop.

if not df_chunk.shape[0]:

break

else:

print(f" - chunk {i_chunk} ({df_chunk.shape[0]} rows)")

chunks.append(df_chunk)

i_chunk += 1

df_chunks = pd.concat(chunks)

# Rename the columns to concatenate the chunks with the header.

columns = {i: col for i, col in enumerate(df_header.columns.tolist())}

df_chunks.rename(columns=columns, inplace=True)

df = pd.concat([df_header, df_chunks])

return df

if __name__ == '__main__':

df = make_df_from_excel('claims-2002-2006_0.xls', nrows=10000)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值