pandas 分块读取大规模数据

最新推荐文章于 2023-05-27 21:22:06 发布

无忧→捕获一只程序员

最新推荐文章于 2023-05-27 21:22:06 发布

阅读量603

点赞数 1

分类专栏： python经验 pandas 案列分享文章标签： python 数据分析

本文链接：https://blog.csdn.net/qq_41793928/article/details/107528046

版权

python经验同时被 3 个专栏收录

36 篇文章 2 订阅

订阅专栏

案列分享

11 篇文章 1 订阅

订阅专栏

pandas

8 篇文章 0 订阅

订阅专栏

pandas 分块读取大规模数据

__author__ = '未昔/angelfate'
__date__ = '2019/7/2 1:30'
# -*- coding: utf-8 -*-

path = r'E:\python\Study\BiGData\new_data.csv'
@timeit
def test_1():
    print('test_1')
    df = pd.read_csv(path, engine='python', encoding='gbk')


@timeit
def test_2():
    print('test_2')
    df = pd.read_csv(path, engine='python', encoding='gbk', iterator=True)  # 分块，每一块是一个chunk，之后将chunk进行拼接；

    loop = True
    chunkSize = 10000
    chunks = []
    while loop:
        try:
            chunk = df.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
            print("Iteration is stopped.")
    df = pd.concat(chunks, ignore_index=True)

test_1()
test_2()