dask 并行读取csv_Dask –使用Python处理大型CSV文件的更好方法

最新推荐文章于 2024-07-02 09:00:00 发布

cumei1658

最新推荐文章于 2024-07-02 09:00:00 发布

阅读量3.2k

点赞数 1

文章标签： python java linux 数据分析大数据

原文链接：https://www.pybloggers.com/2016/11/dask-a-better-way-to-work-with-large-csv-files-in-python/

版权

本文介绍了如何使用Dask数据框处理大型CSV文件，避免将整个文件加载到内存中。通过Dask，可以进行核心外分析，并且其API与pandas相似，使得分析变得简单。此外，Dask允许在不加载所有数据的情况下重命名列和执行过滤操作。

摘要由CSDN通过智能技术生成

dask 并行读取csv

In a recent post titled

Working with Large CSV files in Python, I shared an approach I use when I have very large CSV files (and other file types) that are too large to load into memory. While the approach I previously highlighted works well, it can be tedious to first load data into sqllite (or any other database) and then access that database to analyze data. I just found a better approach using Dask.

在最近的一篇标题为“使用Python处理大型CSV文件”的文章中，我分享了一种方法，当我有非常大的CSV文件（和其他文件类型）太大而无法加载到内存中时，可以使用该方法。尽管我之前强调的方法效果很好，但首先将数据加载到sqllite（或任何其他数据库）中，然后访问该数据库以分析数据可能很繁琐。我只是发现使用Dask更好的方法。

While looking around the web to learn about some parallel processing capabilities, I ran across a python module named Dask, which describes itself as:

在网上浏览以了解一些并行处理功能时，我遇到了一个名为Dask的python模块，该模块将自己描述为：

…is a flexible parallel computing library for analytic computing.

…是用于分析计算的灵活并行计算库。

When I saw that, I was intrigued. There’s a lot that can be done with that statement and I’ve got plans to introduce Dask into my various tool sets for data analytics.

当我看到那件事时，我很感兴趣。该语句可以完成很多工作，而且我已经计划将Dask引入我的各种数据分析工具集中。

While reading the docs, I ran across the ‘

最低0.47元/天解锁文章

cumei1658

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
dask 并行读取csv_Dask –使用Python处理大型CSV文件的更好方法

dask 并行读取csvIn a recent post titled Working with Large CSV files in Python, I shared an approach I use when I have very large CSV files (and other file types) that are too large to load into memory. W...
复制链接

扫一扫