利用Python进行数据预处理

最新推荐文章于 2024-06-21 13:19:16 发布

柚子一只

最新推荐文章于 2024-06-21 13:19:16 发布

阅读量8.5k

点赞数 4

本文链接：https://blog.csdn.net/qq_32572085/article/details/87979333

版权

数据导入到python环境：http://pandas.pydata.org/pandas-docs/stable/io.html（英文版）IO Tools (Text, CSV, HDF5, ...)The pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that gene...

摘要由CSDN通过智能技术生成

数据导入到python环境：http://pandas.pydata.org/pandas-docs/stable/io.html（英文版）

IO Tools (Text, CSV, HDF5, ...)

The pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a pandasobject.

read_csv

read_excel

read_hdf

read_sql

read_json

read_msgpack (experimental)

read_html

read_gbq (experimental)

read_stata

read_sas

read_clipboard

read_pickle

The corresponding writer functions are object methods that are accessed like df.to_csv()

to_csv

to_excel

to_hdf

to_sql

to_json

to_msgpack (experimental)

to_html

to_gbq (experimental)

to_stata

to_clipboard

to_pickle

Here is an informal performance comparison for some of these IO methods.

Note

For examples that use the StringIO class, make sure you import it according to your Python version, i.e.from StringIO import StringIO for Python 2 and from io import StringIO for Python 3.

中文版read_csv参数详解：

pandas.read_csv参数详解

pandas.read_csv参数整理

读取CSV（逗号分割）文件到DataFrame

也支持文件的部分导入和选择迭代

参数：

filepath_or_buffer : str，pathlib。str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)

可以是URL，可用URL类型包括：http, ftp, s3和文件。对于多文件正在准备中

本地文件读取实例：://localhost/path/to/table.csv

sep : str, default ‘,’

指定分隔符。如果不指定参数，则会尝试使用逗号分隔。分隔符长于一个字符并且不是‘\s+’,将使用python的语法分析器。并且忽略数据中的逗号。正则表达式例子：'\r\t'

delimiter : str, default None

定界符，备选分隔符（如果指定该参数，则sep参数失效）

delim_whitespace : boolean, default False.

指定空格(例如’ ‘或者’ ‘)是否作为分隔符使用，等效于设定sep='\s+'。如果这个参数设定为Ture那么delimiter 参数失效。

在新版本0.18.1支持

header : int or list of ints, default ‘infer’

指定行数用来作为列名，数据开始行数。如果文件中没有列名，则默认为0，否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如：[0,1,3]，这个list表示将文件中的这些行作为列标题（意味着每一列有多个标题），介于中间的行将被忽略掉（例如本例中的2；本例中的数据1,2,4行将被作为多级标题出现，第3行数据将被丢弃，dataframe的数据从第5行开始。）。

注意：如果skip_

最低0.47元/天解锁文章

柚子一只

关注

4
点赞
踩
32

收藏

觉得还不错? 一键收藏
0
评论
利用Python进行数据预处理

数据导入到python环境：http://pandas.pydata.org/pandas-docs/stable/io.html（英文版）IO Tools (Text, CSV, HDF5, ...)The pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that gene...
复制链接

扫一扫