Python中数据加载、存储与文件格式

最新推荐文章于 2024-02-27 16:15:25 发布

VIP文章 hustqb

最新推荐文章于 2024-02-27 16:15:25 发布

阅读量2.7k

点赞数 2

分类专栏： Python 文章标签： python 数据库读写

本文链接：https://blog.csdn.net/hustqb/article/details/54343610

版权

摘自《利用Python进行数据分析》—— Wes McKinney

pandas中的解析函数：

函数	说明
read_csv	从文件、URL、文件型对象中加载带分隔符的数据。默认分隔符为逗号
read_table	从文件、URL、文件型对象中加载带分隔符的数据。默认分隔符为制表符(“\t”)
read_fwf	读取定宽列格式的数据(也就是说，没有分隔符)
read_clipboard	读取剪贴板中的数据，可以看作read_table的剪贴板版。在将网页转换为表格式时很有用

我将大致介绍一下这些函数在将文本数据转换为DataFrame时所用到的一些技术。这些函数的选项可以划分为以下几个大类：

对于常用的pd.read_csv()，官方给出划分：

类别	参数
Begin	filepath_or_buffer, sep=’,’, delimiter
Column and Index Location and Names	header, names, idex_col, usecols, squeeze, prefix, mangle_dupe_cols
General Parsing Configuration	dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows
NA and Missing Data Handling	na_values, keep_default_na, na_filter, verbose, skip_blank_lines
Datetime Handling	parse_date, infer_datetime_format, keep_date_col, date_parser, dayfirst
Iteration	iterator, chunksize
Quoting, Compression, and File Format	compression, thousands, decimal, lineterminator, quotechar, quating, escapechar, comment, encoding, dialect, tupleize_clos
Error Handling	error_bad_lines, warn_bad_lines
Internal	skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision