【Python】pandas：基本数据结构（Series和DataFrame），创建Series和DataFrame，将Series和DataFrame保存到文件或从文件获取

最新推荐文章于 2024-10-02 10:53:34 发布

yannan20190313

最新推荐文章于 2024-10-02 10:53:34 发布

阅读量792

点赞数 20

分类专栏： Python 文章标签： python pandas 开发语言

本文链接：https://blog.csdn.net/yannan20190313/article/details/140584588

版权

Python 专栏收录该内容

38 篇文章 0 订阅

订阅专栏

pandas是Python的扩展库（第三方库），为Python编程语言提供高性能、易于使用的数据结构和数据分析工具。

pandas官方文档：https://pandas.pydata.org/pandas-docs/stable/user_guide/

帮助：可使用help(...)查看函数说明文档（若是第三方库的函数，需先导入库）。例如：help(pd.DataFrame)

安装

使用pip安装pandas

pip install pandas

若安装时，报错：Read timed out。

pip install pandas --default-timeout=1000

安装pandas，需要安装其他依赖包，例如：若使用excel，则需要安装openpyxl（pip install openpyxl）。

pandas安装官方文档：https://pandas.pydata.org/docs/getting_started/install.html

导入

Python代码中，导入pandas：

import pandas as pd

Pandas库中基本数据结构：Series和DataFrame

Series：一维数组。一行或一列的数据（任何数据类型）。

DataFrame：二维数据。多行多列，或整个表格。

创建Series

Series(data=None, index=None, dtype: 'Dtype | None' = None, name=None, copy: 'bool' = False, fastpath: 'bool' = False)

pd.Series(单个内容, index=指定索引)：单个内容重复，Series中数据数量与指定索引数量相同。
pd.Series(列表)：默认索引为0到(列表元素数量-1)。
pd.Series(列表, index=指定索引, name=指定名字)：指定索引数量与列表元素数量相同,指定Series名字。
pd.Series(一维数组)：默认索引为0到(一维数组元素数量-1)。
pd.Series(一维数组, index=指定索引)：指定索引数量与一维数组元素数量相同。
pd.Series(字典)：字典的键为Series索引。
pd.Series(字典, index=指定索引)：指定索引，其对应的值为字典中相同键的值，若指定索引不是字典中的键，则对应数据为NaN。
注：默认copy=False，若第一个参数是列表，修改series不会修改原列表，但若第一个参数是一维数组，修改series也会修改原数组。

通过 name 属性获取Series 名字，

通过 index 属性获取Series 索引，

通过 dtype 属性获取Series中数据的类型。

创建DataFrame

DataFrame(data=None, index: 'Axes | None' = None, columns: 'Axes | None' = None, dtype: 'Dtype | None' = None, copy: 'bool | None' = None)

pd.DataFrame(二维数组)：默认行索引为0到(二维数组行数-1)，列索引为0到(二维数组列数-1)。第一个参数是二维数组，则默认copy=False，不拷贝数组，修改DataFrame也会修改数组。
pd.DataFrame(二维数组, index=指定行索引, columns=指定列名)：指定行索引，指定列名（可以从数组中取部分列）。
pd.DataFrame(字典)：字典的键为列名，默认行索引为0到(字典元素数量-1)。第一个参数是字典，则默认copy=True，拷贝字典数据，修改DataFrame不会修改字典。
pd.DataFrame(字典, index=指定行索引)：字典的键为列名，指定行索引。
pd.DataFrame(Series1, Series2)：Series1为数据，Series2为index。
pd.DataFrame( [Series1, Series2] )：第一个参数是列表（列表中2个Series），Series1和Series2分别作为行。
pd.DataFrame( {Series1.name:Series1, Series2.name:Series2})：第一个参数是字典（字典的键为Series名字将作为列名），默认行索引为0到(字典元素数量-1)。

通过 index 属性获取DataFrame 索引，

通过 columns 属性获取DataFrame 列名，

通过 dtypes 属性获取DataFrame 各列数据的类型。

将Series或DataFrame保存到文件

官方文档：IO tools (text, CSV, HDF5, …) — pandas 2.2.2 documentation (pydata.org)

以保存到csv/text文件举例：

to_csv(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None'

Series或DataFrame.to_csv("保存路径和文件名")：将Series或DataFrame保存到csv/text文件中。默认间隔符是逗号。默然输出行索引和列名。
Series或DataFrame.to_csv("保存路径和文件名", sep=指定间隔符, header=False, index=False)：将Series或DataFrame保存到csv/text文件中。指定间隔符（单个字符），不输出列名，不输出行索引。

从文件获取Series或DataFrame

以从csv/text文件获取数据举例：

read_csv的官方文档：pandas.read_csv — pandas 2.2.2 documentation

read_csv(filepath_or_buffer, *, sep=_NoDefault.no_default, delimiter=None, header='infer', names=_NoDefault.no_default, index_col=None, usecols=None, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=_NoDefault.no_default, skip_blank_lines=True, parse_dates=None, infer_datetime_format=_NoDefault.no_default, keep_date_col=_NoDefault.no_default, date_parser=_NoDefault.no_default, date_format=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors='strict', dialect=None, on_bad_lines='error', delim_whitespace=_NoDefault.no_default, low_memory=True, memory_map=False, float_precision=None, storage_options=None, dtype_backend=_NoDefault.no_default)

pd.read_csv("读取的文件")：从csv/text文件中读取数据。默认分隔符是逗号。
pd.read_csv("读取的文件", index_col=指定索引列, header=指定哪行作为列名, usecols=指定获取哪些列, skiprows=指定跳过哪些行)：从csv/text文件中读取数据。默认分隔符是逗号。指定哪一列作为索引列（行索引），指定哪一行作为列名，指定获取哪些列（用列表表示），指定跳过哪些行（用列表表示）。行索引和列索引都从0开始，即第一行的行索引是0，第一列的列索引是0。