python pandas

JasonKQLin

已于 2023-07-30 11:31:11 修改

阅读量426

点赞数

分类专栏： python 文章标签： python pandas 开发语言

于 2020-08-03 22:53:48 首次发布

本文链接：https://blog.csdn.net/linkequa/article/details/107774462

版权

python 专栏收录该内容

31 篇文章 2 订阅

订阅专栏

1，pandas介绍

The name Pandas is derived from the econometrics term Panel Data. Pandas incorporates two additional data structures into Python, namely Pandas Series and Pandas DataFrame.

pandas是在numpy的基础上建立起来的，能够与numpy结合起来使用。

一个dataframe中容许有多种不同的数据类型。

2，pd.Series()

1，与numpy的array很相似，一个series中只能有一种数据类型。与numpy不同之处是多了index参数（numpy的函数操作对series也适用）。
在这里插入图片描述

2，可以通过loc和iloc访问数据

3，删除数据（不加inplace=True就不会改变原来的groceries）
4，对于字符型变量，容许有乘以2的操作，即将字符串复制一遍（在numpy中不容许）。

3，pd.DataFrame()

1，不同列容许是不同数据类型。

2，创建dataframe。
在这里插入图片描述
3，index参数用来访问行，column参数用来访问列。

4，处理NaN数据
#去掉含有NaN的行或列
x.dropna(axis = 0)或x.dropna(axis = 1)

#将所有的NaN用0代替
x.fillna(0)

#将NaN用它前面一行或列的数代替
x.fillna(method = ‘ffill’, axis = 0)或x.fillna(method = ‘ffill’, axis = 1)

#将NaN用它后面一行或列的数代替
x.fillna(method = ‘backfill’, axis = 0)或x.fillna(method = ‘backfill’, axis = 1)

#将NaN用它同行或列的前后平均数代替
x.interpolate(method = ‘linear’, axis = 0)或x.interpolate(method = ‘linear’, axis = 1)

4，从list, dict, pd.series, numpy ndarrays创建pd.dataframe

4.1，list
在这里插入图片描述
4.2，dict（每一个键-值对会以列的形式出现，当值中数据长度不一样时会报错）

4.3，pd.series
pd.series是带标签的一维数组，可以用来存储整数、浮点数、字符串、Python 对象等类型的数据。

4.4 numpy ndarrays
在这里插入图片描述

5，pd.DataFrame骚操作

5.1 访问列的话直接用单中括号切片：df[‘name’]，访问行的话用loc加单中括号切片：df.loc[‘row1’]或df.iloc[1]。

5.2 访问头部、尾部和对数据进行快速统计分别用df.head()，df.tail()和df.describe()
5.3 查看数据的维度、数据类型、统计信息等：df.shape，df.ndim，df.info()，df.dtypes，查看某一列的数据类型：df[‘c1’].dtype。

5.3 对dataframe深拷贝的话要用df.copy()

5.4 增加列与R语言类似，直接df[‘new’] = value即可。

JasonKQLin

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python pandas

1，pandas介绍The name Pandas is derived from the econometrics term Panel Data. Pandas incorporates two additional data structures into Python, namely Pandas Series and Pandas DataFrame.pandas是在numpy的基础上建立起来的，能够与numpy结合起来使用。一个dataframe中容许有多种不同的数据类型。2，pd.Se
复制链接

扫一扫

专栏目录