pandas notes 25

henry_dx

于 2023-09-10 15:15:24 发布

阅读量106

点赞数

分类专栏： pandas 文章标签： pandas

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/henry_dx/article/details/132791413

版权

pandas 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1.显示已安装的版本

pd.__version__

pd.show_versions()

2.创建示例DataFrame

3.更改列名

df = df.rename({ col one : col_one , col two : col_two }, axis= columns )

df.columns = [ col_one , col_two ]

如果你需要在列名中添加前缀或者后缀，你可以使用add_prefix()函数，或者使用add_suffix()函数

4.行序反转

drinks.loc[::-1].reset_index(drop=True).head()

5.列序反转

drinks.loc[:, ::-1].head()

6.通过数据类型选择列

选取数值型的列：drinks.select_dtypes(include= number ).head()

drinks.select_dtypes(exclude= number ).head()

7.将字符型转换为数值型

df.astype({‘col_one’:’float’,‘col_two’:’float’})

to_numeric()函数，无效数据转换为NaN

pd.to_numeric(df.col,errors=’coerce’).fillna(0)

df.apply(pd.to_numeric,errors=’coerce’).fillna(0)

8.减小DataFrame空间大小

9.按行从多个文件中构建DataFrame

from glob import glob

files=sorted(glob(‘stock*.csv))

pd.concat((pd.read_csv(file) for file in files),ignore_index=True)

10.按行从多个文件中构建DataFrame

pd.concat((pd.read_csv(file) for file in files),axis=’colomns’,ignore_index=True)

11. 从剪贴板中创建DataFrame

read_clipboard()

12. 将DataFrame划分为两个随机的子集

movies_1=movies.sample(frac=0.75,random_state=1234）

movies_2=movies.drop(Movies_1.index)

13. 通过多种类型对DataFrame进行过滤

Movies.genre.unique()

以"or"符号分隔，isin()函数

如果你想要进行相反的过滤，那么你可以在过滤条件前加上破浪号

在Python中，波浪号表示“not”操作

14. 从DataFrame中筛选出数量最多的类别

value_counts()函数，并将它保存成counts（type为Series），该Series的nlargest()函数

counts=movies.genre.value_counts()

movies[movies.genre.isin(counts.nlargest(3).index)]

15. 处理缺失值

可以使用isna()函数

Ufo.isna().sum()

16. 将一个字符串划分成多个列

Df[[‘first’,’middle’,’last’]]=df.name.str.split(‘ ‘,expand=True）

Df[‘city’]=df.location.str.split(‘,’expand=True)[0]

17. 将一个由列表组成的Series扩展成DataFrame

第二列包含了Python中的由整数元素组成的列表。如果我们想要将第二列扩展成DataFrame，我们可以对那一列使用apply()函数并传递给Series constructor:

Df_new=df.col_two.apply(pd.series)

Pd.concat([df,df_new],axis=’columns’)

18. 对多个函数进行聚合

Order.groupby(“id”).price.agg([‘sum’,’count’])

19. 将聚合结果与DataFrame进行组合

Order.groupby(“id”).price.transform(‘sum’)

20.选取行和列的切片

21. 对MultiIndexed Series进行重塑

unstack()函数

22. 创建数据透视表（pivot table）

pivot_table()函数,想要使用数据透视表，你需要指定索引(index), 列名(columns), 值(values)和聚合函数(aggregation function)。数据透视表的另一个好处是你可以通过设置margins=True轻松地将行和列都加起来。

23. 将连续数据转变成类别数据

一个解决办法是对年龄范围打标签，比如"adult", "young adult", "child"。实现该功能的最好方式是使用cut()函数。

Pd.cut(titanic.age,bins=[0,18,30,99],labels=[‘a’,’b’,’c’])

24. 更改显示选项

pd.set_option(‘display.float_format’,’{:.2f}’.format)

25. Style a DataFrame

可以创建一个格式化字符串的字典，用于对每一列进行格式化。然后将其传递给DataFrame的style.format()函数。

format_dict={‘date’:’{:%m/%d/%y}’,’close’:’{:.2f}’}

stocks.style.format(format_dict)

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas notes 25

想要使用数据透视表，你需要指定索引(index), 列名(columns), 值(values)和聚合函数(aggregation function)。然后将其传递给DataFrame的style.format()函数。value_counts()函数，并将它保存成counts（type为Series）如果你需要在列名中添加前缀或者后缀，你可以使用add_prefix()函数。17. 将一个由列表组成的Series扩展成DataFrame。14. 从DataFrame中筛选出数量最多的类别。
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。