好用却可能不为人熟知的 Pandas 小众函数 1

最新推荐文章于 2021-02-09 17:38:32 发布

Varalpha

最新推荐文章于 2021-02-09 17:38:32 发布

阅读量343

点赞数 1

分类专栏： # Pandas 文章标签： python pandas 数据分析

本文链接：https://blog.csdn.net/Varalpha/article/details/105935991

版权

Pandas 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

看到了几个不常用的Pandas函数，虽然暂时用不到，但先记下来以后总有可能用到
总目录

Pandas 小众函数收录

为了举例，调取 Pandas 数据。
由于前段时间做股票期权数据，那个数据类型比较全，调用方便，所以这里使用它。

使用 Tushare，随便调用一个数据。

import pandas as pd
import tushare as tus
pro = tus.pro_api()
df = pro.opt_basic(exchange='SSE')
df.head(3)

	ts_code	exchange	name	per_unit	opt_code	opt_type	call_put	exercise_type	exercise_price	s_month	maturity_date	list_price	list_date	delist_date	last_edate	last_ddate	quote_unit	min_price_chg
0	10000579.SH	SSE	华夏上证50ETF期权1604认购2.15	10000.0	OP510050.SH	ETF期权	C	欧式	2.15	201604	20160427	0.0412	20160225	20160427	20160427	20160428	人民币元	0.0001
1	10000108.SH	SSE	华夏上证50ETF期权1505认购2.65	10000.0	OP510050.SH	ETF期权	C	欧式	2.65	201505	20150527	0.1006	20150326	20150527	20150527	20150528	人民币元	0.0001
2	10000111.SH	SSE	华夏上证50ETF期权1505认沽2.55	10000.0	OP510050.SH	ETF期权	P	欧式	2.55	201505	20150527	0.0906	20150326	20150527	20150527	20150528	人民币元	0.0001

cut() 数值划分成等额的数份

cut() 能将数值划分成等额的数份

# 期权价格的数据类型为 float
df['exercise_price'].dtype

dtype('float64')

pd.cut(df['exercise_price'],5)

0       (1.797, 2.36]
1        (2.36, 2.92]
2        (2.36, 2.92]
3        (2.92, 3.48]
4        (2.92, 3.48]
            ...      
2517     (2.92, 3.48]
2518     (2.92, 3.48]
2519     (2.92, 3.48]
2520     (2.92, 3.48]
2521     (2.92, 3.48]
Name: exercise_price, Length: 2522, dtype: category
Categories (5, interval[float64]): [(1.797, 2.36] < (2.36, 2.92] < (2.92, 3.48] < (3.48, 4.04] < (4.04, 4.6]]

如上面的输出，该函数按大约相差 0.56 把期权价分成了五份：
0. $(1.797, 2.36]$
1. $(2.36, 2.92]$
2. $(2.92, 3.48]$
3. $(3.48, 4.04]$
4. $(4.04, 4.6]$
若只取序号的话，如下

pd.cut(df['exercise_price'],5,labels=False)

0       0
1       1
2       1
3       2
4       2
       ..
2517    2
2518    2
2519    2
2520    2
2521    2
Name: exercise_price, Length: 2522, dtype: int64

pd.cut(df['exercise_price'],5,labels=False).unique()

array([0, 1, 2, 3, 4], dtype=int64)

idxmax() & idxmin() 返回最大(小)值第一个索引

从函数名就可以直观地看出这两个函数的作用：返回最大值与最小值的函数索引。
作用返回最大值和最小值第一个索引
我们一般使用以下的代码去获取最小与最大值的第一个索引

# 普通取最小值的第一个索引
df[df['exercise_price']==df['exercise_price'].min()].index[0]

使用 idxmin()的话

# idxmin() 取最小值的第一个索引
df['exercise_price'].idxmin()

idxmax() 与 idxmin() 类似。
由于它只取第一个最小(大)值，一般来说，我们都要取所有最小(大)值的索引。实话来说，使用这函数的机会确实不是很大，虽然看着是更加简洁与高效。

nsmallest() & nlargest() 取n个最小(大)值

取n个最小(大)值

df['exercise_price'].nsmallest(100)

146     1.8
302     1.8
381     1.8
639     1.8
763     1.8
       ... 
1256    2.0
1471    2.0
1505    2.0
1541    2.0
1577    2.0
Name: exercise_price, Length: 100, dtype: float64

df[['exercise_price','ts_code']].nsmallest(5,'exercise_price')

	exercise_price	ts_code
146	1.8	10000384.SH
302	1.8	10000568.SH
381	1.8	10000396.SH
639	1.8	10000393.SH
763	1.8	10000567.SH

pivot_table 数据透视表

创建一个电子表格样式的数据透视表
例如：我们创建一个各个价格的挂牌基准价的平均数

df.pivot_table(index='exercise_price',columns='call_put',values='list_price',aggfunc='mean')

call_put	C	P
exercise_price
1.800	0.190171	0.163243
1.850	0.162160	0.158540
1.900	0.133750	0.181680
1.908	0.178900	0.080900
1.950	0.165933	0.136525
...	...	...
4.200	0.096738	0.291938
4.300	0.068188	0.350062
4.400	0.057617	0.375683
4.500	0.049117	0.426150
4.600	0.053167	0.456633