3.3 DataFrame运算
- 3.3.1算数运算
- 算数运算符
- 算数运算函数
- 3.3.2 逻辑运算
- 逻辑运算符
- 逻辑运算函数
- 3.3.3 统计运算
- 3.3.4 自定义运算
3.3.1 算数运算
- 使用算术运算符
- 使用算数运算函数
- add(other)——加
- sub(other)——减
1. 使用算数运算符
data['open'].head()
trade_date
20200313 222
20200312 100
20200311 100
20200310 100
20200309 100
Name: open, dtype: int64
(data['open'] + 3).head()
trade_date
20200313 225
20200312 103
20200311 103
20200310 103
20200309 103
Name: open, dtype: int64
(data + 10).head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200313 | 2897.4265 | 232 | 2920.8812 | 2809.9841 | -26.0591 | 8.7666 | 366450446.0 | 393019675.2 |
---|
20200312 | 2933.4856 | 110 | 2954.4651 | 2916.2838 | -35.0318 | 8.4830 | 307778467.0 | 328209212.4 |
---|
20200311 | 2978.5174 | 110 | 3020.0286 | 2978.5174 | -18.2444 | 9.0575 | 352470980.0 | 378766629.0 |
---|
20200310 | 3006.7618 | 110 | 3010.2963 | 2914.7989 | 63.4711 | 11.8167 | 393296658.0 | 425017194.8 |
---|
20200309 | 2953.2907 | 110 | 2999.2051 | 2950.7138 | -81.2206 | 6.9939 | 414560746.0 | 438143864.6 |
---|
(data['close'] - data['open']).head()
trade_date
20200313 2665.4265
20200312 2823.4856
20200311 2868.5174
20200310 2896.7618
20200309 2843.2907
dtype: float64
2. 使用算数运算函数
data['open'].add(5).head()
trade_date
20200313 227
20200312 105
20200311 105
20200310 105
20200309 105
Name: open, dtype: int64
data.head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200313 | 2887.4265 | 222 | 2910.8812 | 2799.9841 | -36.0591 | -1.2334 | 366450436.0 | 393019665.2 |
---|
20200312 | 2923.4856 | 100 | 2944.4651 | 2906.2838 | -45.0318 | -1.5170 | 307778457.0 | 328209202.4 |
---|
20200311 | 2968.5174 | 100 | 3010.0286 | 2968.5174 | -28.2444 | -0.9425 | 352470970.0 | 378766619.0 |
---|
20200310 | 2996.7618 | 100 | 3000.2963 | 2904.7989 | 53.4711 | 1.8167 | 393296648.0 | 425017184.8 |
---|
20200309 | 2943.2907 | 100 | 2989.2051 | 2940.7138 | -91.2206 | -3.0061 | 414560736.0 | 438143854.6 |
---|
data.sub(100).head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200313 | 2787.4265 | 122 | 2810.8812 | 2699.9841 | -136.0591 | -101.2334 | 366450336.0 | 393019565.2 |
---|
20200312 | 2823.4856 | 0 | 2844.4651 | 2806.2838 | -145.0318 | -101.5170 | 307778357.0 | 328209102.4 |
---|
20200311 | 2868.5174 | 0 | 2910.0286 | 2868.5174 | -128.2444 | -100.9425 | 352470870.0 | 378766519.0 |
---|
20200310 | 2896.7618 | 0 | 2900.2963 | 2804.7989 | -46.5289 | -98.1833 | 393296548.0 | 425017084.8 |
---|
20200309 | 2843.2907 | 0 | 2889.2051 | 2840.7138 | -191.2206 | -103.0061 | 414560636.0 | 438143754.6 |
---|
data['close'].sub(data['open']).head()
trade_date
20200313 2665.4265
20200312 2823.4856
20200311 2868.5174
20200310 2896.7618
20200309 2843.2907
dtype: float64
3.3.2 逻辑运算
1. 逻辑运算符 <、>、|、&
- 例如筛选pct_chg > 1 的日期数据
- data[‘pct_chg’] > 1 返回布尔值组成的Series
data['pct_chg'] > 1
trade_date
20200313 False
20200312 False
20200311 False
20200310 True
20200309 False
...
19910719 False
19910718 True
19910717 False
19910716 False
19910715 False
Name: pct_chg, Length: 7002, dtype: bool
布尔索引
data[data['pct_chg'] > 2].head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200302 | 2970.9312 | 100 | 2982.5068 | 2899.3100 | 90.6274 | 3.1465 | 367333369.0 | 397244201.2 |
---|
20200217 | 2983.6224 | 100 | 2983.6371 | 2924.9913 | 66.6147 | 2.2837 | 313198007.0 | 367014340.1 |
---|
20190819 | 2883.0960 | 100 | 2883.0960 | 2829.8542 | 59.2722 | 2.0990 | 214546668.0 | 247092216.3 |
---|
20190701 | 3044.9028 | 100 | 3045.3669 | 3014.6871 | 66.0244 | 2.2164 | 250840433.0 | 266541056.9 |
---|
20190620 | 2987.1186 | 100 | 2997.3888 | 2915.0895 | 69.3157 | 2.3756 | 291011537.0 | 288296546.4 |
---|
data[(data['pct_chg'] > 2) & (data['low'] > 2000)].head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200302 | 2970.9312 | 100 | 2982.5068 | 2899.3100 | 90.6274 | 3.1465 | 367333369.0 | 397244201.2 |
---|
20200217 | 2983.6224 | 100 | 2983.6371 | 2924.9913 | 66.6147 | 2.2837 | 313198007.0 | 367014340.1 |
---|
20190819 | 2883.0960 | 100 | 2883.0960 | 2829.8542 | 59.2722 | 2.0990 | 214546668.0 | 247092216.3 |
---|
20190701 | 3044.9028 | 100 | 3045.3669 | 3014.6871 | 66.0244 | 2.2164 | 250840433.0 | 266541056.9 |
---|
20190620 | 2987.1186 | 100 | 2997.3888 | 2915.0895 | 69.3157 | 2.3756 | 291011537.0 | 288296546.4 |
---|
(data['pct_chg'] > 2) & (data['low'] > 2000)
trade_date
20200313 False
20200312 False
20200311 False
20200310 False
20200309 False
...
19910719 False
19910718 False
19910717 False
19910716 False
19910715 False
Length: 7002, dtype: bool
2. 逻辑运算函数
- query(expression)
- isin(values)
# 通过使用query来简化布尔索引
data.query('pct_chg > 2 & low > 2000').head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200302 | 2970.9312 | 100 | 2982.5068 | 2899.3100 | 90.6274 | 3.1465 | 367333369.0 | 397244201.2 |
---|
20200217 | 2983.6224 | 100 | 2983.6371 | 2924.9913 | 66.6147 | 2.2837 | 313198007.0 | 367014340.1 |
---|
20190819 | 2883.0960 | 100 | 2883.0960 | 2829.8542 | 59.2722 | 2.0990 | 214546668.0 | 247092216.3 |
---|
20190701 | 3044.9028 | 100 | 3045.3669 | 3014.6871 | 66.0244 | 2.2164 | 250840433.0 | 266541056.9 |
---|
20190620 | 2987.1186 | 100 | 2997.3888 | 2915.0895 | 69.3157 | 2.3756 | 291011537.0 | 288296546.4 |
---|
用isin()判断open列是否为100或200
data['open'].isin([100, 200]).head()
trade_date
20200313 False
20200312 True
20200311 True
20200310 True
20200309 True
Name: open, dtype: bool
data[data['open'].isin([100, 200])].head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200312 | 2923.4856 | 100 | 2944.4651 | 2906.2838 | -45.0318 | -1.5170 | 307778457.0 | 328209202.4 |
---|
20200311 | 2968.5174 | 100 | 3010.0286 | 2968.5174 | -28.2444 | -0.9425 | 352470970.0 | 378766619.0 |
---|
20200310 | 2996.7618 | 100 | 3000.2963 | 2904.7989 | 53.4711 | 1.8167 | 393296648.0 | 425017184.8 |
---|
20200309 | 2943.2907 | 100 | 2989.2051 | 2940.7138 | -91.2206 | -3.0061 | 414560736.0 | 438143854.6 |
---|
20200306 | 3034.5113 | 100 | 3052.4439 | 3029.4632 | -37.1658 | -1.2100 | 362061533.0 | 377388542.7 |
---|
3.3.3 统计运算
- describe()
- 统计函数
- count – 非空值的数目
- sum – 求和
- mean – 平均值
- median – 中位数
- min – 最小值
- max – 最大值
- mode – Mode
- abs – 绝对值
- prod – Product of values
- std – 标准差
- var – 方差
- idxmax – 最大值所在的位置
- idxmin – 最小值所在的位置
**对单个函数进行统计的时候,坐标轴还是按照这些默认为columns(axis=0, default),如果要对index进行统计,则要指明(axis=1)**这里的axis取值与axis=0对应index,axis=1对应columns不同
- 使用0值表示沿着每一列或行标签\索引值向下执行方法
- 使用1值表示沿着每一行或者列标签模向执行对应的方法
1. describe()——统计count、mean、std、min、max、百分位数
data.describe()
| close | open | high | low | change | pct_chg | vol | amount |
---|
count | 7002.000000 | 7002.000000 | 7002.000000 | 7002.000000 | 7002.000000 | 7002.000000 | 7.002000e+03 | 7.002000e+03 |
---|
mean | 1995.338756 | 100.017424 | 2013.067255 | 1973.510802 | 0.393486 | 0.071241 | 7.701220e+07 | 8.319068e+07 |
---|
std | 1048.414818 | 1.457971 | 1057.291376 | 1035.686578 | 41.442881 | 2.484322 | 1.068379e+08 | 1.309678e+08 |
---|
min | 133.140000 | 100.000000 | 134.100000 | 131.870000 | -354.684000 | -16.393700 | 2.500000e+02 | 3.149740e+02 |
---|
25% | 1191.735250 | 100.000000 | 1203.126250 | 1179.751250 | -12.965025 | -0.756450 | 5.149394e+06 | 4.277775e+06 |
---|
50% | 1898.920000 | 100.000000 | 1914.407000 | 1883.537500 | 0.934000 | 0.069500 | 2.656829e+07 | 1.763237e+07 |
---|
75% | 2838.853250 | 100.000000 | 2865.606500 | 2807.608000 | 14.439000 | 0.863650 | 1.204594e+08 | 1.274506e+08 |
---|
max | 6092.057000 | 222.000000 | 6124.044000 | 6040.713000 | 649.500000 | 105.269100 | 8.571328e+08 | 1.309925e+09 |
---|
data.max()
close 6.092057e+03
open 2.220000e+02
high 6.124044e+03
low 6.040713e+03
change 6.495000e+02
pct_chg 1.052691e+02
vol 8.571328e+08
amount 1.309925e+09
dtype: float64
data.max(axis=1)
trade_date
20200313 393019665.2
20200312 328209202.4
20200311 378766619.0
20200310 425017184.8
20200309 438143854.6
...
19910719 10823.0
19910718 847.0
19910717 660.0
19910716 2796.0
19910715 11938.0
Length: 7002, dtype: float64
data.idxmax()
close 20071016
open 20200313
high 20071016
low 20071016
change 19920521
pct_chg 19920521
vol 20150420
amount 20150608
dtype: int64
data.idxmin()
close 19910715
open 20200312
high 19910715
low 19910715
change 20080122
pct_chg 19950523
vol 19920117
amount 19920115
dtype: int64
3.3.4 累计统计函数
函数 | 作用 |
---|
cumsum | 计算前n个数的和 |
cummax | 计算前n个数的最大值 |
cummin | 计算前n个数的最小值 |
cumprod | 计算前n个数的积 |
data['pct_chg'].cumsum()
trade_date
20200313 -1.2334
20200312 -2.7504
20200311 -3.6929
20200310 -1.8762
20200309 -4.8823
...
19910719 495.5773
19910718 496.5787
19910717 497.5752
19910716 498.5741
19910715 498.8301
Name: pct_chg, Length: 7002, dtype: float64
data['pct_chg'].sort_index().cumsum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x176fb7d9788>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pq38Ylww-1586741471283)(output_129_1.png)]
3.3.5 自定义运算
- apply(func, axis=0)
- func – 自定义函数
- axis=0 – 默认是列(按行标签方向执行方法),axis=1为对行进行运算(按列标签方向执行方法)
- 定义一个对列求max-min的函数
data.head()
| close | open | high | low | change | pct_chg | vol | amount |
---|
trade_date | | | | | | | | |
---|
20200313 | 2887.4265 | 222 | 2910.8812 | 2799.9841 | -36.0591 | -1.2334 | 366450436.0 | 393019665.2 |
---|
20200312 | 2923.4856 | 100 | 2944.4651 | 2906.2838 | -45.0318 | -1.5170 | 307778457.0 | 328209202.4 |
---|
20200311 | 2968.5174 | 100 | 3010.0286 | 2968.5174 | -28.2444 | -0.9425 | 352470970.0 | 378766619.0 |
---|
20200310 | 2996.7618 | 100 | 3000.2963 | 2904.7989 | 53.4711 | 1.8167 | 393296648.0 | 425017184.8 |
---|
20200309 | 2943.2907 | 100 | 2989.2051 | 2940.7138 | -91.2206 | -3.0061 | 414560736.0 | 438143854.6 |
---|
data.apply(lambda x: x.max() - x.min())
close 5.958917e+03
open 1.220000e+02
high 5.989944e+03
low 5.908843e+03
change 1.004184e+03
pct_chg 1.216628e+02
vol 8.571326e+08
amount 1.309924e+09
dtype: float64
data['vol'].max() - data['vol'].min()
857132557.0