数据分析第三篇——Pandas之DataFrame的运算(算数运算、逻辑运算、统计运算与自定义运算)

3.3 DataFrame运算

  • 3.3.1算数运算
    1. 算数运算符
    2. 算数运算函数
  • 3.3.2 逻辑运算
    1. 逻辑运算符
      • 布尔索引
    2. 逻辑运算函数
  • 3.3.3 统计运算
  • 3.3.4 自定义运算

3.3.1 算数运算

  1. 使用算术运算符
  2. 使用算数运算函数
    • add(other)——加
    • sub(other)——减
1. 使用算数运算符
data['open'].head()
trade_date
20200313    222
20200312    100
20200311    100
20200310    100
20200309    100
Name: open, dtype: int64
(data['open'] + 3).head()
trade_date
20200313    225
20200312    103
20200311    103
20200310    103
20200309    103
Name: open, dtype: int64
(data + 10).head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132897.42652322920.88122809.9841-26.05918.7666366450446.0393019675.2
202003122933.48561102954.46512916.2838-35.03188.4830307778467.0328209212.4
202003112978.51741103020.02862978.5174-18.24449.0575352470980.0378766629.0
202003103006.76181103010.29632914.798963.471111.8167393296658.0425017194.8
202003092953.29071102999.20512950.7138-81.22066.9939414560746.0438143864.6
(data['close'] - data['open']).head()
trade_date
20200313    2665.4265
20200312    2823.4856
20200311    2868.5174
20200310    2896.7618
20200309    2843.2907
dtype: float64
2. 使用算数运算函数
data['open'].add(5).head()
trade_date
20200313    227
20200312    105
20200311    105
20200310    105
20200309    105
Name: open, dtype: int64
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
data.sub(100).head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132787.42651222810.88122699.9841-136.0591-101.2334366450336.0393019565.2
202003122823.485602844.46512806.2838-145.0318-101.5170307778357.0328209102.4
202003112868.517402910.02862868.5174-128.2444-100.9425352470870.0378766519.0
202003102896.761802900.29632804.7989-46.5289-98.1833393296548.0425017084.8
202003092843.290702889.20512840.7138-191.2206-103.0061414560636.0438143754.6
data['close'].sub(data['open']).head()
trade_date
20200313    2665.4265
20200312    2823.4856
20200311    2868.5174
20200310    2896.7618
20200309    2843.2907
dtype: float64

3.3.2 逻辑运算

1. 逻辑运算符 <、>、|、&
  • 例如筛选pct_chg > 1 的日期数据
    • data[‘pct_chg’] > 1 返回布尔值组成的Series
data['pct_chg'] > 1
trade_date
20200313    False
20200312    False
20200311    False
20200310     True
20200309    False
            ...  
19910719    False
19910718     True
19910717    False
19910716    False
19910715    False
Name: pct_chg, Length: 7002, dtype: bool
布尔索引
# 单逻辑判断的布尔索引
data[data['pct_chg'] > 2].head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003022970.93121002982.50682899.310090.62743.1465367333369.0397244201.2
202002172983.62241002983.63712924.991366.61472.2837313198007.0367014340.1
201908192883.09601002883.09602829.854259.27222.0990214546668.0247092216.3
201907013044.90281003045.36693014.687166.02442.2164250840433.0266541056.9
201906202987.11861002997.38882915.089569.31572.3756291011537.0288296546.4
# 符合逻辑判断的布尔索引
data[(data['pct_chg'] > 2) & (data['low'] > 2000)].head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003022970.93121002982.50682899.310090.62743.1465367333369.0397244201.2
202002172983.62241002983.63712924.991366.61472.2837313198007.0367014340.1
201908192883.09601002883.09602829.854259.27222.0990214546668.0247092216.3
201907013044.90281003045.36693014.687166.02442.2164250840433.0266541056.9
201906202987.11861002997.38882915.089569.31572.3756291011537.0288296546.4
(data['pct_chg'] > 2) & (data['low'] > 2000)
trade_date
20200313    False
20200312    False
20200311    False
20200310    False
20200309    False
            ...  
19910719    False
19910718    False
19910717    False
19910716    False
19910715    False
Length: 7002, dtype: bool
2. 逻辑运算函数
  • query(expression)
    • expression:查询的字符串
  • isin(values)
    • 判断字段对应的值是否在某个范围内
# 通过使用query来简化布尔索引
data.query('pct_chg > 2 & low > 2000').head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003022970.93121002982.50682899.310090.62743.1465367333369.0397244201.2
202002172983.62241002983.63712924.991366.61472.2837313198007.0367014340.1
201908192883.09601002883.09602829.854259.27222.0990214546668.0247092216.3
201907013044.90281003045.36693014.687166.02442.2164250840433.0266541056.9
201906202987.11861002997.38882915.089569.31572.3756291011537.0288296546.4
用isin()判断open列是否为100或200
data['open'].isin([100, 200]).head()
trade_date
20200313    False
20200312     True
20200311     True
20200310     True
20200309     True
Name: open, dtype: bool
data[data['open'].isin([100, 200])].head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
202003063034.51131003052.44393029.4632-37.1658-1.2100362061533.0377388542.7

3.3.3 统计运算

  1. describe()
  2. 统计函数
    • count – 非空值的数目
    • sum – 求和
    • mean – 平均值
    • median – 中位数
    • min – 最小值
    • max – 最大值
    • mode – Mode
    • abs – 绝对值
    • prod – Product of values
    • std – 标准差
    • var – 方差
    • idxmax – 最大值所在的位置
    • idxmin – 最小值所在的位置

**对单个函数进行统计的时候,坐标轴还是按照这些默认为columns(axis=0, default),如果要对index进行统计,则要指明(axis=1)**这里的axis取值与axis=0对应index,axis=1对应columns不同
- 使用0值表示沿着每一列或行标签\索引值向下执行方法
- 使用1值表示沿着每一行或者列标签模向执行对应的方法

1. describe()——统计count、mean、std、min、max、百分位数
data.describe()
closeopenhighlowchangepct_chgvolamount
count7002.0000007002.0000007002.0000007002.0000007002.0000007002.0000007.002000e+037.002000e+03
mean1995.338756100.0174242013.0672551973.5108020.3934860.0712417.701220e+078.319068e+07
std1048.4148181.4579711057.2913761035.68657841.4428812.4843221.068379e+081.309678e+08
min133.140000100.000000134.100000131.870000-354.684000-16.3937002.500000e+023.149740e+02
25%1191.735250100.0000001203.1262501179.751250-12.965025-0.7564505.149394e+064.277775e+06
50%1898.920000100.0000001914.4070001883.5375000.9340000.0695002.656829e+071.763237e+07
75%2838.853250100.0000002865.6065002807.60800014.4390000.8636501.204594e+081.274506e+08
max6092.057000222.0000006124.0440006040.713000649.500000105.2691008.571328e+081.309925e+09
data.max()
close      6.092057e+03
open       2.220000e+02
high       6.124044e+03
low        6.040713e+03
change     6.495000e+02
pct_chg    1.052691e+02
vol        8.571328e+08
amount     1.309925e+09
dtype: float64
data.max(axis=1)
trade_date
20200313    393019665.2
20200312    328209202.4
20200311    378766619.0
20200310    425017184.8
20200309    438143854.6
               ...     
19910719        10823.0
19910718          847.0
19910717          660.0
19910716         2796.0
19910715        11938.0
Length: 7002, dtype: float64
data.idxmax()
close      20071016
open       20200313
high       20071016
low        20071016
change     19920521
pct_chg    19920521
vol        20150420
amount     20150608
dtype: int64
data.idxmin()
close      19910715
open       20200312
high       19910715
low        19910715
change     20080122
pct_chg    19950523
vol        19920117
amount     19920115
dtype: int64

3.3.4 累计统计函数

函数作用
cumsum计算前n个数的和
cummax计算前n个数的最大值
cummin计算前n个数的最小值
cumprod计算前n个数的积
data['pct_chg'].cumsum()
trade_date
20200313     -1.2334
20200312     -2.7504
20200311     -3.6929
20200310     -1.8762
20200309     -4.8823
              ...   
19910719    495.5773
19910718    496.5787
19910717    497.5752
19910716    498.5741
19910715    498.8301
Name: pct_chg, Length: 7002, dtype: float64
data['pct_chg'].sort_index().cumsum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x176fb7d9788>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pq38Ylww-1586741471283)(output_129_1.png)]

3.3.5 自定义运算

  • apply(func, axis=0)
    • func – 自定义函数
    • axis=0 – 默认是列(按行标签方向执行方法),axis=1为对行进行运算(按列标签方向执行方法)
  • 定义一个对列求max-min的函数
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
data.apply(lambda x: x.max() - x.min()) # lambda为匿名函数,x为自变量,冒号后面为函数表达式
close      5.958917e+03
open       1.220000e+02
high       5.989944e+03
low        5.908843e+03
change     1.004184e+03
pct_chg    1.216628e+02
vol        8.571326e+08
amount     1.309924e+09
dtype: float64
data['vol'].max() - data['vol'].min()
857132557.0
  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值