Pandas4_DataFream运算

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
date1 = pd.date_range('2020-7-1', '2020-7-20')
col = ['open', 'high', 'close', 'low', 'volume',
       'price_change', 'p_change', 'turnover']
data = pd.DataFrame(np.random.randn(20, 8), index=date1, columns=col)
data
openhighcloselowvolumeprice_changep_changeturnover
2020-07-01-0.3225130.267763-1.0931170.0270410.5095840.984490-0.1908610.517801
2020-07-022.089734-0.983815-1.046229-0.036239-0.2567080.356689-0.252771-0.497880
2020-07-03-2.1339040.6798091.807282-0.566753-1.8370100.040434-1.0592970.555953
2020-07-04-0.3862850.9568942.315036-1.346728-0.896410-0.0092060.474799-0.496008
2020-07-050.751076-1.1493390.197884-0.831805-0.496330-0.952271-1.5321020.425145
2020-07-061.756448-0.564414-0.750158-1.2228680.757163-0.6064730.166400-0.793353
2020-07-07-1.4702810.624951-1.186256-0.688941-0.746321-1.896055-1.6418591.192342
2020-07-080.128879-0.416039-0.657877-1.3137040.945771-2.1319310.356988-2.770012
2020-07-09-1.3801730.6508890.0330931.1084842.143283-1.0009141.1178650.440897
2020-07-100.273836-0.454474-0.2877270.992828-0.411735-1.074596-1.815277-0.248398
2020-07-110.841820-0.1194360.446986-0.2906311.690281-0.7066911.6289190.941540
2020-07-120.2811781.508803-2.3023840.501423-0.0715250.0606840.8424501.369522
2020-07-13-0.155110-0.459593-1.8967800.860048-0.023542-0.524829-0.6727710.486148
2020-07-14-0.6873830.6115640.170420-2.491876-0.971182-0.705079-1.2475541.782681
2020-07-150.6848040.7650281.2643211.250718-0.8339110.3329431.901829-0.623410
2020-07-16-0.758880-1.139305-0.064042-0.591721-1.318660-1.2636231.448318-0.238409
2020-07-17-0.6286910.481645-0.2213200.7191160.459349-0.8428110.739307-1.204500
2020-07-18-1.4242060.5366440.5136190.211149-0.2324680.2305220.873776-2.385903
2020-07-190.3131152.3121910.2114850.3207640.772857-0.019631-2.2762350.401747
2020-07-200.3725030.2365780.916577-1.189288-0.210505-0.3709261.478682-0.399276

算数运算

  • add(other)
  • sub(other)
data['open'].add(10).head()
2020-07-01     9.677487
2020-07-02    12.089734
2020-07-03     7.866096
2020-07-04     9.613715
2020-07-05    10.751076
Freq: D, Name: open, dtype: float64
data['open'].sub(10).head()
2020-07-01   -10.322513
2020-07-02    -7.910266
2020-07-03   -12.133904
2020-07-04   -10.386285
2020-07-05    -9.248924
Freq: D, Name: open, dtype: float64

逻辑运算

逻辑运算符号> < | &

# 例如筛选data["open"] > 0的日期数据
data["open"] > 0
2020-07-01    False
2020-07-02     True
2020-07-03    False
2020-07-04    False
2020-07-05     True
2020-07-06     True
2020-07-07    False
2020-07-08     True
2020-07-09    False
2020-07-10     True
2020-07-11     True
2020-07-12     True
2020-07-13    False
2020-07-14    False
2020-07-15     True
2020-07-16    False
2020-07-17    False
2020-07-18    False
2020-07-19     True
2020-07-20     True
Freq: D, Name: open, dtype: bool
# 逻辑判断的结果可以作为筛选的依据
data[data["open"] > 0].head()
openhighcloselowvolumeprice_changep_changeturnover
2020-07-022.089734-0.983815-1.046229-0.036239-0.2567080.356689-0.252771-0.497880
2020-07-050.751076-1.1493390.197884-0.831805-0.496330-0.952271-1.5321020.425145
2020-07-061.756448-0.564414-0.750158-1.2228680.757163-0.6064730.166400-0.793353
2020-07-080.128879-0.416039-0.657877-1.3137040.945771-2.1319310.356988-2.770012
2020-07-100.273836-0.454474-0.2877270.992828-0.411735-1.074596-1.815277-0.248398
# & 注意优先级 需要加()
data[(data["high"]>0) & (data["high"]<1)].head()
openhighcloselowvolumeprice_changep_changeturnover
2020-07-0110.267763-1.0931170.0270410.5095840.984490-0.1908610.517801
2020-07-0310.6798091.807282-0.566753-1.8370100.040434-1.0592970.555953
2020-07-0410.9568942.315036-1.346728-0.896410-0.0092060.474799-0.496008
2020-07-0710.624951-1.186256-0.688941-0.746321-1.896055-1.6418591.192342
2020-07-0910.6508890.0330931.1084842.143283-1.0009141.1178650.440897

逻辑运算函数

  • query(expr)
  • expr:查询字符串
# 通过query使得刚才的过程更加方便简单
data.query('high>0 & high<1').head()
openhighcloselowvolumeprice_changep_changeturnover
2020-07-0110.267763-1.0931170.0270410.5095840.984490-0.1908610.517801
2020-07-0310.6798091.807282-0.566753-1.8370100.040434-1.0592970.555953
2020-07-0410.9568942.315036-1.346728-0.896410-0.0092060.474799-0.496008
2020-07-0710.624951-1.186256-0.688941-0.746321-1.896055-1.6418591.192342
2020-07-0910.6508890.0330931.1084842.143283-1.0009141.1178650.440897
  • isin(列表名或[值])
# 可以判断指定值是否在所找的数据中,data['open']指定要查找的数据
data.open = 1
data[data['open'].isin([1,0,5])].head()
openhighcloselowvolumeprice_changep_changeturnover
2020-07-0110.267763-1.0931170.0270410.5095840.984490-0.1908610.517801
2020-07-021-0.983815-1.046229-0.036239-0.2567080.356689-0.252771-0.497880
2020-07-0310.6798091.807282-0.566753-1.8370100.040434-1.0592970.555953
2020-07-0410.9568942.315036-1.346728-0.896410-0.0092060.474799-0.496008
2020-07-051-1.1493390.197884-0.831805-0.496330-0.952271-1.5321020.425145

统计运算

describe

  • 综合分析: 能够直接得出很多统计结果,count, mean, std, min, max 等
data.describe()
openhighcloselowvolumeprice_changep_changeturnover
count20.020.00000020.00000020.00000020.00000020.00000020.00000020.000000
mean1.00.217317-0.081459-0.228949-0.051401-0.5049640.017030-0.077169
std0.00.8881581.1524810.9991900.9902700.7764461.2762931.151633
min1.0-1.149339-2.302384-2.491876-1.837010-2.131931-2.276235-2.770012
25%1.0-0.455754-0.824176-0.921176-0.768218-0.964431-1.106361-0.529262
50%1.00.374704-0.015474-0.163435-0.221487-0.5656510.2616940.081669
75%1.00.6581190.4636450.5558470.5714790.0454970.9347980.527339
max1.02.3121912.3150361.2507182.1432830.9844901.9018291.782681
# 25% 50% 75% 四分位数

统计函数

  • count sum mean median(中位数) min max mode(众数) prod(积)

  • abs std(标准差) var(方差) idxmax(最大值位置) idxmin(最小值位置)

  • 对于单个函数去进行统计的时候,坐标轴还是按照默认列“columns” (axis=0, default),

  • 如果要对行“index” 需要指定(axis=1)

  • max() min()

data.max()
data.min()
open            1.000000
high           -1.149339
close          -2.302384
low            -2.491876
volume         -1.837010
price_change   -2.131931
p_change       -2.276235
turnover       -2.770012
dtype: float64
  • std var
data.std()
open            0.000000
high            0.888158
close           1.152481
low             0.999190
volume          0.990270
price_change    0.776446
p_change        1.276293
turnover        1.151633
dtype: float64
  • idxmax()、idxmin()
data.idxmax()
open           2020-07-01
high           2020-07-19
close          2020-07-04
low            2020-07-15
volume         2020-07-09
price_change   2020-07-01
p_change       2020-07-15
turnover       2020-07-14
dtype: datetime64[ns]
data.idxmin()
open           2020-07-01
high           2020-07-05
close          2020-07-12
low            2020-07-14
volume         2020-07-03
price_change   2020-07-08
p_change       2020-07-19
turnover       2020-07-08
dtype: datetime64[ns]

累计统计函数

  • cumsum 计算前1/2/3/…/n个数的和
  • cummax 计算前1/2/3/…/n个数的最大值
  • cummin 计算前1/2/3/…/n个数的最小值
  • cumprod 计算前1/2/3/…/n个数的积
data = data.sort_index()
data2 = data['low']
data2.cumsum().plot()
plt.show()

在这里插入图片描述

自定义运算

  • apply(func, axis=0)
  • func:自定义函数
  • axis=0:默认是列,axis=1为行进行运算
# 匿名函数,最大值-最小值的函数
data[['close']].apply(lambda x: x.max() - x.min(), axis=0)
close    4.61742
dtype: float64
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

GJ_WL

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值