量化交易系列【1】常用的Pandas数据统计及计算相关函数
1. pd.read_csv读取CSV文件,查看数据
示例中使用的‘000001.XSHE.csv’文件,以上传至csdn资源中,可直接下载
import pandas as pd
df = pd.read_csv('./000001.XSHE.csv')
df.head(5)
| date | open | close | high | low | volume | money |
---|
0 | 2015/1/5 | 9.98 | 10.00 | 10.17 | 9.74 | 458099037 | 4.565388e+09 |
---|
1 | 2015/1/6 | 9.90 | 9.85 | 10.23 | 9.71 | 346952496 | 3.453446e+09 |
---|
2 | 2015/1/7 | 9.72 | 9.67 | 9.88 | 9.55 | 272274401 | 2.634796e+09 |
---|
3 | 2015/1/8 | 9.68 | 9.34 | 9.72 | 9.30 | 225445502 | 2.128003e+09 |
---|
4 | 2015/1/9 | 9.30 | 9.42 | 9.91 | 9.19 | 401736419 | 3.835378e+09 |
---|
df.tail(5)
| date | open | close | high | low | volume | money |
---|
1873 | 2022/9/13 | 12.88 | 12.95 | 13.02 | 12.66 | 172268989 | 2.223707e+09 |
---|
1874 | 2022/9/14 | 12.75 | 12.73 | 12.87 | 12.70 | 85803584 | 1.095055e+09 |
---|
1875 | 2022/9/15 | 12.80 | 13.00 | 13.10 | 12.77 | 184101788 | 2.393239e+09 |
---|
1876 | 2022/9/16 | 12.92 | 12.56 | 12.95 | 12.56 | 135744781 | 1.719253e+09 |
---|
1877 | 2022/9/19 | 12.54 | 12.57 | 12.67 | 12.48 | 63212104 | 7.934913e+08 |
---|
df.shape
(1878, 7)
df.describe()
| open | close | high | low | volume | money |
---|
count | 1878.000000 | 1878.000000 | 1878.000000 | 1878.000000 | 1.878000e+03 | 1.878000e+03 |
---|
mean | 12.466289 | 12.477007 | 12.650703 | 12.291688 | 1.165366e+08 | 1.467976e+09 |
---|
std | 4.054604 | 4.057595 | 4.141896 | 3.962608 | 8.200638e+07 | 1.067033e+09 |
---|
min | 7.260000 | 7.150000 | 7.380000 | 7.030000 | 2.259368e+07 | 1.755915e+08 |
---|
25% | 8.792500 | 8.800000 | 8.930000 | 8.700000 | 6.478725e+07 | 6.991674e+08 |
---|
50% | 11.875000 | 11.865000 | 12.050000 | 11.665000 | 9.268907e+07 | 1.240380e+09 |
---|
75% | 14.717500 | 14.737500 | 14.917500 | 14.540000 | 1.390155e+08 | 1.890645e+09 |
---|
max | 24.470000 | 24.570000 | 24.720000 | 24.090000 | 6.728781e+08 | 8.596942e+09 |
---|
2. 统计函数:mean,max,min等
df['close'].mean()
12.477007454739086
df['close'].max()
24.57
df['close'].min()
7.15
df['close'].std()
4.057594705064531
df['close'].median()
11.865
df['close'].quantile(0.25)
8.8
3. shift函数:行移动指定位数
Pandas的shift函数可以将DataFrame移动指定的位数
df1 =df.head(5)
df1
| date | open | close | high | low | volume | money |
---|
0 | 2015/1/5 | 9.98 | 10.00 | 10.17 | 9.74 | 458099037 | 4.565388e+09 |
---|
1 | 2015/1/6 | 9.90 | 9.85 | 10.23 | 9.71 | 346952496 | 3.453446e+09 |
---|
2 | 2015/1/7 | 9.72 | 9.67 | 9.88 | 9.55 | 272274401 | 2.634796e+09 |
---|
3 | 2015/1/8 | 9.68 | 9.34 | 9.72 | 9.30 | 225445502 | 2.128003e+09 |
---|
4 | 2015/1/9 | 9.30 | 9.42 | 9.91 | 9.19 | 401736419 | 3.835378e+09 |
---|
df1['昨天的收盘价'] = df1['close'].shift(1)
df1[['date','close','昨天的收盘价']]
| date | close | 昨天的收盘价 |
---|
0 | 2015/1/5 | 10.00 | NaN |
---|
1 | 2015/1/6 | 9.85 | 10.00 |
---|
2 | 2015/1/7 | 9.67 | 9.85 |
---|
3 | 2015/1/8 | 9.34 | 9.67 |
---|
4 | 2015/1/9 | 9.42 | 9.34 |
---|
4. diff函数:求两行差
df1['涨跌值'] = df1['close'].diff(1)
df1[['date','close','涨跌值']]
| date | close | 涨跌值 |
---|
0 | 2015/1/5 | 10.00 | NaN |
---|
1 | 2015/1/6 | 9.85 | -0.15 |
---|
2 | 2015/1/7 | 9.67 | -0.18 |
---|
3 | 2015/1/8 | 9.34 | -0.33 |
---|
4 | 2015/1/9 | 9.42 | 0.08 |
---|
5. pct_change函数:求变化比例
求两个数差值的比例, 与diff类似。但是diff求的是差值,pct_change求的是变化比例
df1['涨跌幅_百分比'] = df1['close'].pct_change()
df1[['date','close','涨跌幅_百分比']]
| date | close | 涨跌幅_百分比 |
---|
0 | 2015/1/5 | 10.00 | NaN |
---|
1 | 2015/1/6 | 9.85 | -0.015000 |
---|
2 | 2015/1/7 | 9.67 | -0.018274 |
---|
3 | 2015/1/8 | 9.34 | -0.034126 |
---|
4 | 2015/1/9 | 9.42 | 0.008565 |
---|
6. cum函数:累加
df1['成交量_cum'] = df1['volume'].cumsum()
df1[['date','volume','成交量_cum']]
| date | volume | 成交量_cum |
---|
0 | 2015/1/5 | 458099037 | 458099037 |
---|
1 | 2015/1/6 | 346952496 | 805051533 |
---|
2 | 2015/1/7 | 272274401 | 1077325934 |
---|
3 | 2015/1/8 | 225445502 | 1302771436 |
---|
4 | 2015/1/9 | 401736419 | 1704507855 |
---|
7. cumprod函数:累乘
df1['涨跌幅变化曲线'] =(df1['涨跌幅_百分比'] + 1).cumprod()
df1[['date','close','涨跌幅_百分比','涨跌幅变化曲线']]
| date | close | 涨跌幅_百分比 | 涨跌幅变化曲线 |
---|
0 | 2015/1/5 | 10.00 | NaN | NaN |
---|
1 | 2015/1/6 | 9.85 | -0.015000 | 0.985 |
---|
2 | 2015/1/7 | 9.67 | -0.018274 | 0.967 |
---|
3 | 2015/1/8 | 9.34 | -0.034126 | 0.934 |
---|
4 | 2015/1/9 | 9.42 | 0.008565 | 0.942 |
---|
8. rank函数:用于计算排名
df1['收盘价_排名'] = df1['close'].rank(ascending=True,pct=False)
df1[['date','close','收盘价_排名']]
| date | close | 收盘价_排名 |
---|
0 | 2015/1/5 | 10.00 | 5.0 |
---|
1 | 2015/1/6 | 9.85 | 4.0 |
---|
2 | 2015/1/7 | 9.67 | 3.0 |
---|
3 | 2015/1/8 | 9.34 | 1.0 |
---|
4 | 2015/1/9 | 9.42 | 2.0 |
---|
9. value_counts函数:统计出现次数
统计该列中每个元素出现的次数,返回Series
df['股票代码'].value_counts()