Pandas Statistical Functions

import pandas as pd
import random
import numpy as np
n_rows=5
n_cols=2
df = pd.DataFrame(np.random.randn(n_rows, n_cols),
   index = pd.date_range('1/1/2000', periods=n_rows),
   columns = ['A','B'])
df=df.apply(lambda x:[int(xx*10) for xx in x],axis=0)
df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183

pct_change

## pct_change() to compute the percent change over a given number of periods 
df.pct_change(periods=1)  # b{t}=(a{t}-a{t-1})/a{t-1}
AB
2000-01-01NaNNaN
2000-01-02-1.277778-2.333333
2000-01-03-1.400000-3.000000
2000-01-04-1.000000-0.875000
2000-01-05-inf2.000000
df.pct_change(periods=2)  # b{t}=(a{t}-a{t-2})/a{t-2}
AB
2000-01-01NaNNaN
2000-01-02NaNNaN
2000-01-03-0.8888891.666667
2000-01-04-1.000000-1.250000
2000-01-058.000000-0.625000

Covariance

df.cov()
AB
A114.80-17.85
B-17.8518.70
df.A.cov(df.B)
-17.849999999999998

Correlation

df.corr()
AB
A1.000000-0.385253
B-0.3852531.000000

Data ranking

df.rank()
AB
2000-01-011.53.5
2000-01-025.01.0
2000-01-033.05.0
2000-01-044.02.0
2000-01-051.53.5
df.rank(axis=1)
AB
2000-01-011.02.0
2000-01-022.01.0
2000-01-031.02.0
2000-01-041.02.0
2000-01-051.02.0
method parameter:
average : average rank of tied group
min : lowest rank in the group
max : highest rank in the group
first : ranks assigned in the order they appear in the array

Window Functions

cumsum
df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183
df.cumsum()
AB
2000-01-01-183
2000-01-02-13-1
2000-01-03-157
2000-01-04-158
2000-01-05-3311
rolling
df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183
r=df.rolling(window=2)
r.mean()
AB
2000-01-01NaNNaN
2000-01-02-6.5-0.5
2000-01-031.52.0
2000-01-04-1.04.5
2000-01-05-9.02.0
r.count()
AB
2000-01-011.01.0
2000-01-022.02.0
2000-01-032.02.0
2000-01-042.02.0
2000-01-052.02.0
r.max()
AB
2000-01-01NaNNaN
2000-01-025.03.0
2000-01-035.08.0
2000-01-040.08.0
2000-01-050.03.0
MethodDescription
count()Number of non-null observations
sum()Sum of values
mean()Mean of values
median()Arithmetic median of values
min()Minimum
max()Maximum
std()Bessel-corrected sample standard deviation
var()Unbiased variance
skew()Sample skewness (3rd moment)
kurt()Sample kurtosis (4th moment)
quantile()Sample quantile (value at %)
apply()Generic apply
cov()Unbiased covariance (binary)
corr()Correlation (binary)

win_type can specify distribution function.
parameter 'on' to specify a column (rather than the default of the index) in a DataFrame.

df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183
df.rolling(window='3d',min_periods=3).sum()   ## 最近三天
AB
2000-01-01NaNNaN
2000-01-02NaNNaN
2000-01-03-15.07.0
2000-01-043.05.0
2000-01-05-20.012.0
expanding
df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183
df.expanding().mean()  ## statistic with all data up to a point in time
AB
2000-01-01-18.003.000000
2000-01-02-6.50-0.500000
2000-01-03-5.002.333333
2000-01-04-3.752.000000
2000-01-05-6.602.200000
Exponentially Weighted Windows(ewm)
df
AB
2000-01-01-183
2000-01-025-4
2000-01-03-28
2000-01-0401
2000-01-05-183
df.ewm(alpha=0.9).mean()
AB
2000-01-01-18.0000003.000000
2000-01-022.909091-3.363636
2000-01-03-1.5135146.873874
2000-01-04-0.1512151.586859
2000-01-05-16.2152822.858699

转载于:https://www.cnblogs.com/sandy-t/p/10511712.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值