时间序列特征

目录

特征指标

主要来自tsfresh

abs_energy

时序数据的绝对能量(平方和)

公式

E = ∑ i = 1 , … , n x i 2 E = \sum_{i=1,\ldots, n} x_i^2 E=i=1,,nxi2

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.abs_energy(data)
144.29829740211147

numpy

np.sum(data**2)
144.2982974021115

absolute_sum_of_changes

一阶差分绝对和

公式

∑ i = 1 , … , n − 1 ∣ x i + 1 − x i ∣ \sum_{i=1, \ldots, n-1} \mid x_{i+1}- x_i \mid i=1,,n1xi+1xi

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.absolute_sum_of_changes(data)
118.54546095276635

numpy

np.sum(np.abs(np.diff(data)))
118.54546095276635

agg_autocorrelation

各阶自相关系数的聚合统计特征

公式

R ( l ) = 1 ( n − l ) σ 2 ∑ t = 1 n − l ( X t − μ ) ( X t + l − μ ) R(l) = \frac{1}{(n-l)\sigma^2} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu) R(l)=(nl)σ21t=1nl(Xtμ)(Xt+lμ)

f a g g ( R ( 1 ) , … , R ( m ) ) for m = m a x ( n , m a x l a g ) . f_{agg} \left( R(1), \ldots, R(m)\right) \quad \text{for} \quad m = max(n, maxlag). fagg(R(1),,R(m))form=max(n,maxlag).

从代码看感觉是这样的
f a g g ( R ( 1 ) , … , R ( m ) ) for m = m a x ( m a x l a g 1 , m a x l a g 2 , . . . m a x l a g n ) . f_{agg} \left( R(1), \ldots, R(m)\right) \quad \text{for} \quad m = max(maxlag_{1}, maxlag_{2},...maxlag_{n}). fagg(R(1),,R(m))form=max(maxlag1,maxlag2,...maxlagn).

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

?fca.agg_autocorrelation
param = [{
   'f_agg': 'mean', 'maxlag':2}]
fca.agg_autocorrelation(data, param)
[('f_agg_"mean"__maxlag_2', -0.00025668394852565446)]

numpy

n = len(data)
var = np.var(data)
u = np.mean(data)
maxlag = 2
rs = []
for l in range(1, maxlag+1):
    r = np.sum(np.multiply((data[:(n-l)]-u), (data[l:]-u)))/((n-l)*(var))
    print(r)
    rs.append(r)
np.array(rs).mean()
0.1755044291314198
-0.1760177970284712





-0.0002566839485256961

statsmodels

from statsmodels.tsa.stattools import acf
a = acf(data, unbiased=True, fft=n > 1250, nlags=maxlag)[1:]
a
array([ 0.17550443, -0.1760178 ])
np.mean(a)
-0.00025668394852565446

agg_linear_trend

基于分块时序聚合值的线性回归(基于OLS)

公式

1,对时间序列的数据按顺序进行分块,对每一块计算agg结果(如min, mean等)
2,对agg后产生的序列进行线性回归
3,返回回归的指标:“pvalue”,“rvalue”, “intercept”, “slope”, “stderr”

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

param = [{
   'f_agg': 'mean','attr': 'slope', 'chunk_len': 2}]
list(fca.agg_linear_trend(data, param))
[('attr_"slope"__chunk_len_2__f_agg_"mean"', -0.0032399538649093094)]

斜率基本为0,与sin一致

approximate_entropy

近似熵

衡量时序数据的的周期性、不可预测性和波动性

公式

todo
参考链接上有

参考

https://zhuanlan.zhihu.com/p/39105270
https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

Yentes et al. (2012) -
The Appropriate Use of Approximate Entropy and Sample Entropy with Short Data Sets
Richman & Moorman (2000) -
Physiological time-series analysis using approximate entropy and sample entropy

code

tsfresh

?fca.approximate_entropy
fca.approximate_entropy(data, 10, 0.1)
0.011049836186585615

ar_coefficient

自回归系数

公式

X t = φ 0 + ∑ i = 1 k φ i X t − i + ε t X_{t}=\varphi_0 +\sum _{ {i=1}}^{k}\varphi_{i}X_{ {t-i}}+\varepsilon_{t} Xt=φ0+i=1kφiXti+εt

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

“coeff”自回归中第X项系数,“k”为自回归阶数

param = [{
   'coeff': 1, 'k': 10}]
fca.ar_coefficient(data, param)
[('coeff_1__k_10', 0.05455773898727232)]

augmented_dickey_fuller

扩展迪基-福勒检验(ADF检验)
ADF检验统计值(浮点数)

公式

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

测试一个自回归模型是否存在单位根,衡量时序数据的平稳性

参数:
paramm (list) {“attr”: x} 其中x是字符串,包含“teststat”, “pvalue” 和“usedlag”
返回值:ADF检验统计值(浮点数)

param = [{
   'attr': 'teststat'}]
fca.augmented_dickey_fuller(data, param)
[('attr_"teststat"__autolag_"AIC"', -9.347028928948198)]

autocorrelation

lag阶自相关性

相关性:描述两个序列发展趋势的相关程度

公式

1 ( n − l ) σ 2 ∑ t = 1 n − l ( X t − μ ) ( X t + l − μ ) \frac{1}{(n-l)\sigma^{2}} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu) (nl)σ21t=1nl(Xtμ)(Xt+lμ)

自相关系数受极端值影响

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.autocorrelation(data, 2)
-0.1760177970284712

numpy

l=2
n=len(data)
u = data.mean()
var = data.var()
np.sum(np.multiply((data[:n-l]-u), (data[l:]-u)))/((n-l)*var)
-0.1760177970284712

benford_correlation

与benford_distribution的相关系数

tsfresh文档上说,适合用于异常检测

公式

benford_distribution

P ( d ) = log ⁡ 10 ( 1 + 1 d ) P(d)=\log_{10}\left(1+\frac{1}{d}\right) P(d)=log10(1+d1)

d={1, 2, 3, 4, 5, 6, 7, 8, 9}

计算流程
1,x = np.array([int(str(np.format_float_scientific(i))[:1]) for i in np.abs(np.nan_to_num(x))])
2,data_distribution = np.array([(x == n).mean() for n in range(1, 10)])
3,计算data_distribution和benford_distribution的皮尔逊相关系数

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

[1] A Statistical Derivation of the Significant-Digit Law, Theodore P. Hill, Statistical Science, 1995
[2] The significant-digit phenomenon, Theodore P. Hill, The American Mathematical Monthly, 1995
[3] The law of anomalous numbers, Frank Benford, Proceedings of the American philosophical society, 1938
[4] Note on the frequency of use of the different digits in natural numbers, Simon Newcomb, American Journal of mathematics, 1881

code

tsfresh

?fca.benford_correlation
Object `fca.benford_correlation` not found.

numpy

x = np.array([int(str(np.format_float_scientific(i))[:1]) for i in np.abs(np.nan_to_num(data))])
x
array([6, 9, 4, 1, 2, 7, 7, 1, 4, 6, 9, 1, 2, 5, 1, 8, 1, 1, 1, 2, 6, 5,
       3, 1, 3, 2, 1, 8, 1, 3, 1, 3, 2, 2, 6, 8, 7, 1, 1, 1, 1, 5, 1, 9,
       7, 9, 1, 6, 1, 1, 2, 4, 1, 1, 1, 3, 6, 1, 1, 1, 8, 1, 7, 4, 1, 1,
       6, 6, 6, 4, 5, 9, 4, 1, 1, 1, 2, 6, 6, 1, 9, 7, 7, 
  • 2
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值