时间序列特征

最新推荐文章于 2024-04-10 10:29:45 发布

双层

最新推荐文章于 2024-04-10 10:29:45 发布

阅读量2.9k

点赞数 2

文章标签：机器学习算法数据挖掘

本文链接：https://blog.csdn.net/y1163307648/article/details/109338126

版权

特征指标
- abs_energy
- - 公式
  - 参考
  - code
- absolute_sum_of_changes
- - 公式
  - 参考
  - code
- agg_autocorrelation
- - 公式
  - 参考
  - code
- agg_linear_trend
- - 公式
  - 参考
  - code
- approximate_entropy
- - 公式
  - 参考
  - code
- ar_coefficient
- - 公式
  - 参考
  - code
- augmented_dickey_fuller
- - 公式
  - 参考
  - code
- autocorrelation
- - 公式
  - 参考
  - code
- benford_correlation
- - 公式
  - 参考
  - code
- binned_entropy
- - 公式
  - 参考
  - code
- c3
- - 公式
  - 参考
  - code
- change_quantiles
- - 公式
  - 参考
  - code
- cid_ce
- - 公式
  - 参考
  - code
- count_above
- - 公式
  - 参考
  - code
- count_above_mean
- - 公式
  - 参考
  - code
- count_below
- - 公式
  - 参考
  - code
- count_below_mean
- - 公式
  - 参考
  - code
- cwt_coefficients
- - 公式
  - 参考
  - code
- energy_ratio_by_chunks
- - 公式
  - 参考
  - code
- fft_aggregated
- - 公式
  - 参考
  - code
- fft_coefficient
- - 公式
  - 参考
  - code
- first_location_of_maximum
- - 公式
  - 参考
  - code
- first_location_of_minimum
- - 公式
  - 参考
  - code
- fourier_entropy
- - 公式
  - 参考
  - code
- friedrich_coefficients
- - 公式
  - 参考
  - code
- has_duplicate
- - 公式
  - 参考
  - code
- has_duplicate_max
- - 公式
  - 参考
  - code
- has_duplicate_min
- - 公式
  - 参考
  - code
- index_mass_quantile
- - 公式
  - 参考
  - code
- kurtosis
- - 公式
  - 参考
  - code
- large_standard_deviation
- - 公式
  - 参考
  - code
- last_location_of_maximum
- - 公式
  - 参考
  - code
- last_location_of_minimum
- - 公式
  - 参考
  - code
- lempel_ziv_complexity
- - 公式
  - 参考
  - code
- length
- - 公式
  - 参考
  - code
- linear_trend
- - 公式
  - 参考
  - code
- linear_trend_timewise
- - 公式
  - 参考
  - code
- longest_strike_above_mean
- - 公式
  - 参考
  - code
- longest_strike_below_mean
- - 公式
  - 参考
  - code
- max_langevin_fixed_point
- - 公式
  - 参考
  - code
- maximum
- - 公式
  - 参考
  - code
- mean
- - 公式
  - 参考
  - code
- mean_abs_change
- - 公式
  - 参考
  - code
- mean_change
- - 公式
  - 参考
  - code
- mean_second_derivative_central
- - 公式
  - 参考
  - code
- median
- - 公式
  - 参考
  - code
- minimum
- - 公式
  - 参考
  - code
- number_crossing_m
- - 公式
  - 参考
  - code
- number_cwt_peaks
- - 公式
  - 参考
  - code
- number_peaks
- - 公式
  - 参考
  - code
- partial_autocorrelation
- - 公式
  - 参考
  - code
- percentage_of_reoccurring_datapoints_to_all_datapoints
- - 公式
  - 参考
  - code
- percentage_of_reoccurring_values_to_all_values
- - 公式
  - 参考
  - code
- permutation_entropy
- - 公式
  - 参考
  - code
- quantile
- - 公式
  - 参考
  - code
- range_count
- - 公式
  - 参考
  - code
- ratio_beyond_r_sigma
- - 公式
  - 参考
  - code
- ratio_value_number_to_time_series_length
- - 公式
  - 参考
  - code
- sample_entropy
- - 公式
  - 参考
  - code
- skewness
- - 公式
  - 参考
  - code
- spkt_welch_density
- - 公式
  - 参考
  - code
- standard_deviation
- - 公式
  - 参考
  - code
- sum_of_reoccurring_data_points
- - 公式
  - 参考
  - code
- sum_of_reoccurring_values
- - 公式
  - 参考
  - code
- sum_values
- - 公式
  - 参考
  - code
- symmetry_looking
- - 公式
  - 参考
  - code
- time_reversal_asymmetry_statistic
- - 公式
  - 参考
  - code
- value_count
- - 公式
  - 参考
  - code
- variance
- - 公式
  - 参考
  - code
- variance_larger_than_standard_deviation
- - 公式
  - 参考
  - code
- variation_coefficient
- - 公式
  - 参考
  - code
时间序列处理方式

特征指标

主要来自tsfresh

abs_energy

时序数据的绝对能量（平方和）

公式

$\sum_{i=1,\ldots, n} x_i^2$

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.abs_energy(data)

144.29829740211147

numpy

np.sum(data**2)

144.2982974021115

absolute_sum_of_changes

一阶差分绝对和

公式

$\sum_{i=1, \ldots, n-1} \mid x_{i+1}- x_i \mid$

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.absolute_sum_of_changes(data)

118.54546095276635

numpy

np.sum(np.abs(np.diff(data)))

118.54546095276635

agg_autocorrelation

各阶自相关系数的聚合统计特征

公式

$\frac{1}{(n-l)\sigma^2} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu)$

$f_{agg} \left( R(1), \ldots, R(m)\right) \quad \text{for} \quad m = max(n, maxlag).$

从代码看感觉是这样的
$f_{agg} \left( R(1), \ldots, R(m)\right) \quad \text{for} \quad m = max(maxlag_{1}, maxlag_{2},...maxlag_{n}).$

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

?fca.agg_autocorrelation

param = [{
   'f_agg': 'mean', 'maxlag':2}]

fca.agg_autocorrelation(data, param)

[('f_agg_"mean"__maxlag_2', -0.00025668394852565446)]

numpy

n = len(data)
var = np.var(data)
u = np.mean(data)
maxlag = 2

rs = []
for l in range(1, maxlag+1):
    r = np.sum(np.multiply((data[:(n-l)]-u), (data[l:]-u)))/((n-l)*(var))
    print(r)
    rs.append(r)
np.array(rs).mean()

0.1755044291314198
-0.1760177970284712





-0.0002566839485256961

statsmodels

from statsmodels.tsa.stattools import acf

a = acf(data, unbiased=True, fft=n > 1250, nlags=maxlag)[1:]
a

array([ 0.17550443, -0.1760178 ])

np.mean(a)

-0.00025668394852565446

agg_linear_trend

基于分块时序聚合值的线性回归（基于OLS）

公式

1，对时间序列的数据按顺序进行分块，对每一块计算agg结果（如min, mean等）
2，对agg后产生的序列进行线性回归
3，返回回归的指标：“pvalue”,“rvalue”, “intercept”, “slope”, “stderr”

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

param = [{
   'f_agg': 'mean','attr': 'slope', 'chunk_len': 2}]

list(fca.agg_linear_trend(data, param))

[('attr_"slope"__chunk_len_2__f_agg_"mean"', -0.0032399538649093094)]

斜率基本为0，与sin一致

approximate_entropy

近似熵

衡量时序数据的的周期性、不可预测性和波动性

公式

todo
参考链接上有

参考

https://zhuanlan.zhihu.com/p/39105270
https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

Yentes et al. (2012) -
The Appropriate Use of Approximate Entropy and Sample Entropy with Short Data Sets
Richman & Moorman (2000) -
Physiological time-series analysis using approximate entropy and sample entropy

code

tsfresh

?fca.approximate_entropy

fca.approximate_entropy(data, 10, 0.1)

0.011049836186585615

ar_coefficient

自回归系数

公式

$X_{t}=\varphi_0 +\sum _{ {i=1}}^{k}\varphi_{i}X_{ {t-i}}+\varepsilon_{t}$

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

“coeff”自回归中第X项系数，“k”为自回归阶数

param = [{
   'coeff': 1, 'k': 10}]
fca.ar_coefficient(data, param)

[('coeff_1__k_10', 0.05455773898727232)]

augmented_dickey_fuller

扩展迪基-福勒检验（ADF检验）
ADF检验统计值（浮点数）

公式

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

测试一个自回归模型是否存在单位根，衡量时序数据的平稳性

参数：
paramm (list) {“attr”: x} 其中x是字符串，包含“teststat”, “pvalue” 和“usedlag”
返回值：ADF检验统计值（浮点数）

param = [{
   'attr': 'teststat'}]
fca.augmented_dickey_fuller(data, param)

[('attr_"teststat"__autolag_"AIC"', -9.347028928948198)]

autocorrelation

lag阶自相关性

相关性：描述两个序列发展趋势的相关程度

公式

$\frac{1}{(n-l)\sigma^{2}} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu)$

自相关系数受极端值影响

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

code

tsfresh

fca.autocorrelation(data, 2)

-0.1760177970284712

numpy

l=2
n=len(data)
u = data.mean()
var = data.var()

np.sum(np.multiply((data[:n-l]-u), (data[l:]-u)))/((n-l)*var)

-0.1760177970284712

benford_correlation

与benford_distribution的相关系数

tsfresh文档上说，适合用于异常检测

公式

benford_distribution

$P(d)=\log_{10}\left(1+\frac{1}{d}\right)$

d={1, 2, 3, 4, 5, 6, 7, 8, 9}

计算流程
1，x = np.array([int(str(np.format_float_scientific(i))[:1]) for i in np.abs(np.nan_to_num(x))])
2，data_distribution = np.array([(x == n).mean() for n in range(1, 10)])
3，计算data_distribution和benford_distribution的皮尔逊相关系数

参考

https://www.jianshu.com/p/de2f7d333b9f
https://tsfresh.readthedocs.io/en/latest/index.html

[1] A Statistical Derivation of the Significant-Digit Law, Theodore P. Hill, Statistical Science, 1995
[2] The significant-digit phenomenon, Theodore P. Hill, The American Mathematical Monthly, 1995
[3] The law of anomalous numbers, Frank Benford, Proceedings of the American philosophical society, 1938
[4] Note on the frequency of use of the different digits in natural numbers, Simon Newcomb, American Journal of mathematics, 1881

code

tsfresh

?fca.benford_correlation

Object `fca.benford_correlation` not found.

numpy

x = np.array([int(str(np.format_float_scientific(i))[:1]) for i in np.abs(np.nan_to_num(data))])
x

array([6, 9, 4, 1, 2, 7, 7, 1, 4, 6, 9, 1, 2, 5, 1, 8, 1, 1, 1, 2, 6, 5,
       3, 1, 3, 2, 1, 8, 1, 3, 1, 3, 2, 2, 6, 8, 7, 1, 1, 1, 1, 5, 1, 9,
       7, 9, 1, 6, 1, 1, 2, 4, 1, 1, 1, 3, 6, 1, 1, 1, 8, 1, 7, 4, 1, 1,
       6, 6, 6, 4, 5, 9, 4, 1, 1, 1, 2, 6, 6, 1, 9, 7, 7,

最低0.47元/天解锁文章

双层

关注

2
点赞
踩
26

收藏

觉得还不错? 一键收藏
0
评论
时间序列特征

特征指标主要来自tsfreshabs_energy时序数据的绝对能量（平方和）公式E=∑i=1,…,nxi2 E = \sum_{i=1,\ldots, n} x_i^2 E=i=1,…,n∑xi2参考https://www.jianshu.com/p/de2f7d333b9fhttps://tsfresh.readthedocs.io/en/latest/index.htmlcodetsfreshfca.abs_energy(data)144.2982974021114
复制链接

扫一扫