python tsfresh特征中文详解(更新中)

tsfresh是开源的提取时序数据特征的python包,能够提取出超过64种特征,堪称提取时序特征的瑞士军刀。最近有需求,所以一直在看,目前还没有中文文档, 有些特征含义还是很难懂的,我把我已经看懂的一部分放这,没看懂的我只写了标题,待我看懂我添加注解。 => 感谢这位作者的帖子,在这位作者基础上,增加了一些内容

原贴:https://blog.csdn.net/xindoo/article/details/79177378

https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html

tsfresh.feature_extraction.feature_calculators.abs_energy(x)
时间序列的平方和

​    

参数:x(pandas.Series) 需要计算特征的时间序列
返回值:特征值
返回值类型:float
函数类型:简单

tsfresh.feature_extraction.feature_calculators.absolute_sum_of_changes(x)
返回序列x的连续变化的绝对值之和

参数:x(pandas.Series) 需要计算特征的时间序列
返回值:特征值
返回值类型:float
函数类型:简单

tsfresh.feature_extraction.feature_calculators.agg_autocorrelation(x, param)
计算聚合函数f_agg(例如方差或者均值)处理后的自相关性,在一定程度可以衡量数据的周期性质,l表示滞后值,如果某个l计算出的值比较大,表示改时序数据具有l周期性质。

n是时间序列_{X_{i}}的长度,\sigma ^{2} 是方差,μ表示均值
参数:x(pandas.Series) 需要计算特征的时间序列
返回值:特征值
返回值类型:float
函数类型:简单

tsfresh.feature_extraction.feature_calculators.agg_linear_trend(x, param)
对时序分块聚合后(max, min, mean, meidan),然后聚合后的值做线性回归,算出 pvalue(),rvalue(相关系数), intercept(截距), slope(斜率), stderr(拟合的标准差)
Parameters:    x (pandas.Series) – the time series to calculate the feature of
param (list) – contains dictionaries {“attr”: x, “chunk_len”: l, “f_agg”: f} with x, f an string and l an int
Returns:    the different feature values
Return type:    pandas.Series

tsfresh.feature_extraction.feature_calculators.approximate_entropy(x, m, r)
近似熵,用来衡量一个时间序列的周期性、不可预测性和波动性

tsfresh.feature_extraction.feature_calculators.ar_coefficient(x, param)
自回归模型系数,适用于极大似然估计的AR(k)模型,参数k是滞后项

 

tsfresh.feature_extraction.feature_calculators.augmented_dickey_fuller(x, param)

扩张的Dickey-Fuller检验(ADF)是在时间序列分析中用来辨识个别变数的样本资料是否存在单位根,返回的是测试统计量的值


tsfresh.feature_extraction.feature_calculators.autocorrelation(x, lag)
滞后lag的自相关系数


tsfresh.feature_extraction.feature_calculators.binned_entropy(x, max_bins)
把整个序列按值均分成max_bins个桶,然后把每个值放进相应的桶中,然后求熵。


p_{k}表示落在第k个桶中的数占总体的比例。这个特征是为了衡量样本值分布的均匀度。
参数:x(pandas.Series) 需要计算特征的时间序列
   max_bins (int) 桶的数量
返回值:特征值
返回值类型:float
函数类型:简单

tsfresh.feature_extraction.feature_calculators.c3(x, lag)

\frac{1}{n-2lag} \sum_{i=0}^{n-2lag} x_{i + 2 \cdot lag}^2 \cdot x_{i + lag} \cdot x_{i}

which is

\mathbb{E}[L^2(X)^2 \cdot L(X) \cdot X]

衡量时序数据的非线性性

tsfresh.feature_extraction.feature_calculators.change_quantiles(x, ql, qh, isabs, f_agg)
先用ql和qh两个分位数在x中确定出一个区间,然后在这个区间里计算时序数据的均值、绝对值、连续变化值。

Parameters:    
x (pandas.Series) – 时序数据
ql (float) – 分位数的下限
qh (float) – 分位数的上线
isabs (bool) – 使用使用绝对值
f_agg (str, name of a numpy function (e.g. mean, var, std, median)) – numpy自带的聚合函数(均值,方差,标准差,中位数)

tsfresh.feature_extraction.feature_calculators.cid_ce(x, normalize)
用来评估时间序列的复杂度,越复杂的序列有越多的谷峰。
 \sqrt{ \sum_{i=0}^{n-2lag} ( x_{i} - x_{i+1})^2 }

tsfresh.feature_extraction.feature_calculators.count_above_mean(x)
大于均值的数的个数

tsfresh.feature_extraction.feature_calculators.count_below_mean(x)
小于均值的数的个数

tsfresh.feature_extraction.feature_calculators.cwt_coefficients(x, param)
计算Ricker小波的连续小波变化,又被成为“墨西哥帽小波”
\frac{2}{\sqrt{3a} \pi^{\frac{1}{4}}} (1 - \frac{x^2}{a^2}) exp(-\frac{x^2}{2a^2})
 采用所有不同宽度的数组,对每个不同宽度的数组进行一次cwt计算

 

tsfresh.feature_extraction.feature_calculators.energy_ratio_by_chunks(x, param)
Calculates the sum of squares of chunk i out of N chunks expressed as a ratio with the sum of squares over the whole series.

计算块i在N个块的平方和,对整个级数求平方和的比率

Takes as input parameters the number num_segments of segments to divide the series into and segment_focus which is the segment number (starting at zero) to return a feature on.

If the length of the time series is not a multiple of the number of segments, the remaining data points are distributed on the bins starting from the first. For example, if your time series consists of 8 entries, the first two bins will contain 3 and the last two values, e.g. [ 0., 1., 2.], [ 3., 4., 5.] and [ 6., 7.].

Note that the answer for num_segments = 1 is a trivial “1” but we handle this scenario in case somebody calls it. Sum of the ratios should be 1.0.

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
param – contains dictionaries {“num_segments”: N, “segment_focus”: i} with N, i both ints

Returns:
the feature values

Return type:
list of tuples (index, data)

tsfresh.feature_extraction.feature_calculators.fft_aggregated(x, param)
Returns the spectral centroid (mean)光谱矩心, variance方差, skew偏度, and kurtosis of the absolute fourier transform spectrum绝对傅里叶变换频谱的峰度.

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
param (list) – contains dictionaries {“aggtype”: s} where s str and in [“centroid”, “variance”, “skew”, “kurtosis”]

Returns:
the different feature values

Return type:
pandas.Series

This function is of type: combiner

tsfresh.feature_extraction.feature_calculators.fft_coefficient(x, param)
通过快速傅里叶变换算法计算一维离散傅里叶变换的傅里叶系数

A_k =  \sum_{m=0}^{n-1} a_m \exp \left \{ -2 \pi i \frac{m k}{n} \right \}, \qquad k = 0, \ldots , n-1.
​ 

The resulting coefficients will be complex, this feature calculator can return the real part (attr==”real”), the imaginary part (attr==”imag), the absolute value (attr=”“abs) and the angle in degrees (attr==”angle).

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
param (list) – contains dictionaries {“coeff”: x, “attr”: s} with x int and x >= 0, s str and in [“real”, “imag”, “abs”, “angle”]

Returns:
the different feature values

Return type:
pandas.Series

This function is of type: combiner

tsfresh.feature_extraction.feature_calculators.first_location_of_maximum(x)
最大值第一次出现的位置

tsfresh.feature_extraction.feature_calculators.first_location_of_minimum(x)
最小值第一次出现的位置

tsfresh.feature_extraction.feature_calculators.friedrich_coefficients(x, param)
Coefficients of polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
多项式h(x),已被确定性动力学中Langevin模型拟合出系数?

\dot{x}(t) = h(x(t)) + \mathcal{N}(0,R)

as described by [1].

For short time-series this method is highly dependent on the parameters.

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
param (list) – contains dictionaries {“m”: x, “r”: y, “coeff”: z} with x being positive integer, the order of polynom to fit for estimating fixed points of dynamics, y positive float, the number of quantils to use for averaging and finally z, a positive integer corresponding to the returned coefficient

Returns:
the different feature values

Return type:
pandas.Series

tsfresh.feature_extraction.feature_calculators.has_duplicate(x)
有没有重复值,bool

tsfresh.feature_extraction.feature_calculators.has_duplicate_max(x)
最大值有没有重复, bool

tsfresh.feature_extraction.feature_calculators.has_duplicate_min(x)
最小值有没有重复, bool

tsfresh.feature_extraction.feature_calculators.index_mass_quantile(x, param)

这些应用特性计算相对指数i,其中q%的时间序列x位于i的左侧


tsfresh.feature_extraction.feature_calculators.kurtosis(x)

返回x的峰度(采用调整后的Fisher-Pearson标准化矩系数G2计算)


tsfresh.feature_extraction.feature_calculators.large_standard_deviation(x, r)
x的标准差是否大于r乘以最大值减最小值
std(x) > r * (max(X)-min(X))

tsfresh.feature_extraction.feature_calculators.last_location_of_maximum(x)
最大值最后出现的位置

tsfresh.feature_extraction.feature_calculators.last_location_of_minimum(x)
最小值最后出现的位置

tsfresh.feature_extraction.feature_calculators.length(x)

x的长度

 

tsfresh.feature_extraction.feature_calculators.linear_trend(x, param)

tsfresh.feature_extraction.feature_calculators.linear_trend_timewise(xparam)

计算时间序列的值与从0到时间序列长度- 1之间的序列的线性最小二乘回归,特征假设信号是均匀采样的,不会使用时间戳来匹配模型,参数控制返回哪些特性。可能提取的属性:pvalue|rvalue|intercept|slope|stderr...

tsfresh.feature_extraction.feature_calculators.longest_strike_above_mean(x)
大于均值的最长连续子序列长度

tsfresh.feature_extraction.feature_calculators.longest_strike_below_mean(x)
小于均值的最长连续子序列长度

tsfresh.feature_extraction.feature_calculators.max_langevin_fixed_point(x, r, m) “不明白”
Largest fixed point of dynamics :math:argmax_x {h(x)=0}` estimated from polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
\dot(x)(t) = h(x(t)) + R \mathcal(N)(0,1)

as described by

Friedrich et al. (2000): Physics Letters A 271, p. 217-222 Extracting model equations from experimental data
For short time-series this method is highly dependent on the parameters.

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
m (int) – order of polynom to fit for estimating fixed points of dynamics
r (float) – number of quantils to use for averaging

Returns:
Largest fixed point of deterministic dynamics

Return type:
float

tsfresh.feature_extraction.feature_calculators.maximum(x)
最大值

tsfresh.feature_extraction.feature_calculators.mean(x)
均值

tsfresh.feature_extraction.feature_calculators.mean_abs_change(x)
连续变化值绝对值的均值
\frac{1}{n} \sum_{i=1,\ldots, n-1} | x_{i+1} - x_{i}|

tsfresh.feature_extraction.feature_calculators.mean_change(x)
连续变化值的均值
\frac{1}{n} \sum_{i=1,\ldots, n-1}  x_{i+1} - x_{i}

tsfresh.feature_extraction.feature_calculators.mean_second_derivative_central(x)

Returns the mean value of a central approximation of the second derivative

\frac{1}{n} \sum_{i=1,\ldots, n-1}  \frac{1}{2} (x_{i+2} - 2 \cdot x_{i+1} + x_i)

tsfresh.feature_extraction.feature_calculators.median(x)
中位数

tsfresh.feature_extraction.feature_calculators.minimum(x)
最小值

tsfresh.feature_extraction.feature_calculators.number_crossing_m(x, m)
计算x与m的相交次数,相交被定义为:;两个序列值,第一个值比m小第二个值更大,反之亦然。如果把m设为0,得到0相交的数量。

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
m (float) – the threshold for the crossing

Returns:
the value of this feature

Return type:
int

tsfresh.feature_extraction.feature_calculators.number_cwt_peaks(x, n)
搜索X中不同的峰值,x被ricker小波平滑,宽度从1到n。返回的是在足够的宽度上出现峰的数量及SNR高信号噪声比?

Parameters:
x (numpy.ndarray) – the time series to calculate the feature of
n (int) – maximum width to consider

Returns:
the value of this feature

Return type:
int

tsfresh.feature_extraction.feature_calculators.number_peaks(x, n)
峰值个数

tsfresh.feature_extraction.feature_calculators.partial_autocorrelation(x, param)
指定滞后lag的偏自相关函数的值。lag : k

tsfresh.feature_extraction.feature_calculators.percentage_of_reoccurring_datapoints_to_all_datapoints(x)
len(different values occurring more than once) / len(different values)
出现超过1次的值的个数/总的取值的个数(重复值只算一个),百分比被规范化为惟一值的数量,与.percentage_of_reoccurring_values_to_all_values(x)形成对照

tsfresh.feature_extraction.feature_calculators.percentage_of_reoccurring_values_to_all_values(x)
出现超过1次的值的个数/总个数

tsfresh.feature_extraction.feature_calculators.quantile(x, q)
返回x中q的分位数,q% 小于分位数。

tsfresh.feature_extraction.feature_calculators.range_count(x, min, max)
x中在min和max之间的数的个数

tsfresh.feature_extraction.feature_calculators.ratio_beyond_r_sigma(x, r)
取值大于r倍标准差的比例

tsfresh.feature_extraction.feature_calculators.ratio_value_number_to_time_series_length(x)
把 x unique后的长度除以x原始长度 len(set(x))/len(x)

tsfresh.feature_extraction.feature_calculators.sample_entropy(x)

tsfresh.feature_extraction.feature_calculators.set_property(key, value)

This method returns a decorator that sets the property key of the function to value
tsfresh.feature_extraction.feature_calculators.skewness(x)

返回x的偏度(采用调整后的Fisher-Pearson标准化矩系数G1计算)
tsfresh.feature_extraction.feature_calculators.spkt_welch_density(x, param) “不明白”

This feature calculator estimates the cross power spectral density of the time series x at different frequencies. To do so, the time series is first shifted from the time domain to the frequency domain.


tsfresh.feature_extraction.feature_calculators.standard_deviation(x)
标准差

tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_data_points(x)
出现过多次的点的个数

tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_values(x)
出现过多次的值的和

tsfresh.feature_extraction.feature_calculators.sum_values(x)
所有值的和

tsfresh.feature_extraction.feature_calculators.symmetry_looking(x, param)| mean(X)-median(X)| < r * (max(X)-min(X))

tsfresh.feature_extraction.feature_calculators.time_reversal_asymmetry_statistic(x, lag)
 

\frac{1}{n-2lag} \sum_{i=0}^{n-2lag} x_{i + 2 \cdot lag}^2 \cdot x_{i + lag} - x_{i + lag} \cdot  x_{i}^2

which is

\mathbb{E}[L^2(X)^2 \cdot L(X) - L(X) \cdot X^2]

tsfresh.feature_extraction.feature_calculators.value_count(x, value)
x中值等于value的计数

tsfresh.feature_extraction.feature_calculators.variance(x)
方差

tsfresh.feature_extraction.feature_calculators.variance_larger_than_standard_deviation(x)
方差是否大于标准差
 

  • 6
    点赞
  • 42
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值