Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles

1 INTRODUCTION

Our second hypothesis was that we can improve TSC
performance through ensembling.
Although the value of ensembling is well known, our approach is unusual in that we inject diversity by adopting a heterogeneous ensemble rather than by using resampling schemes with weak learners. Our approach is in fact a meta-ensemble, since two of the components (random forest and rotation forest) are themselves ensembles.
虽然集成的价值是众所周知的,但是我们的方法是不寻常的,因为我们通过采用异构集合来注入多样性,而不是使用与弱学习器一样的重采样方案。

总的来说,我们(为了方便,这里我们指的是作者他们)使用了35种 classifiers. 这方法是最精确的,但是解释性最小。

We investigate ways of forming hierarchical ensembles through choosing subsets of data representations to use based on training set performance.
我们通过 选择基于训练集性能的数据表示 子集来研究形成分层集成的方法。

2 TIME SERIES CLASSIFICATION BACKGROUND

we have a set of n time series,
T=T1,T2,...,Tn
where each time series has m ordered real-valued observations
Ti=<ti1,ti2,...,tim>
and a class value ci .

时间序列分类问题的特点与一般的分类问题不同的是 属性的排序 是很重要的。

The best discriminatory features for classification might be masked by the length of the series, confounded by noise in the phase of the series or embedded in the interaction of observations.
分类的最佳鉴别特征可能被系列的长度所掩盖,被系列相位中的噪声所掩盖,或者被嵌入到观测的相互作用中。

因此,TSC通常需要针对问题性质的技术。

The alternative approaches to TSC are best understood by considering how the data is represented or, equivalently, how similarity between series is quantified.
通过考虑如何表示数据,或者等同于如何量化系列之间的相似性,可以更好地理解 the alternative approaches to TSC/ TSC的替代方法。

序列之间的相似性可以基于几个判别标准,例如:时间相似性,谱或自相关结构; 全球或本地相似; 和数据驱动或基于模型的相似性。

2.1 Similarity in the Time Domain

1-NN DTW with the warping window size set through
cross-validation on the training data, is surprisingly hard to beat.
shapelet : We discuss recent shapelet research in more detail in Section 3.1.

The second popular localised approach involves deriving features from varying size intervals of the series.
BoP、SAX、构建距离、构建树什么的

Baydogan et al. [3] describe a bag-of-features approach that combines interval and frequency features.
time series based on a bag-of-features representation (TSBF)

3 DATA TRANSFORMATIONS

  1. Shapelet Transform
  2. Frequency Domain: Periodogram Transform(ACF等等)
  3. Autocorrelation-Based Transform

4 CLASSIFIERS

4.1 Heterogeneous Ensemble

The classifiers used are the WEKA [24] implementations of

  • k Nearest Neighbour (where k is set through cross validation),
  • Naive Bayes,
  • C4.5 decision tree,
  • SVM with linear
  • SVM quadratic basis function kernels,
  • Random Forest (with 100 trees),
  • Rotation Forest(with 10 trees)
  • Bayesian network.
4.2 Elastic Ensemble

We use the heterogeneous ensemble of eight classifiers for datasets in the frequency, change, and shapelet transformation domains.

The 11 classifiers in EE are

  • 1-NN with euclidean distance (ED),
  • full dynamic time warping,
  • DTW with window size set through cross validation (DTWCV),
  • derivative DTW with full window
  • window set through cross validation (DDTW and DDTWCV),
  • weighted DTW (WDTW)
  • and derivative weighted DTW(WDDTW) ,
  • longest common subsequence,
  • Edit Distance with Real Penalty [29],
  • Time Warp Edit distance [9],
  • and the Move-Split-Merge distance metric [4].

两种Ensemble算法的比较在 time-domain的结果:
这里写图片描述

EE明显优于基于时间的异构集合,在46个数据集上获胜,23个失败,与3个打和。

5 DATASETS

就先写到这里。
【未完待续……】

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值