SAX

ABSTRACT

The parallel explosions of interest in streaming data, and data mining of timeseries have had surprisingly little intersection. This is in spite of the factthat time series data are typically streaming data. The main reason forstreaming data explicitly assumes that the data is discrete, whereas the vastmajority of time series is real valued.

Many researchers have also considered transforming real valued time series intosymbolic representations, noting that such representations would potentially allow researchers to avail of the wealth of data structures and algorithms from thetext processing and bioinformatics communities, in addition to allowingformerly “batch-only” problems to be tackled by the streaming community. Whilemany symbolic representations of time series have been introduced over the pastdecades. They all suffer from three fatal flaws.

1)       Firstly,the dimensionality of the symbolic representation is the same as the originaldata, and virtually all data mining algorithms scale poorly withdimensionality.

2)       Secondly,although distance measures can be defined on the symbolic approaches, thesedistance measures have little correlation with distance measures defined on theoriginal time series.

3)   most of these  symbolic approaches require one to have accessto all the data,  before creating the symbolic representation.  This last feature explicitly thwarts effort s to use the representations with streaming algorithms.

In this work we introduces a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction.And it also allows distance measures to be defined on the symbolic approach that lower bounding corresponding distance measures defined on the original series.As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulatedsymbolic representation, while producing identical results to the algorithms that operate on the original data. Finally, our representation allows the real valueddata to be converted in a streaming fashion , with only  an infinitesimal time and space overhead.

Wewill demonstrate the utility of our representation on the classic data miningtasks of clustering, classification, query by content and anomaly detection.

The most prominent problems are arise from the high dimensionality of time-seriesdata.

Datarepresentation.

1)       How can the fundamental shape characteristics of a time-series be represented?

2)       what invariance properties should the representation satisfy?

3)       A representation technique should derive the notion of shape by reducing thedimensionality of data while retaining its essential characteristics.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值