Wasserstein distance vs Dynamic Time Warping

最新推荐文章于 2023-12-22 17:44:15 发布

Utterly Bonkers

最新推荐文章于 2023-12-22 17:44:15 发布

阅读量834

点赞数 1

文章标签：最优传输时间序列 time series analysis 动态时间规整动态规划

本文链接：https://blog.csdn.net/utterly_bonkers/article/details/89411140

版权

这篇博客探讨了Dynamic Time Warping（DTW）和Wasserstein距离在时间序列分析中的应用。DTW是一种常用的时间序列匹配算法，而Wasserstein距离在机器学习中广泛使用，尤其在概率分布比较中。尽管两者都能通过坐标变形找寻序列相似性，但DTW不满足三角不等式，不是真正意义上的距离度量，且在保持时间顺序上更为严格。实验证明，DTW和Wasserstein在身体运动数据比较上的结果高度相关，但在处理时间顺序改变的情况下，两者表现不同。

摘要由CSDN通过智能技术生成

这篇博客同时在我的wordpress上发布

In my internship with UCSF Neuroscape lab, I was faced with an important question: is there any difference between Dynamic Time Warping and the Wasserstein metric applied on one-dimensional (time series) data? The end-goal here is to find an algorithm that could reliably determine how similar the body motion of an older adult is to young adults, but this question of DTW vs. Wasserstein applies to all time series comparison problems.

Background

Time series data is everywhere: stock information, temperature graphed over hours, and video are all time series, as they all are a series of data points linked in time order. For stocks, the price points of each day are linked; the temperature of each hour is stringed together; individual frames are linked together chronologically to form videos.

Dynamic Time Warping, an algorithm that uses dynamic programming, had been a leading time-series analysis algorithm for several decades, being used in a wide array of applications. As the name suggests, DTW “warps” the time coordinates of the time series in order to see similarities, even if the two series are not entirely aligned - a very useful trait.

However, another family of algorithms have just popped up - Optimal Transport. In the recent years, OT, and especially its Wasserstein distance, have become incredibly hot in Machine Learning, finding itself employed in roles from image searching to the discriminator of Generative Adversarial Networks.

Although Wasserstein distance (also called EMD) is nearly exclusively mentioned in the context of ML topics, when applied on time series data, it has an effect incredibly similar to that of DTW, also diligently warping coordinates to find similarities between series.

All of this begs a question - exactly how similar are Dynamic Time Warping and ML’s Wasserstein metric?

Through looking into the two algorithms and performing real life testing, we find that DTW is nearly a 1-dimensional special case of Wasserstein metric, but is different in two ways.

Dynamic Time Warping

Dynamic Time Warping is a wonderfully simple algorithm, utilizing a two-dimensional array for the entire computation. Each state is the minimum of three possible cases: advance, insertion, or deletion, and the final cell of the array is the answer. Python code for DTW is incredibly simple, as seen below:
DTW code
The line containing the “min” operation is the crucial status update