190721|本周看的两篇论文的总结

最新推荐文章于 2021-01-13 03:46:49 发布

听我蒋蒋

最新推荐文章于 2021-01-13 03:46:49 发布

阅读量398

点赞数 2

文章标签：时间序列 DTW

本文链接：https://blog.csdn.net/weixin_41267423/article/details/96841026

版权

两篇论文：

1、Elastic bands across the path – A new framework method to lower bound DTW：提出一种新的下限DTW，具体实现见2.2的第四种方法
2、A Fast Semi-supervised Clustering Framework for Large-scale Time Series Data：本文提出了STSC(半监督时间序列框架)：主要包括一种更快速的距离测量和成对约束的传递策略。基于此设计了fss-kmeans和fss-DBSCAN两种算法，对11种数据进行了实验测试。

1 聚类评价方法

1.1 purity

在这里插入图片描述
python实现：

import numpy as np
from collections import Counter

def Purity(y_pred,labels):
"""
:param y_pred
:param labels:
:return:
"""
length = len(labels)
classIndex = np.unique(y_pred)
classNum = len(classIndex)
maxSum = 0
for i in range(classNum):
    currIndex = classIndex[i]
    currTrueLabels = []
    for j in range(length):
        if y_pred[j] == currIndex:
            currTrueLabels.append(labels[j])
    currMax = Counter(currTrueLabels).most_common(1)[0][1]
    maxSum = maxSum + currMax
return maxSum / length

1.2 RandIndex

RandInex = (TP+TN)/(TP+FP+FN+TN)

TP:本身是同一类，且被分到了同一类
TN:本身不是同一类，且被分到了不同类
FP:不同类被分到同一类
FN:同一类被分到不同类

python实现：

def RandIndex(y_pred,labels):
    """
    :param y_pred:predicted label
    :param labels: true label
    :return:
    """
    length = len(labels)
    TP,TN ,FP,FN = 0,0,0,0
    for k1 in range(length-1):
        for k2 in range(k1+1,length):
            if y_pred[k1]== y_pred[k2] and labels[k1]==labels[k2]:
                # 本身是同一类，且被分到了同一类
                TP = TP + 1
            elif y_pred[k1] != y_pred[k2] and labels[k1]!=labels[k2]:
                # 本身不是同一类，且被分到了不同类
                TN = TN +1
            elif y_pred[k1] == y_pred[k2] and labels[k1] != labels[k2]:
                # 不同类被分到同一类
                FP = FP +1
            elif y_pred[k1] != y_pred[k2] and labels[k1] == labels[k2]:
                # 同一类被分到不同类
                FN = FN +1
    return (TP+TN)/(TP+FP+FN+TN)

1.3 NMI

这篇文章解释的很详细了：https://blog.csdn.net/chengwenyao18/article/details/45913891
代码看这里：https://blog.csdn.net/qq_34807908/article/details/86522731 （注意：输入里的A，B必须是numpy.array，用python的list测试会报错）

2 相似距离计算方法

应用于时间序列，就是判断两个时间序列是否相似。

2.1 欧式距离(ED)

简单，不多概述。python实现：

import numpy as np
def distEclud(vecA, vecB):
    '''
    输入：向量A和B
    输出：A和B间的欧式距离
    '''
    return np.sqrt(sum(np.power(vecA - vecB, 2)))

2.2 DTW

ED只能计算等长时间序列之间的相似程度，而DTW可以实现不等长序列的相似计算。

参考链接：https://nbviewer.jupyter.org/github/alexminnaar/time-series-classification-and-clustering/blob/master/Time Series Classification and Clustering.ipynb

以下是DTW及不断改进的几种方法的python实现:

 import numpy as np
 # 1、O(m*n)运算速度,求距离最小值
    def DWTDistance(s1,s2):
        DWT = {}
        for i in range(len(s1)):
            DWT[(i,-1)] = float('inf')
        for i in range(len(s2)):
            DWT[(-1,i)] = float('inf')
        DWT[(-1,-1)] = 0
        for i in range(len(s1)):
            for j in range(len(s2)):
                dist = (s1[i] - s2[j])**2
                DWT[(i,j)]= dist + min(DWT[(i-1,j)],DWT[(i,j-1)],DWT[(i-1,j-1)])
        return np.sqrt(DWT[(len(s1)-1,len(s2)-1)])

# 2、增加w窗口值，提高速度
# 当i和j相距较远，选择不计算，提高速度

def DWTDistance_window(s1,s2,w):
    DWT = {}
    m = len(s1)
    n = len(s2)
    w = max(w,abs(m-n))
    for i in range(-1,m):
        for j in range(-1,n):
            DWT[(i,j)] = float('inf')
    DWT[(-1,-1)] = 0

    for i in range(m):
        for j in range(max(0,i-w),min(n,i+w)):
            dist = (s1[i]-s2[j])**2
            DWT[(i,j)] = dist + min(DWT[(i-1,j)],DWT[(i,j-1)],DWT[(i-1,j-1)])
    return np.sqrt(DWT[(m-1,n-1)])

 # 3、下界方法,时间复杂度O(n)
def LB_Keogh(s1,s2,r):
    LB_sum = 0
    n = len(s2)
    for ind,i in enumerate(s1):
        # r表示边界范围
        lower_bound = min(s2[(ind-r if ind-r >= 0 else 0):(ind+r if ind+r <=n else n)])
        upper_bound = max(s2[(ind-r if ind-r >= 0 else 0):(ind+r if ind+r <=n else n)])
        if i>upper_bound:
            LB_sum += (i-upper_bound)**2
        elif i<lower_bound:
            LB_sum += (i-lower_bound)**2
    return np.sqrt(LB_sum)

#4、LB_enhanced，时间复杂度为O(n)，比前者好
#是本周看的第一篇论文Elastic bands across the path -- A new framework method to lower bound DTW中提出的方法
def LB_Enhanced(A,B,W,V,D):
    """
    a new method to lower bound DTW
    :param A: time series;
    :param B: time series
    :param W: warping window;w=0.1xL=6，L是时间序列的长度
    :param V: speed-tightness parameter;20
    :param D: current distance to NN
    :return:
    """
    res = (A[0]-B[0])**2 + (A[-1]-B[-1])**2
    n_bands = min(L/2,V)
    # 1、do L,R bands loop
    for i in range(2,n_bands+1):
        minL = (A[i-1]-B[i-1])**2
        minR = (A[L-i]-B[L-i])**2
        for j in range(max(1,i-W),i-1):
            minL = min(minL,(A[i-1]-B[j-1])**2)
            minL = min(minL,(A[j-1]-B[i-1])**2)
            minR = min(minR,(A[L-i]-B[L-j])**2)
            minR = min(minR,(A[L-j]-B[L-i])**2)
        res = res + minL + minR
    if res >= D:
        return float('inf')
    #2、LB_keogh
    temp = LB_Keogh(A[n_bands:L-n_bands],B[n_bands:L-n_bands],5)
    res += temp
    return res
    # 5、Fast similarity Measure
    # 论文2A Fast Semi-supervised Clustering Framework for Large-scale Time Series Data提出的更快速的计算方法
    # 以下是部分代码
    def aDTW_calculate(C,D,beta):
        """
        :param C:二维矩阵；聚类中心矩阵
        :param D:二维矩阵；训练数据
        :param beta:
        :return:aDTW矩阵
        """
        k = C.shape[0]
        n = D.shape[0]
        aDTW = np.zeros((k,n))
        for i in range(k):
            for j in range(n):
                LB = lb_keogh(C[i],D[j]) #lower bound
                UB = ub_ED(C[i],D[j]) #upper bound
                aDTW[i,j] = LB + beta*(UB-LB)
        return aDTW

3 baseline聚类算法

3.1 k-means（ED）

核心思想：设定各类别的聚类中心序列，每一次迭代过程中，将所有训练数据序列与k个（几个类别就有几个聚类中心）聚类中心序列进行相似距离计算，距离哪个中心序列距离最小，该训练数据序列就属于哪个中心序列类别。ED即欧几里得距离，是最简单最常见的相似距离计算方法。

（聚类中心的设置：

在没有任何标签信息的情况下，聚类中心的设置即从所有的数据中随机抽取k个样本，之后在每次迭代后进行更新（对已得到的聚类数据分别求均值）
在有标签数据的情况下，对标签数据各类别的数据求均值得到各类别的聚类中心序列。
）

3.2 DBSCAN（ED）

原理及python实现，见链接：https://www.cnblogs.com/tiaozistudy/p/dbscan_algorithm.html

听我蒋蒋

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
190721|本周看的两篇论文的总结

上周任务：看论文Elastic bands across the path – A new framework method to lower bound DTWA Fast Semi-supervised Clustering Framework for Large-scale Time Series DataIndexing and classifying gigabytes of ...
复制链接

扫一扫