时间序列预测系列4

最新推荐文章于 2024-03-29 16:03:29 发布

untilyouydc

最新推荐文章于 2024-03-29 16:03:29 发布

阅读量848

点赞数

分类专栏：时间序列预测

本文链接：https://blog.csdn.net/qq_40774175/article/details/108076025

版权

时间序列预测专栏收录该内容

4 篇文章 2 订阅

订阅专栏

时间序列预测系列4

主要内容

本文主要讲述双阶段注意力模型的实现，即从时间和空间两个维度，分别使用注意力模型。具体模型的详细内容，可以参考下面这篇论文。
链接：https://pan.baidu.com/s/1UePtoCnYcKXbb7eTwqvNiw
提取码：j1c6

本文专注于讲述如何用代码实现，对论文里的内容不做过多的解释。若想完全理解论文，需要多读几遍。

本文应该是时间序列预测系列中最难的一部分，需要花费一定的时间去阅读代码和论文。

论文核心思想

空间注意力：每支股票每天的数据中，都含有多种因素（开盘价，收盘价，最高价，最低价…），之前我们的实验都是认为每种因素对预测值的影响相同，但实际上每种因素对预测值的影响应该是不同的，因此需要对这些因素使用注意力机制，不同的因素分配不同的权重。
其次，不同股票直接的有相互影响的，例如如果你要预测沪深300指数的收盘价，那么你可以将沪深300的一些成分股的收盘价当做驱动序列，分配给不同的成分股不同的权重系数（因为每支成分股对沪深300指数的影响是不同的）。
时间注意力：正如第3篇中提到的，有T个时间步的信息，但每个时间步对最终预测值的影响应该是不同的，所以也应该使用注意力机制。

数据集

本次实验预测沪深300指数的收盘价，文件tg.csv中存储的就是它的收盘价数据，每行数据对应的日期已经丢失，每行数据代表一天的历史数据。数据保证是正确的。
文件hs300.csv中存储的是67支成分股的收盘价历史数据，每行对应一天的数据，日期保证能与tg.csv中的数据对齐。实验时直接使用即可。

也就是说，本文没有考虑最高价，最低价，交易量等因素的影响。本文的空间注意力机制目的是给不同的成分股分配不同的权重。
链接：https://pan.baidu.com/s/1qrr5PxWVNedukv4r6Nx6Kw
提取码：je8h

链接：https://pan.baidu.com/s/1ydKUSQFL072j0mG4Z1V3GA
提取码：c6rq

数据预处理

本次实验同前面相同，对数据进行标准化即可。

# 对数据进行标准化
def normal(float_data):
    print(float_data.shape)
    mean = float_data.mean(axis=0)
    print(mean)
    float_data -= mean
    std = float_data.std(axis=0)
    print(std)
    float_data /= std
    return float_data

# print(float_data[:,5])

def get_data(data_path):
    f = open(data_path)
    data = f.read()
    f.close
    lines = data.split('\n')
    #print(len(lines))
    header = lines[0].split(',')
    lines = lines[1:]
    #print(len(lines))
    #print('size')
    print(len(header))
    float_data = np.zeros((len(lines),len(header)))
    for i,line in enumerate(lines):
        f=1
        #print(i)
        for j in line.split(','):
            if j == 'None':
                f=0
                break
        if i==len(lines)-1:
            break
        if f==1:
            tmp = [float(x) for x in line.split(',')]
            float_data[i]=tmp
    return float_data

编码器

空间注意力机制是在编码器中实现的。而实现它又分成两步，首先计算出attention_weight，然后进行加权。

def one_encoder_attention_step(h_prev,s_prev,X):
    '''
    :param h_prev: previous hidden state  # LSTM的隐含层
    :param s_prev: previous cell state  # 记忆细胞状态 ,论文中的s
    :param X: (T,n),n is length of input series at time t,T is length of time series  n代表维度,X代表输入的一个序列，步长为T，每一步的维度是n
    :return: x_t's attention weights,total n numbers,sum these are 1 
    '''
    concat = Concatenate()([h_prev,s_prev])  #(none,1,2m)  按最后一个维度进行拼接 m+m=2m
    result1 = en_densor_We(concat)   #(none,1,T) # 通过全连接层，等价于 w*[h_prev;s_prev]
    result1 = RepeatVector(X.shape[2],)(result1)  #(none,n,T)  扩展为n*T
    X_temp = My_Transpose(axis=(0,2,1))(X) #X_temp(None,n,T) 改变X的维度，也可使用permute((0,2,1))(X)
    result2 = Dense_no_bias(T)(X_temp)  # (none,n,T) * Ue(T,T)   ==n*T
    result3 = Add()([result1,result2])  #(none,n,T)  w*[h_prev;s_prev] + Ue*X 
    result4 = Activation(activation='tanh')(result3)  #(none,n,T)
    
    result5 = Dense_no_bias(1)(result4) # Ve(1*T)*(n,T) == 1*n 
    result5 = My_Transpose(axis=(0,2,1))(result5)
    print('result5 ',result5)
    alphas = Activation(activation='softmax')(result5)

    return alphas

上面计算权重的方式使用了相加的方式，而没有使用点积，论文中没有使用点积的方式，应该是为了避免点积导致数值过大或者过小。

def encoder_attention(T,X,s0,h0):

    s = s0
    h = h0
    print('s:', s)
    #initialize empty list of outputs
    attention_weight_t = None
    for t in range(T):
        print('X:', X)
        context = one_encoder_attention_step(h,s,X)  #(none,1,n)
        print('context:',context)
        x = Lambda(lambda x: X[:,t,:])(X) # 取出第t个时间步的数据
        x = Reshape((1,x.shape[1]))(x) # 设置维度为1*n
        print('x:',x) 
        h, _ , s = en_LSTM_cell(x, initial_state=[h, s]) 
        if t!=0: 
            print('attention_weight_t:',attention_weight_t) 
            #attention_weight_t= Merge(mode='concat', concat_axis=1)([attention_weight_t,context]) # 旧版本
            if t==T-1:
                attention_weight_t  = Lambda(lambda x:  K.concatenate([x[0], x[1]], axis=1),name='attention_weight_local')([attention_weight_t, context])
            else:
                attention_weight_t  = Lambda(lambda x:  K.concatenate([x[0], x[1]], axis=1))([attention_weight_t, context])
            # my_concat([attention_weight_t, context])
            #attention_weight_t = Concatenate(axis=1)([attention_weight_t, context]) # 新版本的Keras
            print(attention_weight_t)
            print('hello') 
        else:
            attention_weight_t = context
        print('h:', h)
        print('_:', _)
        print('s:', s)
        print('t', t)
        # break

    X_ = Multiply()([attention_weight_t,X]) # 获得各维度加权后的值 T*n
    print('return X:',X_)
    return X_

上述代码的意义很明确，每组数据有T个时间步，每个时间步都对应着67支成分股的收盘价，因此最后得到一个T67的权重矩阵，然后与原数据X(T67)进行加权即可。

建议参考论文中的公式和上述代码进行分析，尤其是注意分析各矩阵的维度。

解码器

一方面将沪深300的收盘价读取进来，一方面对来自编码器的T个时间步信息求权重系数。

def one_decoder_attention_step(h_de_prev,s_de_prev,h_en_all,t):
    '''
    :param h_prev: previous hidden state
    :param s_prev: previous cell state
    :param h_en_all: (None,T,m),m is hidden size at time t,T is length of time series
    :return: x_t's attention weights,total T numbers,sum these are 1
    '''
    print('h_en_all:',h_en_all)
    concat = Concatenate()([h_de_prev,s_de_prev])  #(None,1,2p)
    result1 = de_densor_We(concat)   #(None,1,m)
    result1 = RepeatVector(T)(result1)  #(None,T,m)
    result2 = Dense_no_bias(m)(h_en_all) # m*m dot (T,m)=T*m  或直接写 Dense(m)(h_en_all)
    print('result2:',result2)
    print('result1:',result1) 
    result3 = Add()([result1,result2])  #(None,T,m) 
    result4 = Activation(activation='tanh')(result3)  #(None,T,m) 
    result5 = Dense_no_bias(1)(result4) # 1*m dot T*m= 1*T 
    print('result5, ',result5.shape) 
    result5 = Reshape((1,result5.shape[1]))(result5)
    print('result5_new ',result5)
    if t==T-2:
        beta = Activation(activation='softmax' ,name = 'attention_weight_time')(result5)
    else:
        beta = Activation(activation='softmax')(result5) 
    beta = Reshape((beta.shape[2],1))(beta)
    context = Dot(axes = 1)([beta,h_en_all])  #(1,m) 将T个（1*m)的向量，按比例合并，最终为一个1*m。
    return context

def decoder_attention(T,h_en_all,Y,s0,h0):
    s = s0
    h = h0
    for t in range(T-1):  
        print(t)
        y_prev = Lambda(lambda y_prev: Y[:, t, :])(Y) # Y是输入的预测序列
        y_prev = Reshape((1, y_prev.shape[1]))(y_prev)   # (None,1,1)  ，代码修改点
        print('y_prev:',y_prev) 
        context = one_decoder_attention_step(h,s,h_en_all,t)  #(None,1,20)
        y_prev = Concatenate(axis=2)([y_prev,context])   #(None,1,21)  
        print('y_prev:',y_prev)
        y_prev = Dense(1)(y_prev)       #(None,1,1) w [y;c] 
        print('y_prev:',y_prev)
        h, _, s = de_LSTM_cell(y_prev, initial_state=[h, s])
        print('h:', h)
        print('_:', _)
        print('s:', s)

    context = one_decoder_attention_step(h, s, h_en_all,T-1) # Ct 1*m  
    return h,context # h === 最后一个隐含层

总的来说，这篇论文一开始我觉得很强，但后面真正理解以后，我发现这大概只是为了水论文而写的。

这篇文章写的有些简略，因为很多分析都不好写，需要你看着论文来理解。
后续若发现问题可以在评论区评论。
下面把完整代码放上来，两种格式的代码。其中.ipnb里面可以看到运行的结果。
链接：https://pan.baidu.com/s/1QxSclUxTW1T0FRoo5Rz3hg
提取码：hlpn

untilyouydc

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
4
评论
时间序列预测系列4

时间序列预测系列4主要内容本文主要讲述双阶段注意力模型的实现，即从时间和空间两个维度，分别使用注意力模型。具体模型的详细内容，可以参考下面这篇论文。链接：https://pan.baidu.com/s/1UePtoCnYcKXbb7eTwqvNiw提取码：j1c6本文专注于讲述如何用代码实现，对论文里的内容不做过多的解释。若想完全理解论文，需要多读几遍。本文应该是时间序列预测系列中最难的一部分，需要花费一定的时间去阅读代码和论文。论文核心思想空间注意力：每支股票每天的数据中，都含有多种因素
复制链接

扫一扫

专栏目录