RNN最接地气的理解以及实现——sin正弦序列

最新推荐文章于 2023-05-04 20:11:48 发布

LoveMIss-Y

最新推荐文章于 2023-05-04 20:11:48 发布

阅读量1.2k

点赞数

本文链接：https://blog.csdn.net/qq_27825451/article/details/102457054

版权

640?wx_fmt=jpeg

640?wx_fmt=png

进入正文

注意：本文适合有一定神经网络基础的人阅读，方便理解。

一、RNN数据预处理

对于深度神经网络而言，数据的预处理是必不可少的，但是因为本文主要是借助“正弦函数”曲线讲解RNN的实现原理和实现过程，故而预处理步骤省略了，主要是实现数据到底怎么去“组织”。

RNN数据预处理

预处理过程

1.1

1.1.1 最原始的时间序列

有一个正弦序列，包含11100个数据，其中11000个数据作为测试集数据，后1100个数据作为测试数据。如下图所示：

640?wx_fmt=png

前面蓝色的部分是训练数据，后面橙色的部分为测试数据。当然还可以给每个数据加入适量的噪声，本文没有添加噪声。数据的前15条数据如下所示：

[0.

0.00909161

0.01818247

0.02727182

0.03635893

0.04544302

0.05452336

0.0635992

0.07266977

0.08173434

0.09079216

0.09984246

0.10888452

0.11791757

0.12694089

0.1359537

为什么这里要添加三个绿色的数字，后面会讲到。

1.1.2 数据的组织

640?wx_fmt=other

在使用循环神经网络的时候，训练的数据需要转化成特定的结构才能使用，如果有使用过keras的相关经验，那么一定见过这样的例子，要求我们的输入的数据必须为如下格式：

X:【batch_size，time_steps, input_size】

而

Y:【batch_size,1】这样的形式，为什么呢，这代表什么意思呢？

实际上，

batch_size表示的是样本数量；

time_steps是值得时间跨度，也是整个RNN最核心的部分，什么意思，比如time_steps=10，那么代表的是当前的状态与之前的10个状态有关，至于到底是什么样的关系，就是RNN要去训练的。

input_size指的是每一个数据的维度，也称之为特征数，（什么是深度学习的“特征”？这里就不做讲解了）在本文中没因为就是一个单独的数字，故而为1，

我们常见的相关教程或者是资料上，表述略有差异，下面列举出来了一些常见的情况：

【batch，time_steps, input_size】

【examples，time_steps, vector_size】

【batch，time_steps, features】

【batch，max_time, features】

【batch，time_length, features】

这几种方法都是一样的意思，最核心的在于time_steps。

查看本例中组织之后的数据维度为：

重新组织训练数据和测试数据的形状

(10990, 10, 1) #训练数据

(1090, 10, 1) #测试数据

我们常见的相关教程或者是资料上，表述略有差异，下面列举出来了一些常见的情况：

【batch，time_steps, input_size】

【examples，time_steps, vector_size】

【batch，time_steps, features】

【batch，max_time, features】

【batch，time_length, features】

这几种方法都是一样的意思，最核心的在于time_steps。

查看本例中组织之后的数据维度为：

重新组织训练数据和测试数据的形状

(10990, 10, 1) #训练数据

(1090, 10, 1) #测试数据

为什么变成了10990和1090呢，不是11000和1100吗？这是因为重新组织数据的意思是，每time_steps个元素组织在一起了，即探寻的是：

X11和x10、x9、x8、x7、x6、x5、x4、x3、x2、x1之间函数关系

X12和x11、x10、x9、x8、x7、x6、x5、x4、x3、x2之间函数关系

X13和x12、x11、x10、x9、x8、x7、x6、x5、x4、x3之间函数关系，以此类推

X11000和x10999、x10998、x10997、x10996、x10995、x10994、x10993、x10992、x10991、x10990之间的关系，

从这里就可以知道，为什么我在前面三个数字上涂上绿色，因为实际上它们代表的就是前面3组样本的输出值，这里也解释了为什么组织之后的数据只有10990组合1090组了。组织之后的数据查看，——仅显示三条样本：（这里因为数据多，就不显示了，可以参考下面的代码实现）

实现上面两个步骤的代码如下：

import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
import matplotlib.pyplot as plt


time_steps=10
batch_size=128
cell_units=5
learning_rate=0.001
epoch=150

train_examples=11000
test_examples=1100

#------------------------------------产生数据-----------------------------------------------#
def generate(seq):
    X=[]
    Y=[]
    for i in range(len(seq)-time_steps):
        X.append([seq[i:i+time_steps]])
        Y.append([seq[i+time_steps]])
    return np.array(X,dtype=np.float32),np.array(Y,dtype=np.float32)

print('最原始的时间序列为：')
seq_train=np.sin(np.linspace(start=0,stop=100,num=train_examples,dtype=np.float32))
print(seq_train.shape)
print(seq_train[0:50])
seq_test=np.sin(np.linspace(start=100,stop=110,num=test_examples,dtype=np.float32))
print(seq_test.shape)

# plt.plot(np.linspace(start=0,stop=100,num=11000,dtype=np.float32),seq_train)
# plt.plot(np.linspace(start=100,stop=110,num=1100,dtype=np.float32),seq_test)
# plt.show()

X_train,Y_train=generate(seq_train)
print('训练数据的形状为：')
print(X_train.shape,Y_train.shape,sep='          ')
X_test,Y_test=generate(seq_test)
print('测试数据的形状为：')
print(X_test.shape,Y_test.shape,sep='          ')

#reshape to (batch,time_steps,input_size)
print('重新组织训练数据和测试数据的形状')
X_train=np.reshape(X_train,newshape=(-1,time_steps,1))
print(X_train.shape)
X_test=np.reshape(X_test,newshape=(-1,time_steps,1))
print(X_test.shape)

print(X_train[0:3,:,:])
print('===============================================')
print(Y_train[0:3,:])

运行结果如下：

最原始的时间序列为：

(11000,)

(1100,)

训练数据的形状为：

(10990, 1, 10) (10990, 1)

测试数据的形状为：

(1090, 1, 10) (1090, 1)

重新组织训练数据和测试数据的形状

(10990, 10, 1)

(1090, 10, 1)

打印前三个样本

****此处省略***

二、搭建RNN结构

搭建RNN结构

搭建步骤

2.1

2.1.1 实现代码

#------------------------------------构建RNN网络的结构------------------------------------------#
#输入输出、以及权值矩阵和偏置项
XX=tf.placeholder(dtype=tf.float32,shape=(None,time_steps,1),name="input_placeholder")
YY=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
W=tf.Variable(tf.random_normal(shape=[cell_units,1],stddev=0.01))
b=tf.Variable(tf.random_normal(shape=[1],stddev=0.01))

#最基本的循环神经网络RNN
cell=rnn.BasicRNNCell(num_units=cell_units)

#初始状态，初始状态的尺寸为【batch_size，cell_units】
init_state=cell.zero_state(batch_size=batch_size,dtype=tf.float32)

#动态RNN的定义，这里默认的参数time_major为默认的false，故而XX的形状为【batch_size，time_steps，input_dim】
'''
   注意：这里outputs的形状为【batch_size，time_steps，cell_units】，详细解释参见文章中的解释
        states的形状是【batch_size，cell_units】
'''
outputs,states=tf.nn.dynamic_rnn(cell=cell,inputs=XX,initial_state=init_state,dtype=tf.float32)

#定义输出层，输出层的节点数是1，它的输入来自于RNN的输出，但是因为outputs的维度关系，需要将outputs转化为二维的，然后才能与权值矩阵W相乘
Y_pred=tf.matmul(outputs[:,-1,:],W)+b
#--------------------------------------------------------------------------------------------#
#---------------------------------定义损失以及优化方式----------------------------------#
loss=tf.losses.mean_squared_error(labels=YY,predictions=Y_pred) #损失函数
optimizer=tf.train.AdamOptimizer(learning_rate).minimize(loss=loss)#优化方式

本文为了简化，只有一个循环网络层，多层循环层所遵循的步骤大致一样，按照以下几步走：

第一步：确定输入X，输出Y，以及需要自己手动初始化的权值矩阵；RNN中的三个共享矩阵U、W、V不需要自己手动初始化了，它是封装了的，自动实现；

第二步：确定RNN层节点数，可以是RNN层、LSTM层、GRU层等；

第三步：初始化循环层的初始状态；

第四步：动态执行，这里有很多种不同的方式，可以自己编写循环，可以使用static_rnn函数，可以使用dynamic_rnn函数，后面会讲到。

搭建RNN结构

结构的解释

2.2

2.2.1 什么是cell

640?wx_fmt=other

简单直接地说，cell就是循环层，不仅仅是一个节点，在创建循环层的时候，需要穿入一个

num_units参数，这个参数表示的就是循环层的节点数，很多人在这里就蒙了，那是因为我们不管是在PPT、书本上、论文里、还是博客上，我们最常见的循环神经网络结构图是下面图片。

640?wx_fmt=png

于是我们认为，循环层的节点数那不是和输入的样本X是同维度的吗？输入的每一个Xt是一个数字，图中的每一个“圆圈”表示的是一个节点吗？

这样的理解大错特错，这样的图像太过抽象，图形的确没有错，只不过我们看不懂。实际上上面的图形是为了简要概括，展示的是一个“动态的运算过程”，并不是真正的网络结构，那网络结构到底是什么样子呢？

下面盗用一张图初略看一下：

640?wx_fmt=jpeg

这个图看着就舒服多了，实际上他才是真正意义上的循环层的结构，下面是我自己画的两张草图（因为自己画图很慢，时间较紧，就用草图代替了）

640?wx_fmt=png

上面也列出了一些重要的结论，文字部分就不再写在上面了。

本节总结

通过上面的分析可知，num_units就是循环层的节点数目，是可以自定义的。

2.2.2 一些重要的数据维度

通过上面的代码可知，有三个重要的参数，

initial_state

outputs

states

initial_state：即为循环层的初始化状态，它的定义有两种方式：

第一：

cell.zero_state(batch_size=batch_size,dtype=tf.float32)

第二：

initial_state=tf.zeros(shape=[batch_size,cell.state_size],name='state')

其中cell.state_size就是自定义的节点数，即num_units

从上面可以看出，initial_state的形状为

【batch_size，num_units】，可是从图中看明明循环层状态的维度是（num_units，）

那是因为在定义循环神经网络时，一定要确定每一次训练的batch_size，即一组样本batch_size的状态为【batch_size，num_units】

outputs：它的维度是【batch_size，time_steps，num_units】

同上，每一个时刻t的输出的确为（num_units）

但是有time_steps个时刻，还有batch_size个样本。

states：它的更新之后的状态，故而同initial_states一样。

2.2.3 动态训练

对于定义的cell，因为它已经实现了__call__方法，故而可以对象调用，但是，他只是调用一个时间点，从上面的手绘图可以看出，它只运行第一幅图，如果要不断迭代下去，还需要自己手动编写循环和迭代，这样很不方便，TensorFlow提供了两种方式可以完成，static_rnn和dynamic_rnn两个方法：

1、 static_rnn()

· tf.contrib.rnn.static_rnn #1.9版本为这种方式

· tf.nn.static_rnn

tf.nn.static_rnn(

参数解释Args:

· cell: An instance of RNNCell.

· inputs:必须要将inputs转化为【time_steps，batch_size,input_size】这样的形式，因为我们大多数情况下，原始给定的inputs为【batch_size,time_steps， input_size】的形式，故而需要转化维度，转化的方式为：

为什么这里需要转换呢，这与static_rnn内部的实现原理有关，因为后面的dynamic_rnn却又是不需要转化的。

联系前面的循环神经网络的展开图：

· initial_state: 维度为【batch_size，cell_units】

理解：其实对于每一个样本的训练，初始状态就是循环层节点数目的个数，比如为5个循环层节点，那么状态就是5个数，但是因为一次训练的是128组样本，所以每一组样本对应一个initial_state，故而有128组5个数，即为【128,5】

返回值：

outputs：

返回的是一个列表，列表的长度为time_steps,列表的每一个元素是一个【batch_size，cell_units】的张量，故而我们也可以看成是一个

【time_steps，batch_size,cell_units】的张量

· state ：

同初始化的state一样，依然为，【batch_size，cell_units】，而且含义也一样。

1、 dynamic_rnn

tf.nn.dynamic_rnn(

它定义在tensorflow/python/ops/rnn.py.和static_rnn函数是定义在同一个文件rnn.py这个模块中的。

@tf_export("nn.dynamic_rnn")

def dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None,

                dtype=None, parallel_iterations=None, swap_memory=False,

                time_major=False, scope=None):



@tf_export("nn.static_rnn")

def static_rnn(cell,

               inputs,

               initial_state=None,

               dtype=None,

               sequence_length=None,

               scope=None):

从上面的定义可以看出，我们都可以使用tf.nn.xxx_rnn()也可以使用

from tensorflow.python.ops import rnn

rnn.xxx_rnn()使用，二者是等价的.

参数Args:

· cell: An instance of RNNCell.

· inputs: The RNN inputs.

如果 time_major == False (default), this must be a Tensor of shape: [batch_size, max_time, ...], time_major是一个关键字参数，默认已经能指定为false了。

如果 time_major == True, this must be a Tensor of shape:[max_time, batch_size, ...],

故而我们习惯手上就直接使用[batch_size, max_time, vector_size]的形式级即可

initial_state:

· 形状为 [batch_size, cell.state_size]. 和前面的【batch_size，cell_units】是一样的，因为cell.state_size就是cell_units。

· time_major: The shape format of the inputs and outputs Tensors. If true, these Tensors must be shaped [max_time, batch_size, depth]. If false, these Tensors must be shaped [batch_size, max_time, depth]. Using time_major = True is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.（关键）

返回值Returns:

A pair (outputs, state) where:

· outputs: The RNN output Tensor.

If time_major == False (default), this will be a Tensor shaped: [batch_size, max_time, cell.output_size].注意：cell.state_size就是cell_units

If time_major == True, this will be a Tensor shaped: [max_time, batch_size, cell.output_size].注意：cell.state_size就是cell_units

· state: The final state. If cell.state_size is an int, this will be shaped [batch_size, cell.state_size]. 注意：cell.state_size就是cell_units

本节总结

为什么要使用这两个RNN 因为如果不使用static_rnn和dynamic_rnn的话，直接使用cell去掉用，则相当于只调用了一个时刻，即time_steps为1，当然也可自己编写一个简单的循环和参数迭代代码实现time_steps不等于1的代码，但是TensorFlow提供了直接的方式，更为推荐。与此同时dynamic_rnn和static_rnn的输入数据X的维度有所不同，static_rnn更加接近RNN运算过程的本质，而dynamic_rnn封装性更强，使用的更多，更简单，我肯看起来更加容易理解。当dynamic_rnn的参数time_major为True的时候，输入数据X的维度同static_rnn一样。

三、RNN的最终实现

RNN最终实现

最终实现

3.1

代码如下：

#---------------------------------定义损失以及优化方式----------------------------------#
loss=tf.losses.mean_squared_error(labels=YY,predictions=Y_pred) #损失函数
optimizer=tf.train.AdamOptimizer(learning_rate).minimize(loss=loss)#优化方式

#---------------------------------定义损失以及优化方式----------------------------------#
tf.summary.scalar('loss',loss)
tf.summary.histogram('weight',W)
merge=tf.summary.merge_all()

#--------------------------------------定义会话对象---------------------------------------#
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    tf.summary.FileWriter('sin_rnn_summary_01',graph=sess.graph)
    for epoch in range(1,epoch+1):
        results = np.zeros(shape=(test_examples, 1))
        train_losses=[]
        test_losses=[]
        for j in range(train_examples//batch_size):
            opti,summary,train_loss=sess.run(
                    fetches=(optimizer,merge,loss),
                    feed_dict={
                            XX:X_train[j*batch_size:(j+1)*batch_size],
                            YY:Y_train[j*batch_size:(j+1)*batch_size]
                        }
            )
            train_losses.append(train_loss)
        print(f"第 {epoch} 个训练周期，,训练集平均损失为:{sum(train_losses) / len(train_losses)}")

#注意：这个地方是每次训练完一个epoch之后就开始进行预测一次，方便查看模型在训练的过程中，在测试集上的拟合过程
        for j in range(test_examples//batch_size):
            result,test_loss=sess.run(fetches=(Y_pred,loss),
                     feed_dict={XX:X_test[j*batch_size:(j+1)*batch_size],
                                YY:Y_test[j*batch_size:(j+1)*batch_size]
                                  }
                                )
            results[j*batch_size:(j+1)*batch_size]=result
            test_losses.append(test_loss)
        print(f"第 {epoch} 个训练周期，,测试集平均损失为:{sum(test_losses) / len(test_losses)}")
        plt.plot(range(1000),results[:1000,0])
    plt.show()

程序训练的打印结果为：

第 1 个训练周期，,训练集平均损失为:0.39805943054311416

第 1 个训练周期，,测试集平均损失为:0.2694476251490414

第 2 个训练周期，,训练集平均损失为:0.15971132924451548

第 2 个训练周期，,测试集平均损失为:0.08498280285857618

第 3 个训练周期，,训练集平均损失为:0.05957840400583604

第 3 个训练周期，,测试集平均损失为:0.043691962491720915

第 4 个训练周期，,训练集平均损失为:0.03500954824335435

第 4 个训练周期，,测试集平均损失为:0.026649791980162263

第 5 个训练周期，,训练集平均损失为:0.01823293665862259

第 5 个训练周期，,测试集平均损失为:0.00988568615866825

第 6 个训练周期，,训练集平均损失为:0.005666293896844282