循环神经网络 递归神经网络_CNTK-递归神经网络

本文介绍了如何在CNTK中构建和理解递归神经网络(RNN),包括RNN的基本概念、用途、工作原理以及长期记忆单元(LSTM)和门控循环单元(GRU)。通过RNN,可以处理时间序列数据,预测单一输出或序列,并展示了如何在CNTK中创建和训练RNN模型。
摘要由CSDN通过智能技术生成
循环神经网络 递归神经网络

循环神经网络 递归神经网络

CNTK-递归神经网络 (CNTK - Recurrent Neural Network)

Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.

现在,让我们了解如何在CNTK中构建递归神经网络(RNN)。

介绍 (Introduction)

We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.

我们学习了如何使用神经网络对图像进行分类,这是深度学习中的标志性工作之一。 但是,神经网络擅长和研究大量的另一个领域是递归神经网络(RNN)。 在这里,我们将了解什么是RNN,以及在需要处理时间序列数据的场景中如何使用RNN。

什么是递归神经网络? (What is Recurrent Neural Network?)

Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −

递归神经网络(RNN)可以定义为能够随时间进行推理的特殊类型的NN。 RNN主要用于需要处理随时间变化的值(即时间序列数据)的场景。 为了更好地理解它,让我们对常规神经网络和递归神经网络进行一下比较-

  • As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks.

    众所周知,在常规神经网络中,我们只能提供一个输入。 这将其限制为仅导致一个预测。 举个例子,我们可以使用常规的神经网络来翻译文本。

  • On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks.

    另一方面,在递归神经网络中,我们可以提供导致单个预测的一系列样本。 换句话说,使用RNN,我们可以基于输入序列来预测输出序列。 例如,在翻译任务中已经有许多成功的RNN实验。

递归神经网络的用途 (Uses of Recurrent Neural Network)

RNNs can be used in several ways. Some of them are as follows −

RNN可以以多种方式使用。 其中一些如下-

预测单个输出 (Predicting a single output)

Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−

在深入研究步骤之前,RNN如何基于序列预测单个输出,让我们看一下基本RNN的样子-

Single Output

As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.

如上图所示,RNN包含到输入的回送连接,并且每当我们输入一个值序列时,它将作为时间步长处理序列中的每个元素。

Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.

此外,由于具有环回连接,RNN可以将生成的输出与序列中下一个元素的输入进行组合。 这样,RNN将在整个序列上建立一个可用于进行预测的内存。

In order to make prediction with RNN, we can perform the following steps−

为了使用RNN进行预测,我们可以执行以下步骤-

  • First, to create an initial hidden state, we need to feed the first element of the input sequence.

    首先,要创建初始隐藏状态,我们需要输入输入序列的第一个元素。

  • After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence.

    之后,要生成更新的隐藏状态,我们需要采用初始隐藏状态并将其与输入序列中的第二个元素组合。

  • At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence.

    最后,要生成最终的隐藏状态并预测RNN的输出,我们需要在输入序列中使用final元素。

In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.

这样,借助此环回连接,我们可以教导RNN识别随时间发生的模式。

预测序列 (Predicting a sequence)

The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −

上面讨论的RNN的基本模型也可以扩展到其他用例。 例如,我们可以使用它来基于单个输入来预测值序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-

  • First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network.

    首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要将输入样本馈入神经网络。

  • After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample.

    之后,要生成更新的隐藏状态和输出序列中的第二个元素,我们需要将初始隐藏状态与相同的样本进行组合。

  • At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time.

    最后,要再更新一次隐藏状态并预测输出序列中的最后一个元素,我们需要再一次提供样本。

预测序列 (Predicting sequences)

As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −

如我们所见,如何基于序列预测单个值以及如何基于单个值预测序列。 现在让我们看看如何预测序列的序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-

  • First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence.

    首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要获取输入序列中的第一个元素。

  • After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state.

    之后,要更新隐藏状态并预测输出序列中的第二个元素,我们需要采用初始隐藏状态。

  • At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.

    最后,要预测输出序列中的最后一个元素,我们需要获取更新的隐藏状态和输入序列中的最后一个元素。

RNN的工作 (Working of RNN)

To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.

为了了解递归神经网络(RNN)的工作,我们需要首先了解网络中递归层的工作方式。 因此,首先让我们讨论e如何通过标准循环层来预测输出。

使用标准RNN层预测输出 (Predicting output with standard RNN layer)

As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −

如前所述,RNN中的基本层与神经网络中的常规层完全不同。 在上一节中,我们还在图中演示了RNN的基本体系结构。 为了更新首次进入序列的隐藏状态,我们可以使用以下公式-

Rnn Layer

In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.

在上式中,我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。

Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −

现在,对于下一步,将当前时间步的隐藏状态用作序列中下一时间步的初始隐藏状态。 这就是为什么要更新第二步的隐藏状态,我们可以重复在第一步中执行的计算,如下所示:

First Step

Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −

接下来,我们可以按照以下顺序重复更新第三步和最后一步的隐藏状态的过程:

Last Step

And when we have processed all the above steps in the sequence, we can calculate the output as follows −

当我们按顺序处理了所有上述步骤后,我们可以计算出如下输出:

Calculate Output

For the above formula, we have used a third set of weights and the hidden state from the final time step.

对于上面的公式,我们使用了第三组权重和最后时间步骤中的隐藏状态。

高级循环单元 (Advanced Recurrent Units)

The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −

基本循环层的主要问题是消失的梯度问题,因此,它不是很擅长学习长期相关性。 用简单的话来说,基本的循环层不能很好地处理长序列。 这就是为什么其他一些更适合于较长序列的循环图层类型的原因如下-

长期记忆(LSTM) (Long-Short Term Memory (LSTM))

Long-Short Term Memory (LSTM)

Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −

Hochreiter&Schmidhuber引入了长期短期记忆(LSTM)网络。 它解决了使基本的循环层能够长时间记住事物的问题。 LSTM的体系结构如上图所示。 如我们所见,它具有输入神经元,记忆细胞和输出神经元。 为了解决梯度消失的问题,长期短期存储网络使用显式存储单元(存储先前的值)和随后的门-

  • Forget gate− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.

    忘记门 -顾名思义,它告诉存储单元忘记先前的值。 存储单元存储这些值,直到门(即“忘记门”)告诉它忘记它们为止。

  • Input gate− As name implies, it adds new stuff to the cell.

    输入门 -顾名思义,它为单元添加了新内容。

  • Output gate− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.

    输出门 -顾名思义,输出门决定何时将矢量从单元传递到下一个隐藏状态。

门控循环单元(GRU) (Gated Recurrent Units (GRUs))

Gated Recurrent Units (GRUs)

Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −

梯度递归单位 (GRU)是LSTM网络的细微变化。 它的门少了一个,并且接线方式与LSTM略有不同。 上图显示了它的体系结构。 它具有输入神经元,门控存储单元和输出神经元。 门控循环单元网络具有以下两个门-

  • Update gate− It determines the following two things−

    更新门 -它确定以下两件事-

    • What amount of the information should be kept from the last state?

      上次状态应保留多少信息?

    • What amount of the information should be let in from the previous layer?

      上一层应提供多少信息?

  • Reset gate− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently.

    重置门 - 重置门的功能与LSTMs网络的忘记门非常相似。 唯一的区别是它的位置略有不同。

In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.

与长期短期存储网络相比,门控循环单元网络稍快且易于运行。

创建RNN结构 (Creating RNN structure)

Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−

在开始对任何数据源的输出进行预测之前,我们需要首先构建RNN,并且构建RNN与在上一节中构建常规神经网络的过程完全相同。 以下是构建一个的代码-


from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.learners import adam
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10

多层放样 (Staking multiple layers)

We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−

我们还可以在CNTK中堆叠多个循环层。 例如,我们可以使用以下图层组合:


from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)
with default_options(initial_state = 0.1):
   model = Sequential([
      Fold(LSTM(15)),
      Dense(1)
   ])(features)
target = input_variable(1, dynamic_axes=model.dynamic_axes)

As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −

从上面的代码中可以看到,我们可以通过以下两种方式在CNTK中对RNN进行建模-

  • First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep.

    首先,如果只需要循环层的最终输出,则可以将折叠层与循环层结合使用,例如GRU,LSTM甚至RNNStep。

  • Second, as an alternative way, we can also use the Recurrence block.

    其次,作为一种替代方法,我们也可以使用Recurrence块。

用时间序列数据训练RNN (Training RNN with time series data)

Once we build the model, let’s see how we can train RNN in CNTK −

构建模型后,让我们看看如何在CNTK中训练RNN-


from cntk import Function
@Function
def criterion_factory(z, t):
   loss = squared_error(z, t)
   metric = squared_error(z, t)
   return loss, metric
loss = criterion_factory(model, target)
learner = adam(model.parameters, lr=0.005, momentum=0.9)

Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.

现在,要将数据加载到训练过程中,我们必须从一组CTF文件中反序列化序列。 以下代码具有create_datasource函数,该函数是用于创建训练和测试数据源的有用实用程序函数。


target_stream = StreamDef(field='target', shape=1, is_sparse=False)
features_stream = StreamDef(field='features', shape=1, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
   datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
return datasource
train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.

Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.

现在,由于我们已经设置了数据源,模型和损失函数,因此可以开始训练过程。 就像我们在上一节中使用基本神经网络所做的那样,它非常相似。


progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
   features: train_datasource.streams.features,
   target: train_datasource.streams.target
}
history = loss.train(
   train_datasource,
   epoch_size=EPOCH_SIZE,
   parameter_learners=[learner],
   model_inputs_to_streams=input_map,
   callbacks=[progress_writer, test_config],
   minibatch_size=BATCH_SIZE,
   max_epochs=EPOCHS
)

We will get the output similar as follows −

我们将获得类似以下的输出-

输出- (Output−)


average  since  average  since  examples
loss      last  metric  last
------------------------------------------------------
Learning rate per minibatch: 0.005
0.4      0.4    0.4      0.4      19
0.4      0.4    0.4      0.4      59
0.452    0.495  0.452    0.495   129
[…]

验证模型 (Validating the model)

Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.

实际上,使用RNN进行预测与使用任何其他CNK模型进行预测非常相似。 唯一的区别是,我们需要提供序列而不是单个样本。

Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −

现在,由于我们的RNN终于经过训练完成,我们可以通过使用一些样本序列测试模型来验证模型,如下所示-


import pickle
with open('test_samples.pkl', 'rb') as test_file:
test_samples = pickle.load(test_file)
model(test_samples) * NORMALIZE

输出- (Output−)


array([[ 8081.7905],
[16597.693 ],
[13335.17 ],
...,
[11275.804 ],
[15621.697 ],
[16875.555 ]], dtype=float32)

翻译自: https://www.tutorialspoint.com/microsoft_cognitive_toolkit/microsoft_cognitive_toolkit_recurrent_neural_network.htm

循环神经网络 递归神经网络

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值