快排递归非递归python_Python递归神经网络终极指南

最新推荐文章于 2021-09-05 21:09:07 发布

cumichun6193

最新推荐文章于 2021-09-05 21:09:07 发布

阅读量420

点赞数

文章标签：神经网络算法可视化大数据编程语言

原文链接：https://www.freecodecamp.org/news/the-ultimate-guide-to-recurrent-neural-networks-in-python/

版权

快排递归非递归python

Recurrent neural networks are deep learning models that are typically used to solve time series problems. They are used in self-driving cars, high-frequency trading algorithms, and other real-world applications.

循环神经网络是深度学习模型，通常用于解决时间序列问题。它们被用于自动驾驶汽车，高频交易算法和其他实际应用中。

This tutorial will teach you the fundamentals of recurrent neural networks. You'll also build your own recurrent neural network that predicts tomorrow's stock price for Facebook (FB).

本教程将教您递归神经网络的基础知识。您还将建立自己的循环神经网络，该网络可以预测Facebook(FB)明天的股价。

递归神经网络的直觉 (The Intuition of Recurrent Neural Networks)

Recurrent neural networks are an example of the broader field of neural networks. Other examples include:

递归神经网络是更广泛的字段的示例神经网络。其他示例包括：

This article will be focused on recurrent neural networks.

本文将重点介绍递归神经网络。

This tutorial will begin our discussion of recurrent neural networks by discussing the intuition behind recurrent neural networks.

本教程将通过讨论递归神经网络背后的直觉来开始我们对递归神经网络的讨论。

递归神经网络解决的问题类型 (The Types of Problems Solved By Recurrent Neural Networks)

Although we have not explicitly discussed it yet, there are generally broad swathes of problems that each type of neural network is designed to solve:

尽管我们尚未明确讨论，但每种类型的神经网络通常都可以解决很多问题：

Artificial neural networks: classification and regression problems
人工神经网络：分类和回归问题
Convolutional neural networks: computer vision problems
卷积神经网络：计算机视觉问题

In the case of recurrent neural networks, they are typically used to solve time series analysis problems.

在递归神经网络的情况下，它们通常用于解决时间序列分析问题。

Each of these three types of neural networks (artificial, convolutional, and recurrent) are used to solve supervised machine learning problems.

这三种类型的神经网络(人工，卷积和递归)均用于解决监督式机器学习问题。

将神经网络映射到人脑的一部分 (Mapping Neural Networks to Parts of the Human Brain)

As you’ll recall, neural networks were designed to mimic the human brain. This is true for both their construction (both the brain and neural networks are composed of neurons) and their function (they are both used to make decisions and predictions).

您会记得，神经网络旨在模仿人类的大脑。对于它们的构造(大脑和神经网络都由神经元组成)和它们的功能(它们都用于进行决策和预测)都是如此。

The three main parts of the brain are:

大脑的三个主要部分是：

The cerebrum
大脑
The brainstem
脑干
The cerebellum
小脑

Arguably the most important part of the brain is the cerebrum. It contains four lobes:

可以说，大脑最重要的部分是大脑。它包含四个瓣：

The frontal lobe
额叶
The parietal lobe
顶叶
The temporal lobe
颞叶
The occipital lobe
枕叶

The main innovation that neural networks contain is the idea of weights.

神经网络包含的主要创新是权重的概念。

Said differently, the most important characteristic of the brain that neural networks have mimicked is the ability to learn from other neurons.

换句话说，神经网络模仿的大脑最重要的特征是向其他神经元学习的能力。

The ability of a neural network to change its weights through each epoch of its training stage is similar to the long-term memory that is seen in humans (and other animals).

神经网络在训练阶段的每个时期改变其权重的能力类似于在人类(和其他动物)中看到的长期记忆。

The temporal lobe is the part of the brain that is associated with long-term memory. Separately, the artificial neural network was the first type of neural network that had this long-term memory property. In this sense, many researchers have compared artificial neural networks with the temporal lobe of the human brain.

颞叶是大脑中与长期记忆有关的部分。另外，人工神经网络是具有这种长期记忆特性的第一类神经网络。从这个意义上讲，许多研究人员已经将人工神经网络与人脑的颞叶进行了比较。

Similarly, the occipital lobe is the component of the brain that powers our vision. Since convolutional neural networks are typically used to solve computer vision problems, you could say that they are equivalent to the occipital lobe in the brain.

同样，枕叶是大脑的组成部分，为我们的视觉提供动力。由于卷积神经网络通常用于解决计算机视觉问题，因此可以说它们等同于大脑的枕叶。

As mentioned, recurrent neural networks are used to solve time series problems. They can learn from events that have happened in recent previous iterations of their training stage. In this way, they are often compared to the frontal lobe of the brain – which powers our short-term memory.

如前所述，递归神经网络用于解决时间序列问题。他们可以从培训阶段最近一次迭代中发生的事件中学习。通过这种方式，它们经常被与大脑的额叶相比较，这为我们的短期记忆提供了动力。

To summarize, researchers often pair each of the three neural nets with the following parts of the brain:

总而言之，研究人员经常将三个神经网络中的每一个与大脑的以下部分配对：

Artificial neural networks: the temporal lobe
人工神经网络：颞叶
Convolutional neural networks: the occipital lobe
卷积神经网络：枕叶
Recurrent neural networks: the frontal lobe
递归神经网络：额叶

递归神经网络的组成 (The Composition of a Recurrent Neural Network)

Let’s now discuss the composition of a recurrent neural network. First, recall that the composition of a basic neural network has the following appearance:

现在让我们讨论递归神经网络的组成。首先，回想一下基本神经网络的组成具有以下外观：

The first modification that needs to be made to this neural network is that each layer of the network should be squashed together, like this:

需要对该神经网络进行的第一个修改是将网络的每一层都挤压在一起，如下所示：

Then, three more modifications need to be made:

然后，需要再进行三处修改：

The neural network’s neuron synapses need to be simplified to a single line
神经网络的神经突触需要简化为单行
The entire neural network needs to be rotated 90 degrees
整个神经网络需要旋转90度
A loop needs to be generated around the hidden layer of the neural net
需要在神经网络的隐藏层周围生成一个循环

The neural network will now have the following appearance:

神经网络现在将具有以下外观：

That line that circles the hidden layer of the recurrent neural network is called the temporal loop. It is used to indicate that the hidden layer not only generates an output, but that output is fed back as the input into the same layer.

环绕循环神经网络隐藏层的那条线称为时间环。它用于指示隐藏层不仅生成输出，而且该输出作为输入反馈到同一层。

A visualization is helpful in understanding this. As you can see in the following image, the hidden layer used on a specific observation of a data set is not only used to generate an output for that observation, but it is also used to train the hidden layer of the next observation.

可视化有助于理解这一点。如下图所示，在数据集的特定观察值上使用的隐藏层不仅用于生成该观察值的输出，而且还用于训练下一个观察值的隐藏层。

This property of one observation helping to train the next observation is why recurrent neural networks are so useful in solving time series analysis problems.

一个观测值的这一特性有助于训练下一个观测值，这就是为什么递归神经网络在解决时间序列分析问题中如此有用的原因。

摘要–递归神经网络的直觉 (Summary – The Intuition of Recurrent Neural Networks)

In this tutorial, you had your first introduction to recurrent neural networks. More specifically, we discussed the intuition behind recurrent neural networks.

在本教程中，您首次介绍了递归神经网络。更具体地说，我们讨论了递归神经网络背后的直觉。

Here is a brief summary of what we discussed in this tutorial:

这是我们在本教程中讨论的内容的简短摘要：

The types of problems solved by recurrent neural networks
递归神经网络解决的问题类型
The relationships between the different parts of the brain and the different neural networks we’ve studied in this course
我们在本课程中研究的大脑不同部位与不同神经网络之间的关系
The composition of a recurrent neural network and how each hidden layer can be used to help train the hidden layer from the next observation in the data set
递归神经网络的组成以及如何使用每个隐藏层帮助从数据集中的下一次观察中训练隐藏层

递归神经网络中的消失梯度问题 (The Vanishing Gradient Problem in Recurrent Neural Networks)

The vanishing gradient problem has historically been one of the largest barriers to the success of recurrent neural networks.

历史上消失的梯度问题一直是递归神经网络成功的最大障碍之一。

Because of this, having an understanding of the vanishing gradient problem is important before you build your first RNN.

因此，在构建第一个RNN之前，了解消失的梯度问题非常重要。

This section will explain the vanishing gradient problem in plain English, including a discussion of the most useful solutions to this interesting problem.

本节将用普通英语解释消失的梯度问题，包括对该问题最有用的解决方案的讨论。

什么是消失梯度问题？ (What Is The Vanishing Gradient Problem?)

Before we dig in to the details of the vanishing gradient problem, it’s helpful to have some understanding of how the problem was initially discovered.

在深入研究消失的梯度问题的细节之前，了解一下最初发现该问题的方式将很有帮助。

The vanishing gradient problem was discovered by Sepp Hochreiter, a German computer scientist who has had an influential role in the development of recurrent neural networks in deep learning.

消失的梯度问题是由德国计算机科学家Sepp Hochreiter发现的，他在深度学习中的递归神经网络的发展中发挥了重要作用。

Now let’s explore the vanishing gradient problem in detail. As its name implies, the vanishing gradient problem is related to deep learning gradient descent algorithms. Recall that a gradient descent algorithm looks something like this:

现在，让我们详细研究消失的梯度问题。顾名思义，消失的梯度问题与深度学习梯度下降算法有关。回想一下，梯度下降算法看起来像这样：

This gradient descent algorithm is then combined with a backpropagation algorithm to update the synapse weights throughout the neural network.

然后将此梯度下降算法与反向传播算法结合起来，以更新整个神经网络中的突触权重。

Recurrent neural networks behave slightly differently because the hidden layer of one observation is used to train the hidden layer of the next observation.

递归神经网络的行为略有不同，因为一个观察值的隐藏层用于训练下一个观察值的隐藏层。

This means that the cost function of the neural net is calculated for each observation in the data set. These cost function values are depicted at the top of the following image:

这意味着将为数据集中的每个观测值计算神经网络的成本函数。这些成本函数值显示在下图的顶部：

The vanishing gradient problem occurs when the backpropagation algorithm moves back through all of the neurons of the neural net to update their weights. The nature of recurrent neural networks means that the cost function computed at a deep layer of the neural net will be used to change the weights of neurons at shallower layers.

当反向传播算法通过神经网络的所有神经元向后移动以更新其权重时，就会出现消失的梯度问题。循环神经网络的性质意味着在神经网络深层计算的成本函数将用于更改较浅层神经元的权重。

The mathematics that computes this change is multiplicative, which means that the gradient calculated in a step that is deep in the neural network will be multiplied back through the weights earlier in the network. Said differently, the gradient calculated deep in the network is “diluted” as it moves back through the net, which can cause the gradient to vanish – giving the name to the vanishing gradient problem!

计算此变化的数学是可乘的，这意味着在神经网络深处的步骤中计算出的梯度将通过网络中较早的权重相乘。换句话说，在网络深处计算出的梯度在通过网络返回时会被“稀释”，这可能会导致梯度消失-为消失的梯度问题起了个名字！

The actual factor that is multiplied through a recurrent neural network in the backpropagation algorithm is referred to by the mathematical variable Wrec. It poses two problems:

在反向传播算法中，通过递归神经网络相乘的实际因子由数学变量Wrec 。它带来两个问题：

When Wrec is small, you experience a vanishing gradient problem
当Wrec很小时，您会遇到梯度消失的问题
When Wrec is large, you experience an exploding gradient problem
当Wrec大时，您会遇到爆炸性梯度问题

Note that both of these problems are generally referred to by the simpler name of the “vanishing gradient problem”.

注意，这两个问题通常都用“消失梯度问题”的简单名称来指代。

To summarize, the vanishing gradient problem is caused by the multiplicative nature of the backpropagation algorithm. It means that gradients calculated at a deep stage of the recurrent neural network either have too small of an impact (in a vanishing gradient problem) or too large of an impact (in an exploding gradient problem) on the weights of neurons that are shallower in the neural net.

总而言之，消失的梯度问题是由反向传播算法的乘法性质引起的。这意味着在递归神经网络的较深阶段计算出的梯度对神经元权重的影响太小(在消失的梯度问题中)或影响太大(在爆炸性梯度问题中)。神经网络。

如何解决消失的梯度问题 (How to Solve The Vanishing Gradient Problem)

There are a number of strategies that can be used to solve the vanishing gradient problem. We will explore strategies for both the vanishing gradient and exploding gradient problems separately. Let’s start with the latter.

有许多策略可用于解决消失的梯度问题。我们将分别探讨消失梯度和爆炸梯度问题的策略。让我们从后者开始。

解决爆炸梯度问题 (Solving the Exploding Gradient Problem)

For exploding gradients, it is possible to use a modified version of the backpropagation algorithm called truncated backpropagation. The truncated backpropagation algorithm limits that number of timesteps that the backproporation will be performed on, stopping the algorithm before the exploding gradient problem occurs.

为了使梯度爆炸，可以使用称为truncated backpropagation的反向传播算法的修改版本。截断的反向传播算法限制了将执行反向传播的时间步数，从而在发生爆炸梯度问题之前将算法停止。

You can also introduce penalties, which are hard-coded techniques for reduces a backpropagation’s impact as it moves through shallower layers in a neural network.

您还可以引入penalties ，这些penalties是硬编码的技术，可以减少反向传播在神经网络中穿过较浅层时的影响。

Lastly, you could introduce gradient clipping, which introduces an artificial ceiling that limits how large the gradient can become in a backpropagation algorithm.

最后，您可以引入gradient clipping ，它引入了一个人工上限，该上限限制了反向传播算法中渐变的大小。

解决消失梯度问题 (Solving the Vanishing Gradient Problem)

Weight initialization is one technique that can be used to solve the vanishing gradient problem. It involves artificially creating an initial value for weights in a neural network to prevent the backpropagation algorithm from assigning weights that are unrealistically small.

权重初始化是可以用来解决梯度消失问题的一种技术。它涉及在神经网络中人为地创建权重的初始值，以防止反向传播算法分配不切实际的小权重。

You could also use echo state networks, which is a specific type of neural network designed to avoid the vanishing gradient problem. Echo state networks are outside the scope of this course. Having knowledge of their existence is sufficient for now.

您还可以使用回波状态网络，这是一种特殊的神经网络，旨在避免梯度消失的问题。回声状态网络不在本课程范围之内。到目前为止，了解它们的存在就足够了。

The most important solution to the vanishing gradient problem is a specific type of neural network called Long Short-Term Memory Networks (LSTMs), which were pioneered by Sepp Hochreiter and Jürgen Schmidhuber. Recall that Mr. Hochreiter was the scientist who originally discovered the vanishing gradient problem.

消失梯度问题最重要的解决方案是一种特定类型的神经网络，称为长短期记忆网络(LSTM)，由Sepp Hochreiter和JürgenSchmidhuber率先提出。回想一下，Hochreiter先生是最初发现消失的梯度问题的科学家。

LSTMs are used in problems primarily related to speech recognition, with one of the most notable examples being Google using an LSTM for speech recognition in 2015 and experiencing a 49% decrease in transcription errors.

LSTM用于主要与语音识别相关的问题，最著名的例子之一是Google在2015年使用LSTM进行语音识别，转录错误减少了49％。

LSTMs are considered to be the go-to neural net for scientists interested in implementing recurrent neural networks. We will be largely focusing on LSTMs through the remainder of this course.

对于对实施递归神经网络感兴趣的科学家，LSTM被认为是首选的神经网络。在本课程的其余部分中，我们将主要关注LSTM。

摘要–消失的梯度问题 (Summary – The Vanishing Gradient Problem)

In this section, you learned about the vanishing gradient problem of recurrent neural networks.

在本节中，您了解了递归神经网络的消失梯度问题。

Here is a brief summary of what we discussed:

以下是我们讨论内容的简要摘要：

That Sepp Hochreiter was the first scientist to discover the vanishing gradient problem in recurrent neural networks
Sepp Hochreiter是第一位在递归神经网络中发现消失梯度问题的科学家
What the vanishing gradient problem (and its cousin, the exploding gradient problem) involves
消失梯度问题(及其表亲，爆炸梯度问题)涉及什么
The role of Wrec in vanishing gradient problems and exploding gradient problems
Wrec在消失梯度问题和爆炸梯度问题中的作用
How vanishing gradient problems and exploding gradient problems are solved
如何解决消失梯度问题和爆炸梯度问题
The role of LSTMs as the most common solution to the vanishing gradient problem
LSTM作为消失梯度问题的最常见解决方案的作用

长短期记忆网络(LSTM) (Long Short-Term Memory Networks (LSTMs))

Long short-term memory networks (LSTMs) are a type of recurrent neural network used to solve the vanishing gradient problem.

长短期记忆网络(LSTM)是一种递归神经网络，用于解决消失的梯度问题。

They differ from “regular” recurrent neural networks in important ways.

它们在重要方面不同于“常规”递归神经网络。

This tutorial will introduce you to LSTMs. Later in this course, we will build and train an LSTM from scratch.

本教程将向您介绍LSTM。在本课程的稍后部分，我们将从头开始构建和训练LSTM。

目录 (Table of Contents)

You can skip to a specific section of this LSTM tutorial using the table of contents below:

您可以使用以下目录跳到本LSTM教程的特定部分：

LSTM的历史 (The History of LSTMs)

As we alluded to in the last section, the two most important figures in the field of LSTMs are Sepp Hochreiter and Jürgen Schmidhuber.

正如我们在上一节中提到的，LSTM领域中最重要的两个人物是Sepp Hochreiter和JürgenSchmidhuber 。

The latter was the former’s PhD supervisor at the Technical University of Munich in Germany.

后者是前任德国慕尼黑工业大学的博士生导师。

Hochreiter’s PhD thesis introduced LSTMs to the world for the first time.

Hochreiter的博士学位论文首次将LSTM引入了世界。

LSTM如何解决消失的梯度问题 (How LSTMs Solve The Vanishing Gradient Problem)

In the last tutorial, we learned how the Wrec term in the backpropagation algorithm can lead to either a vanishing gradient problem or an exploding gradient problem.

在上一教程中，我们了解了反向传播算法中的Wrec术语如何导致消失的梯度问题或爆炸的梯度问题。

We explored various possible solutions for this problem, including penalties, gradient clipping, and even echo state networks. LSTMs are the best solution.

我们针对该问题探索了各种可能的解决方案，包括罚款，梯度削波甚至回波状态网络。 LSTM是最好的解决方案。

So how do LSTMs work? They simply change the value of Wrec.

那么LSTM如何工作？他们只是改变Wrec的值。

In our explanation of the vanishing gradient problem, you learned that:

在我们对消失的梯度问题的解释中，您了解到：

When Wrec is small, you experience a vanishing gradient problem
当Wrec很小时，您会遇到梯度消失的问题
When Wrec is large, you experience an exploding gradient problem
当Wrec大时，您会遇到爆炸性梯度问题

We can actually be much more specific:

我们实际上可以更加具体：

When Wrec < 1, you experience a vanishing gradient problem
当Wrec < 1 ，您会遇到消失的梯度问题
When Wrec > 1, you experience an exploding gradient problem
当Wrec > 1 ，您会遇到爆炸梯度问题

This makes sense if you think about the multiplicative nature of the backpropagation algorithm.

如果考虑反向传播算法的乘法性质，这是有道理的。

If you have a number that is smaller than 1 and you multiply it against itself over and over again, you’ll end up with a number that vanishes. Similarly, multiplying a number greater than 1 against itself many times results in a very large number.

如果您有一个小于1的数字，并一遍又一遍地将其与自己相乘，那么最终将得到一个消失的数字。类似地，将一个大于1的数字与其自身相乘多次会导致很大的数字。

To solve this problem, LSTMs set Wrec = 1. There is certainly more to LSTMS than setting Wrec = 1, but this is definitely the most important change that this specification of recurrent neural networks makes.

为了解决此问题，LSTM将Wrec = 1设置Wrec = 1 。 LSTMS肯定比设置Wrec = 1还重要，但这绝对是此递归神经网络规范所做的最重要的更改。

LSTM如何工作 (How LSTMs Work)

This section will explain how LSTMs work. Before proceeding ,it’s worth mentioning that I will be using images from Christopher Olah’s blog post Understanding LSTMs, which was published in August 2015 and has some of the best LSTM visualizations that I have ever seen.

本节将说明LSTM如何工作。在继续之前，值得一提的是，我将使用Christopher Olah的博客文章《了解LSTM》中的图像，该文章发表于2015年8月，具有我见过的一些最佳的LSTM可视化效果。

To start, let’s consider the basic version of a recurrent neural network:

首先，让我们考虑循环神经网络的基本版本：

This neural network has neurons and synapses that transmit the weighted sums of the outputs from one layer as the inputs of the next layer. A backpropagation algorithm will move backwards through this algorithm and update the weights of each neuron in response to he cost function computed at each epoch of its training stage.

该神经网络具有神经元和突触，这些神经元和突触将一层的输出的加权总和作为下一层的输入进行传输。反向传播算法将在该算法中向后移动，并响应在其训练阶段的每个时期计算的代价函数来更新每个神经元的权重。

By contrast, here is what an LSTM looks like:

相比之下，这是LSTM的样子：

As you can see, an LSTM has far more embedded complexity than a regular recurrent neural network. My goal is to allow you to fully understand this image by the time you’ve finished this tutorial.

如您所见，LSTM具有比常规递归神经网络更高的嵌入复杂度。我的目标是让您在完成本教程时充分了解该图像。

First, let’s get comfortable with the notation used in the image above:

首先，让我们熟悉上图中使用的表示法：

Now that you have a sense of the notation we’ll be using in this LSTM tutorial, we can start examining the functionality of a layer within an LSTM neural net. Each layer has the following appearance:

既然您已经了解了将在本LSTM教程中使用的符号，那么我们就可以开始检查LSTM神经网络中某个层的功能。每层都有以下外观：

Before we dig into the functionality of nodes within an LSTM neural network, it’s worth noting that every input and output of these deep learning models is a vector. In Python, this is generally represented by a NumPy array or another one-dimensional data structure.

在深入研究LSTM神经网络中节点的功能之前，值得注意的是，这些深度学习模型的每个输入和输出都是一个向量。在Python中，通常用NumPy数组或其他一维数据结构表示。

The first thing that happens within an LSTM is the activation function of the forget gate layer. It looks at the inputs of the layer (labelled xt for the observation and ht for the output of the previous layer of the neural network) and outputs either 1 or 0 for every number in the cell state from the previous layer (labelled Ct-1).

在LSTM中发生的第一件事是forget gate layer的激活功能。它查看层的输入(标记为xt的观察值和ht标记为神经网络的前一层的输出)，并为前一层(标记为Ct-1的单元格状态中的每个数字输出1或0 。 )。

Here’s a visualization of the activation of the forget gate layer:

这是forget gate layer激活的可视化：

We have not discussed cell state yet, so let’s do that now. Cell state is represented in our diagram by the long horizontal line that runs through the top of the diagram. As an example, here is the cell state in our visualizations:

我们还没有讨论单元状态，所以让我们开始吧。单元状态在我们的图中由贯穿图顶部的水平长线表示。例如，这是我们可视化中的单元状态：

The cell state’s purpose is to decide what information to carry forward from the different observations that a recurrent neural network is trained on. The decision of whether or not to carry information forward is made by gates - of which the forget gate is a prime example. Each gate within an LSTM will have the following appearance:

细胞状态的目的是决定从训练循环神经网络的不同观察中继承哪些信息。是否传递信息是由gates - forget gate就是一个很好的例子。 LSTM中的每个门将具有以下外观：

The σ character within these gates refers to the Sigmoid function, which you have probably seen used in logistic regression machine learning models. The sigmoid function is used as a type of activation function in LSTMs that determines what information is passed through a gate to affect the network’s cell state.

这些门中的σ字符表示Sigmoid函数，您可能已经在logistic回归机器学习模型中看到过该函数。乙状结肠功能用作LSTM中的一种激活功能，该功能确定哪些信息通过门传递以影响网络的小区状态。

By definition, the Sigmoid function can only output numbers between 0 and 1. It’s often used to calculate probabilities because of this. In the case of LSTM models, it specifies what proportion of each output should be allowed to influence the sell state.

根据定义，Sigmoid函数只能输出0到1之间的数字。因此，它通常用于计算概率。对于LSTM模型，它指定应允许每个输出的比例影响销售状态。

The next two steps of an LSTM model are closely related: the input gate layer and the tanh layer. These layers work together to determine how to update the cell state. At the same time, the last step is completed, which allows the cell to determine what to forget about the last observation in the data set.

LSTM模型的接下来的两个步骤密切相关： input gate layer和tanh layer 。这些层共同确定如何更新单元状态。同时，完成了最后一步，这使单元可以确定要忘记数据集中最后一次观察的内容。

Here is a visualization of this process:

这是此过程的可视化：

The last step of an LSTM determines the output for this observation (denoted ht). This step runs through both a sigmoid function and a hyperbolic tangent function. It can be visualized as follows:

LSTM的最后一步确定此观察的输出(表示为ht )。此步骤同时通过S型函数和双曲正切函数。可以如下所示：

That concludes the process of training a single layer of an LSTM model. As you might imagine, there is plenty of mathematics under the surface that we have glossed over. The point of this section is to broadly explain how LSTMs work, not for you to deeply understand each operation in the process.

到此结束了训练LSTM模型的单个层的过程。如您所想，我们掩盖了很多数学知识。本节的重点是广泛地解释LSTM的工作原理，而不是让您深入了解过程中的每个操作。

LSTM体系结构的变体 (Variations of LSTM Architectures)

I wanted to conclude this tutorial by discussing a few different variations of LSTM architecture that are slightly different from the basic LSTM that we’ve discussed so far.

我想通过讨论LSTM体系结构的一些不同变化来结束本教程，这些变化与到目前为止我们讨论的基本LSTM略有不同。

As a quick recap, here is what a generalized node of an LSTM looks like:

快速回顾一下，这是LSTM的广义节点的外观：

窥Kong变化 (The Peephole Variation)

Perhaps the most important variation of the LSTM architecture is the peephole variant, which allows the gate layers to read data from the cell state.

LSTM体系结构最重要的变体也许是peephole变体，它允许栅极层从单元状态读取数据。

Here is a visualization of what the peephole variant might look like:

这是窥视Kong变体的外观的可视化效果：

Note that while this diagram adds a peephole to every gate in the recurrent neural network, you could also add peepholes to some gates and not other gates.

请注意，虽然此图为循环神经网络中的每个门都添加了一个窥Kong，但您也可以为某些门而不是其他门添加窥Kong。

耦合门变化 (The Coupled Gate Variation)

There is another variation of the LSTM architecture where the model makes the decision of what to forget and what to add new information to together. In the original LSTM model, these decisions were made separately.

LSTM体系结构的另一种变化是，该模型决定要忘记什么以及将哪些新信息添加到一起。在原始LSTM模型中，这些决定是分别做出的。

Here is a visualization of what this architecture looks like:

这是此体系结构的可视化效果：

其他LSTM变体 (Other LSTM Variations)

These are only two examples of variants to the LSTM architecture. There are many more. A few are listed below:

这些只是LSTM体系结构变体的两个示例。还有更多。下面列出了一些：

摘要-长期短期记忆网络 (Summary - Long Short-Term Memory Networks)

In this tutorial, you had your first exposure to long short-term memory networks (LSTMs).

在本教程中，您第一次接触长短期内存网络(LSTM)。

Here is a brief summary of what you learned:

这是您学到的简短摘要：

A (very) brief history of LSTMs and the role that Sepp Hochreiter and Jürgen Schmidhuber played in their development
LSTM的(非常)简短的历史以及Sepp Hochreiter和JürgenSchmidhuber在其开发中所扮演的角色
How LSTMs solve the vanishing gradient problem
LSTM如何解决消失的梯度问题
How LSTMs work
LSTM如何工作
The role of gates, sigmoid functions, and the hyperbolic tangent function in LSTMs
LSTM中的门，S形函数和双曲正切函数的作用
A few of the most popular variations of the LSTM architecture
LSTM架构的一些最受欢迎的变体

如何建立和训练递归神经网络 (How To Build And Train A Recurrent Neural Network)

So far in our discussion of recurrent neural networks, you have learned:

到目前为止，在我们对递归神经网络的讨论中，您已经了解到：

The basic intuition behind recurrent neural networks
递归神经网络背后的基本直觉
The vanishing gradient problem that historically impeded the progress of recurrent neural networks
历史上阻碍递归神经网络发展的消失梯度问题
How long short-term memory networks (LSTMs) help to solve the vanishing gradient problem
短期记忆网络(LSTM)可以帮助解决消失的梯度问题多长时间

It’s now time to build your first recurrent neural network! More specifically, this tutorial will teach you how to build and train an LSTM to predict the stock price of Facebook (FB).

现在是时候建立您的第一个循环神经网络了！更具体地说，本教程将教您如何构建和训练LSTM以预测Facebook(FB)的股价。

目录 (Table of Contents)

You can skip to a specific section of this Python recurrent neural network tutorial using the table of contents below:

您可以使用以下目录跳到该Python循环神经网络教程的特定部分：

下载本教程的数据集 (Downloading the Data Set For This Tutorial)

To proceed through this tutorial, you will need two download two data sets:

要继续学习本教程，您将需要两个下载两个数据集：

A set of training data that contains information on Facebook’s stock price from teh start of 2015 to the end of 2019
一组训练数据，其中包含有关2015年年初至2019年末Facebook股价的信息
A set of test data that contains information on Facebook’s stock price during the first month of 2020
一组测试数据，包含有关2020年第一个月期间Facebook股票价格的信息

Our recurrent neural network will be trained on the 2015-2019 data and will be used to predict the data from January 2020.

我们的递归神经网络将接受2015-2019年数据的训练，并将用于预测2020年1月以来的数据。

You can download the training data and test data using the links below:

您可以使用以下链接下载培训数据和测试数据：

Each of these data sets are simply exports from Yahoo! Finance. They look like this (when opened in Microsoft Excel):

这些数据集中的每一个都是Yahoo! 金融。它们如下所示(在Microsoft Excel中打开时)：

Once the files are downloaded, move them to the directory you’d like to work in and open a Jupyter Notebook.

下载文件后，将其移动到您要使用的目录中，然后打开Jupyter Notebook 。

导入本教程所需的库 (Importing The Libraries You’ll Need For This Tutorial)

This tutorial will depend on a number of open-source Python libraries, including NumPy, pandas, and matplotlib.

本教程将依赖于许多开源Python库，包括NumPy ， pandas和matplotlib 。

Let’s start our Python script by importing some of these libraries:

让我们通过导入以下库来启动Python脚本：

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

将我们的训练集导入Python脚本 (Importing Our Training Set Into The Python Script)

The next task that needs to be completed is to import our data set into the Python script.

下一个需要完成的任务是将我们的数据集导入Python脚本中。

We will initially import the data set as a pandas DataFrame using the read_csv method. However, since the keras module of TensorFlow only accepts NumPy arrays as parameters, the data structure will need to be transformed post-import.

最初，我们将使用read_csv方法将数据集作为pandas DataFrame导入。但是，由于TensorFlow的keras模块仅接受NumPy数组作为参数，因此导入后需要转换数据结构。

Let’s start by importing the entire .csv file as a DataFrame:

让我们开始将整个.csv文件作为DataFrame导入：

training_data = pd.read_csv('FB_training_data.csv')

You will notice in looking at the DataFrame that it contains numerous different ways of measuring Facebook’s stock price, including opening price, closing price, high and low prices, and volume information:

在查看DataFrame时，您会注意到它包含许多不同的衡量Facebook股票价格的方式，包括开盘价，收盘价，最高价和最低价以及数量信息：

We will need to select a specific type of stock price before proceeding. Let’s use Close, which indicates the unadjusted closing price of Facebook’s stock.

在继续操作之前，我们需要选择一种特定的股票价格。让我们使用Close ，它表示Facebook股票的未调整收盘价。

Now we need to select that column of the DataFrame and store it in a NumPy array. Here is the command to do this:

现在，我们需要选择DataFrame的该列并将其存储在NumPy数组中。这是执行此操作的命令：

training_data = training_data.iloc[:, 1].values

Note that this command overwrites the existing training_data variable that we had created previously.

请注意，此命令将覆盖我们先前创建的现有training_data变量。

You can now verify that our training_data variable is indeed a NumPy array by running type(training_data), which should return:

现在，您可以通过运行type(training_data)来验证我们的training_data变量确实是NumPy数组，该数组应返回：

numpy.ndarray

将特征缩放应用于我们的数据集 (Applying Feature Scaling To Our Data Set)

Let’s now take some time to apply some feature scaling to our data set.

现在，花一些时间对我们的数据集应用一些特征缩放。

As a quick refresher, there are two main ways that you can apply featuer scaling to your data set:

作为快速入门，可以通过两种主要方法将特征缩放应用于数据集：

Standardization
标准化
Normalization
正常化

We’ll be using normalization to build our recurrent neural network, which involves subtracting the minimum value of the data set and then dividing by the range of the data set.

我们将使用归一化来构建递归神经网络，其中涉及减去数据集的最小值，然后除以数据集的范围。

Here is the normalization function defined mathematically:

这是数学定义的归一化函数：

Fortunately, scikit-learn makes it very easy to apply normalization to a dataset using its MinMaxScaler class.

幸运的是，使用scikit-learn可以非常容易地使用其MinMaxScaler类将规范化应用于数据集。

Let’s start by importing this class into our Python script. The MinMaxScaler class lives within the preprocessing module of scikit-learn, so the command to import the class is:

首先，将此类导入到我们的Python脚本中。 MinMaxScaler类位于scikit-learn的preprocessing模块中，因此导入该类的命令为：

from sklearn.preprocessing import MinMaxScaler

Next we need to create an instance of this class. We will assign the newly-created object to a variable called scaler. We will be using the default parameters for this class, so we do not need to pass anything in:

接下来，我们需要创建此类的实例。我们将新创建的对象分配给一个名为scaler的变量。我们将使用此类的默认参数，因此我们无需在以下内容中传递任何内容：

scaler = MinMaxScaler()

Since we haven’t specified any non-default parameters, this will scale our data set so that every observation is between 0 and 1.

由于我们未指定任何非默认参数，因此将缩放数据集，以使每个观察值都在0到1之间。

We have created our scaler object but our training_data data set has not yet been scaled. We need to use the fit_transform method to modify the original data set. Here’s the statement to do this:

我们已经创建了scaler对象，但是我们的training_data数据集尚未缩放。我们需要使用fit_transform方法来修改原始数据集。这是执行此操作的语句：

training_data = scaler.fit_transform(training_data.reshape(-1, 1))

为我们的递归神经网络指定时间步数 (Specifying The Number Of Timesteps For Our Recurrent Neural Network)

The next thing we need to do is to specify our number of timesteps. Timesteps specify how many previous observations should be considered when the recurrent neural network makes a prediction about the current observation.

接下来，我们需要指定timesteps 。时间步长指定当递归神经网络对当前观测值做出预测时应考虑多少先前观测值。

We will use 40 timesteps in this tutorial. This means that for every day that the neural network predicts, it will consider the previous 40 days of stock prices to determine its output. Note that since there are only ~20 trading days in a given month, using 40 timesteps means we’re relying on stock price data from the previous 2 months.

在本教程中，我们将使用40时间步。这意味着神经网络预测的每一天，都会考虑股票价格的前40天来确定其产量。请注意，由于给定月份只有约20个交易日，因此使用40个时间步意味着我们将依赖前两个月的股价数据。

So how do we actually specify the number of timesteps within our Python script?

那么，我们如何在Python脚本中实际指定时间步数？

It’s done through creating two special data structures:

通过创建两个特殊的数据结构来完成：

One data structure that we’ll call x_training_data that contains the last 40 stock price observations in the data set. This is the data that the recurrent neural network will use to make predictions.
我们将其称为x_training_data数据结构包含数据集中的最后40个股票价格观察值。这是递归神经网络将用于进行预测的数据。
One data structure that we’ll call y_training_data that contains the stock price for the next trading day. This is the data point that the recurrent neural network is trying to predict.
我们将其称为y_training_data一种数据结构包含下一个交易日的股价。这是递归神经网络试图预测的数据点。

To start, let’s initialize each of these data structures as an empty Python list:

首先，让我们将每个数据结构初始化为一个空的Python列表：

x_training_data = []

y_training_data =[]

Now we will use a for loop to populate the actual data into each of these Python lists. Here’s the code (with further explanation of the code after the code block):

现在，我们将使用for循环将实际数据填充到每个这些Python列表中。这是代码(在代码块后面有代码的进一步说明)：

for i in range(40, len(training_data)):

    x_training_data.append(training_data[i-40:i, 0])

    y_training_data.append(training_data[i, 0])

Let’s unpack the components of this code block:

让我们解压缩此代码块的组件：

The range(40, len(training_data)) function causes the for loop to iterate from 40 to the last index of the training data.
range(40, len(training_data))函数使for循环从40迭代到训练数据的最后一个索引。
The x_training_data.append(training_data[i-40:i, 0]) line causes the loop to append the 40 preceding stock prices to x_training_data with each iteration of the loop.
x_training_data.append(training_data[i-40:i, 0]) x_training_data循环在每次循环时将40个先前的股票价格附加到x_training_data 。
Similarly, the y_training_data.append(training_data[i, 0]) causes the loop to append the next day’s stock price to y_training_data with each iteration of the loop.
同样， y_training_data.append(training_data[i, 0])使循环在每次循环时将第二天的股票价格附加到y_training_data 。

通过将它们转换为NumPy数组来最终确定我们的数据集 (Finalizing Our Data Sets By Transforming Them Into NumPy Arrays)

TensorFlow is designed to work primarily with NumPy arrays. Because of this, the last thing we need to do is transform the two Python lists we just created into NumPy arrays.

TensorFlow设计为主要用于NumPy数组。因此，我们要做的最后一件事是将刚刚创建的两个Python列表转换为NumPy数组。

Fortunately, this is simple. You simply need to wrap the Python lists in the np.array function. Here’s the code:

幸运的是，这很简单。您只需要将Python列表包装在np.array函数中。这是代码：

x_training_data = np.array(x_training_data)

y_training_data = np.array(y_training_data)

One important way that you can make sure your script is running as intended is to verify the shape of both NumPy arrays.

确保脚本按预期运行的一种重要方法是验证两个NumPy数组的形状。

The x_training_data array should be a two-directional NumPy array with one dimension being 40 (the number of timesteps) and the second dimension being len(training_data) - 40, which evaluates to 1218 in our case.

x_training_data数组应为双向NumPy数组，其一维为40 (时间步数)，第二维为len(training_data) - 40 ，在我们的示例中为1218 。

Similarly, the y_training_data object should be a one-dimensional NumPy array of length 1218 (which, again, is len(training_data) - 40).

类似地， y_training_data对象应该是长度为1218的一维NumPy数组(再次是len(training_data) - 40 )。

You can verify the shape of the arrays by printing their shape attribute, like this:

您可以通过打印数组的shape属性来验证它们的shape ，如下所示：

print(x_training_data.shape)

print(y_training_data.shape)

This prints:

打印：

(1218, 40)

(1218,)

Both arrays have the dimensions you’d expect. However, we need to reshape our x_training_data object one more time before proceeding to build our recurrent neural network.

这两个数组都具有您期望的尺寸。但是，在继续构建递归神经网络之前，我们需要重塑x_training_data对象一次。

The reason for this is that the recurrent neural network layer available in TensorFlow only accepts data in a very specific format. You can read the TensorFlow documentation on this topic here.

原因是TensorFlow中可用的循环神经网络层仅接受非常特定格式的数据。您可以在此处阅读有关此主题的TensorFlow文档。

To reshape the x_training_data object, I will use the np.reshape method. Here’s the code to do this:

要重塑x_training_data对象的x_training_data ，我将使用np.reshape方法。这是执行此操作的代码：

x_training_data = np.reshape(x_training_data, (x_training_data.shape[0], 

                                               x_training_data.shape[1], 

                                               1))

Now let’s print the shape of x_training_data once again:

现在，让我们再次打印x_training_data的形状：

print(x_training_data.shape)

This outputs:

输出：

(1218, 40, 1)

Our arrays have the desired shape, so we can proceed to building our recurrent neural network.

我们的阵列具有所需的形状，因此我们可以继续构建递归神经网络。

导入我们的TensorFlow库 (Importing Our TensorFlow Libraries)

Before we can begin building our recurrent neural network, we’ll need to import a number of classes from TensorFlow. Here are the statements you should run before proceeding:

在开始构建递归神经网络之前，我们需要从TensorFlow导入许多类。以下是在继续操作之前应运行的语句：

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import LSTM

from tensorflow.keras.layers import Dropout

建立我们的递归神经网络 (Building Our Recurrent Neural Network)

It’s now time to build our recurrent neural network.

现在是时候建立我们的递归神经网络了。

The first thing that needs to be done is initializing an object from TensorFlow’s Sequential class. As its name implies, the Sequential class is designed to build neural networks by adding sequences of layers over time.

首先需要做的是从TensorFlow的Sequential类初始化一个对象。顾名思义， Sequential类旨在通过随时间添加层序列来构建神经网络。

Here’s the code to initialize our recurrent neural network:

这是初始化循环神经网络的代码：

rnn = Sequential()

As with our artificial neural networks and convolutional neural networks, we can add more layers to this recurrent neural network using the add method.

与我们的人工神经网络和卷积神经网络一样，我们可以使用add方法向该递归神经网络添加更多层。

添加我们的第一个LSTM层 (Adding Our First LSTM Layer)

The first layer that we will add is an LSTM layer. To do this, pass an invocation of the LSTM class (that we just imported) into the add method.

我们将添加的第一层是LSTM层。为此，请将LSTM类(我们刚刚导入的)的调用传递到add方法中。

The LSTM class accepts several parameters. More precisely, we will specify three arguments:

LSTM类接受几个参数。更准确地说，我们将指定三个参数：

The number of LSTM neurons that you’d like to include in this layer. Increasing the number of neurons is one method for increasing the dimensionality of your recurrent neural network. In our case, we will specify units = 45.
您要在此层中包括的LSTM神经元的数量。增加神经元数量是增加循环神经网络维数的一种方法。在我们的例子中，我们将指定units = 45 。
return_sequences = True - this must always be specified if you plan on including another LSTM layer after the one you’re adding. You should specify return_sequences = False for the last LSTM layer in your recurrent neural network.
return_sequences = True如果您打算在要添加的LSTM层之后再包括另一个LSTM层，则必须始终指定此值。您应该为循环神经网络中的最后一个LSTM层指定return_sequences = False 。
input_shape: the number of timesteps and the number of predictors in our training data. In our case, we are using 40 timesteps and only 1 predictor (stock price), so we will add
input_shape ：训练数据中的时间步长和预测变量的数量。在我们的例子中，我们使用40时间步长，并且仅使用1预测变量(股价)，因此我们将添加

Here is the full add method:

这是完整的add方法：

rnn.add(LSTM(units = 45, return_sequences = True, input_shape = (x_training_data.shape[1], 1)))

Note that I used x_training_data.shape[1] instead of the hardcoded value in case we decide to train the recurrent neural network on a larger model at a later date.

请注意，我使用x_training_data.shape[1]而不是硬编码的值，以防万一我们决定稍后在较大的模型上训练循环神经网络。

添加一些辍学正则化 (Adding Some Dropout Regularization)

Dropout regularization is a technique used to avoid overfitting when training neural networks.

辍学正则化是一种用于在训练神经网络时避免过度拟合的技术。

It involves randomly excluding - or “dropping out” - certain layer outputs during the training stage.

它涉及在训练阶段随机排除或“退出”某些层输出。

TensorFlow makes it easy to implement dropout regularization using the Dropout class that we imported earlier in our Python script. The Dropout class accepts a single parameter: the dropout rate.

TensorFlow使用我们之前在Python脚本中导入的Dropout类使实现Dropout正则化变得容易。 Dropout类接受一个参数：辍学率。

The dropout rate indicates how many neurons should be dropped in a specific layer of the neural network. It is common to use a dropout rate of 20%. We will follow this convention in our recurrent neural network.

丢失率表示应在神经网络的特定层中丢弃多少个神经元。通常使用20％的辍学率。我们将在循环神经网络中遵循这一约定。

Here’s how you can instruct TensorFlow to drop 20% of the LSTM layer’s neuron during each iteration of the training stage:

在训练阶段的每次迭代期间，您可以通过以下方法指示TensorFlow丢弃LSTM层神经元的20％：

rnn.add(Dropout(0.2))

通过Dropout正则化添加三层以上LSTM层 (Adding Three More LSTM Layers With Dropout Regularization)

We will now add three more LSTM layers (with dropout regularization) to our recurrent neural network. You will see that after specifying the first LSTM layer, adding more is trivial.

现在，我们将在递归神经网络中再添加三个LSTM层(具有丢包正则化)。您将看到指定第一个LSTM层后，添加更多的内容很简单。

To add more layers, all that needs to be done is copying the first two add methods with one small change. Namely, we should remove the input_shape argument from the LSTM class.

要添加更多的层，需要做的就是复制前两个add方法，但有一点点改动。即，我们应该从LSTM类中删除input_shape参数。

We will keep the number of neurons (or units) and the dropout rate the same in each of the LSTM class invocations. Since the third LSTM layer we’re adding in this section will be our last LSTM layer, we can remove the return_sequences = True parameter as mentioned earlier. Removing the parameter sets return_sequences to its default value of False.

在每个LSTM类调用中，我们将使神经元(或units )的数量和退出率保持相同。由于我们在本节中添加的第三个LSTM层将是我们的最后一个LSTM层，因此我们可以删除前文提到的return_sequences = True参数。 Removing the parameter sets return_sequences to its default value of False .

Here’s the full code to add our next three LSTM layers:

Here's the full code to add our next three LSTM layers:

rnn.add(LSTM(units = 45, return_sequences = True))

rnn.add(Dropout(0.2))

rnn.add(LSTM(units = 45, return_sequences = True))

rnn.add(Dropout(0.2))

rnn.add(LSTM(units = 45))

rnn.add(Dropout(0.2))

This code is very repetitive and violates the DRY (Don’t repeat yourself) principle of software development. Let’s nest it in a loop instead:

This code is very repetitive and violates the DRY (Don't repeat yourself) principle of software development. Let's nest it in a loop instead:

for i in [True, True, False]:

    rnn.add(LSTM(units = 45, return_sequences = i))

    rnn.add(Dropout(0.2))

Adding The Output Layer To Our Recurrent Neural Network (Adding The Output Layer To Our Recurrent Neural Network)

Let’s finish architecting our recurrent neural network by adding our output layer.

Let's finish architecting our recurrent neural network by adding our output layer.

The output layer will be an instance of the Dense class, which is the same class we used to create the full connection layer of our convolutional neural network earlier in this course.

The only parameter we need to specify is units , which is the desired number of dimensions that the output layer should generate. Since we want to output the next day’s stock price (a single value), we’ll specify units = 1.

The only parameter we need to specify is units , which is the desired number of dimensions that the output layer should generate. Since we want to output the next day's stock price (a single value), we'll specify units = 1 .

Here’s the code to create our output layer:

Here's the code to create our output layer:

rnn.add(Dense(units = 1))

Compiling Our Recurrent Neural Network (Compiling Our Recurrent Neural Network)

As you’ll recall from the tutorials on artificial neural networks and convolutional neural networks, the compilation step of building a neural network is where we specify the neural net’s optimizer and loss function.

As you'll recall from the tutorials on artificial neural networks and convolutional neural networks, the compilation step of building a neural network is where we specify the neural net's optimizer and loss function.

TensorFlow allows us to compile a neural network using the aptly-named compile method. It accepts two arguments: optimizer and loss. Let’s start by creating an empty compile function:

TensorFlow allows us to compile a neural network using the aptly-named compile method. It accepts two arguments: optimizer and loss . Let's start by creating an empty compile function:

rnn.compile(optimizer = '', loss = '')

We now need to specify the optimizer and loss parameters.

Let’s start by discussing the optimizer parameter. Recurrent neural networks typically use the RMSProp optimizer in their compilation stage. With that said, we will use the Adam optimizer (as before). The Adam optimizer is a workhorse optimizer that is useful in a wide variety of neural network architectures.

Let's start by discussing the optimizer parameter. Recurrent neural networks typically use the RMSProp optimizer in their compilation stage. With that said, we will use the Adam optimizer (as before). The Adam optimizer is a workhorse optimizer that is useful in a wide variety of neural network architectures.

The loss parameter is fairly simple. Since we’re predicting a continuous variable, we can use mean squared error - just like you would when measuring the performance of a linear regression machine learning model. This means we can specify loss = mean_squared_error.

The loss parameter is fairly simple. Since we're predicting a continuous variable, we can use mean squared error - just like you would when measuring the performance of a linear regression machine learning model . This means we can specify loss = mean_squared_error .

Here’s the final compile method:

Here's the final compile method:

rnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

Fitting The Recurrent Neural Network On The Training Set (Fitting The Recurrent Neural Network On The Training Set)

It’s now time to train our recurrent network on our training data.

It's now time to train our recurrent network on our training data.

To do this, we use the fit method. The fit method accepts four arguments in this case:

The training data: in our case, this will be x_training_data and y_training_data
The training data : in our case, this will be x_training_data and y_training_data
Epochs: the number of iterations you’d like the recurrent neural network to be trained on. We will specify epochs = 100 in this case.
Epochs : the number of iterations you'd like the recurrent neural network to be trained on. We will specify epochs = 100 in this case.
The batch size: the size of batches that the network will be trained in through each epoch.
The batch size : the size of batches that the network will be trained in through each epoch.

Here is the code to train this recurrent neural network according to our specifications:

rnn.fit(x_training_data, y_training_data, epochs = 100, batch_size = 32)

Your Jupyter Notebook will now generate a number of printed outputs for every epoch in the training algorithm. They look like this:

As you can see, every output shows how long the epoch took to compute as well as the computed loss function at that epoch.

You should see the loss function’s value slowly decline as the recurrent neural network is fitted to the training data over time. In my case, the loss function’s value declined from 0.0504 in the first iteration to 0.0017 in the last iteration.

You should see the loss function's value slowly decline as the recurrent neural network is fitted to the training data over time. In my case, the loss function's value declined from 0.0504 in the first iteration to 0.0017 in the last iteration.

Making Predictions With Our Recurrent Neural Network (Making Predictions With Our Recurrent Neural Network)

We have built our recurrent neural network and trained it on data of Facebook’s stock price over the last 5 years. It’s now time to make some predictions!

We have built our recurrent neural network and trained it on data of Facebook's stock price over the last 5 years. It's now time to make some predictions!

Importing Our Test Data (Importing Our Test Data)

To start, let’s import the actual stock price data for the first month of 2020. This will give us something to compare our predicted values to.

To start, let's import the actual stock price data for the first month of 2020. This will give us something to compare our predicted values to.

Here’s the code to do this. Note that it is very similar to the code that we used to import our training data at the start of our Python script:

Here's the code to do this. Note that it is very similar to the code that we used to import our training data at the start of our Python script:

test_data = pd.read_csv('FB_test_data.csv')

test_data = test_data.iloc[:, 1].values

If you run the statement print(test_data.shape), it will return (21,). This shows that our test data is a one-dimensional NumPy array with 21 entries - which means there were 21 stock market trading days in January 2020.

If you run the statement print(test_data.shape) , it will return (21,) . This shows that our test data is a one-dimensional NumPy array with 21 entries - which means there were 21 stock market trading days in January 2020.

You can also generate a quick plot of the data using plt.plot(test_data). This should generate the following Python visualization:

You can also generate a quick plot of the data using plt.plot(test_data) . This should generate the following Python visualization:

With any luck, our predicted values should follow the same distribution.

Building The Test Data Set We Need To Make Predictions (Building The Test Data Set We Need To Make Predictions)

Before we can actually make predictions for Facebook’s stock price in January 2020, we first need to make some changes to our data set.

Before we can actually make predictions for Facebook's stock price in January 2020, we first need to make some changes to our data set.

The reason for this is that to predict each of the 21 observations in January, we will need the 40 previous trading days. Some of these trading days will come from the test set while the remainder will come from the training set. Because of this, some concatenation is necessary.

Unfortunately, you can just concatenate the NumPy arrays immediately. This is because we’ve already applied feature scaling to the training data but haven’t applied any feature scaling to the test data.

Unfortunately, you can just concatenate the NumPy arrays immediately. This is because we've already applied feature scaling to the training data but haven't applied any feature scaling to the test data.

To fix this, we need to re-import the original x_training_data object under a new variable name called unscaled_x_training_data. For consistency, we will also re-import the test data as a DataFrame called unscaled_test_data:

To fix this, we need to re-import the original x_training_data object under a new variable name called unscaled_x_training_data . For consistency, we will also re-import the test data as a DataFrame called unscaled_test_data :

unscaled_training_data = pd.read_csv('FB_training_data.csv')

unscaled_test_data = pd.read_csv('FB_test_data.csv')

Now we can concatenate together the Open column from each DataFrame with the following statement:

all_data = pd.concat((unscaled_x_training_data['Open'], unscaled_test_data['Open']), axis = 0)

This all_data object is a pandas Series of length 1279.

Now we need to create an array of all the stock prices from January 2020 and the 40 trading days prior to January. We will call this object x_test_data since it contains the x values that we’ll use to make stock price predictions for January 2020.

Now we need to create an array of all the stock prices from January 2020 and the 40 trading days prior to January. We will call this object x_test_data since it contains the x values that we'll use to make stock price predictions for January 2020.

The first thing you need to do is find the index of the first trading day in January within our all_data object. The statement len(all_data) - len(test_data) identifies this index for us.

This represents the upper bound of the first item in the array. To get the lower bound, just subtract 40 from this number. Said differently, the lower bound is len(all_data) - len(test_data) - 40.

The upper bound of the entire x_test_data array will be the last item in the data set. Accordingly, we can create this NumPy array with the following statement:

x_test_data = all_data[len(all_data) - len(test_data) - 40:].values

You can check whether or not this object has been created as desired by printing len(x_test_data), which has a value of 61. This makes sense - it should contain the 21 values for January 2020 as well as the 40 values prior.

You can check whether or not this object has been created as desired by printing len(x_test_data) , which has a value of 61 . This makes sense - it should contain the 21 values for January 2020 as well as the 40 values prior.

The last step of this section is to quickly reshape our NumPy array to make it suitable for the predict method:

x_test_data = np.reshape(x_test_data, (-1, 1))

Note that if you neglected to do this step, TensorFlow would print a handy message that would explain exactly how you’d need to transform your data.

Note that if you neglected to do this step, TensorFlow would print a handy message that would explain exactly how you'd need to transform your data.

Scaling Our Test Data (Scaling Our Test Data)

Our recurrent neural network was trained on scaled data. Because of this, we need to scale our x_test_data variable before we can use the model to make predictions.

x_test_data = scaler.transform(x_test_data)

Note that we used the transform method here instead of the fit_transform method (like before). This is because we want to transform the test data according to the fit generated from the entire training data set.

This means that the transformation that is applied to the test data will be the same as the one applied to the training data - which is necessary for our recurrent neural network to make accurate predictions.

Grouping Our Test Data (Grouping Our Test Data)

The last thing we need to do is group our test data into 21 arrays of size 40. Said differently, we’ll now create an array where each entry corresponds to a date in January and contains the stock prices of the 40 previous trading days.

The last thing we need to do is group our test data into 21 arrays of size 40 . Said differently, we'll now create an array where each entry corresponds to a date in January and contains the stock prices of the 40 previous trading days.

The code to do this is similar to something we used earlier:

final_x_test_data = []

for i in range(40, len(x_test_data)):

    final_x_test_data.append(x_test_data[i-40:i, 0])

final_x_test_data = np.array(final_x_test_data)

Lastly, we need to reshape the final_x_test_data variable to meet TensorFlow standards.

We saw this previously, so the code should need no explanation:

final_x_test_data = np.reshape(final_x_test_data, (final_x_test_data.shape[0], 

                                               final_x_test_data.shape[1], 

                                               1))

Actually Making Predictions (Actually Making Predictions)

After an absurd amount of data reprocessing, we are now ready to make predictions using our test data!

This step is simple. Simply pass in our final_x_test_data object into the predict method called on the rnn object. As an example, here is how you could generate these predictions and store them in an aptly-named variable called predictions:

predictions = rnn.predict(final_x_test_data)

Let’s plot these predictions by running plt.plot(predictions) (note that you’ll need to run plt.clf() to clear your canvas first):

Let's plot these predictions by running plt.plot(predictions) (note that you'll need to run plt.clf() to clear your canvas first):

As you can see, the predicted values in this plot are all between 0 and 1. This is because our data set is still scaled! We need to un-scale it for the predictions to have any practical meaning.

As you can see, the predicted values in this plot are all between 0 and 1 . This is because our data set is still scaled! We need to un-scale it for the predictions to have any practical meaning.

The MinMaxScaler class that we used to originally scale our data set comes with a useful inverse_transform method to un-scale the data. Here’s how you could un-scale the data and generate a new plot:

The MinMaxScaler class that we used to originally scale our data set comes with a useful inverse_transform method to un-scale the data. Here's how you could un-scale the data and generate a new plot:

unscaled_predictions = scaler.inverse_transform(predictions)

plt.clf() #This clears the first prediction plot from our canvas

plt.plot(unscaled_predictions)

This looks much better! Anyone who’s followed Facebook’s stock price for any length of time can see that this seems fairly close to where Facebook has actually traded.

This looks much better! Anyone who's followed Facebook's stock price for any length of time can see that this seems fairly close to where Facebook has actually traded.

Let’s generate plot that compares our predicted stock prices with Facebook’s actual stock price:

Let's generate plot that compares our predicted stock prices with Facebook's actual stock price:

plt.plot(unscaled_predictions, color = '#135485', label = "Predictions")

plt.plot(test_data, color = 'black', label = "Real Data")

plt.title('Facebook Stock Price Predictions')

The Full Code For This Tutorial (The Full Code For This Tutorial)

You can view the full code for this tutorial in this GitHub repository. It is also pasted below for your reference:

You can view the full code for this tutorial in this GitHub repository . It is also pasted below for your reference:

#Import the necessary data science libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

#Import the data set as a pandas DataFrame

training_data = pd.read_csv('FB_training_data.csv')

#Transform the data set into a NumPy array

training_data = training_data.iloc[:, 1].values

#Apply feature scaling to the data set

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

training_data = scaler.fit_transform(training_data.reshape(-1, 1))

#Initialize our x_training_data and y_training_data variables 

#as empty Python lists

x_training_data = []

y_training_data =[]

#Populate the Python lists using 40 timesteps

for i in range(40, len(training_data)):

    x_training_data.append(training_data[i-40:i, 0])

    y_training_data.append(training_data[i, 0])

    

#Transforming our lists into NumPy arrays

x_training_data = np.array(x_training_data)

y_training_data = np.array(y_training_data)

#Verifying the shape of the NumPy arrays

print(x_training_data.shape)

print(y_training_data.shape)

#Reshaping the NumPy array to meet TensorFlow standards

x_training_data = np.reshape(x_training_data, (x_training_data.shape[0], 

                                               x_training_data.shape[1], 

                                               1))

#Printing the new shape of x_training_data

print(x_training_data.shape)

#Importing our TensorFlow libraries

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import LSTM

from tensorflow.keras.layers import Dropout

#Initializing our recurrent neural network

rnn = Sequential()

#Adding our first LSTM layer

rnn.add(LSTM(units = 45, return_sequences = True, input_shape = (x_training_data.shape[1], 1)))

#Perform some dropout regularization

rnn.add(Dropout(0.2))

#Adding three more LSTM layers with dropout regularization

for i in [True, True, False]:

    rnn.add(LSTM(units = 45, return_sequences = i))

    rnn.add(Dropout(0.2))

#(Original code for the three additional LSTM layers)

# rnn.add(LSTM(units = 45, return_sequences = True))

# rnn.add(Dropout(0.2))

# rnn.add(LSTM(units = 45, return_sequences = True))

# rnn.add(Dropout(0.2))

# rnn.add(LSTM(units = 45))

# rnn.add(Dropout(0.2))

#Adding our output layer

rnn.add(Dense(units = 1))

#Compiling the recurrent neural network

rnn.compile(optimizer = 'adam', loss = 'mean_squared_error')

#Training the recurrent neural network

rnn.fit(x_training_data, y_training_data, epochs = 100, batch_size = 32)

#Import the test data set and transform it into a NumPy array

test_data = pd.read_csv('FB_test_data.csv')

test_data = test_data.iloc[:, 1].values

#Make sure the test data's shape makes sense

print(test_data.shape)

#Plot the test data

plt.plot(test_data)

#Create unscaled training data and test data objects

unscaled_training_data = pd.read_csv('FB_training_data.csv')

unscaled_test_data = pd.read_csv('FB_test_data.csv')

#Concatenate the unscaled data

all_data = pd.concat((unscaled_x_training_data['Open'], unscaled_test_data['Open']), axis = 0)

#Create our x_test_data object, which has each January day + the 40 prior days

x_test_data = all_data[len(all_data) - len(test_data) - 40:].values

x_test_data = np.reshape(x_test_data, (-1, 1))

#Scale the test data

x_test_data = scaler.transform(x_test_data)

#Grouping our test data

final_x_test_data = []

for i in range(40, len(x_test_data)):

    final_x_test_data.append(x_test_data[i-40:i, 0])

final_x_test_data = np.array(final_x_test_data)

#Reshaping the NumPy array to meet TensorFlow standards

final_x_test_data = np.reshape(final_x_test_data, (final_x_test_data.shape[0], 

                                               final_x_test_data.shape[1], 

                                               1))

#Generating our predicted values

predictions = rnn.predict(final_x_test_data)

#Plotting our predicted values

plt.clf() #This clears the old plot from our canvas

plt.plot(predictions)

#Unscaling the predicted values and re-plotting the data

unscaled_predictions = scaler.inverse_transform(predictions)

plt.clf() #This clears the first prediction plot from our canvas

plt.plot(unscaled_predictions)

#Plotting the predicted values against Facebook's actual stock price

plt.plot(unscaled_predictions, color = '#135485', label = "Predictions")

plt.plot(test_data, color = 'black', label = "Real Data")

plt.title('Facebook Stock Price Predictions')

Summary - The Intuition Behind Recurrent Neural Networks (Summary - The Intuition Behind Recurrent Neural Networks)

In this tutorial, you learned how to build and train a recurrent neural network.

Here is a brief summary of what you learned:

How to apply feature scaling to a data set that a recurrent neural network will be trained on
How to apply feature scaling to a data set that a recurrent neural network will be trained on
The role of timesteps in training a recurrent neural network
The role of timesteps in training a recurrent neural network
That TensorFlow primarily uses NumPy arrays as the data structure to train models with
That TensorFlow primarily uses NumPy arrays as the data structure to train models with
How to add LSTM and Dropout layers to a recurrent neural network
How to add LSTM and Dropout layers to a recurrent neural network
Why dropout regularization is commonly used to avoid overfitting when training neural networks
Why dropout regularization is commonly used to avoid overfitting when training neural networks
That the Dense layer from TensorFlow is commonly used as the output layer of a recurrent neural network
That the Dense layer from TensorFlow is commonly used as the output layer of a recurrent neural network
That the compilation step of building a neural network involves specifying its optimizer and its loss function
That the compilation step of building a neural network involves specifying its optimizer and its loss function
How to make predictions using a recurrent neural network
How to make predictions using a recurrent neural network
That predictions make using a neural network trained on scaled data must be unscaled to be interpretable by humans
That predictions make using a neural network trained on scaled data must be unscaled to be interpretable by humans

If you found this post helpful, check out my book Pragmatic Machine Learning for a project-based guide on deep learning models covered here and in my other articles.

If you found this post helpful, check out my book Pragmatic Machine Learning for a project-based guide on deep learning models covered here and in my other articles .