keras中ann神经网络_有关在keras中使用递归神经网络的全面指南

最新推荐文章于 2024-01-31 22:47:41 发布

weixin_26750481

最新推荐文章于 2024-01-31 22:47:41 发布

阅读量549

点赞数

文章标签：神经网络 leetcode 深度学习

原文链接：https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f

版权

keras中ann神经网络

Recurrent Neural Networks are designed to handle sequential data by incorporating the essential dimension of time. This type of data appears everywhere from the prediction of stock prices to the modelling of language, so it’s an essential skillset for someone interesting in getting into deep learning. This article will cover:

循环神经网络旨在通过合并时间的基本维度来处理顺序数据。这种类型的数据随处可见，从股票价格的预测到语言的建模，因此对于感兴趣的深度学习者来说，它是必不可少的技能。本文将介绍：

A demonstration of properly vectorizing data for a sequential model
正确矢量化顺序模型数据的演示
A discussion on how to shape data for recurrent neural networks
关于如何为递归神经网络塑造数据的讨论
Implementation of RNNs, LSTMs, GRUs, and Embeddings
RNN，LSTM，GRU和嵌入的实现
Best practices and tips in building effective deep models
建立有效的深度模型的最佳做法和技巧

Let’s get into it!

让我们开始吧！

We will be training a recurrent neural network to predict Amazon stock prices. We can collect this from the pandas_datareader. The stock data is stored into a DataFrame named df.

我们将训练一个循环神经网络来预测亚马逊的股价。我们可以从pandas_datareader收集它。股票数据存储在名为df的DataFrame中。

We’ll predict the closing prices, which can be accessed in the Close column. These prices will be separated into a training set and a testing set.

我们将预测收盘价，可以在“ Close列中进行访问。这些价格将分为训练集和测试集。

Currently, train and test are long sequences of stock prices. We would like to convert these sequences into x and y sets, where x represents a sequence of prices and y is the next price.

当前， train和test是长期的股票价格序列。我们想将这些序列转换为x和y集，其中x代表价格序列， y是下一个价格。

Consider the following sequence:

请考虑以下顺序：

We would want to generate the following sequences, in this case with a ‘window size’ of three.

我们希望生成以下序列，在这种情况下，“窗口大小”为3。

This can be programmed using the following code:

可以使用以下代码进行编程：

Great! We’ve collected our training data. One more thing, though — currently the data is in list form. In order to run it through a Keras model, data must almost always be in array form. Additionally, the X must be a three-dimensional array — in this case, we reshape the third dimension of X as 1.

大！我们已经收集了训练数据。不过，还有一件事-当前数据是列表形式的。为了通过Keras模型运行它，数据几乎必须始终为数组形式。此外， X必须是三维数组-在这种情况下，我们将X的第三维调整为1。

It’s worth exploring why RNNs require a three-dimensional input (at least, their implementations in Keras). In the case of stock prediction, at each time step, there is only one data point — the stock price.

值得探讨的是为什么RNN需要三维输入(至少，它们在Keras中的实现)。对于股票预测，在每个时间步长只有一个数据点-股票价格。

However, consider a RNN learning to generate a sequence of movements up, down, right, and left; each time step has four values (a 1or a 0 for each direction). For example, right might be represented as [0, 0, 1, 0]. If there are 5000 items in the training set and each had a sequence length of 10 movements, the shape of the data would be (5000, 10, 4).

但是，考虑使用RNN学习来生成一系列的up ， down ， right和left ；每个时间步都有四个值(每个方向为1或0)。例如， right可能表示为[0, 0, 1, 0] 。如果训练集中有5000个项目，每个项目的序列长度为10个动作，则数据的形状将为(5000, 10, 4).

Similarly, in text generation that is character-by-character based, each time step has at least 26 different values of 0s or 1s, indicating letters of the alphabet. Often, additional characters include punctuation or spaces. This idea will be iterated upon more later.

类似地，在基于逐个字符的文本生成中，每个时间步长至少具有26个不同的0或1s值，表示字母。通常，其他字符包括标点或空格。这个想法将在以后更多的地方被重复。

Now that the training data has been created, we can get started constructing the recurrent neural network. The simplest RNN has two layers: a standard recurrent layer and a standard dense layer, which will be connected through a Sequential model. We can go ahead and import these.

现在已经创建了训练数据，我们可以开始构建递归神经网络。最简单的RNN具有两层：标准循环层和标准密集层，它们将通过Sequential模型进行连接。我们可以继续导入这些。

We can begin creating a recurrent neural network now. Although it’s not entirely accurate, one can think of the 10 in SimpleRNN(10, …) as having ’10 neurons’, much like a dense layer. Since we are predicting single values on a continuous scale, the last dense layer has one neuron and a linear activation.

我们现在可以开始创建递归神经网络。尽管这不是完全准确，可以想到的10在SimpleRNN(10, …)具有'10元，很像一个致密层。由于我们在连续范围上预测单个值，因此最后一个密集层具有一个神经元和线性激活。

Because our sequences have 50 elements and each can be represented using only one value (opposed to something like 26 for the alphabet), the input shape of a sequence is (50, 1).

因为我们的序列有50个元素，并且每个元素只能用一个值表示(与字母类似，类似于26)，所以序列的输入形状为(50, 1) 。

Lastly, the Keras model must be compiled with a loss (default mean squared error for regression), an optimizer (Adam is a default), and optional metrics to track the progress (mean absolute error). We’ll train the model on X_train and y_train for 500 epochs and save training data to history.

最后，必须在编译Keras模型时使用损失(回归的默认均方误差)，优化器(默认为Adam)和用于跟踪进度的可选指标(平均绝对误差)。我们将在X_train和y_train上训练模型500个纪元，并将训练数据保存到history 。

A recurrent layer can be thought of as parsing several inputs, taking into account sequential order — if it helps, it’s can be thought of as a derivative, finding a generalizable pattern across sequences. Stacking two recurrent layers, then, is like taking a double derivative, or the ‘difference of the difference’. Much like stacking multiple convolutional layers, it allows for more complex relationships to be identified.

循环层可以被认为是解析多个输入，并考虑到顺序顺序—如果有帮助，可以将其视为派生，在整个序列中找到可归纳的模式。那么，堆叠两个递归层就像拿一个双导数，或“差异之差”。就像堆叠多个卷积层一样，它允许识别更复杂的关系。

If one tries to stack two recurrent layers naively, it won’t work: a parameter, return_sequences=True, must be added. This returns the output as a sequence that can be inputted into another recurrent layer. We may decide to design a deep recurrent neural network with several stacked layers as follows:

如果一个人尝试天真地堆叠两个递归层，将无法正常工作：必须添加参数return_sequences=True 。这将输出作为序列返回，可以输入到另一个循环图层。我们可能会决定设计具有多个堆叠层的深度递归神经网络，如下所示：

LSTMs, or Long Short-Term Memory networks, are an improvement upon naïve recurrent neural networks because they can ‘memorize’ important information across long input sequences. The interface for using LSTMs is the same for the SimpleRNN layer.

LSTM或长短期记忆网络是对天真递归神经网络的改进，因为它们可以跨长输入序列“存储”重要信息。 LSTM的使用接口与SimpleRNN层相同。

Like RNNs, LSTMs can be stacked to develop complexity and deeper understanding of patterns in the inputs. Note that because Long Short-Term Memory networks have a variety of memory mechanisms, they are more expensive to train than standard recurrent neural networks.

像RNN一样，可以将LSTM堆叠起来以提高复杂性并加深对输入模式的理解。请注意，由于长短期记忆网络具有多种记忆机制，因此与标准的递归神经网络相比，它们的训练成本更高。

Similarly, the Gated Recurrent Unit is another recurrent layer. Like the LSTM, its goal is retain important information over long sequences of inputs through gates, but approaches the task in a different method. It is implemented just like the SimpleRNN and LSTM layers at keras.layers.GRU.

同样，门控循环单元是另一个循环层。像LSTM一样，其目标是在通过门的较长输入序列上保留重要信息，但以另一种方法来完成任务。它实现就像SimpleRNN和LSTM在层keras.layers.GRU 。

A note on our task of stock forecasting — it’s bad practice to have the network predict stocks in their raw values (e.g. $576, $598, $589, …) because of extrapolation. The main idea is that stocks, especially those of a high-growth company like Amazon, are continually rising, and that it’s difficult for models to predict values in a range it hasn’t been trained on.

关于我们的股票预测任务的注释-由于推算，让网络预测股票的原始价值(例如576美元，598美元，589美元……)是不明智的做法。其主要思想是，股票，尤其是像亚马逊这样的高增长公司的股票，正在持续上涨，并且模型很难预测其尚未受过训练的范围内的价值。

If one were to train a model of data from 2000 to 2020 where prices ranged from x to y, the model will have difficulty predicting prices above y or below x. Theoretically, a recurrent network should be able to carry out base-level relationships by itself, but it’s always good to reduce the work it needs to do.

如果要训练2000年至2020年价格从x到y的数据模型，则该模型将很难预测价格高于y或低于x的价格。从理论上讲，递归网络应该能够独自执行基本级别的关系，但是减少所需要做的工作总是好的。

With any forecasting task, difference the data beforehand. Differencing transforms the dataset such that each value is a change from the previous one, and can either be done absolutely (+3, -4, +2, +1) or on a percentage scale (95%, 103%, 105%). The benefit of differencing is that values lie on a more centered and stationary scale that is easier for a model to operate on.

对于任何预测任务，请事先更改数据。差分转换数据集，以使每个值都是前一个值的变化，并且可以绝对地(+ 3，-4，+ 2，+ 1)或以百分比比例(95％，103％，105％)完成。差异的好处在于，值位于更居中和固定的范围内，这对于模型而言更易于操作。

Stock forecasting is a simpler usage of recurrent neural networks, since (aside from differencing) there is little preprocessing that needs to be done, since the data is inherently numerical. Other applications of recurrent neural networks may not be so clean, particularly text.

股票预测是循环神经网络的一种更简单的用法，因为(除差异之外)几乎不需要进行任何预处理，因为数据本质上是数值的。循环神经网络的其他应用可能不是那么干净，尤其是文本。

For example, if I were to train recurrent neural networks to predict the next character sequence, the vectorized input of ‘abe’ would look like this:

例如，如果我要训练递归神经网络以预测下一个字符序列，则“ abe”的矢量化输入应如下所示：

To put some jargon in the example: a ‘sequence’ is a collection of timesteps, in this case being [‘a’, ‘b’, ‘e’]. A timestep is a vector with a length; for instance, [1, 0, 0, 0, 0, …] for ‘a’. The dataset is a collection of sequences, so the shape of a sequential dataset is (number of sequences, number of timesteps within a sequence, number of components in a timestep).

在示例中使用一些术语：“顺序”是时间步长的集合，在这种情况下为['a', 'b', 'e'] 。时间步长是一个具有长度的向量。例如， [1, 0, 0, 0, 0, …]为'a' 数据集是序列的集合，因此顺序数据集的形状为(序列数，序列中的时间步数，时间步中的分量数)。

It’s usually advisable to use embeddings with large input sequences, which map long sequences as points in an embedding space. This can improve performance and training time: in a properly developed embedding space, tokens (words) that have similar meanings in the context of the dataset are placed physically close together in the embedding.

通常建议使用具有较大输入序列的嵌入，该输入序列将长序列映射为嵌入空间中的点。这可以提高性能和训练时间：在正确开发的嵌入空间中，将在数据集上下文中具有相似含义的标记(单词)在嵌入中物理上紧密靠近放置。

When embeddings are being used, three parameters are required: the input dimensions, the output dimensions, and the input length, which are 1000, 64, and 50 in this case, respectively.

使用嵌入时，需要三个参数：输入尺寸，输出尺寸和输入长度，在这种情况下分别为1000、64和50。

from keras.layers import Dense, LSTM, Embedding
model = Sequential()
model.add(Embedding(1000, 64, input_length=50))
model.add(LSTM(64, input_shape=(50,1), return_sequences=True)) #LSTM layer with 32 neurons
...

The input dimensions refers to the size of the vocabulary. For instance, if there were 27 total unique characters in the vocabulary (alphabet and the space), the inputted value would be 27. This is the third number in the tuple (number of sequences, number of timesteps within a sequence, number of components in a timestep).
输入尺寸是指词汇的大小。例如，如果词汇表中共有27个唯一字符(字母和空格)，则输入值为27。这是元组中的第三个数字(序列数，序列中的时间步数，组件数)在一个时间步中)。
The output dimensions refers to the dimensionality of the output. Like mentioned before, think of embedding as a specialized dimensionality reduction.
输出尺寸是指输出的尺寸。如前所述，可以将嵌入视为一种专门的降维方法。
The input length refers to the number of timesteps within each sequence, if it is fixed. This is the second number in the tuple (number of sequences, number of timesteps within a sequence, number of components in a timestep).
输入长度是指每个序列中时间步长的数量(如果固定)。这是元组中的第二个数字(序列数，序列中的时间步数，时间步中的组件数)。

Afterwards, recurrent, LSTM, and GRU layers can be stacked on top of the embedding layer. It’s always wise to stack several Dense layers, along with the standard ANN shebang — batch norm, dropout, etc. Once you have a solid understanding of the dynamics of recurrent neural networks, they’re not hard at all to implement.

之后，可以将循环层，LSTM层和GRU层堆叠在嵌入层的顶部。堆叠多个Dense层以及标准的ANN shebang(批处理规范，辍学等)总是明智的。一旦您对循环神经网络的动态有了深刻的了解，就一点也不难实现它们。

Thanks for reading!

谢谢阅读！

All images except for cover image created by author.

除作者创建的封面图像外的所有图像。