tensorflow RNN-LSTM-GRU

Lzj000lzj

于 2019-06-03 17:02:32 发布

阅读量164

点赞数

分类专栏： nlp tensorflow

nlp 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

tensorflow

8 篇文章 0 订阅

订阅专栏

RNN

在这里插入图片描述

循环神经网络如图1所示。每个节点在一个时间步中接受来自 前一个节点 的输入,我们取一个输入x_i和a_i-1(前一个节点的输出)并对其执行计算，生成一个输出h_i。这个输出被获取并提供给下一个节点。这个过程一直持续到所有的时间步都得到评估。

LSTM

RNN的缺点是，随着时间步的变长，它不能从后面很远的时间步中获得上下文,RNN只能记住短期记忆序列。
LSTM网络的结构与RNN相同，但是重复模块A执行更多的操作。增强重复模块使得LSTM网络能够记住长期依赖关系。让我们试着把每个有助于网络记忆的操作分解开来。

遗忘门的操作

本次输入和上一个时间步的结果的拼接值传递给sigmoid函数，sigmoid函数输出0-1之间的值f_t。然后，将f_t和上一步的遗忘值c_t-1做元素乘法，如果乘法结果是0，则被遗忘，1的话不遗忘。
更新门操作

上图为“更新门操作”。我们将当前时间步中的值与前一个时间步中学习到的表示拼接起来。通过tanh函数传递拼接的值，我们生成候选值C_t~。f_t和i_t的计算公式相同，而参数不同。
输出门操作

GRU

当使用更大的网络时，训练时间与RNN相比会显著增加。如果希望减少训练时间，并且使用能够记住长期依赖关系的网络，那么LSTM还有一种替代方案。它被称为门控循环单元(GRU)。
门控循环单元使用更新门和复位门。更新门决定应该让过去的多少信息通过，复位门决定应该丢弃过去的多少信息。在上面的图中，z_t表示update gate操作，通过使用sigmoid函数，我们决定让哪些值通过。h_t表示复位门操作，其中我们将前一个时间步和当前时间步的串联值与r_t相乘。这将生成我们希望从前面的时间步中丢弃的值。
尽管GRU的计算效率比LSTM网络高，但是由于门的减少，它的性能仍然排在LSTM网络之后。因此，GRU可以用于我们需要更快的训练和没有太多的计算能力的时候。

LSTM的简单实践

数据准备

num_words = 30000
maxlen = 200
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=num_words)
print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen, padding='post')
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen, padding='post')

print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)

LSTM模型

def lstm_model():
    model = keras.Sequential([
        layers.Embedding(input_dim=num_words, output_dim=32, input_length=maxlen),
        layers.LSTM(32, return_sequences=True),
        layers.LSTM(1, activation='sigmoid', return_sequences=False)
    ])
    model.compile(optimizer=keras.optimizers.Adam(),
                 loss=keras.losses.BinaryCrossentropy(),
                 metrics=['accuracy'])
    return model
model = lstm_model()
model.summary()
输出：
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 200, 32)           960000    
_________________________________________________________________
lstm (LSTM)                  (None, 200, 32)           8320      
_________________________________________________________________
lstm_1 (LSTM)                (None, 1)                 136       
=================================================================
Total params: 968,456
Trainable params: 968,456
Non-trainable params: 0
_________________________________________________________________

模型训练

%%time
history = model.fit(x_train, y_train, batch_size=64, epochs=5,validation_split=0.1)
import matplotlib.pyplot as plt
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training', 'valivation'], loc='upper left')
plt.show()

GRU

构建模型

def lstm_model():
    model = keras.Sequential([
        layers.Embedding(input_dim=30000, output_dim=32, input_length=maxlen),
        layers.GRU(32, return_sequences=True),
        layers.GRU(1, activation='sigmoid', return_sequences=False)
    ])
    model.compile(optimizer=keras.optimizers.Adam(),
                 loss=keras.losses.BinaryCrossentropy(),
                 metrics=['accuracy'])
    return model
model = lstm_model()
model.summary()
输出：
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 200, 32)           960000    
_________________________________________________________________
gru (GRU)                    (None, 200, 32)           6336      
_________________________________________________________________
gru_1 (GRU)                  (None, 1)                 105       
=================================================================
Total params: 966,441
Trainable params: 966,441
Non-trainable params: 0
_________________________________________________________________

模型训练

%%time
history = model.fit(x_train, y_train, batch_size=64, epochs=5,validation_split=0.1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training', 'valivation'], loc='upper left')
plt.show()