基于tensorflow2.0的RNN实战（单向RNN，双向RNN，Attention+RNN）

最新推荐文章于 2024-05-09 21:23:46 发布

zhong_ddbb

最新推荐文章于 2024-05-09 21:23:46 发布

阅读量5.2k

点赞数 7

分类专栏：深度学习 tensorflow2.0 文章标签： tensorflow LSTM Attention RNN Bi-LSTM

本文链接：https://blog.csdn.net/zhong_ddbb/article/details/108830276

版权

深度学习同时被 2 个专栏收录

26 篇文章 9 订阅

订阅专栏

tensorflow2.0

17 篇文章 6 订阅

订阅专栏

基本结构

RNN常见的结构如下：

在这里插入图片描述

其中的单元 $A$ 可以是全连接RNN，LSTM，GRU。

tensorfow2.0中将这三个封装到以下接口中：

keras.layers.SimpleRNN

keras.layers.GRU

keras.layers.LSTM

对于序列预测，如机器翻译，我们需要同时获得所有的单元A的输出 $(h_0,h_1\cdots,h_t)$ 。对于分类和回归问题，则只需要最后一个单元输出 $h_t$ 。这个过程通过参数 return_sequences=True 控制。

进而还可以实现一些双向结构：

在这里插入图片描述

还可以加入Attention机制。

实战—单向RNN

基本使用以及常见参数：

tf.keras.layers.LSTM(
    units,
    activation="tanh",
    use_bias=True,
    dropout=0.0,
    return_sequences=False,
)

注：SimpleRNN GRU 使用方式相同。

更多细节参考：Recurrent layers

以下均以LSTM结构进行演示。SimpleRNN、GRU与LSTM类似

（1）以机器翻译为例进行说明

引入库

import numpy as np
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

构造样本

x_train = np.random.randint(0, 65, (1000,64,10))
y_train = np.random.randint(0, 65, (1000,64,1))

x_train：表示随机生成100句话，每句话长度为64（64个词组成），每个词被嵌入到10个维度（word2vec或者bert等技术）。标签和训练的输入形式一样。

y_train：表示x_train的翻译的结果，一共1000句话，每句话对应64个词，1表示每个词在词库中的位置。

模型定义和训练

rnn_units = 128
vocab_size = 65

# 构造输入，输入维度为句子的维度
_input = keras.Input(shape=(64,10))

# 构造LSTM单元，rnn_units 为h_i的输出维度
# LSTM单元会自动匹配句子的长度64，生成64个单元A，也会得到输出h_1~h_64
# return_sequences=True表示返回所有的h_i
x = layers.LSTM(rnn_units,return_sequences=True, recurrent_initializer='orthogonal')(_input)

# 在来一层LSTM，每个h_i的输出维度为90
x = layers.LSTM(90,return_sequences=True,recurrent_initializer='orthogonal')(x)

# 进入全连接层，vocab_size表示所有的词汇量的大小,每个timestep共享同一组参数。
output = layers.Dense(vocab_size,activation='softmax')(x)


# 注意以下用法：
# lay = layers.Dense(vocab_size,activation='relu')
# output = layers.TimeDistributed(lay)(x)
# 每个timestep拥有独立的参数。

# 声明模型
model = keras.Model(_input, output)
# 编译模型
model.compile(loss='sparse_categorical_crossentropy', 
              optimizer = 'adam',
              metrics=['acc']
             )

model.fit(x_train,y_train,epochs=10,batch_size=16)

模型的结构长这样

在这里插入图片描述

可以查看其中的一个预测结果：

np.argmax(model.predict(x_train)[0],axis=-1)

在这里插入图片描述

（2）分类与回归问题

不需要指定：return_sequences=True

(2-1)回归问题

必要的库和构造数据集

import numpy as np
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow as tf

x_train = np.random.rand(1000,64,1).astype(np.float32) 
y_train = np.random.rand(1000,1)

搭建网络并进行训练

input_layer = keras.Input(shape=(64, 1))
lstm = layers.LSTM(100)(input_layer)
dense1 = layers.Dense(20, activation='relu')(lstm)
dense2 = layers.Dense(1, activation='sigmoid')(dense1)

model = keras.Model(input_layer,dense2)
model.compile("adam", loss='binary_crossentropy',metrics=['mean_squared_error'])
history = model.fit(x_train, y_train, epochs=20,batch_size=512)

（2-2）分类问题

import numpy as np
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow as tf


x_train = np.random.rand(1000,64,1).astype(np.float32) 
y_train = np.random.randint(0,2,(1000,1))

input_layer = keras.Input(shape=(64, 1))
lstm = layers.LSTM(100)(input_layer)
dense1 = layers.Dense(20, activation='relu')(lstm)
dense2 = layers.Dense(2, activation='softmax')(dense1)

model = keras.Model(input_layer,dense2)
model.compile("adam", loss='sparse_categorical_crossentropy',metrics=['binary_accuracy'])
history = model.fit(x_train, y_train, epochs=2,batch_size=512)
print(model.predict(x_train))

在这里插入图片描述

实战—双向RNN

基本使用方式

tf.keras.layers.Bidirectional(
    layer, merge_mode="concat"
)

参数说明

参数	说明
layer	可以是：LSTM，GRU，SimpleRNN
merge_mode	前向和后向RNN的输出将被组合的模式。{‘sum’，‘mul’，‘concat’，‘ave’，None}中的一个。如果为None，则将不合并输出，它们将作为列表返回。默认值为“ concat”。
backward_layer	用于处理向后输入处理的实例。如果`backward_layer`未提供，则作为`layer`参数传递的图层实例将用于自动生成后向图层。需要注意的是所提供的`backward_layer`层应具有属性匹配的那些的`layer`参数，特别地，它应具有相同的值`stateful`，`return_states`，`return_sequence`等。此外，`backward_layer`和`layer`应该有不同的 `go_backwards`参数值(一个True ，一个False)

以下是使用双向LSTM进行机器翻译的例子


import numpy as np
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

# 注意 x_train 和 y_train的shape
x_train = np.random.randint(0, 65, (1000,64,10)).astype(np.float32)
y_train = np.random.randint(0, 65, (1000,64,1))

rnn_units = 128
vocab_size = 65

# 构造输入，输入维度为句子的维度
_input = keras.Input(shape=(64,10))

# 构造LSTM单元，rnn_units 为h_i的输出维度
# LSTM单元会自动匹配句子的长度64，生成64个单元A，也会得到输出h_1~h_64
# recurrent_initializer='orthogonal'表示返回所有的h_i
x = layers.Bidirectional(layers.LSTM(rnn_units,return_sequences=True, recurrent_initializer='orthogonal'))(_input)
x = layers.Bidirectional(layers.LSTM(90,return_sequences=True, recurrent_initializer='orthogonal'))(x)

# 进入全连接层，vocab_size表示所有的词汇量的大小
output = layers.Dense(vocab_size,activation='softmax')(x)
model = keras.Model(_input,output)

model.compile(loss='sparse_categorical_crossentropy', 
              optimizer = 'adam',
              metrics=['acc']
             )

model.fit(x_train,y_train,epochs=1,batch_size=16)

# 执行模型的预测
print(np.argmax(model.predict(x_train)[0],-1))

在这里插入图片描述

下面是一个双向RNN结构。其中正向使用LSTM，反向使用GRU结构。


import numpy as np
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

# 注意 x_train 和 y_train的shape
x_train = np.random.randint(0, 65, (1000,64,10)).astype(np.float32)
y_train = np.random.randint(0, 65, (1000,64,1))

rnn_units = 128
vocab_size = 65

# 构造输入，输入维度为句子的维度
_input = keras.Input(shape=(64,10))

# 构造LSTM单元，rnn_units 为h_i的输出维度
# LSTM单元会自动匹配句子的长度64，生成64个单元A，也会得到输出h_1~h_64
# recurrent_initializer='orthogonal'表示返回所有的h_i

forward_layer = layers.LSTM(rnn_units,return_sequences=True)
backward_layer = layers.GRU(rnn_units,activation='relu',return_sequences=True,go_backwards=True)
x = layers.Bidirectional(forward_layer,backward_layer=backward_layer)(_input)


# 进入全连接层，vocab_size表示所有的词汇量的大小
output = layers.Dense(vocab_size,activation='softmax')(x)
model = keras.Model(_input,output)

model.compile(loss='sparse_categorical_crossentropy', 
              optimizer = 'adam',
              metrics=['acc']
             )

model.fit(x_train,y_train,epochs=1,batch_size=16)
print(np.argmax(model.predict(x_train)[0],-1))

双向RNN处理分类回归问题与单向RNN相似，通过设置return_sequences=False即可。

实战—Attention+RNN

Attention机制原理参考：深入理解 Bert核心：Self-Attention与transformer

在tensorflow2.0中实现如下：

tf.keras.layers.Attention(use_scale=False, **kwargs)

inputs: 以下三个tensor:

query: Query Tensor 的形状为：[batch_size, Tq, dim]
value: Value Tensor 的形状为： [batch_size, Tv, dim].
key: Key Tensor 的形状为：[batch_size, Tv, dim] ，此变量可选，如果未给定，则用value代替，这也是常用的做法。

Attention的输出shape：[batch_size, Tq, dim]

下面还是机器翻译的背景，使用lstm+Attention实现

引入必要的库和构造数据

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

x_train = np.random.randint(0, 65, (10000,64,10)).astype(np.float32)
y_train = np.random.randint(0, 65, (10000,64,1))

实现多头Self-Attention

# 实现多头机制
def muti_head_attention(_input,d=8,n_attention_head=2):
    """
    实现单层多头机制
    @param _input: 输入 (?, n_feats, n_dim)
    @param d: Q,K,V映射后的维度
    @param n_attention_head: multi-head attention的个数
    """
    attention_heads = []

    for i in range(n_attention_head):
        embed_q = layers.Dense(d)(_input)   # 相当于映射到不同的空间,得到不同的Query
        embed_v = layers.Dense(d)(_input)   # 相当于映射到不同的空间,得到不同的Value
        attention_output  = layers.Attention()([embed_q,embed_v])  
        # 将每一个head的结果暂时存入
        attention_heads.append(attention_output)
    
    # 多个head则合并，单个head直接返回
    if n_attention_head > 1:
        muti_attention_output = layers.Concatenate(axis=-1)(attention_heads)
    else:
        muti_attention_output = attention_output
    return muti_attention_output

模型

# 模型的输入
_input = tf.keras.Input(shape=(64,10))
# LSTM 层
lstm_layer = layers.LSTM(64,return_sequences=True)(_input)
# 多层 muti_head_attention,将LSTM结构的输出直接输入
x = muti_head_attention(lstm_layer,8,1)
x  = muti_head_attention(x,32,3)
# 输出
output = layers.Dense(65,activation='softmax')(x)

model = tf.keras.Model(_input,output)

model.compile(loss='sparse_categorical_crossentropy', 
              optimizer = 'adam',
              metrics=['acc'])
# 模型训练
model.fit(x_train,y_train,epochs=1,batch_size=256)
y_pred = np.argmax(model.predict(x_train)[2],-1)

zhong_ddbb

关注

7
点赞
踩
92

收藏

觉得还不错? 一键收藏
3
评论
基于tensorflow2.0的RNN实战（单向RNN，双向RNN，Attention+RNN）

文章目录基本结构实战—单向RNN实战—双向RNN实战—Attention+RNN基本结构RNN常见的结构如下：其中的单元AAA可以是全连接RNN，LSTM，GRU。tensorfow2.0中将这三个封装到以下接口中：keras.layers.SimpleRNNkeras.layers.GRUkeras.layers.LSTM对于序列预测，如机器翻译，我们需要同时获得所有的单元A的输出(h0,h1⋯ ,ht)(h_0,h_1\cdots,h_t)(h0,h1⋯,ht)。对于分类和回归问
复制链接

扫一扫

专栏目录