LSTM(GRU)在Pytorch和Tensorflow中的区别

最新推荐文章于 2024-04-30 17:28:01 发布

Mr.Rang

最新推荐文章于 2024-04-30 17:28:01 发布

阅读量5.4k

点赞数 6

文章标签： python 人工智能神经网络

本文链接：https://blog.csdn.net/shixianrang123/article/details/124021265

版权

Pytorch和Tensorflow作为现在最流行的神经网络的框架，是现在绝大多数神经网络爱好者用来搭建神经网络模型的必要框架。Pytorch背后是Facebook人工智能研究院(FAIR)，Tensorflow背后是谷歌人工智能团队谷歌大脑(Google Brain)。

循环神经网络是和卷积神经网络一样重要和值得了解和学习的神经网络，一般用于处理数据点的序列或时间序列，如自然语言处理，而卷积神经网络一般用于处理结构化的数据，如图像等。当然这只是一般普通情况下的用法，大家也可以标新立异！

在循环神经网络中有两种常用的模型，分别是LSTM(长短期记忆，long short-term memory)和GRU(门控循环单元，gated recurrent unit)，Pytorch和Tensorflow这两种架构均对其做出了API类的实现，现在我们来分析一下这两种框架实现的循环神经网络的区别（以LSTM为例，GRU类似）。

注：由于Tensorflow已更新到2.7版本，而且谷歌已不再对Tensorflow2.0以前的版本进行维护，所以本文都是基于Tensorflow2.0以后的版本！还有就是在Pytorch中，张量的第二个维度表示批量(batch)，而在Tensorflow中，张量的第一个维度表示批量(batch)

LSTM in Pytorch

在Pytorch中，LSTM的定义如下（这就是Pytorch中定义的一个LSTM类，在Pytorch中的所有关于神经网络的模块都是定义为一个类(Class)，所以在使用时需要先将其实例化）：

torch.nn.LSTM(*args, **kwargs)

其中我认为最重要和最容易搞不清楚的参数如下：

input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
bidirectional – If True, becomes a bidirectional LSTM. Default: False

使用方法如下：

import torch
import torch.nn as nn

input_size = 8
hidden_size = 5
num_layers = 2
seq_len = 10  # time_step
batch = 32
num_directions = 1

rnn = nn.LSTM(input_size,hidden_size,num_layers)  # 实例化一个LSTM类  # (input_size,hidden_size,num_layers)
input = torch.randn(seq_len,batch,input_size)  # (seq_len,batch,input_size)
h0 = torch.randn(num_layers*num_directions, batch, hidden_size)  # (num_layers*num_directions, batch, hidden_size)
c0 = torch.randn(num_layers*num_directions, batch, hidden_size)  # (num_layers*num_directions, batch, hidden_size)
output, (hn, cn) = rnn(input, (h0, c0))  # 用实例化后的对象进行运算
print(output.shape)
print(hn.shape)
print(cn.shape)

输出：

torch.Size([10, 32, 5])
torch.Size([2, 32, 5])
torch.Size([2, 32, 5])

LSTM in Tensorflow

在Tensorflow中，LSTM的定义如下（本质上也是一个类）：

tf.keras.layers.LSTM(
    units, activation='tanh', recurrent_activation='sigmoid',
    use_bias=True, kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros', unit_forget_bias=True,
    kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
    activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
    bias_constraint=None, dropout=0.0, recurrent_dropout=0.0,
    return_sequences=False, return_state=False, go_backwards=False, stateful=False,
    time_major=False, unroll=False, **kwargs
)

其中我认为最重要和最容易搞不清楚的参数如下：

units – Positive integer, dimensionality of the output space.
return_sequences – Boolean. Whether to return the last output. in the output sequebce, or the full sequence. Default: False.

使用方法如下：

inputs = tf.random.normal([32, 10, 8])   # (batch, seq_len, input_size)
lstm = tf.keras.layers.LSTM(4)  # (units,) # 这里的units等同与Pytorch中的hidden_size
output = lstm(inputs)
lstm = tf.keras.layers.LSTM(4, return_sequences=True, return_state=True) # (units, return_sequences, return_state)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(output.shape)
print(whole_seq_output.shape)
print(final_memory_state.shape)
print(final_carry_state.shape)

输出：

(32, 4)
(32, 10, 4)
(32, 4)
(32, 4)

通过以上分析，LSTM在Pytorch和Tensorflow上的不同主要表现在实例化时需要的参数不同。具体如下：

Pytorch中的LSTM实例化时需要知道输入张量的维度大小(即input_size)，而Tensorflow中的LSTM则不需要，它可以更根据输入自己推断出输入张量维度大小。
Tensorflow中的LSTM在实例化时一般只需要一个参数(即units，这个参数和Pytorch中LSTM所需的参数中的hidden_size是同一个意思，即表示经过LSTM计算的输出的维度大小)，不需要像Pytorch中的LSTM一样还需要知道输入张量的大小。
Pytorch中LSTM中的num_layers参数表示实例化后的对象由几个LSTM模块进行堆栈而成，而在Tensorflow中则没有这个参数，即表示只有1个模块进行堆栈，和Pytorch中的LSTM中num_layers=1时等效。当在Pytorch中的LSTM中设置num_layers=2时，即表示实例化后的LSTM将由2个循环神经网络层进行堆栈。想在Tensorflow中实现相同的功能，就需要设置2层循环神经网络层，而在Pytorch中只需设置一个参数即可。虽然Pytorch中的LSTM变得简单，但是不利于进行个性化设置，个人认为Tensorflow中的LSTM稍好。
在Pytorch中的LSTM不仅输出了每个序列中每个元素计算的结果，也输出了中间变量，而在Tensorflow中的LSTM则需要进行设置return_sequences=True和return_state=True来实现相同的功能，这两参数默认为False，即只输出一个序列中最后一个元素的计算结果。
在Pytorch中的LSTM中，只需将bidirectional设置为True,，即可实现一个双向LSTM，而在Tensorflow中，则还需要结合tf.keras.layers.Bidirectional来使用(如下所示)，使用示例如下：
```
Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 10)))
```

GRU和LSTM类似，这里就不展开细讲了，嘿嘿。

在这篇短文中，主要简单分析一下Pytorch和Tensorflow中的循环神经网络模块的不同之处！以后有机会再细聊循环神经网络。

更多内容可以关注个人做着玩的微信公众号。。。

Mr.Rang

关注

6
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
LSTM(GRU)在Pytorch和Tensorflow中的区别

在循环神经网络中有两种常用的模型，分别是LSTM(长短期记忆，long short-term memory)和GRU(门控循环单元，gated recurrent unit)，Pytorch和Tensorflow这两种架构均对其做出了API类的实现，现在我们来分析一下这两种框架实现的循环神经网络的区别（以LSTM为例，GRU类似）。
复制链接

扫一扫