RuntimeError: Expected hidden[0] size (x, x, x), got(x, x, x)

最新推荐文章于 2024-03-09 19:28:59 发布

带鱼工作室

最新推荐文章于 2024-03-09 19:28:59 发布

阅读量5.6k

点赞数 10

分类专栏： Pytorch 神经网络文章标签：深度学习神经网络 pytorch

本文链接：https://blog.csdn.net/liaoningxinmin/article/details/120613558

版权

神经网络同时被 2 个专栏收录

14 篇文章 1 订阅

订阅专栏

Pytorch

9 篇文章 4 订阅

订阅专栏

先上图：

上图是在训练BILSTM网络时出现的问题。

问题描述：通过定义BILSTM网络的初始权重h0，c0，并将其作为BILSTM的初始权重输入至网络，通过如下代码实现

output, (hn, cn) = self.bilstm(input, (h0, c0))

网络结构如下所示：

self.bilstm = nn.LSTM(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            bidirectional=True,
            bias=True,
            dropout=config.drop_out
        )

初始权重的维度在此我根据官方文档的定义为 h0，c0 进行初始化，维度为：

**h_0** of shape `(num_layers * num_directions, batch, hidden_size)`
**c_0** of shape `(num_layers * num_directions, batch, hidden_size)`

在BILSTM网络中参数定义如下：

num_layers: 2

num_directions: 2

batch: 4

seq_len: 10

input_size: 300

hidden_size: 100

那么根据官方文档中定义的 h0，c0 维度应为：（ 2*2，4，100）=（4，4，100）

但根据文章最开始的错误截图表明，隐藏层初始权重的维度应该为（4，10，100），这不禁让我怀疑官方文档中规定的维度是否正确。

显然，官方文档是不可能错的，并且在以往使用BLSTM、RNN、BIGRU时的隐状态维度均与官方规定的维度相同，一时不知从何下手。

于是重新查看网络结构，发现遗漏了一个重要参数，即batch_first，来看一下BILSTM所需的所有参数：

Args:
        input_size: The number of expected features in the input `x`
        hidden_size: The number of features in the hidden state `h`
        num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
            would mean stacking two LSTMs together to form a `stacked LSTM`,
            with the second LSTM taking in outputs of the first LSTM and
            computing the final results. Default: 1
        bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
            Default: ``True``
        batch_first: If ``True``, then the input and output tensors are provided
            as (batch, seq, feature). Default: ``False``
        dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
            LSTM layer except the last layer, with dropout probability equal to
            :attr:`dropout`. Default: 0
        bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``

batch_first 参数可以使得在训练过程中 batch这个维度在第一维，即输入数据维度为

（batch size，seq len，embedding dim），如果不添加 batch_first=True ，则其维度为

（seq len，batch size，embedding dim）

由于中午没有休息，迷迷糊糊的忘记添加了这个重要的参数，导致报错：初始权重维度不正确，通过添加 batch_first=True 后顺利的运行。

修改后网络结构如下：

self.bilstm = nn.LSTM(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,
            bias=True,
            dropout=config.drop_out
        )

扩展：当我们使用RNN及其变体网络时，想要添加初始权重，其维度一定是官方规定的维度，即

(num_layers * num_directions, batch, hidden_size)

同时一定要确保设置 batch_first=True ，官方文档中并未说明当设置 batch_first=True 时 h0、c0、hn、cn 的维度才为 (num_layers * num_directions, batch, hidden_size)，所以千万要小心谨慎！

同时当hn、cn的维度不正确时，也要检查是否设置 batch_first 参数，RNN及其变体网络均适用该方法！

带鱼工作室

关注

10
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
1
评论
RuntimeError: Expected hidden[0] size (x, x, x), got(x, x, x)

先上图：上图是在训练BILSTM网络时出现的问题。问题描述：通过定义BILSTM网络的初始权重h0，c0，并将其作为BILSTM的初始权重输入至网络，通过如下代码实现output, (hn, cn) = self.bilstm(input, (h0, c0))网络结构如下所示：self.bilstm = nn.LSTM( input_size=self.input_size, hidden_size=self.hidden_siz.
复制链接

扫一扫