深度学习实战 (7) 解决pytorch多GPU训练gru时的hx is not contiguous报错

最新推荐文章于 2023-03-20 18:05:50 发布

icebird_craft

最新推荐文章于 2023-03-20 18:05:50 发布

阅读量1.7k

点赞数 1

分类专栏： pytorch深度学习

本文链接：https://blog.csdn.net/icestorm_rain/article/details/111414962

版权

pytorch深度学习专栏收录该内容

14 篇文章 3 订阅

订阅专栏

我已经在之前的实战系列文章中介绍了如何搭建双向GRU
今天我们继续向前走，在训练的时候如果我们有多张显卡，那么我们肯定想要用多GPU训练呀，这样我们不仅可以搭建更大的网络，而且训练速度更快。
这里我使用单击多卡的训练方式，通过nn.DataParallel将模型放到多卡上，默认使用所有显卡。

    Encoder = Roberta_Encoder(Roberta_model,gru_layers=2,batch_size=Config.batch_size)
    Decoder = Transformers_Decoder(embedding_dim, nhead, num_decoder_layers, dropout, ntokens)
    Encoder = torch.nn.DataParallel(Encoder)
    Decoder = torch.nn.DataParallel(Decoder)
    if torch.cuda.is_available():
        Encoder = Encoder.cuda()
        Decoder = Decoder.cuda()

处理hx is not contiguous报错

但是在多gpu训练的时候很快就遇到了第一个错误：

hx is not contiguous
导致这个错误的原因是因为DataParallel会沿着tensor的维度0将数据均分到多卡上，但我们的gru是需要一个initial_hidden的，我之前教的声明方法如下，那我们当然不能让他沿着self.n_layers切割呀，于是我就用transpose操作将batch_size挪动到了维度0，n_layers到了维度1，然后再进行前向传播，在前向转播的过程中调用transpose再让他回到原来的维度分布。

    def init_hidden(self):
        # 这个函数写在这里，有一定迷惑性，这个不是模型的一部分，是每次第一个向量没有上下文，在这里捞一个上下文，仅此而已。
        hidden = torch.autograd.Variable(
            torch.zeros(2 * self.n_layers,self.batch_size, 768))
        return hidden

好了，问题就是出在forward里面对gru_hidden做了transpose操作，我们只需要在后面加个contiguouse就行了！！！如下所示：

    def forward(self, input_ids, attention_mask,gru_hidden):
        gru_hidden =gru_hidden.transpose(0,1).contiguous()
        output = self.encoder(input_ids, attention_mask)[0]
        gru_output,_ = self.gru(output,gru_hidden)
        return output

处理UserWarning: RNN module weights are not part of single contiguous chunk of memory

但是这个时候你会发现一个警告，我比较有强迫症，而且这个警告后面还告诉我会导致有额外的gpu损耗！！！！这怎么可以？于是我按照提示将代码改成下面这样，就可以避免警告了，完美收工：

    def forward(self, input_ids, attention_mask,gru_hidden):
        gru_hidden =gru_hidden.transpose(0,1).contiguous()
        output = self.encoder(input_ids, attention_mask)[0]
        if not hasattr(self, '_flattened'):
            self.gru.flatten_parameters()
        setattr(self, '_flattened', True)
        gru_output,_ = self.gru(output,gru_hidden)
        return output

icebird_craft

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
深度学习实战 (7) 解决pytorch多GPU训练gru时的hx is not contiguous报错

解决pytorch多GPU训练gru时的hx is not contiguous报错和UserWarning: RNN module weights are not part of signle contiguous chunk of memory
复制链接

扫一扫