pytorch:在执行loss.backward()时out of memory报错

最新推荐文章于 2024-06-07 21:23:03 发布

在路上的McCoff

最新推荐文章于 2024-06-07 21:23:03 发布

阅读量7k

点赞数 4

分类专栏：问题记录中级文章标签：深度学习 python

本文链接：https://blog.csdn.net/weixin_43953686/article/details/105897353

版权

中级同时被 2 个专栏收录

16 篇文章 1 订阅

订阅专栏

问题记录

3 篇文章 0 订阅

订阅专栏

在自己编写SurfNet网络的过程中，出现了这个问题，查阅资料后，将得到的解决方法汇总如下
可试用的方法：

reduce batch size, all the way down to 1
remove everything to CPU leaving only the network on the GPU
remove validation code, and only executing the training code
reduce the size of the network (I reduced it significantly: details below)
I tried scaling the magnitude of the loss that is backpropagating as well to a much smaller value
在训练时，在每一个step后面加上：

torch.cuda.empty_cache()

在每一个验证时的step之后加上代码：

   with torch.no_grad()

不要在循环训练中累积历史记录

total_loss = 0
for i in range(10000):
    optimizer.zero_grad()
    output = model(input)
    loss = criterion(output)
    loss.backward()
    optimizer.step()
    total_loss += loss

total_loss在循环中进行了累计，因为loss是一个具有autograd历史的可微变量。你可以通过编写total_loss += float(loss)来解决这个问题。

本人遇到这个问题的原因是，自己构建的模型输入到全连接层中的特征图拉伸为1维向量时太大导致的，加入pool层或者其他方法将最后的卷积层输出的特征图尺寸减小即可。

在路上的McCoff

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
6
评论
pytorch:在执行loss.backward()时out of memory报错

在自己编写SurfNet网络的过程中，出现了这个问题，查阅资料后，将得到的解决方法汇总如下可试用的方法：reduce batch size, all the way down to 1remove everything to CPU leaving only the network on the GPUremove validation code, and only executing ...
复制链接

扫一扫

专栏目录