解决：RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0； 2.00 GiB total capacity； 1

最新推荐文章于 2024-05-24 12:32:34 发布

蠕动的爬虫

最新推荐文章于 2024-05-24 12:32:34 发布

阅读量6.7w

点赞数 51

分类专栏： Problems and Solutions 文章标签：神经网络深度学习 pytorch

本文链接：https://blog.csdn.net/weixin_43760844/article/details/113462431

版权

Problems and Solutions 专栏收录该内容

12 篇文章 3 订阅

订阅专栏

1. 问题
2. 分析
3. 解决

1. 问题

训练模型时报错：
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 2.00 GiB total capacity; 1.49 GiB already allocated; 57.03 MiB free; 6.95 MiB cached)

在这里插入图片描述

2. 分析

这种问题，是GPU内存不够引起的

3. 解决

方法一：通过修改代码减少GPU内存使用量
（1）不计算梯度
部分代码：

        for x, y in dataload:
            step += 1
            inputs = x.to(device)
            labels = y.to(device)
            # zero the parameter gradients
            optimizer.zero_grad()
            # forward
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()#梯度下降,计算出梯度
            optimizer.step()

在 outputs = model(inputs)那里加上一行代码：with torch.no_grad():

            ...
            optimizer.zero_grad()
            # forward
            with torch.no_grad():#添加这行代码
                outputs = model(inputs)#再缩进这行
            loss = criterion(outputs, labels)
            ...

说明：对于tensor的计算操作，默认是要计算梯度和进行反向传播的，而torch.no_grad()用来禁止梯度的计算和反向传播，可以减少GPU的内存使用量。在验证和测试阶段不需要计算梯度反向传播，可以添加with torch.no_grad():。在作用域范围内的操作不会计算梯度，也不会进行反向传播。

如果添加上述代码后还不行，那是因为我们前面虽然禁止了计算梯度，但是却没有计算梯度的Variable，因为这里的inputs本身默认就是不求梯度的。所以还会报错：
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

在这里插入图片描述

这时需要构建Variable，并传入参数：requires_grad=True，这个参数表示对这个变量求梯度，默认是False。

			...	
			inputs = x.to(device)
            labels = y.to(device)
			#添加下面两行代码
            inputs = Variable(inputs,requires_grad=True)
            labels = Variable(labels,requires_grad=True)
            ...

记得还需要在代码最前面添加 from torch.autograd import Variable才能导入Variable
（2）调整Batch Size

还有一种方法就是调整Batch Size（即一次训练所抓取的数据样本数量）的大小，Batch size大小的选择也至关重要。为了在内存效率和内存容量之间寻求最佳平衡，batch size应该精心设置，从而最优化网络模型的性能及速度。
在卷积神经网络中，大的batch size通常可使网络更快收敛，但由于内存资源的限制，batch size过大可能会导致内存不够用，如果GPU内存不够用出现以上情况，可以将Batch Size适当的设置小一点。

方法二：更换更高性能更高显存的显卡

这样所有问题都解决了。

蠕动的爬虫

关注

51
点赞
踩
182

收藏

觉得还不错? 一键收藏
25
评论
解决：RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0； 2.00 GiB total capacity； 1

1. 问题2. 分析3. 解决1. 问题训练模型时报错：RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 2.00 GiB total capacity; 1.49 GiB already allocated; 57.03 MiB free; 6.95 MiB cached)2. 分析这种问题，是GPU内存不够引起的3. 解决方法一：换高性能高显存的显卡方法二：修改代码报错的训练代码为.
复制链接

扫一扫