TensorFlow训练报错：ResourceExhaustedError: OOM when allocating tensor device:GPU:0 by allocator G

最新推荐文章于 2024-01-05 12:00:35 发布

AiCharm

最新推荐文章于 2024-01-05 12:00:35 发布

阅读量2.3k

点赞数 3

分类专栏：深度学习报错调试合集文章标签：深度学习 tensorflow

本文链接：https://blog.csdn.net/muye_IT/article/details/124448901

版权

深度学习报错调试合集专栏收录该内容

11 篇文章

订阅专栏

使用TensorFlow训练某些较大模型时会发生内存溢出，如果已经安装了TensorFlow-GPU版本，训练时会优先调用GPU版本的TensorFlow，而一般电脑上显存比较小，很容易发生溢出，就会出现如下报错：

ResourceExhaustedError:  OOM when allocating tensor with shape[1024,728,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node model/block13_sepconv2/separable_conv2d (defined at <ipython-input-41-425b3e9b7078>:11) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_41706]
 
Function call stack:
train_function

解决方案：

1. CPU进行训练

尝试使用CPU进行训练，将model.fit()代码做如下修改：

with tf.device("/cpu:0"):
    history = model.fit(替换成自己的代码)

输出：

Epoch 1/50
43/86 [===========>.............] - ETA: 16:08 - loss: 0.4574 - 
accuracy: 0.8438

2. Jupyter notebook

tf.keras.backend.clear_session()

如果在notebook中运行了很多代码，则会占用一定的内存，上面的代码顾名思义就是清楚掉之前运行的一些session，以释放空间。

3. 修改batch_size

如果还是不行，则只能修改代码，将批次数batch_size改小一些，每次给模型喂入小批量的数据。

关注博主即可阅读全文

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

AiCharm

关注关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
1
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
打赏
打赏
打赏举报

举报

专栏目录

Tensorflow调参报错：Resource exhausted OOM when allocating tensor with shape

qq_24594197的博客

11-13

6392

错误信息 Resource ex hausted: OOM when allocating tensor with shape[200,256,28,28] and**** 这是一种调参时常遇到的问题，由于电脑显存不够而导致，我的电脑显存是8g，在调整参数 IMAGES_PER_GPU = 2时，会导致这样的错误，将其改回1错误消失(降低了batch size的大小)，一般的解决办法：减少Batch 的大小分析错误的位置，在哪一层出现显卡不够，比如在全连接层出现的，则降低全连接层的维度，把204

TensorFlow训练报错：ResourceExhaustedError: OOM when allocating tensor device:GPU:0 by allocator GPU_0_b

bigcindy的博客

07-11

1万+

使用TensorFlow训练某些较大模型时会发生内存溢出，如果已经安装了TensorFlow-GPU版本，训练时会优先调用GPU版本的TensorFlow，而一般电脑上显存比较小，很容易发生溢出，就会出现如下报错： ResourceExhaustedError: OOM when allocating tensor with shape[1024,728,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allo

1 条评论您还未登录，请先登录后发表或查看评论

(0) Resource exhausted: OOM when allocating tensor with shape[16,12,512,512] and type bool on

热门推荐

qq_36427732的博客

12-08

2万+

使用keras库进行模型训练时，出现以下错误：ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10,256,400,528]解决方法如下：[10,256,400,528]的第一个参数表示batch_size的大小，第二个参数表示某层卷积核的个数，第三个参数表示图像的高，第四

Resource exhausted: OOM when allocating tensor with shape[2] and type int32 on /job:localhost/replic

qq_22169787的博客

04-08

2206

当显示Resource exhausted错误时，一般会在之前显示好多错误提示，提示的内容都和内存相关，此error是电脑内存不够，目前我找到解决问题的方法有三种： 1.改变输入图像的大小，例如原来输入图像的大小为448x448，你更改成224x224，再运行程序，当然需要修改你程序对应的部分。如果224x224还不可以的话，在不影响测试效果的情况下，继续减小，实在不行的话那就需要参考第三条了。...

报错解决：ResourceExhaustedError: OOM when allocating tensor with shape

个人博客

04-14

2万+

报错解决：ResourceExhaustedError: OOM when allocating tensor with shape 早上在使用tensorflow时遇到如下报错： Traceback (most recent call last): File "C:\Users\peter\Anaconda3\lib\site-packages\spyder_kernels\custom...

ResourceExhaustedError: OOM when allocating tensor with shape[512] and type float on /job:localhost

weixin_44152421的博客

05-26

1万+

运行resnet50程序时，出现这个错误。 **原代码地址：**https://github.com/calmisential/TensorFlow2.0_ResNet 错误显示： ResourceExhaustedError: OOM when allocating tensor with shape[512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node res_

tensorflow资源耗净 Resource exhausted OOM when allocating tensor with shape

妙音

05-01

4097

描述 tensorflow跑训练集经常会遇到错误Resource exhausted: OOM when allocating tensor with shape[64,33,33,2048] 错误内容 tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resou...

训练深度学习模型时报错**ResourceExhaustedError:** OOM when allocating tensor with shape原因和解决方法分析

wyf425的博客

03-22

4482

训练深度学习模型时报错ResourceExhaustedError: OOM when allocating tensor with shape原因和解决方法分析一、报错内容报错内容如下图所示：这是在采用tensorflow训练网络时非常常见的报错形式。二、原因分析 **核心原因：**GPU显存不足。表面原因： Batchsize太大；图片尺寸太大；池化层池化效果不佳；等等。二、解决方法 **核心方法：**使用一切手段减少模型内存占用或扩大显卡内存或采用其他内存。开源节流相辅相成.

Error：OOM when allocating tensor with shape[......]

澄宁的博客

12-26

1万+

问题描述：训练模型时，遇到：原因：内存不够，OOM即ran out of memory 可以使用top命令动态监听运行时间非常短的程序的内存使用情况：参考启用2个session，一个session运行top命令，另一个session运行指定的程序，top命令会输出与这个命令相关的资源使用情况。解决方法：最直观的方式就是减小batch_size或者hidden_layer中的单元数 ...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072]

Learning

03-16

4611

跑模型的时候出现了下面的错误（太长了，所以只保留了有用的关键信息）。在网上得知，出现这种错误的原因可能是显存空间不够，这有可能是使用的batch_size过大或者显卡被其他服务占用引起的。之后我查看了一下源码，偶然间发现代码里使用的n_gpu的默认值是4，我将其修改为1并重新运行代码之后，代码被成功执行。结合网上搜索到的资源和我的这次试验，总结一下出现这个问题的原因： batch_size太...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[682112,156]

secretxx的博客

04-25

5727

运行出错结果如下：出错原因主要和自己的算法有关，在图像识别的程序中，我的filters的数量分配如下：Filters的数量越多，同一张图片经过这些filters提取特征后占用显卡内存数量越大，经过这两层众多数量的filter卷积后，占用的内存极大。另外：图片经过这些filters提取特征后缓存在显卡内存中，经过flaten展平，在经过一层包含1024个神经元的神经网络学习，每一批（batch）数据...

tensorflow ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

loovelj的博客

03-29

1045

1、问题：在tensorflow 的Object detection 中，如果出现上面的错误，就是内存溢出，主要原因是设置的batchsize过大 2、解决方法：在config中，减少batch_size的数值就好 ...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1105,7,7

weixin_36670529的博客

11-30

891

一般出现这个问题是GPU内存不够用，重新指定GPU即可。

TensorFlow中OOM错误

WeiXy的博客

10-30

634

I have seen OOMs happen several epochs into training in tensorflow, my best guess is that if your model is at the borderline of using all the GPU memory then internal memory allocation issues such as ...

Tensorflow中遇到OOM when allocating tensor

lanyuxuan100的博客

04-21

2万+

在使用tensorflow训练程序的时候，遇到了如下错误：这表明内存溢出，在查看发现有另外一个训练程序与之争抢资源，停掉，在重新给训练任务分配GPU，避免争抢，问题解决。

[显存被占满，程序无法运行问题]ResourceExhaustedError (see above for traceback): OOM when allocating tensor

Fox_Alex的博客

04-10

1万+

最近在实验室的服务器上跑tensorflow程序，一直都没有报错，但是今天却突然报错，而且出错提示显示的内容从未见到过，错误提示如下：错误提示资源耗尽，无法分配tensor出错。通过在网上查找原因以后才明白，是因为后台存在其他进程占用GPU资源。问题出在两方面，一个是有其他进程正在GPU上运行占用GPU资源，另外一个是由于所写的模型逻辑出了问题，比如分配大小等等。第二种情况需要自己重新检查...

Tensorflow运行时ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

qq_40431912的博客

02-22

3786

使用Tensorflow库进行cifar-10模型测试时，出现以下错误： ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,32,32] and type float on /job:localhost/replica:0/task:0/device:GP...

ResourceExhaustedError: OOM when allocating tensor with shape[32,32,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

06-09

这个错误表示在 GPU 上分配张量时，内存不足。原因是模型或输入数据太大，超出了 GPU 的内存限制。解决这个问题的方法有几种： 1. 减少批量大小（batch_size）：减少每个批次处理的样本数量，可以减少 GPU 内存的使用量，但可能会影响模型的训练效果。 2. 减小模型的规模：可以通过减少模型中的层数或参数数量来减小模型的规模，从而减少 GPU 内存的使用量。 3. 使用更高内存的 GPU：如果您使用的 GPU 内存较小，可以考虑升级到内存更大的 GPU。 4. 使用分布式训练：将训练任务分发到多个 GPU 或计算机上进行并行训练，可以减少每个 GPU 的负载，从而减少内存使用量。 5. 使用混合精度训练：使用混合精度训练可以减少 GPU 内存的使用量，从而让您能够训练更大的模型或使用更大的批量大小。