解决tensorflow gpu报错: ran out of memory (OOM)

报错原因及解决方案

报错现象:

Allocator (GPU_0_bfc) ran out of memory trying to allocate 200.00MiB (rounded to 209715200).  Current allocation summary follows.
<省略>
Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc.
<省略>
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node word/word/bi-lstm-0/bi-lstm-0/bw/bw/while/lstm_cell/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[add/_77]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[51200,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node word/word/bi-lstm-0/bi-lstm-0/bw/bw/while/lstm_cell/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

报错原因在于GPU显存不够用了,可以在运行过程通过命令watch -n 1 nvidia-smi查看GPU的显存利用率,退出用ctrl+C。注意中间那一列才是显存利用情况"已分配显存/可用总显存",最右边那列是GPU利用率,这是另一回事(这就和内存利用率和CPU利用率是两回事是同一个道理)。

在这里插入图片描述

解决方案有几种,根据自己的情况来选择:

  1. 如果没有设置允许tensorflow根据需要自动增加申请的显存,可以先尝试这个,看看用上单个GPU的所有显存能否解决。设置自动增长的代码见后文。
  2. 如果网络中用到了RNN,可以使用swap_memory=True选项以减少其对显存的占用。例如tf.nn.bidirectional_dynamic_rnn()方法就有这个参数。设置之后,tensorflow会将RNN前向运算产生但反向传播需要用到的tensor从GPU转移到CPU中(从显存转移到内存),这几乎不会(甚至完全不会)带来性能上的损失。
  3. 减小batch_size或减小RNN的序列的最大长度(即时间步长)
  4. 换个显存更大的GPU吧

P.S. 设置显存自动增长的代码:

 	config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)

References

  1. stack overflow问题:How can I solve ‘ran out of gpu memory’ in TensorFlow
  2. tensorflow文档:bidirectional_dynamic_rnn
  • 4
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 5
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值