tensorflow 显存 训练_训练卡死,GPU利用率为0,但是显存占满

博主在使用TensorFlow进行模型训练时遇到问题,训练过程突然卡死,GPU利用率显示为0,但显存已被占满。训练过程中可以看到每步的时间和例子处理速度,但到一定步骤后停止更新。博主使用的是双张GTX1080显卡进行训练,寻求解决显存占用和训练卡死的方法。
摘要由CSDN通过智能技术生成

博主,非常感谢你的pse-tensorflow复现,对我的帮助很大。请问一下,为什么我在训练过程中会出现卡死的情况

INFO:root:Step 000000, model loss 0.9816, total loss 1.2604, 1.28 seconds/step, 12.52 examples/second

INFO:root:Step 000010, model loss 0.9747, total loss 1.2535, 1.46 seconds/step, 10.94 examples/second

INFO:root:Step 000020, model loss 0.9603, total loss 1.2391, 0.80 seconds/step, 20.00 examples/second

INFO:root:Step 000030, model loss 0.9513, total loss 1.2301, 0.79 seconds/step, 20.31 examples/second

INFO:root:Step 000040, model loss 0.9450, total loss 1.2237, 0.79 seconds/step, 20.15 examples/second

INFO:root:Step 000050, model loss 0.9209, total loss 1.1995, 0.79 seconds/step, 20.19 examples/second

INFO:root:Step 000060, model loss 0.8839, total loss 1.1626, 0.80 seconds/step, 20.06 examples/second

INFO:root:Step 000070, model loss 0.9407, total loss 1.2193, 0.81 seconds/step, 19.83 examples/second

INFO:root:Step 000080, model loss 0.7876, total loss 1.0662, 0.80 seconds/step, 19.97 examples/second

INFO:root:Step 000090, model loss 0.9840, total loss 1.2626, 0.81 seconds/step, 19.72 examples/second

INFO:root:Step 000100, model loss 0.8153, total loss 1.0938, 0.81 seconds/step, 19.82 examples/second

INFO:root:Step 000110, model loss 0.8064, total loss 1.0850, 0.87 seconds/step, 18.29 examples/second

INFO:root:Step 000120, model loss 0.8660, total loss 1.1446, 0.81 seconds/step, 19.79 examples/second

INFO:root:Step 000130, model loss 0.7714, total loss 1.0499, 0.80 seconds/step, 19.99 examples/second

INFO:root:Step 000140, model loss 0.9863, total loss 1.2648, 0.81 seconds/step, 19.66 examples/second

INFO:root:Step 000150, model loss 0.8436, total loss 1.1220, 0.81 seconds/step, 19.75 examples/second

INFO:root:Step 000160, model loss 0.9230, total loss 1.2014, 0.81 seconds/step, 19.67 examples/second

INFO:root:Step 000170, model loss 0.9442, total loss 1.2226, 0.81 seconds/step, 19.74 examples/second

INFO:root:Step 000180, model loss 0.7808, total loss 1.0592, 0.81 seconds/step, 19.85 examples/second

INFO:root:Step 000190, model loss 0.9916, total loss 1.2700, 0.82 seconds/step, 19.48 examples/second

INFO:root:Step 000200, model loss 0.9583, total loss 1.2367, 0.81 seconds/step, 19.71 examples/second

INFO:root:Step 000210, model loss 0.7617, total loss 1.0401, 0.89 seconds/step, 18.08 examples/second

INFO:root:Step 000220, model loss 0.8324, total loss 1.1107, 0.81 seconds/step, 19.83 examples/second

INFO:root:Step 000230, model loss 0.7749, total loss 1.0533, 0.81 seconds/step, 19.79 examples/second

INFO:root:Step 000240, model loss 0.7469, total loss 1.0252, 0.80 seconds/step, 20.05 examples/second

INFO:root:Step 000250, model loss 0.9720, total loss 1.2504, 0.80 seconds/step, 19.90 examples/second

INFO:root:Step 000260, model loss 0.7180, total loss 0.9963, 0.82 seconds/step, 19.59 examples/second

INFO:root:Step 000270, model loss 0.8716, total loss 1.1499, 0.82 seconds/step, 19.58 examples/second

INFO:root:Step 000280, model loss 0.8580, total loss 1.1363, 0.80 seconds/step, 19.95 examples/second

INFO:root:Step 000290, model loss 0.9351, total loss 1.2134, 0.80 seconds/step, 20.02 examples/second

INFO:root:Step 000300, model loss 0.7840, total loss 1.0623, 0.80 seconds/step, 19.92 examples/second

INFO:root:Step 000310, model loss 0.9569, total loss 1.2352, 0.89 seconds/step, 18.05 examples/second

INFO:root:Step 000320, model loss 0.6371, total loss 0.9154, 0.81 seconds/step, 19.66 examples/second

INFO:root:Step 000330, model loss 0.8040, total loss 1.0823, 0.85 seconds/step, 18.82 examples/second

INFO:root:Step 000340, model loss 0.8689, total loss 1.1471, 0.81 seconds/step, 19.81 examples/second

INFO:root:Step 000350, model loss 0.8724, total loss 1.1506, 0.84 seconds/step, 19.01 examples/second

INFO:root:Step 000360, model loss 0.8443, total loss 1.1225, 0.83 seconds/step, 19.26 examples/second

INFO:root:Step 000370, model loss 0.8604, total loss 1.1386, 0.80 seconds/step, 20.12 examples/second

INFO:root:Step 000380, model loss 0.8354, total loss 1.1136, 0.87 seconds/step, 18.46 examples/second

INFO:root:Step 000390, model loss 0.7982, total loss 1.0764, 0.82 seconds/step, 19.53 examples/second

INFO:root:Step 000400, model loss 0.8847, total loss 1.1629, 0.84 seconds/step, 19.08 examples/second

INFO:root:Step 000410, model loss 0.8517, total loss 1.1299, 0.91 seconds/step, 17.55 examples/second

INFO:root:Step 000420, model loss 0.7689, total loss 1.0471, 0.82 seconds/step, 19.42 examples/second

之后就不再训练了,我应该怎么改,才能不出现占满卡死的情况

我是用的gtx1080 两张显卡训练的

望不吝赐教

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值