win10 +cude9.2+相匹配的cudnn+相匹配的tensorflow+ssd学习之路（问题百出1）

最新推荐文章于 2022-03-14 22:40:05 发布

SuperLuu7

最新推荐文章于 2022-03-14 22:40:05 发布

阅读量828

点赞数

本文链接：https://blog.csdn.net/huachuchengzhang/article/details/88314481

版权

参考网址：程序下载：https://github.com/balancap/SSD-Tensorflow

博客参考：https://blog.csdn.net/yexiaogu1104/article/details/77415990

https://blog.csdn.net/qq_36396104/article/details/82857533

在train的时候出现的错误：

019-03-07 15:13:44.886676: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 771840000 totalling 736.08MiB
2019-03-07 15:13:44.886834: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:678] Sum Total of in-use chunks: 3.49GiB
2019-03-07 15:13:44.886986: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:680] Stats:
Limit: 6871947673
InUse: 3751763456
MaxInUse: 6863138048
NumAllocs: 1351554
MaxAllocSize: 1736667648

2019-03-07 15:13:44.887652: W c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] *********_______***********_____**___************____*_*****______***_***********************_______
2019-03-07 15:13:44.887867: W c:\users\user\source\repos\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
INFO:tensorflow:Error reported to Coordinator: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: ssd_300_vgg/block8/conv3x3/Relu/_229 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1177_ssd_300_vgg/block8/conv3x3/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'ssd_300_vgg/conv1/conv1_2/Conv2D', defined at:
File "D:/work/SSD-Tensorflow-master/train_ssd_network.py", line 390, in <module>
tf.app.run()
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "D:/work/SSD-Tensorflow-master/train_ssd_network.py", line 291, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "D:\work\SSD-Tensorflow-master\deployment\model_deploy.py", line 196, in create_clones
outputs = model_fn(*args, **kwargs)
File "D:/work/SSD-Tensorflow-master/train_ssd_network.py", line 275, in clone_fn
ssd_net.net(b_image, is_training=True)
File "D:\work\SSD-Tensorflow-master\nets\ssd_vgg_300.py", line 155, in net
scope=scope)
File "D:\work\SSD-Tensorflow-master\nets\ssd_vgg_300.py", line 452, in ssd_net
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 2607, in repeat
outputs = layer(outputs, *args, **kwargs)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1154, in convolution2d
conv_dims=2)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 774, in apply
return self.__call__(inputs, *args, **kwargs)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\layers\base.py", line 329, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 703, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\layers\convolutional.py", line 184, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\nn_ops.py", line 868, in __call__
return self.conv_op(inp, filter)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\nn_ops.py", line 520, in __call__
return self.call(inp, filter)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\nn_ops.py", line 204, in __call__
name=self.name)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1042, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
op_def=op_def)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py", line 495, in run
self.run_loop()
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\supervisor.py", line 1035, in run_loop
self._sv.global_step])
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:/work/SSD-Tensorflow-master/train_ssd_network.py", line 390, in <module>
tf.app.run()
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "D:/work/SSD-Tensorflow-master/train_ssd_network.py", line 386, in main
sync_optimizer=None)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 785, in train
ignore_live_threads=ignore_live_threads)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\supervisor.py", line 833, in stop
ignore_live_threads=ignore_live_threads)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "D:\python\soft\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py", line 495, in run
self.run_loop()
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\supervisor.py", line 1035, in run_loop
self._sv.global_step])
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "C:\Users\11327\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,64,300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: ssd_300_vgg/conv1/conv1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_300_vgg/conv1/conv1_1/Relu, ssd_300_vgg/conv1/conv1_2/weights/read/_115)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

2，改为16之后就可以运行了，一直在训练

那么新的问题（菜鸟问题）又出现了：

就是程序一直在训练，按理说两三天这几千张图片应该训练好的，但是一直在训练，我就一直在等训练结果。后来等到第四天，闲的无聊，就随便查看一下源程序有关解释，无意中发现训练模型运行文件train_ssd

_network.py

中有个函数，可以设置最大训练步数，里面设置的参数是None，这个参数的意思就是无限循环。一般5000张图像训练步数设置为50000步差不多了。（刚上手的小白真的不知道这里有这个设置）

参考博客：https://blog.csdn.net/weixin_39881922/article/details/80569803

SuperLuu7

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
7
评论
win10 +cude9.2+相匹配的cudnn+相匹配的tensorflow+ssd学习之路（问题百出1）

参考网址：程序下载：https://github.com/balancap/SSD-Tensorflow博客参考：https://blog.csdn.net/yexiaogu1104/article/details/77415990https://blog.csdn.net/qq_36396104/article/details/82857533在train的时候出现的错误：019...
复制链接

扫一扫