针对Tensroflow object detction API应用,执行train.py后训练到一半出现错误
错误信息如下:
INFO:tensorflow:global step 110: loss = 0.3455 (0.875 sec/step)
INFO:tensorflow:global step 110: loss = 0.3455 (0.875 sec/step)
INFO:tensorflow:global step 111: loss = 0.3455 (0.859 sec/step)
INFO:tensorflow:global step 111: loss = 0.3455 (0.859 sec/step)
INFO:tensorflow:Error reported to Coordinator: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
[[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1157_...egularizer", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
File "E:/tensorflow_learn/my-traffic-sign-detection/TensorFlow--Models-master/research/object_detection/train.py", line 188, in <module>
tf.app.run()
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "E:/tensorflow_learn/my-traffic-sign-detection/TensorFlow--Models-master/research/object_detection/train.py", line 184, in main
graph_hook_fn=graph_rewriter_fn)
File "E:\tensorflow_learn\my-traffic-sign-detection\TensorFlow--Models-master\research\object_detection\trainer.py", line 352, in train
model_var.op.name, model_var))
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\summary\summary.py", line 203, in histogram
tag=tag, values=values, name=scope)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 309, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
[[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1157_...egularizer", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
[[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/jo
将预训练模型的配置文件(如:ssd_mobilenet_v1_coco.config)中的
train_config {
batch_size: 1
data_augmentation_options {
random_horizontal_flip {
}
}
修改为:
train_config {
batch_size: 2 ##不要设置为1
data_augmentation_options {
random_horizontal_flip {
}
}
备注:在利用Tensroflow object detction API训练自己的数据时,遇到了许多错误,但这个错误折磨了我许久,所以在这里记录以下。