执行train.py时,训练到一半出现错误:Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv

针对Tensroflow object detction API应用,执行train.py后训练到一半出现错误

错误信息如下:

INFO:tensorflow:global step 110: loss = 0.3455 (0.875 sec/step)
INFO:tensorflow:global step 110: loss = 0.3455 (0.875 sec/step)
INFO:tensorflow:global step 111: loss = 0.3455 (0.859 sec/step)
INFO:tensorflow:global step 111: loss = 0.3455 (0.859 sec/step)
INFO:tensorflow:Error reported to Coordinator: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
	 [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
	 [[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1157_...egularizer", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance', defined at:
  File "E:/tensorflow_learn/my-traffic-sign-detection/TensorFlow--Models-master/research/object_detection/train.py", line 188, in <module>
    tf.app.run()
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "E:/tensorflow_learn/my-traffic-sign-detection/TensorFlow--Models-master/research/object_detection/train.py", line 184, in main
    graph_hook_fn=graph_rewriter_fn)
  File "E:\tensorflow_learn\my-traffic-sign-detection\TensorFlow--Models-master\research\object_detection\trainer.py", line 352, in train
    model_var.op.name, model_var))
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\summary\summary.py", line 203, in histogram
    tag=tag, values=values, name=scope)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 309, in histogram_summary
    "HistogramSummary", tag=tag, values=values, name=name)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
	 [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
	 [[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1157_...egularizer", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
    return fn(*args)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
	 [[Node: ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ModelVars/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/tag, FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance/read)]]
	 [[Node: FeatureExtractor/MobilenetV1/Conv2d_13_depthwise/depthwise_weights/Regularizer/l2_regularizer/_335 = _Recv[client_terminated=false, recv_device="/jo

将预训练模型的配置文件(如:ssd_mobilenet_v1_coco.config)中的

train_config {
batch_size: 1
data_augmentation_options {
random_horizontal_flip {
}
}

修改为:

train_config {
batch_size: 2 ##不要设置为1
data_augmentation_options {
random_horizontal_flip {
}
}

备注:在利用Tensroflow object detction API训练自己的数据时,遇到了许多错误,但这个错误折磨了我许久,所以在这里记录以下。

评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值