复现SCRDet:Towards More Robust Detection for Small, Cluttered and Rotated Objects 遇到的问题及解决方案
问题一、OutOfRangeError (see above for traceback)
可能的原因有:数据为空(出问题)或者局部变量没有初始化,初始化局部变量。
一般这种问题都不是代码的问题,请先检查训练数据:
\1. 训练数据中图像文件和标注文件数量是否相同
\2. 训练数据中是否有损坏的图片(数量多的话可以用PIL写个简单的加载方法去判断)
\3. 标注文件中标注的长宽与实际长宽是否相同(我的问题在这里得到了解决,下面列出检测的代码):
可是目前来说,数据tfrecord为99G。
也存在局部变量:
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
问题二、可以成功训练,但是出现loss=nan的问题
具体的出错是这样的:
rpn_loc_loss:0.017838023602962494 | rpn_cla_loss:1.3777527809143066 |
rpn_total_loss:1.3955907821655273 | fast_rcnn_loc_loss:nan |
fast_rcnn_cla_loss:nan | fast_rcnn_loc_rotate_loss:nan |
fast_rcnn_cla_rotate_loss:nan | fast_rcnn_total_loss:nan |
attention_loss:nan | total_loss:nan |
pre_cost_time:0.8750865459442139s
根据 train.py–>build_whole_networks.py :
(libs/networks/build_whole_networks.py )
cls_loss_h = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=cls_score_h,
labels=labels)) # beacause already sample before
print(80*‘l’)
print(cls_loss_h)
print(80*‘l’)
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Tensor(“build_loss/FastRCNN_loss/Mean:0”, shape=(), dtype=float32)
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
(libs/losses/losses.py)
normalizer = tf.to_float(tf.shape(bbox_pred)[0])
print(80*‘l’)
print(normalizer)
print(80*‘l’)
WARNING:tensorflow:From train.py:87: get_regularization_losses (from
tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and
will be removed after 2016-12-30. Instructions for updating: Use
tf.losses.get_regularization_losses instead.
根据错误提示,是因为定义损失函数时,函数版本在2016-12-30过期,所以在出错文件tarin.py中将loss_ops调用出换成tf.losses.get_regularization_losses解决。
同理,
WARNING:tensorflow:From …/libs/networks/resnet.py:183: calling
reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is
deprecated and will be removed in a future version. Instructions for
updating: keep_dims is deprecated, use keepdims insteadWARNING:tensorflow:From train.py:129: get_or_create_global_step (from
tensorflow.contrib.framework.python.ops.variables) is deprecated and
will be removed in a future version. Instructions for updating: Please
switch to tf.train.get_or_create_global_step
将上面的警告进行进行修改后,进行训练还是nan,而且步骤从之前开始,那么想要重新开始训练应该怎么办呢?
/output/trained_weight 里的文件删掉
更新警告,再重新训练一般就可以了!