opencv 计数后不动了 训练模型时,经过一段时间的训练后,训练损失值正在增加,但是该模型可以很好地检测出物体...

I encounter a strange problem while training CNN to detect objects from my own dataset. I am using transfer learning and at the beginning of training, the loss value is decreasing (as expected). But after some time, it gets higher and higher, and I have no idea why it happens.

At the same time, when I look at Images tab on Tensorboard to check how well the CNN predicts objects, I can see that it does it very well, it doesn't look as it is getting worse over time. Also, the Precision and Recall charts look good, only the Loss charts (especially classification_loss) show an increasing trend over time.

Here are some specific details:

I have 10 different classes of logos (such as DHL, BMW, FedEx, etc.)

Around 600 images per class

I use tensorflow-gpu on Ubuntu 18.04

I tried multiple pre-trained models, the latest being faster_rcnn_resnet101_coco with this config pipeline:

model {

faster_rcnn {

num_classes: 10

image_resizer {

keep_aspect_ratio_resizer {

min_dimension: 600

max_dimension: 1024

}

}

feature_extractor {

type: 'faster_rcnn_resnet101'

first_stage_features_stride: 16

}

first_stage_anchor_generator {

grid_anchor_generator {

scales: [0.25, 0.5, 1.0, 2.0]

aspect_ratios: [0.5, 1.0, 2.0]

height_stride: 16

width_stride: 16

}

}

first_stage_box_predictor_conv_hyperparams {

op: CONV

regularizer {

l2_regularizer {

weight: 0.0

}

}

initializer {

truncated_normal_initializer {

stddev: 0.01

}

}

}

first_stage_nms_score_threshold: 0.0

first_stage_nms_iou_threshold: 0.7

first_stage_max_proposals: 300

first_stage_localization_loss_weight: 2.0

first_stage_objectness_loss_weight: 1.0

initial_crop_size: 14

maxpool_kernel_size: 2

maxpool_stride: 2

second_stage_box_predictor {

mask_rcnn_box_predictor {

use_dropout: false

dropout_keep_probability: 1.0

fc_hyperparams {

op: FC

regularizer {

l2_regularizer {

weight: 0.0

}

}

initializer {

variance_scaling_initializer {

factor: 1.0

uniform: true

mode: FAN_AVG

}

}

}

}

}

second_stage_post_processing {

batch_non_max_suppression {

score_threshold: 0.0

iou_threshold: 0.6

max_detections_per_class: 100

max_total_detections: 300

}

score_converter: SOFTMAX

}

second_stage_localization_loss_weight: 2.0

second_stage_classification_loss_weight: 1.0

}

}

train_config: {

batch_size: 1

optimizer {

momentum_optimizer: {

learning_rate: {

manual_step_learning_rate {

initial_learning_rate: 0.0003

schedule {

step: 900000

learning_rate: .00003

}

schedule {

step: 1200000

learning_rate: .000003

}

}

}

momentum_optimizer_value: 0.9

}

use_moving_average: false

}

gradient_clipping_by_norm: 10.0

fine_tune_checkpoint: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/models2/faster_rcnn_resnet101_coco/model.ckpt"

from_detection_checkpoint: true

data_augmentation_options {

random_horizontal_flip {

}

}

}

train_input_reader: {

tf_record_input_reader {

input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/train.record"

}

label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"

}

eval_config: {

num_examples: 8000

# Note: The below line limits the evaluation process to 10 evaluations.

# Remove the below line to evaluate indefinitely.

max_evals: 10

}

eval_input_reader: {

tf_record_input_reader {

input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/test.record"

}

label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"

shuffle: false

num_readers: 1

}

Here you can see results that I get after training for nearly 23 hours and reaching over 120k steps:

So, my question is, why is the loss value increasing over time? It should be getting smaller or stay more or less constant, but you can clearly see the increasing trend in the above charts.

I think everything is properly configured and my dataset is pretty decent (also .tfrecord files were correctly "built").

To check if it is my fault I tried to use somebody's else dataset and configuration files. So I used the racoon dataset author's files (he provided all of the necessary files on his repo). I just downloaded them and started training with no modifications to check if I would get similar results as him.

Surprisingly, after 82k steps, I got entirely different charts than the ones shown in the linked article (that were captured after 22k steps). Here you can see the comparison of our results:

Clearly, something worked differently on my PC. I suspect it may be the same reason why I get increasing loss on my own dataset, that's why I mentioned it.

解决方案

The totalLoss is the weighted sum of those four other losses. (RPN cla and reg losses, BoxCla cla and reg losses) and they are all Evaluation loss. On tensorboard you can check or uncheck to see the evaluation results for training only or for evaluation only. (For example, the following pic has train summary and evaluation summary)

9i0OX.png

If the evaluation loss is increasing, this might suggest an overfitting model, besides, the precision metrics dropped a little bit.

To try a better fine-tuning result, you may try adjusting the weights of the four losses, for example, you may increase the weight for BoxClassifierLoss/classification_loss to let the model focused on this metric better. In your config file, the loss weight for second_stage_classification_loss_weight and first_stage_objectness_loss_weight are both 1 while the other two are both 2, so the model currently focused on the other two a little more.

An extra question about why loss_1 and loss_2 are the same. This can be explained by looking at the tensorflow graph.

Jk0lc.png

Here loss_2 is the summary for total_loss, (note this total_loss is not the same as in totalLoss) and the red-circled node is a tf.identity node. This node will output the same tensor as the input, so loss_1 is the same as loss_2

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值