图解 SSD

SSD 是 Single Shot Multibox Detector 的简称

本文以 tensorflow models 的 object detect 中的 ssd 实现为准。特征提取层采用 MobileNetV1。具体参考附录配置示例

物体检测训练

ssd train

输入图片 300 x 300

标签 :

  1. gt_box : [batch_size, num_boxes, 4]
  2. gt_class : [batch_size, num_boxes, num_classes]

预处理

  1. 水平翻转
  2. 随机 crop
  3. 采用 BILINEAR 方法将输入图片 resize 为 300 * 300

特征提取

  1. preprocessed_inputs 经过 MobileNetV1,从 Conv2d_13_pointwise 提取特征图 image_feature
  2. 依次经过
    Conv2d_11_pointwise,
    Conv2d_13_pointwise,
    Conv2d_13_pointwise_2_Conv2d_2_1x1_256, Conv2d_13_pointwise_2_Conv2d_2_3x3_s2_512,
    Conv2d_13_pointwise_3_Conv2d_2_1x1_128, Conv2d_13_pointwise_3_Conv2d_2_3x3_s2_256,
    Conv2d_13_pointwise_4_Conv2d_2_1x1_128, Conv2d_13_pointwise_4_Conv2d_2_3x3_s2_256,
    Conv2d_13_pointwise_4_Conv2d_2_1x1_64, Conv2d_13_pointwise_4_Conv2d_2_3x3_s2_128

IoU 过滤

  1. 计算 gt_box 与 anchors 的 IoU 矩阵 match_quality_matrix (得到 [num_boxes, num_anchor] 的矩阵。其中 row 索引为 groundtruth_boxeslist 元素索引, colum 索引为 anchors 元素索引)
  2. 记录 match_quality_matrix 每列最大值对应的行索引得到 match1。此时 match1 本身的索引为列索引,存储的值为行索引。因此,通过 match1 就能定位到 match_quality_matrix 对应的 IoU 值。
  3. 如果 match1[i] 对应的元素(IoU)大于 0.5,match1[i] 不变,如果 match1[i] 中元素对应 IoU 小于 0.5,match1[i] 为 -1。
  4. 记录 match_quality_matrix 每行最大值对应的列索引得到 match2。此时 match2 本身的索引为行索引,存储的值为列索引。因此,通过 match2 就能定位到 match_quality_matrix 对应的 IoU 值。
  5. matches 为 num_anchor 个元素的数组,从 0 到 num_anchor 的任意值 i,如果 i 存在于 match2,matches[i] 为 i 在 match2 中的索引。否则 matches[i] = match1[i] (备注:这是整个实现的一个非常绕的点,要仔细推敲)。至此 matches 中大于 -1 的元素为满足条件,小于 0 为不满足条件。实际上 matches 小于 0,只可能取 -1, -2

编码

输入:Anchors 与 gt_box 编码

anchors 表示为 [ycenter_a, xcenter_a, ha, wa]

gt_box 表示为 [ycenter, xcenter, h, w]

tx = (xcenter - xcenter_a) / wa
ty = (ycenter - ycenter_a) / ha
tw = tf.log(w / wa)
th = tf.log(h / ha)

输出:[tx, ty, tw, th]

解码

输入 Anchors 与 box_encoding 编码

anchors 表示为 [ycenter_a, xcenter_a, ha, wa]

box_encoding 表示为 [ty, tx, th, tw]

w = tf.exp(tw) * wa
h = tf.exp(th) * ha
ycenter = ty * ha + ycenter_a
xcenter = tx * wa + xcenter_a
ymin = ycenter - h / 2.
xmin = xcenter - w / 2.
ymax = ycenter + h / 2
xmax = xcenter + w / 2

输出 [ymin, xmin, ymax, xmax]

损失函数

ssd loss

分类损失:tf.nn.sigmoid_cross_entropy_with_logits

位置损失:smoothL1

分类损失和位置损失乘以权重并归一化

验证

ssd inference

预处理

  1. 水平翻转
  2. 随机 crop
  3. 采用 BILINEAR 方法将输入图片 resize 为 300 * 300

特征提取

  1. preprocessed_inputs 经过 MobileNetV1,从 Conv2d_13_pointwise 提取特征图 image_feature
  2. 依次经过
    Conv2d_11_pointwise,
    Conv2d_13_pointwise,
    Conv2d_13_pointwise_2_Conv2d_2_1x1_256, Conv2d_13_pointwise_2_Conv2d_2_3x3_s2_512,
    Conv2d_13_pointwise_3_Conv2d_2_1x1_128, Conv2d_13_pointwise_3_Conv2d_2_3x3_s2_256,
    Conv2d_13_pointwise_4_Conv2d_2_1x1_128, Conv2d_13_pointwise_4_Conv2d_2_3x3_s2_256,
    Conv2d_13_pointwise_4_Conv2d_2_1x1_64, Conv2d_13_pointwise_4_Conv2d_2_3x3_s2_128

附录

Anchor 生成

将输出的 6 个 feature_map 生成依次 3,7,7,7,7,7 个 anchor,将 base_anchor_size 为 256 划分为 feature_map[i] 个 grid, anchor_stride 为 base_anchor/feature_map[i],anchor_offset 为每个 grid 的中心。

smin = 0.2
smax = 0.95
aspect_ratio = [0.2, 0.35, 0.5, 0.65, 0.80, 0.95, 1.0]

#每个元素为 scale : aspect_ratio
[
   [(0.1, 1.0), (0.2, 2.0), (0.2, 0.5)],
   [(0.35,1.0),(0.35,2.0),(0.35,3.0),(0.35,1.0/2),(0.35,1.0/3),(sqrt(0.418), 1.0)],
   [(0.5, 1.0),(0.5, 2.0),(0.5, 3.0),(0.5, 1.0/2),(0.5, 1.0/3),(sqrt(0.570), 1.0)],
   [(0.65,1.0),(0.65,2.0),(0.65,3.0),(0.65,1.0/2),(0.65,1.0/3),(sqrt(0.721), 1.0)],
   [(0.80,1.0),(0.80,2.0),(0.80,3.0),(0.80,1.0/2),(0.80,1.0/3),(sqrt(0.872), 1.0)],
   [(0.95,1.0),(0.95,2.0),(0.65,3.0),(0.95,1.0/2),(0.95,1.0/3),(sqrt(0.975), 1.0)]
]

配置示例

samples/configs/ssd_mobilenet_v1_coco.config

    model {
      ssd {
        num_classes: 90
        box_coder {
          faster_rcnn_box_coder {
            y_scale: 10.0
            x_scale: 10.0
            height_scale: 5.0
            width_scale: 5.0
          }
        }
        matcher {
          argmax_matcher {
            matched_threshold: 0.5
            unmatched_threshold: 0.5
            ignore_thresholds: false
            negatives_lower_than_unmatched: true
            force_match_for_each_row: true
          }
        }
        similarity_calculator {
          iou_similarity {
          }
        }
        anchor_generator {
          ssd_anchor_generator {
            num_layers: 6
            min_scale: 0.2
            max_scale: 0.95
            aspect_ratios: 1.0
            aspect_ratios: 2.0
            aspect_ratios: 0.5
            aspect_ratios: 3.0
            aspect_ratios: 0.3333
          }
        }
        image_resizer {
          fixed_shape_resizer {
            height: 300
            width: 300
          }
        }
        box_predictor {
          convolutional_box_predictor {
            min_depth: 0
            max_depth: 0
            num_layers_before_predictor: 0
            use_dropout: false
            dropout_keep_probability: 0.8
            kernel_size: 1
            box_code_size: 4
            apply_sigmoid_to_scores: false
            conv_hyperparams {
              activation: RELU_6,
              regularizer {
                l2_regularizer {
                  weight: 0.00004
                }
              }
              initializer {
                truncated_normal_initializer {
                  stddev: 0.03
                  mean: 0.0
                }
              }
              batch_norm {
                train: true,
                scale: true,
                center: true,
                decay: 0.9997,
                epsilon: 0.001,
              }
            }
          }
        }
        feature_extractor {
          type: 'ssd_mobilenet_v1'
          min_depth: 16
          depth_multiplier: 1.0
          conv_hyperparams {
            activation: RELU_6,
            regularizer {
              l2_regularizer {
                weight: 0.00004
              }
            }
            initializer {
              truncated_normal_initializer {
                stddev: 0.03
                mean: 0.0
              }
            }
            batch_norm {
              train: true,
              scale: true,
              center: true,
              decay: 0.9997,
              epsilon: 0.001,
            }
          }
        }
        loss {
          classification_loss {
            weighted_sigmoid {
            }
          }
          localization_loss {
            weighted_smooth_l1 {
            }
          }
          hard_example_miner {
            num_hard_examples: 3000
            iou_threshold: 0.99
            loss_type: CLASSIFICATION
            max_negatives_per_positive: 3
            min_negatives_per_image: 0
          }
          classification_weight: 1.0
          localization_weight: 1.0
        }
        normalize_loss_by_num_matches: true
        post_processing {
          batch_non_max_suppression {
            score_threshold: 1e-8
            iou_threshold: 0.6
            max_detections_per_class: 100
            max_total_detections: 100
          }
          score_converter: SIGMOID
        }
      }
    }

    train_config: {
      batch_size: 24
      optimizer {
        rms_prop_optimizer: {
          learning_rate: {
            exponential_decay_learning_rate {
              initial_learning_rate: 0.004
              decay_steps: 800720
              decay_factor: 0.95
            }
          }
          momentum_optimizer_value: 0.9
          decay: 0.9
          epsilon: 1.0
        }
      }
      fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
      from_detection_checkpoint: true
      # Note: The below line limits the training process to 200K steps, which we
      # empirically found to be sufficient enough to train the pets dataset. This
      # effectively bypasses the learning rate schedule (the learning rate will
      # never decay). Remove the below line to train indefinitely.
      num_steps: 200000
      data_augmentation_options {
        random_horizontal_flip {
        }
      }
      data_augmentation_options {
        ssd_random_crop {
        }
      }
    }

    train_input_reader: {
      tf_record_input_reader {
        input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
      }
      label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
    }

    eval_config: {
      num_examples: 8000
      # Note: The below line limits the evaluation process to 10 evaluations.
      # Remove the below line to evaluate indefinitely.
      max_evals: 10
    }

    eval_input_reader: {
      tf_record_input_reader {
        input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
      }
      label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
      shuffle: false
      num_readers: 1
    }


  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值