目标检测特殊层：multibox_loss_layer层详解

最新推荐文章于 2024-07-28 08:46:40 发布

BigCowPeking

最新推荐文章于 2024-07-28 08:46:40 发布

阅读量1.6k

点赞数

分类专栏：目标检测特殊层文章标签： multiloss

目标检测特殊层专栏收录该内容

12 篇文章 1 订阅

订阅专栏

SSD如何计算location loss function

SSD在计算损失函数的时候，用到了两项的加和，类别的confidence和对default box location的回归分别计算的损失值。
这里写图片描述
N是匹配的default boxes的个数，x表示匹配了的框是否属于类别p，取值{0,1};l是预测框predicted box，g是真实值ground truth box;c是指所框选目标属于类别p的置信度confidence。
只对Lloc位置的损失函数查看SSD的caffe源码怎么做的：

caffe源码

在caffe-ssd/jobs/VGGNet/VOC0712/SSD_300x300/train.prototxt中，查询loss，直接定位到了MultiBoxLoss层，里面包含了多个bottom layer，在此文件中向上查找可以看到前三个bottom层是由Concat层将多个层的数据组合到一起形成的数据层。这种多层结构选取default box的方式是SSD的特点所在，文中有一些引用来表明这一想法来源。
这里写图片描述
然后在src/caffe/layer找到相应的cpp—multibox_loss_layer.cpp，里面的函数LayerSerUp()是读取prototxt中该层的参数，Forward_cpu()函数是对这一层的数据处理过程，bottom[0]和bottom[3]分别对应loc layer数据和label 数据。

然后调用了函数EncodeLocPrediction（）来计算，找到源码位置在bbox_util.hpp（include/caffe/util/）中是这样定义该函数的：

// Encode the localization prediction and ground truth for each matched prior.
//    all_loc_preds: stores the location prediction, where each item contains
//      location prediction for an image.
//    all_gt_bboxes: stores ground truth bboxes for the batch.
//    all_match_indices: stores mapping between predictions and ground truth.
//    prior_bboxes: stores all the prior bboxes in the format of NormalizedBBox.
//    prior_variances: stores all the variances needed by prior bboxes.
//    multibox_loss_param: stores the parameters for MultiBoxLossLayer.
//    loc_pred_data: stores the location prediction results.
//    loc_gt_data: stores the encoded location ground truth.
template <typename Dtype>
void EncodeLocPrediction(const vector<LabelBBox>& all_loc_preds,
      const map<int, vector<NormalizedBBox> >& all_gt_bboxes,
      const vector<map<int, vector<int> > >& all_match_indices,
      const vector<NormalizedBBox>& prior_bboxes,
      const vector<vector<float> >& prior_variances,
      const MultiBoxLossParameter& multibox_loss_param,
      Dtype* loc_pred_data, Dtype* loc_gt_data);

可见SSD在实现的时候，是将所有的符合“匹配策略”的default box和 ground truth集合拿进来进行计算的。据此可以找到该函数调用的时候的参数来源，特别是FindMatches()是用来查找符合条件的集合，同样在bbox_util.hpp中，函数定义为：

// Find matches between prediction bboxes and ground truth bboxes.
//    all_loc_preds: stores the location prediction, where each item contains
//      location prediction for an image.
//    all_gt_bboxes: stores ground truth bboxes for the batch.
//    prior_bboxes: stores all the prior bboxes in the format of NormalizedBBox.
//    prior_variances: stores all the variances needed by prior bboxes.
//    multibox_loss_param: stores the parameters for MultiBoxLossLayer.
//    all_match_overlaps: stores jaccard overlaps between predictions and gt.
//    all_match_indices: stores mapping between predictions and ground truth.
void FindMatches(const vector<LabelBBox>& all_loc_preds,
      const map<int, vector<NormalizedBBox> >& all_gt_bboxes,
      const vector<NormalizedBBox>& prior_bboxes,
      const vector<vector<float> >& prior_variances,
      const MultiBoxLossParameter& multibox_loss_param,
      vector<map<int, vector<float> > >* all_match_overlaps,
      vector<map<int, vector<int> > >* all_match_indices);

jaccard overlap/ComputeLocLoss
在FindMatches可以看到jaccard overlap的处理，顺便看看源码怎么处理的overlap：函数调用了MatchBBox()（行584,bbox_util.cpp），然后又调用了JaccardOverlap()函数，它计算重叠区域时调用了IntersectBBox()。数据增强处理时SSD也会用到这一函数，不过还需要后续的判断。

在multibos_loss_layer.cpp后面调用了MineHardExamples（）用来选择正负样本达到1：3的效果，里面用到了jaccardOverlapLabel。并且在这里计算了confidence，函数ComputerConfLossGpu()（行900，bbox_util.cpp）。并且在这里面也计算了localization losses，有函数ComputeLocLoss()（行919，bbox_util.cpp），查看其头文件为

// Compute the localization loss per matched prior.
//    loc_pred: stores the location prediction results.
//    loc_gt: stores the encoded location ground truth.
//    all_match_indices: stores mapping between predictions and ground truth.
//    num: number of images in the batch.
//    num_priors: total number of priors.
//    loc_loss_type: type of localization loss, Smooth_L1 or L2.
//    all_loc_loss: stores the localization loss for all priors in a batch.
template <typename Dtype>
void ComputeLocLoss(const Blob<Dtype>& loc_pred, const Blob<Dtype>& loc_gt,
      const vector<map<int, vector<int> > >& all_match_indices,
      const int num, const int num_priors, const LocLossType loc_loss_type,
      vector<vector<float> >* all_loc_loss);

在multibos_loss_layer.cpp又紧接着调用了EncodeLocPrediction（）函数。

然后创建了loc_loss_layer进行forward计算，其中MultiBoxLossLayer继承了LossLayer，而LossLayer又继承了Layer，Layer定义了forward和backward函数，并调用了Forward_cpu和Forward_gpu虚函数，backward也相同。
conf_loss_layer有相似的结构。

由此可知，在原文计算L(loc)时的X(ij)是只选用了符合jaccard overlap限制要求的default box和ground boxes构建损失函数的，损失函数如下。

void DecodeBBox(
    const NormalizedBBox& prior_bbox, const vector<float>& prior_variance,
    const CodeType code_type, const bool variance_encoded_in_target,
    const bool clip_bbox, const NormalizedBBox& bbox,
    NormalizedBBox* decode_bbox) {
  if (code_type == PriorBoxParameter_CodeType_CORNER) {
    if (variance_encoded_in_target) {
      // variance is encoded in target, we simply need to add the offset
      // predictions.
      decode_bbox->set_xmin(prior_bbox.xmin() + bbox.xmin());
      decode_bbox->set_ymin(prior_bbox.ymin() + bbox.ymin());
      decode_bbox->set_xmax(prior_bbox.xmax() + bbox.xmax());
      decode_bbox->set_ymax(prior_bbox.ymax() + bbox.ymax());
    } else {
      // variance is encoded in bbox, we need to scale the offset accordingly.
      decode_bbox->set_xmin(
          prior_bbox.xmin() + prior_variance[0] * bbox.xmin());
      decode_bbox->set_ymin(
          prior_bbox.ymin() + prior_variance[1] * bbox.ymin());
      decode_bbox->set_xmax(
          prior_bbox.xmax() + prior_variance[2] * bbox.xmax());
      decode_bbox->set_ymax(
          prior_bbox.ymax() + prior_variance[3] * bbox.ymax());
    }
  } else if (code_type == PriorBoxParameter_CodeType_CENTER_SIZE) {
    float prior_width = prior_bbox.xmax() - prior_bbox.xmin();
    CHECK_GT(prior_width, 0);
    float prior_height = prior_bbox.ymax() - prior_bbox.ymin();
    CHECK_GT(prior_height, 0);
    float prior_center_x = (prior_bbox.xmin() + prior_bbox.xmax()) / 2.;
    float prior_center_y = (prior_bbox.ymin() + prior_bbox.ymax()) / 2.;

    float decode_bbox_center_x, decode_bbox_center_y;
    float decode_bbox_width, decode_bbox_height;
    if (variance_encoded_in_target) {
      // variance is encoded in target, we simply need to retore the offset
      // predictions.
      decode_bbox_center_x = bbox.xmin() * prior_width + prior_center_x;
      decode_bbox_center_y = bbox.ymin() * prior_height + prior_center_y;
      decode_bbox_width = exp(bbox.xmax()) * prior_width;
      decode_bbox_height = exp(bbox.ymax()) * prior_height;
    } else {
      // variance is encoded in bbox, we need to scale the offset accordingly.
      decode_bbox_center_x =
          prior_variance[0] * bbox.xmin() * prior_width + prior_center_x;
      decode_bbox_center_y =
          prior_variance[1] * bbox.ymin() * prior_height + prior_center_y;
      decode_bbox_width =
          exp(prior_variance[2] * bbox.xmax()) * prior_width;
      decode_bbox_height =
          exp(prior_variance[3] * bbox.ymax()) * prior_height;
    }

    decode_bbox->set_xmin(decode_bbox_center_x - decode_bbox_width / 2.);
    decode_bbox->set_ymin(decode_bbox_center_y - decode_bbox_height / 2.);
    decode_bbox->set_xmax(decode_bbox_center_x + decode_bbox_width / 2.);
    decode_bbox->set_ymax(decode_bbox_center_y + decode_bbox_height / 2.);
  } else if (code_type == PriorBoxParameter_CodeType_CORNER_SIZE) {
    float prior_width = prior_bbox.xmax() - prior_bbox.xmin();
    CHECK_GT(prior_width, 0);
    float prior_height = prior_bbox.ymax() - prior_bbox.ymin();
    CHECK_GT(prior_height, 0);
    if (variance_encoded_in_target) {
      // variance is encoded in target, we simply need to add the offset
      // predictions.
      decode_bbox->set_xmin(prior_bbox.xmin() + bbox.xmin() * prior_width);
      decode_bbox->set_ymin(prior_bbox.ymin() + bbox.ymin() * prior_height);
      decode_bbox->set_xmax(prior_bbox.xmax() + bbox.xmax() * prior_width);
      decode_bbox->set_ymax(prior_bbox.ymax() + bbox.ymax() * prior_height);
    } else {
      // variance is encoded in bbox, we need to scale the offset accordingly.
      decode_bbox->set_xmin(
          prior_bbox.xmin() + prior_variance[0] * bbox.xmin() * prior_width);
      decode_bbox->set_ymin(
          prior_bbox.ymin() + prior_variance[1] * bbox.ymin() * prior_height);
      decode_bbox->set_xmax(
          prior_bbox.xmax() + prior_variance[2] * bbox.xmax() * prior_width);
      decode_bbox->set_ymax(
          prior_bbox.ymax() + prior_variance[3] * bbox.ymax() * prior_height);
    }
  } else {
    LOG(FATAL) << "Unknown LocLossType.";
  }
  float bbox_size = BBoxSize(*decode_bbox);
  decode_bbox->set_size(bbox_size);
  if (clip_bbox) {
    ClipBBox(*decode_bbox, decode_bbox);
  }
}