SSD如何计算location loss function
SSD在计算损失函数的时候,用到了两项的加和,类别的confidence和对default box location的回归分别计算的损失值。
N是匹配的default boxes的个数,x表示匹配了的框是否属于类别p,取值{0,1};l是预测框predicted box,g是真实值ground truth box;c是指所框选目标属于类别p的置信度confidence。
只对Lloc位置的损失函数查看SSD的caffe源码怎么做的:
caffe源码
在caffe-ssd/jobs/VGGNet/VOC0712/SSD_300x300/train.prototxt中,查询loss,直接定位到了MultiBoxLoss层,里面包含了多个bottom layer,在此文件中向上查找可以看到前三个bottom层是由Concat层将多个层的数据组合到一起形成的数据层。这种多层结构选取default box的方式是SSD的特点所在,文中有一些引用来表明这一想法来源。
然后在src/caffe/layer找到相应的cpp—multibox_loss_layer.cpp,里面的函数LayerSerUp()是读取prototxt中该层的参数,Forward_cpu()函数是对这一层的数据处理过程,bottom[0]和bottom[3]分别对应loc layer数据和label 数据。
然后调用了函数EncodeLocPrediction()来计算,找到源码位置在bbox_util.hpp(include/caffe/util/)中是这样定义该函数的:
// Encode the localization prediction and ground truth for each matched prior.
// all_loc_preds: stores the location prediction, where each item contains
// location prediction for an image.
// all_gt_bboxes: stores ground truth bboxes for the batch.
// all_match_indices: stores mapping between predictions and ground truth.
// prior_bboxes: stores all the prior bboxes in the format of NormalizedBBox.
// prior_variances: stores all the variances needed by prior bboxes.
// multibox_loss_param: stores the parameters for MultiBoxLossLayer.
// loc_pred_data: stores the location prediction results.
// loc_gt_data: stores the encoded location ground truth.
template <typename Dtype>
void EncodeLocPrediction(const vector<LabelBBox>& all_loc_preds,
const map<int, vector<NormalizedBBox> >& all_gt_bboxes,
const vector<map<int, vector<int> > >& all_match_indices,
const vector<NormalizedBBox>& prior_bboxes,
const vector<vector<float> >& prior_variances,
const MultiBoxLossParameter& multibox_loss_param,
Dtype* loc_pred_data, Dtype* loc_gt_data);
可见SSD在实现的时候,是将所有的符合“匹配策略”的default box和 ground truth集合拿进来进行计算的。据此可以找到该函数调用的时候的参数来源,特别是FindMatches()是用来查找符合条件的集合,同样在bbox_util.hpp中,函数定义为:
// Find matches between prediction bboxes and ground truth bboxes.
// all_loc_preds: stores the location prediction, where each item contains
// location prediction for an image.
// all_gt_bboxes: stores ground truth bboxes for the batch.
// prior_bboxes: stores all the prior bboxes in the format of NormalizedBBox.
// prior_variances: stores all the variances needed by prior bboxes.
// multibox_loss_param: stores the parameters for MultiBoxLossLayer.
// all_match_overlaps: stores jaccard overlaps between predictions and gt.
// all_match_indices: stores mapping between predictions and ground truth.
void FindMatches(const vector<LabelBBox>& all_loc_preds,
const map<int, vector<NormalizedBBox> >& all_gt_bboxes,
const vector<NormalizedBBox>& prior_bboxes,
const vector<vector<float> >& prior_variances,
const MultiBoxLossParameter& multibox_loss_param,
vector<map<int, vector<float> > >* all_match_overlaps,
vector<map<int, vector<int> > >* all_match_indices);
jaccard overlap/ComputeLocLoss
在FindMatches可以看到jaccard overlap的处理,顺便看看源码怎么处理的overlap:函数调用了MatchBBox()(行584,bbox_util.cpp),然后又调用了JaccardOverlap()函数,它计算重叠区域时调用了IntersectBBox()。数据增强处理时SSD也会用到这一函数,不过还需要后续的判断。
在multibos_loss_layer.cpp后面调用了MineHardExamples()用来选择正负样本达到1:3的效果,里面用到了jaccardOverlapLabel。并且在这里计算了confidence,函数ComputerConfLossGpu()(行900,bbox_util.cpp)。并且在这里面也计算了localization losses,有函数ComputeLocLoss()(行919,bbox_util.cpp),查看其头文件为
// Compute the localization loss per matched prior.
// loc_pred: stores the location prediction results.
// loc_gt: stores the encoded location ground truth.
// all_match_indices: stores mapping between predictions and ground truth.
// num: number of images in the batch.
// num_priors: total number of priors.
// loc_loss_type: type of localization loss, Smooth_L1 or L2.
// all_loc_loss: stores the localization loss for all priors in a batch.
template <typename Dtype>
void ComputeLocLoss(const Blob<Dtype>& loc_pred, const Blob<Dtype>& loc_gt,
const vector<map<int, vector<int> > >& all_match_indices,
const int num, const int num_priors, const LocLossType loc_loss_type,
vector<vector<float> >* all_loc_loss);
在multibos_loss_layer.cpp又紧接着调用了EncodeLocPrediction()函数。
然后创建了loc_loss_layer进行forward计算,其中MultiBoxLossLayer继承了LossLayer,而LossLayer又继承了Layer,Layer定义了forward和backward函数,并调用了Forward_cpu和Forward_gpu虚函数,backward也相同。
conf_loss_layer有相似的结构。
由此可知,在原文计算L(loc)时的X(ij)是只选用了符合jaccard overlap限制要求的default box和ground boxes构建损失函数的,损失函数如下。
void DecodeBBox(
const NormalizedBBox& prior_bbox, const vector<float>& prior_variance,
const CodeType code_type, const bool variance_encoded_in_target,
const bool clip_bbox, const NormalizedBBox& bbox,
NormalizedBBox* decode_bbox) {
if (code_type == PriorBoxParameter_CodeType_CORNER) {
if (variance_encoded_in_target) {
// variance is encoded in target, we simply need to add the offset
// predictions.
decode_bbox->set_xmin(prior_bbox.xmin() + bbox.xmin());
decode_bbox->set_ymin(prior_bbox.ymin() + bbox.ymin());
decode_bbox->set_xmax(prior_bbox.xmax() + bbox.xmax());
decode_bbox->set_ymax(prior_bbox.ymax() + bbox.ymax());
} else {
// variance is encoded in bbox, we need to scale the offset accordingly.
decode_bbox->set_xmin(
prior_bbox.xmin() + prior_variance[0] * bbox.xmin());
decode_bbox->set_ymin(
prior_bbox.ymin() + prior_variance[1] * bbox.ymin());
decode_bbox->set_xmax(
prior_bbox.xmax() + prior_variance[2] * bbox.xmax());
decode_bbox->set_ymax(
prior_bbox.ymax() + prior_variance[3] * bbox.ymax());
}
} else if (code_type == PriorBoxParameter_CodeType_CENTER_SIZE) {
float prior_width = prior_bbox.xmax() - prior_bbox.xmin();
CHECK_GT(prior_width, 0);
float prior_height = prior_bbox.ymax() - prior_bbox.ymin();
CHECK_GT(prior_height, 0);
float prior_center_x = (prior_bbox.xmin() + prior_bbox.xmax()) / 2.;
float prior_center_y = (prior_bbox.ymin() + prior_bbox.ymax()) / 2.;
float decode_bbox_center_x, decode_bbox_center_y;
float decode_bbox_width, decode_bbox_height;
if (variance_encoded_in_target) {
// variance is encoded in target, we simply need to retore the offset
// predictions.
decode_bbox_center_x = bbox.xmin() * prior_width + prior_center_x;
decode_bbox_center_y = bbox.ymin() * prior_height + prior_center_y;
decode_bbox_width = exp(bbox.xmax()) * prior_width;
decode_bbox_height = exp(bbox.ymax()) * prior_height;
} else {
// variance is encoded in bbox, we need to scale the offset accordingly.
decode_bbox_center_x =
prior_variance[0] * bbox.xmin() * prior_width + prior_center_x;
decode_bbox_center_y =
prior_variance[1] * bbox.ymin() * prior_height + prior_center_y;
decode_bbox_width =
exp(prior_variance[2] * bbox.xmax()) * prior_width;
decode_bbox_height =
exp(prior_variance[3] * bbox.ymax()) * prior_height;
}
decode_bbox->set_xmin(decode_bbox_center_x - decode_bbox_width / 2.);
decode_bbox->set_ymin(decode_bbox_center_y - decode_bbox_height / 2.);
decode_bbox->set_xmax(decode_bbox_center_x + decode_bbox_width / 2.);
decode_bbox->set_ymax(decode_bbox_center_y + decode_bbox_height / 2.);
} else if (code_type == PriorBoxParameter_CodeType_CORNER_SIZE) {
float prior_width = prior_bbox.xmax() - prior_bbox.xmin();
CHECK_GT(prior_width, 0);
float prior_height = prior_bbox.ymax() - prior_bbox.ymin();
CHECK_GT(prior_height, 0);
if (variance_encoded_in_target) {
// variance is encoded in target, we simply need to add the offset
// predictions.
decode_bbox->set_xmin(prior_bbox.xmin() + bbox.xmin() * prior_width);
decode_bbox->set_ymin(prior_bbox.ymin() + bbox.ymin() * prior_height);
decode_bbox->set_xmax(prior_bbox.xmax() + bbox.xmax() * prior_width);
decode_bbox->set_ymax(prior_bbox.ymax() + bbox.ymax() * prior_height);
} else {
// variance is encoded in bbox, we need to scale the offset accordingly.
decode_bbox->set_xmin(
prior_bbox.xmin() + prior_variance[0] * bbox.xmin() * prior_width);
decode_bbox->set_ymin(
prior_bbox.ymin() + prior_variance[1] * bbox.ymin() * prior_height);
decode_bbox->set_xmax(
prior_bbox.xmax() + prior_variance[2] * bbox.xmax() * prior_width);
decode_bbox->set_ymax(
prior_bbox.ymax() + prior_variance[3] * bbox.ymax() * prior_height);
}
} else {
LOG(FATAL) << "Unknown LocLossType.";
}
float bbox_size = BBoxSize(*decode_bbox);
decode_bbox->set_size(bbox_size);
if (clip_bbox) {
ClipBBox(*decode_bbox, decode_bbox);
}
}