SSD的priorbox层

猫猫与橙子

已于 2022-03-29 16:03:00 修改

阅读量3.2k

点赞数 1

分类专栏：目标检测文章标签： ssd priorbox源码理解 priorbox原理

于 2018-12-11 11:49:03 首次发布

本文链接：https://blog.csdn.net/qq_22764813/article/details/84950352

版权

目标检测专栏收录该内容

14 篇文章

订阅专栏

最近要对faceboxes的网络结构进行TensorRT加速,发现同事使用的faceboxes中的priorBox layer和平常使用的priorBox layer在网络结构上好像不太一样,这就说明,我有可能需要在TensorRT中自己添加自己的priorBox层了.这里比较了常规的prior_box_param参数:

prior_box_param{
min_size:60.0  #有的包含max_size;
aspect_ratio:2.0
flip:true
clip:false
variance:0.1
variance:0.1
variance:0.2
variance:0.2
offset:0.5
}

而face_boxes中使用的的prior_box_param是这样的:

prior_box_param{
fixed_size:512 #有的包含多个
density:1      #有的包含多个
step:128
variance:0.1
variance:0.1
variance:0.2
variance:0.2
offset:0.5
}

对比caffe中proto中的PriorBoxParameter中的参数发现,face_boxes的caffe工程(https://github.com/zeusees/FaceBoxes)中使用的代码增加了三个参数:

repeated float fixed_size = 14;
repeated float fixed_ratio = 15;
repeated float density = 16;

所以说较之前的代码实现而言,可能需要添加priorBox Iplugin;这就需要了解priorBox的代码原理了:

priorBox的原理我的理解是这样的:

人为的制作了一些不同比例的box框,priorbox层就是用于生成这些框;生成这些框有什么用了?如果生成的框和ground truth重合或是大于某个设定的阈值,则满足条件的这个框就是正样本,否则这些框就是负样本了;同时,同一层的所有特征图共享一组默认框。

看源码:

首先是caffe.proto中的priorBox中的参数包括那些(参考:SSD网络解析之PriorBox层_走的那么干脆的博客-CSDN博客_priorbox):


// Message that store parameters used by PriorBoxLayer
message PriorBoxParameter {
  // Encode/decode type.
  enum CodeType {
    CORNER = 1;
    CENTER_SIZE = 2;
    CORNER_SIZE = 3;
  }
  // Minimum box size (in pixels). Required!
  repeated float min_size = 1; //对应论文2.2节中公式（4）中的sk×网络输入层输入
图像[data层的输入]大小
  // Maximum box size (in pixels). Required!
  repeated float max_size = 2; //下一层用来生成默认框特征图所在的min_size
  // Various of aspect ratios. Duplicate ratios will be ignored.
  // If none is provided, we use default ratio 1.
  repeated float aspect_ratio = 3; //宽高比
  // If true, will flip each aspect ratio.
  // For example, if there is aspect ratio "r",
  // we will generate aspect ratio "1.0/r" as well.
  optional bool flip = 4 [default = true]; //是否翻转宽高比
  // If true, will clip the prior so that it is within [0, 1]
  optional bool clip = 5 [default = false]; //是否进行裁剪（是否保证默认框整个在网络输入层输入图像内）
  // Variance for adjusting the prior bboxes.
  repeated float variance = 6; //暂时未知用来做什么
  // By default, we calculate img_height, img_width, step_x, step_y based on
  // bottom[0] (feat) and bottom[1] (img). Unless these values are explicitely
  // provided.
  // Explicitly provide the img_size.
  optional uint32 img_size = 7;
  // Either img_size or img_h/img_w should be specified; not both.
  optional uint32 img_h = 8; //网络输入层输入图像的高（或自行设置的高度）
  optional uint32 img_w = 9; //网络输入层输入图像的宽（或自行设置的宽度）
 
  // Explicitly provide the step size.
  optional float step = 10;
  // Either step or step_h/step_w should be specified; not both.
  optional float step_h = 11; //特征图上同一列上相邻两像素点间的距离在网络输入层输入图像上
的距离
  optional float step_w = 12; //特征图上同一行上相邻两像素点间的距离在网络输入层输入图像上
的距离
 
  // Offset to the top left corner of each cell.
  optional float offset = 13 [default = 0.5]; //默认框中心偏移量（相对偏移量）

repeated float fixed_size = 14;
repeated float fixed_ratio = 15;
repeated float density = 16;
}

源码理解:

#include <algorithm>
#include <functional>
#include <utility>
#include <vector>
#include <math.h>
#include "caffe/layers/prior_box_layer.hpp"

namespace caffe {

template <typename Dtype>
void PriorBoxLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const PriorBoxParameter& prior_box_param =
      this->layer_param_.prior_box_param();//获取所需的参数
  // CHECK_GT(prior_box_param.min_size_size(), 0) << "must provide min_size.";//min_size是必须的，不可缺省设置
  if (prior_box_param.min_size_size()>0){
    for (int i = 0; i < prior_box_param.min_size_size(); ++i) {
      min_sizes_.push_back(prior_box_param.min_size(i));
      CHECK_GT(min_sizes_.back(), 0) << "min_size must be positive.";//min_size必须为正数（CHECK_GT表示大于[greater than]）
    }
  }
//for face boxes
  if (prior_box_param.fixed_size_size()>0){
    for (int i = 0; i < prior_box_param.fixed_size_size(); ++i) {
      fixed_sizes_.push_back(prior_box_param.fixed_size(i));
      CHECK_GT(fixed_sizes_.back(), 0) << "fixed_size must be positive.";//fixed_size必须为正数
      CHECK_GT(prior_box_param.density_size(),0) << "if use fixed_size then you must provide density";//fixed_size & density appear in same time;
    }
  }
  
  if (prior_box_param.fixed_ratio_size()>0){
    CHECK_EQ(0,prior_box_param.aspect_ratio_size()) << "can not provide fixed_ratio and aspect_ratio simultaneously.";//fixed_ratio是必须的，不可缺省设置
  }


  fixed_ratios_.clear();
  for(int i=0; i < prior_box_param.fixed_ratio_size(); ++i){
    float ar = prior_box_param.fixed_ratio(i);//将fixed_ratio拷贝到类变量fixed_ratio中
    fixed_ratios_.push_back(ar);
  }
//end
  aspect_ratios_.clear();
  aspect_ratios_.push_back(1.); //默认情况下宽高比为1（也即会默认设置一个为1的宽高比）
  flip_ = prior_box_param.flip();//flip=true表示翻转宽高比，即原设置的宽高比为2,则翻转后宽高比为1/2
  //筛选不同的宽高比（即允许设置的宽高比重复，代码会自动找出不重复的，也即不同的宽高比）
  for (int i = 0; i < prior_box_param.aspect_ratio_size(); ++i) {
    float ar = prior_box_param.aspect_ratio(i);
    bool already_exist = false;
    for (int j = 0; j < aspect_ratios_.size(); ++j) {
      if (fabs(ar - aspect_ratios_[j]) < 1e-6) {
        already_exist = true;
        break;
      }
    }
    if (!already_exist) {
      aspect_ratios_.push_back(ar);//将不同的宽高比放入aspect_ratios_中
      if (flip_) {
        aspect_ratios_.push_back(1./ar);//将翻转后的宽高比也放入aspect_ratios_中
      }
    }
  }

  if (min_sizes_.size()>0){
    num_priors_ = aspect_ratios_.size() * min_sizes_.size();//计算需要生成的默认框
  }
//for face_boxes
  if (fixed_sizes_.size()>0){
    num_priors_ = aspect_ratios_.size() * fixed_sizes_.size();
  }

  if(prior_box_param.density_size() > 0) {
    for(int i=0;i<prior_box_param.density_size();++i){
      densitys_.push_back(prior_box_param.density(i));
      CHECK_GT(densitys_.back(), 0) << "density must be positive.";
      if (prior_box_param.fixed_ratio_size()>0){
        num_priors_ += (fixed_ratios_.size()) * (pow(densitys_[i],2)-1);
      }else{
        num_priors_ += (aspect_ratios_.size()) * (pow(densitys_[i],2)-1);
      }
    } 
  }
//end
  if(prior_box_param.max_size_size() > 0) {
    CHECK_EQ(prior_box_param.min_size_size(), prior_box_param.max_size_size());//检查所设置的min_size数目和max_size数目是否相等（CHECK_EQ表示相等）
    for (int i = 0; i < prior_box_param.max_size_size(); ++i) {
      max_sizes_.push_back(prior_box_param.max_size(i));
      CHECK_GT(max_sizes_[i], min_sizes_[i])
          << "max_size must be greater than min_size.";//max_size必须大于min_size
      num_priors_ += 1;//默认框数目加1
    }
  }

  clip_ = prior_box_param.clip();//获取裁剪参数
  //将variance拷贝到类变量variance_中
  if (prior_box_param.variance_size() > 1) {//获取variance参数（用户可设置1个或4个或不设置）
    // Must and only provide 4 variance.
    CHECK_EQ(prior_box_param.variance_size(), 4);
    for (int i = 0; i < prior_box_param.variance_size(); ++i) {
      CHECK_GT(prior_box_param.variance(i), 0);
      variance_.push_back(prior_box_param.variance(i));
    }
  } else if (prior_box_param.variance_size() == 1) {
    CHECK_GT(prior_box_param.variance(0), 0);//此情况下表示只设置一个variance
    variance_.push_back(prior_box_param.variance(0));
  } else {
    // Set default to 0.1.
    variance_.push_back(0.1);//默认情况下设置variance = 0.1
  }
//prototxt中一般未给定img_h,img_w和img_size,所以img_h,img_w = 0
  if (prior_box_param.has_img_h() || prior_box_param.has_img_w()) {
    CHECK(!prior_box_param.has_img_size())
        << "Either img_size or img_h/img_w should be specified; not both.";//两者只能设置一种
    img_h_ = prior_box_param.img_h();
    CHECK_GT(img_h_, 0) << "img_h should be larger than 0.";
    img_w_ = prior_box_param.img_w();
    CHECK_GT(img_w_, 0) << "img_w should be larger than 0.";
  } else if (prior_box_param.has_img_size()) {
    const int img_size = prior_box_param.img_size();
    CHECK_GT(img_size, 0) << "img_size should be larger than 0.";
    img_h_ = img_size;
    img_w_ = img_size;
  } else {
    img_h_ = 0; //如果两者均未设置，则先赋值为0
    img_w_ = 0;
  }

  //step赋值给step_h_和step_w_
  if (prior_box_param.has_step_h() || prior_box_param.has_step_w()) {
    CHECK(!prior_box_param.has_step())
        << "Either step or step_h/step_w should be specified; not both.";
    step_h_ = prior_box_param.step_h();
    CHECK_GT(step_h_, 0.) << "step_h should be larger than 0.";
    step_w_ = prior_box_param.step_w();
    CHECK_GT(step_w_, 0.) << "step_w should be larger than 0.";
  } else if (prior_box_param.has_step()) {
    const float step = prior_box_param.step();
    CHECK_GT(step, 0) << "step should be larger than 0.";
    step_h_ = step;
    step_w_ = step;
  } else {
    step_h_ = 0;
    step_w_ = 0;
  }

  offset_ = prior_box_param.offset();//获取相对左上角的偏移量（默认为0.5）
}

template <typename Dtype>
void PriorBoxLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const int layer_width = bottom[0]->width();//获取特征图的长和宽
  const int layer_height = bottom[0]->height();
  vector<int> top_shape(3, 1);
  // Since all images in a batch has same height and width, we only need to
  // generate one set of priors which can be shared across all images.
  top_shape[0] = 1;//由于每一batch中所有特征图具有相同的长和宽，因此我们只需要生成一组可以在该batch中所有特征图之间共享的默认框
  // 2 channels. First channel stores the mean of each prior coordinate.
  // Second channel stores the variance of each prior coordinate.
  top_shape[1] = 2;//第一个通道存储默认框左上角和右下角归一化坐标;第二个通道存储这些坐标的variance
  top_shape[2] = layer_width * layer_height * num_priors_ * 4;//特征图每一像素点处都产生num_priors_个默认框，每个预测框相对默认框有4归一化坐标值/也有4个variance
  CHECK_GT(top_shape[2], 0);
  top[0]->Reshape(top_shape);
}

template <typename Dtype>
void PriorBoxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  LOG(INFO)<<this->layer_param().name().c_str();
  const int layer_width = bottom[0]->width();//inception3_priorbox 32,32
  const int layer_height = bottom[0]->height(); //bottom[0]一般为特征图（feature map）;bottom[1]一般为网络输入层输入数据即data
  int img_width, img_height;
  if (img_h_ == 0 || img_w_ == 0) {
    img_width = bottom[1]->width(); //1024
    img_height = bottom[1]->height();//1024
  } else {
    img_width = img_w_;
    img_height = img_h_;
  }
  float step_w, step_h;
  if (step_w_ == 0 || step_h_ == 0) {
    step_w = static_cast<float>(img_width) / layer_width;
    step_h = static_cast<float>(img_height) / layer_height;
  } else {
    step_w = step_w_;//32
    step_h = step_h_;//32
  }
  Dtype* top_data = top[0]->mutable_cpu_data();
  int dim = layer_height * layer_width * num_priors_ * 4;//num_priors_ = 21
  int idx = 0;
  //嵌套for循环来设置默认框数据
  for (int h = 0; h < layer_height; ++h) {
    for (int w = 0; w < layer_width; ++w) {
      float center_x = (w + offset_) * step_w; //默认框中心在网络输入层输入图像（即网络的data层输入图像）上的x坐标
      float center_y = (h + offset_) * step_h; //默认框中心在网络输入层输入图像上的y坐标
      float box_width, box_height;

      for (int s = 0; s < fixed_sizes_.size(); ++s) {
        int fixed_size_ = fixed_sizes_[s];
        box_width = box_height = fixed_size_;

        if(fixed_ratios_.size()>0){
          for (int r = 0; r < fixed_ratios_.size(); ++r) {
            float ar = fixed_ratios_[r];
            int density_ = densitys_[s]; 
            int shift = fixed_sizes_[s] / density_;
            float box_width_ratio = fixed_sizes_[s] * sqrt(ar);
            float box_height_ratio = fixed_sizes_[s] / sqrt(ar);
            for (int r = 0 ; r < density_ ; ++r){
              for (int c = 0 ; c < density_ ; ++c){
                float center_x_temp = center_x - fixed_size_ / 2 + shift/2. + c*shift;
                float center_y_temp = center_y - fixed_size_ / 2 + shift/2. + r*shift;
                // xmin
                top_data[idx++] = (center_x_temp - box_width_ratio / 2.) / img_width >=0 ? (center_x_temp - box_width_ratio / 2.) / img_width : 0 ; //默认框左上角归一化后x坐标（归一化后，即网络输入层输入图像x坐标在0-1范围内）
                // ymin
                top_data[idx++] = (center_y_temp - box_height_ratio / 2.) / img_height >= 0 ? (center_y_temp - box_height_ratio / 2.) / img_height : 0;
                // xmax
                top_data[idx++] = (center_x_temp + box_width_ratio / 2.) / img_width <= 1 ? (center_x_temp + box_width_ratio / 2.) / img_width : 1;
                // ymax
                top_data[idx++] = (center_y_temp + box_height_ratio / 2.) / img_height <= 1 ? (center_y_temp + box_height_ratio / 2.) / img_height : 1;
              }
            }
          }
        }
        else {
          //this code added by gaozhihua for density anchor box
          if (densitys_.size() > 0) {
            CHECK_EQ(fixed_sizes_.size(),densitys_.size());
            int density_ = densitys_[s]; 
            int shift = fixed_sizes_[s] / density_;
            for (int r = 0 ; r < density_ ; ++r){
              for (int c = 0 ; c < density_ ; ++c){
                float center_x_temp = center_x - fixed_size_ / 2 + shift/2. + c*shift;
                float center_y_temp = center_y - fixed_size_ / 2 + shift/2. + r*shift;
                // xmin
                top_data[idx++] = (center_x_temp - box_width / 2.) / img_width >=0 ? (center_x_temp - box_width / 2.) / img_width : 0 ;
                // ymin
                top_data[idx++] = (center_y_temp - box_height / 2.) / img_height >= 0 ? (center_y_temp - box_height / 2.) / img_height : 0;
                // xmax
                top_data[idx++] = (center_x_temp + box_width / 2.) / img_width <= 1 ? (center_x_temp + box_width / 2.) / img_width : 1;
                // ymax
                top_data[idx++] = (center_y_temp + box_height / 2.) / img_height <= 1 ? (center_y_temp + box_height / 2.) / img_height : 1;
              }
            }
          }
          //rest of priors
          for (int r = 0; r < aspect_ratios_.size(); ++r) {
            float ar = aspect_ratios_[r];
            if (fabs(ar - 1.) < 1e-6) {
              continue;
            }
            int density_ = densitys_[s]; 
            int shift = fixed_sizes_[s] / density_;
            float box_width_ratio = fixed_sizes_[s] * sqrt(ar);
            float box_height_ratio = fixed_sizes_[s] / sqrt(ar);
            for (int r = 0 ; r < density_ ; ++r){
              for (int c = 0 ; c < density_ ; ++c){
                float center_x_temp = center_x - fixed_size_ / 2 + shift/2. + c*shift;
                float center_y_temp = center_y - fixed_size_ / 2 + shift/2. + r*shift;
                // xmin
                top_data[idx++] = (center_x_temp - box_width_ratio / 2.) / img_width >=0 ? (center_x_temp - box_width_ratio / 2.) / img_width : 0 ;
                // ymin
                top_data[idx++] = (center_y_temp - box_height_ratio / 2.) / img_height >= 0 ? (center_y_temp - box_height_ratio / 2.) / img_height : 0;
                // xmax
                top_data[idx++] = (center_x_temp + box_width_ratio / 2.) / img_width <= 1 ? (center_x_temp + box_width_ratio / 2.) / img_width : 1;
                // ymax
                top_data[idx++] = (center_y_temp + box_height_ratio / 2.) / img_height <= 1 ? (center_y_temp + box_height_ratio / 2.) / img_height : 1;
              }
            }
          }
        }
      }

      for (int s = 0; s < min_sizes_.size(); ++s) {
        int min_size_ = min_sizes_[s];
        // first prior: aspect_ratio = 1, size = min_size
        box_width = box_height = min_size_;
        // xmin
        top_data[idx++] = (center_x - box_width / 2.) / img_width;
        // ymin
        top_data[idx++] = (center_y - box_height / 2.) / img_height;
        // xmax
        top_data[idx++] = (center_x + box_width / 2.) / img_width;
        // ymax
        top_data[idx++] = (center_y + box_height / 2.) / img_height;

        if (max_sizes_.size() > 0) {//论文中额外添加的另一个宽高比为1的默认框
          CHECK_EQ(min_sizes_.size(), max_sizes_.size());
          int max_size_ = max_sizes_[s];
          // second prior: aspect_ratio = 1, size = sqrt(min_size * max_size)
          box_width = box_height = sqrt(min_size_ * max_size_);
          // xmin
          top_data[idx++] = (center_x - box_width / 2.) / img_width;
          // ymin
          top_data[idx++] = (center_y - box_height / 2.) / img_height;
          // xmax
          top_data[idx++] = (center_x + box_width / 2.) / img_width;
          // ymax
          top_data[idx++] = (center_y + box_height / 2.) / img_height;
        }

        // rest of priors计算剩余的默认框左上角和右下角坐标
        for (int r = 0; r < aspect_ratios_.size(); ++r) {
          float ar = aspect_ratios_[r];
          if (fabs(ar - 1.) < 1e-6) {//除去宽高比为1的情况，上面已经计算了
            continue;
          }
          box_width = min_size_ * sqrt(ar);
          box_height = min_size_ / sqrt(ar);
          // xmin
          top_data[idx++] = (center_x - box_width / 2.) / img_width;
          // ymin
          top_data[idx++] = (center_y - box_height / 2.) / img_height;
          // xmax
          top_data[idx++] = (center_x + box_width / 2.) / img_width;
          // ymax
          top_data[idx++] = (center_y + box_height / 2.) / img_height;
        }
      }
    }
  }
  // clip the prior's coordidate such that it is within [0, 1]
  //如果clip=true，表示要保证默认框的左上角坐标和右下角坐标（归一化后）均需要在原图像内
  if (clip_) {
    for (int d = 0; d < dim; ++d) {
      top_data[d] = std::min<Dtype>(std::max<Dtype>(top_data[d], 0.), 1.);
    }
  }
  // set the variance. 除以variance是对预测box和真实box的误差进行放大，从而增加loss，增大梯度，加快收敛。
  top_data += top[0]->offset(0, 1);
  if (variance_.size() == 1) {
    caffe_set<Dtype>(dim, Dtype(variance_[0]), top_data);// 用常数variance_[0]对top_data进行初始化
  } else {
    int count = 0;
    for (int h = 0; h < layer_height; ++h) {
      for (int w = 0; w < layer_width; ++w) {
        for (int i = 0; i < num_priors_; ++i) {
          for (int j = 0; j < 4; ++j) {
            top_data[count] = variance_[j];
            ++count;
          }
        }
      }
    }
  }
}

INSTANTIATE_CLASS(PriorBoxLayer);
REGISTER_LAYER_CLASS(PriorBox);

}  // namespace caffe

整个prior层以feature map和data层作为输入,为feature map每个点考虑num_prior个prioi box,输出shape为(1,2,layer_height * layer_width * num_priors_ * 4),也就是2个channel,第一个channel存放每个prioi box映射回原图的位置信息,第二个channel存放每个prioi box的varience信息.其实prior box就和anchor差不多,只不过前者在多scale的featurp map上获得且个数不为9.

盗用知乎上的一张图来帮助理解:先将feature map上的每个点对应到300*300的img上,作为中心点.依此中心点做出num_prior个prior box,再把各个超出边界的box拉回来(前提是clip=true).