ssd源码detection_output_layer解析

最新推荐文章于 2024-06-27 16:32:34 发布

泥石流中的一股清流

最新推荐文章于 2024-06-27 16:32:34 发布

阅读量2.9k

点赞数

分类专栏： ssd 文章标签： caffe ssd detection_output_layer

本文链接：https://blog.csdn.net/qq_31261509/article/details/83377143

版权

本文主要解析了在CUDA环境下，SSD模型中的DetectionOutputLayer如何进行前向传播，包括DecodeBBoxesGPU的解码操作、PermuteDataGPU的数据重塑以及ApplyNMSFast的非极大值抑制算法。这些函数共同作用，实现了检测框的解码、分类得分的处理和最终的检测结果筛选。

摘要由CSDN通过智能技术生成

概述

由于cpu版本速度太慢，真正应用实际环境中仅仅只能通过cuda或者opencl实现，所以我将仅仅介绍cuda版本

detection_output_layer层的输入可以参考Caffe框架下SSD算法源码综述。它通过hpp，cpp和cu实现。
参看DetectionOutputLayer::Forward_gpu()，前向传播通过decodeBBoxesGPU函数将预测得到的检测框进行解码操作
通过PermuteDataGPU函数重新reshape一下类别的预测值，在处理之前，deploy.prototxt可以看出已经将conf当做dtection_output_layer的输入之前已经做了sofxmax。所以我们不需要在detection_output_layer中进行softmax
然后通过上述的两个结果进行处理：将不同的类别应用极大抑制算法（类别间的极大抑制算法是相互独立的）
最终将处理后的数据放入输出层中
源码还有存储结果操作，不是必要项，所以不解析

源码解析

和常规的layer层一样，detection_output_layer函数主要Forward和Backward组成，但没有实现Backward。

前向传播使用到的函数有:

DecodeBBoxesGPU函数
PermuteDataGPU函数
ApplyNMSFast函数

Forward_gpu

template <typename Dtype>
void DetectionOutputLayer<Dtype>::Forward_gpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
   
    // loc_data 为输入的数据，这里表示预测得到的位置
  
    const Dtype* loc_data = bottom[0]->gpu_data();
    // prior_data 为输入的数据，这里表示loc_data对应的原图的大小
    const Dtype* prior_data = bottom[2]->gpu_data();
    // num 为样本数量
    const int num = bottom[0]->num();

    // caffe 一般通过mutable表示将会改变数据的指针，否则仅仅进行读操作
    Dtype* bbox_data = bbox_preds_.mutable_gpu_data();
    // loc_count 为所有样本预测结果综合
    const int loc_count = bbox_preds_.count();
    // 是否将预测得到的框大小限定在原图之内(预测得到的数据可能会产生)
    const bool clip_bbox = false;
    /*
     * code_type_类型默认为 CENTER_SIZE
     * variance_encoded_in_target_ 默认为false，表示不使用variance带入到位置预测的结果计算结果
     * num_priors_ 表示所有候选框的数目
     * share_location 默认为true，表示位置预测默认将所有类的位置归为一种类别进行位置预测
     * num_loc_classes share_location ? 1 : num_classes
     * background_label_id 背景标签的id
     * clip_bbox: 是否将位置预测值限定在0到1中，图片大小内
    */
    DecodeBBoxesGPU<Dtype>(loc_count, loc_data, prior_data, code_type_,
        variance_encoded_in_target_, num_priors_, share_location_,
        num_loc_classes_, background_label_id_, clip_bbox, bbox_data);
    // Retrieve all decoded location predictions.
    const Dtype* bbox_cpu_data;
    if (!share_location_) {
   
      Dtype* bbox_permute_data = bbox_permute_.mutable_gpu_data();
      PermuteDataGPU<Dtype>(loc_count, bbox_data, num_loc_classes_, num_priors_,
          4, bbox_permute_data);
      bbox_cpu_data = bbox_permute_.cpu_data();
    } else {
   
      bbox_cpu_data = bbox_preds_.cpu_data();
    }

    // Retrieve all confidences.
    Dtype* conf_permute_data = conf_permute_.mutable_gpu_data();
    // bottom[1] 为conf分类数据结果， num_classes_为类别数量，num_priors为单个样本的所有先验框数量
    // 将 conf 数据由 num_batch d c num_dim 转换成　num_batch c d num_dim
    PermuteDataGPU<Dtype>(bottom[1]->count(), bottom[1]->gpu_data(),
        num_classes_, num_priors_, 1, conf_permute_data);
    const Dtype* conf_cpu_data = conf_permute_.cpu_data();

    int num_kept = 0;
    vector<map<int, vector<int> > > all_indices;
    // i 传输的num索引, 最终
    for (int i = 0; i < num; ++i) {
   
      map<int, vector<int> > indices;
      int num_det = 0;
      // 当前所处的num所在的类别起始索引
      const int conf_idx = i * num_classes_ * num_priors_;
      int bbox_idx;
      if (share_location_) {
   
        // bbox_idx为当前所处的num其实位置索引
        bbox_idx = i * num_priors_ * 4;
      } else {
   
        bbox_idx = conf_idx * 4;
      }
      // conf n 通道为num_batch,第二个通道是 classes，所以一层num循环后，紧接着进行classes循环
      // 而indices内存储的是经过nms筛选过后的样本，其中第一个索引为对应的类别，第二个索引对应的值为相应类别下的结果位置
      for (int c = 0; c < num_classes_; ++c) {
   
        // 不处理背景
        if (c == background_label_id_) {
   
          // Ignore background class.
          continue;
        }
        /* 获取当前num以及当前类别所在的cur_conf的起始索引指针
         * 其中conf_idx已经存储到了num所以为了找到类别起始索引
         * 仅仅需要加上c * num_proirs
        */
        const Dtype* cur_conf_data = conf_cpu_data + conf_idx + c * num_priors_;
        // 获取当前num的位置起始位置
        const Dtype* cur_bbox_data = bbox_cpu_data + bbox_idx;
        // share_location为true，跳过
        if (!share_location_) {
   
          cur_bbox_data += c * num_priors_ * 4;
        }
        /*
         * 应用非极大抑制算法
         * confidence_threashold为阈值设置，
         * nms_threashold为设置的阈值，
         * era_
         * top_k_表示保存的最大数量
        */
        ApplyNMSFast(cur_bbox_data, cur_conf_data, num_priors_,
            confidence_threshold_, nms_threshold_, eta_, top_k_, &(indices[c]));
        // 加上所有类别预测
        num_det += indices[c].size();