cuda释放显存_CenterNet的Caffe-C++/CUDA推理实现(下篇)

最新推荐文章于 2023-07-19 11:45:34 发布

VIP文章你的脸红了耶

最新推荐文章于 2023-07-19 11:45:34 发布

阅读量729

点赞数

文章标签： cuda释放显存

本文链接：https://blog.csdn.net/weixin_29380121/article/details/112116065

版权

本文紧接CenterNet的Caffe-C++/CUDA推理实现(上篇)。

在上文中，虽然通过编写C++/cuda方式实现了CenterNet的后处理部分，但显然不是很优雅，频繁地对显存进行申请和释放可能会影响推理过程中的稳定性和吞吐量，因此我们有必要将后处理部分以Caffe层的形式执行。

将后处理移至Caffe层中

如果移到caffe层中，相当于自己添加一个新的层，那么需要使用protobuf定义新的层，首先我们需要修改caffe.proto。

修改caffe.proto

这里我定义了一个CenternetOutput层，作为CenterNet的后处理部分，需要在caffe.proto中的合适位置添加以下内容：

optional CenternetOutputParameter centernet_output_param = 209;
message CenternetOutputParameterParameter {
    
  // Number of classes that are actually predicted. Required!
  optional uint32 num_classes = 1;
  optional uint32 kernel_size = 2 [default = 3];
  optional float vis_threshold = 3 [default = 0.3];
  optional bool apply_nms = 4 [default = false];
  optional uint32 feature_map_h = 5 [default = 0];
  optional uint32 feature_map_w = 6 [default = 0];
}

并且在之前的res50.prototxt中最后添加以下部分，三个bottom分别为CenterNet最后三个输出：hm、hw、reg：

layer {
    
  name: "centernet_output"
  type: "CenternetOutput"
  bottom: "conv_blob55"
  bottom: "conv_blob57"
  bottom: "conv_blob59"
  top: "result_out"
  centernet_output_param {
    
    num_classes: 2
    kernel_size: 3
    vis_threshold: 0.3
  }
}

修改完prototxt后模型最后几层的结果是这样的，CenternetOutpu即我们定义的后处理层：

修改后记得造出新的caffe.pb.cc和caffe.pb.h，否则会Error parsing text-format caffe.NetParameter: 2715:26: Message type "caffe.LayerParameter" has no field named "centernet_output_param"，最好make clean一下再重新编译。

对于这些后处理层，我们不需要只需要前向过程，不需要反向的过程，所以直接将其设置为：

  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
    
    NOT_IMPLEMENTED;
  }
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
    
    NOT_IMPLEMENTED;
  }

而我们的centernet_output_layer.hpp这样写：

#ifndef CAFFE_CENTERNET_OUTPUT_LAYER_H
#define CAFFE_CENTERNET_OUTPUT_LAYER_H

#include <vector>

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

struct Box{
    
  float x1;
  float y1;
  float x2;
  float y2;
};

struct Detection{
    
  //x1 y1 x2 y2
  Box bbox;
  int classId;
  float prob;
};

namespace caffe {
    

/**
 * @brief Combine CenterNet (hm|wh|reg) layers to BoxOutput
 *
 */
template <typename Dtype>
class CenternetOutputLayer : public Layer<Dtype> {
    
public:
  explicit CenternetOutputLayer(const LayerParameter& param)
      : Layer<Dtype>(param) {}
  virtual

最低0.47元/天解锁文章

你的脸红了耶

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
cuda释放显存_CenterNet的Caffe-C++/CUDA推理实现(下篇)

本文紧接CenterNet的Caffe-C++/CUDA推理实现(上篇)。在上文中，虽然通过编写C++/cuda方式实现了CenterNet的后处理部分，但显然不是很优雅，频繁地对显存进行申请和释放可能会影响推理过程中的稳定性和吞吐量，因此我们有必要将后处理部分以Caffe层的形式执行。将后处理移至Caffe层中如果移到caffe层中，相当于自己添加一个新的层，那么需要使用protobuf定义新的...
复制链接

扫一扫