Caffe（11）--YOLOv1的Detection层实现

最新推荐文章于 2024-05-05 13:30:09 发布

零尾

最新推荐文章于 2024-05-05 13:30:09 发布

阅读量2.2k

点赞数 1

分类专栏： Deep Learning 文章标签： caffe yolo detection

本文链接：https://blog.csdn.net/lwplwf/article/details/82788376

版权

Deep Learning 同时被 3 个专栏收录

81 篇文章 9 订阅

订阅专栏

Caffe

14 篇文章 0 订阅

订阅专栏

Caffe学习笔记

13 篇文章 6 订阅

订阅专栏

1、yolov1论文中分为7*7=49个网格
2、对于3类的目标检测，每个网格有classes+num*(coords+confidence)=3+2*（4+1）=13个参数，其中3为类别（voc中类别为20），则一张图回归出49*13=637个参数（每个cell预测1个classes，2个box（每个box包含4个坐标和1个置信度），有49*2=98个box，最后一个全连接层的num_output=637
3、对于训练图片，每个网格都有标注信息，每个有1+3+4=8个参数（分别为置信度、类别、x、y、w、h），一张图有7*7*8=392个标注参数

yolov1 loss实现

在这里插入图片描述
（1）计算第4项，没有物体中心的box的置信度损失
为了代码的简洁，先将77个网格中的77*2=98个box都当成没有物体中心的情况，所以对所有box都计算这项损失。没有物体中心，那其它损失就不用计算了，如果有物体中心，最后再减去这一项即可。

cost计算的是一张图上所有box的loss的加和
对Ci求偏导，得到2*λnoobj(Ci - Ci^)，Ci^为标注的置信度，即Ci^ = 0
所以在代码中为：

int Confindence_index = input_index + locations*classws + 1*num +n; # 637存储
cost += nooject_scale*pow(input[Confidence_index]-0, 2); # 没有标注信息，所以Ci^全为0
delta[Confidence_index] = nooject_scale*(input[Confidence_index]-0);

求偏导得到的系数2在代码实现中省略不写，因为每一项loss中，都有2

avg_anyobj += input[Confidence_index]; # 置信度求和，用来log输出

（2）计算第5项，类别的概率值损失
每个网格预测1个类别，该类别由1个数值组成（yolo中使用voc，则为20个数值组成）
要对类别的每个数值计算损失和求梯度

对Pi(c)求偏导，得到2*(Pi(c) - Pi(c)^)

int class_index = input_index + l*classes; # input_index代表每个637的起始位置，class_index代表每个网格类别信息的起始位置，共7*7=49个数，通过l在locations的for循环中遍历
for (int j=0; j<classes; ++j){
cost += class_scale*pow(input[class_index] - truth[truth_index + 1 +j], 2); # truth_index代表每个标注信息的起始位置
delta[class_index + j] = class_scale * (input[class_index + j] - truth[truth_index +1 + j]);
if (truth[truth_index + 1 + j]) avg_cat += input[class_index + j];
avg_allcat += inout[class_index + j];
}

（3）计算第1、2项，box位置损失
每个网格预测2个box，但在具体计算时，每次只选取IOU最大或跟groundtruth距离最小的box去计算位置损失和求梯度
对Xi求偏导，得到2*(Xi - Xi^)
对Yi求偏导，得到2*(Yi - Yi^)
对根号w1求偏导，得到2*(根号w1 - 根号w1^)
对根号h1求偏导，得到2*(根号h1 - 根号h1^)

vector<float> truth_box;
truth_box.push_back(float(truth[truth_index + 1 + classes] / side)); # x
truth_box.push_back(float(truth[truth_index + 1 + classes + 1] / side)); # y
truth_box.push_back(float(truth[truth_index + 1 + classes + 2])); # w
truth_box.push_back(float(truth[truth_index + 1 + classes + 3])); # h

for (int n=0; n<num; ++n){
    int box_index = input_index + locations*(classes+num)+(1*num+n)*coords; # 每个网格两个bbox的坐标索引
    vector<float> out_box;

    out_box.push_back(float(input[box_index] / side));
    out_box.push_back(float(input[box_index + 1] / side));
    out_box.push_back(float(input[box_index + 2] * input[box_index + 2]));
    out_box.push_back(float(input[box_index + 3] * input[box_index + 3]));

    # 每次选择IOU最大的box进行回归
    # 如果没有交集，则选距离最小的box进行回归
    float iou = box_iou(truth_box, out_box);
    float rmse = (pow(truth_box[0] - out_box[0], 2) + pow(truth_box[1] - out_box[1], 2) + pow(truth_box[2] - out_box[2], 2) + pow(truth_box[3] - out_box[3], 2))

    # 找到IOU最大或距离最小的box
    if (best_iou > 0 || iou > 0){
        if (iou > best_iou){
            best_iou = iou;
            best_index = n;
        }
    }
    else{
        if (rmse < best_rmse){
            best_rmse = rmse;
            best_index = n;
        }
    }
}

# 通过上面找到最好的box，是0或者1，再得到这个最理想box的索引
int box_index = input_index + locations * (classes + num) + (l * num + best_index) * coords; # 预测box索引
int tbox_index = truth_index + 1 + classes; # 标注box索引
avg_iou += best_iou; 

# 计算位置损失
cost += coord_scale*pow(input[box_index] - truth[tbox_index], 2); # x的损失
cost += coord_scale*pow(input[box_index + 1] - truth[tbox_index + 1], 2); # y的损失
cost += coord_scale*pow(input[box_index + 2] - std::sqrt(truth[tbox_index + 2]), 2); # 根号w的损失
cost += coord_scale*pow(input[box_index + 3] - std::sqrt(truth[tbox_index + 3]), 2); # 根号h的损失
# 求梯度
delta[box_index] = coord_scale*(input[box_index] - truth[tbox_index]); # x的梯度
delta[box_index + 1] = coord_scale*pow(input[box_index + 1] - truth[tbox_index + 1]); # y的梯度
delta[box_index + 2] = coord_scale*pow(input[box_index + 2] - std::sqrt(truth[tbox_index + 2]); # 根号w的梯度
delta[box_index + 3] = coord_scale*pow(input[box_index + 3] - std::sqrt(truth[tbox_index + 3]); # 根号h的梯度

（4）计算第3项，有物体中心的box的置信度损失
记得减去之前多加的损失

int Confidence_index = input_index + locations * classes + l * num + best_index;
cost -= noobject_scale * pow(input[Confidence_index] - 0, 2); # 之前多计算的损失
cost += object_scale * pow(input[Confidence_index] - 1, 2); # 这里减1是因为有物体中心的标注信息是1
avg_obj += input[Confidence_index];
delta[Confidence_index] = object_scale * (input[Confidence_index] - 1);
++count;

外面有2层for循环，locations循环和最外层的batch循环。

将detect_layer添加到Caffe中

1.创建detection_layer.hpp文件和detection_layer.cpp文件
资源链接：https://download.csdn.net/download/lwplwf/10712961
2.将detection_layer.hpp文件放到caffe根目录/include/caffe/layers路径下
3.将detection_layer.cpp文件放到caffe根目录/src/caffe/layers路径下
4.修改src/caffe/proto/caffe.proto文件
（1）在message LayerParameter{}中添加：

optional DetectionParameter detection_param = 150; # 编号自定义，但不能重复

（2）在当前文件下面添加具体参数：

// yolo
message DetectionParameter {
optional uint32 classes = 1 [default = 3]; # 默认3个类别
optional uint32 coords = 2 [default = 4]; # x,y,w,h
optional uint32 side = 3 [default = 7]; # 分割成多少的网格
optional float object_scale = 4 [default = 1.0];
optional float noobject_scale = 5 [default = 0.5];
optional float class_scale = 6 [default = 1.0];
optional float coord_scale = 7 [default = 5];
optional uint32 num = 8 [default = 2];
}

实际使用时，在prototxt文件中体现为：

layer{
name: "detect"
type: "Detect"
top: "loss"
bottom: "fc12"
bottom: "labels"
}

图解理解yolo原理及损失函数的部分代码
在这里插入图片描述

在这里插入图片描述

零尾

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录