Caffe源码精读 - 8 - Caffe Layers之yolov3_layer(yolov3 loss计算层)

最新推荐文章于 2024-06-06 09:41:53 发布

赛先生.AI

最新推荐文章于 2024-06-06 09:41:53 发布

阅读量561

点赞数

分类专栏： Caffe源码精读文章标签：机器学习神经网络深度学习 caffe

本文链接：https://blog.csdn.net/tecsai/article/details/107819908

版权

Caffe源码精读专栏收录该内容

10 篇文章 3 订阅

订阅专栏

Caffe源码精读 - 8 - Caffe Layers之yolov3_layer(yolov3 loss计算层)

1. 概述

yolov3_layer是yolov3模型，处理feature map，计算loss的层。主要就是计算定位损失，bndbox的置信损失和类别置信损失。

2. dx_box_iou(vector<Dtype> pred, vector<Dtype> truth, IOU_LOSS iou_loss)

dx_box_iou是计算iou梯度信息的工具函数。

输入是预测框，ground_truth信息以及需要计算的iou的类型，返回的是IOU的梯度信息，及IOU对x、y、w、h的偏导数。

boxabs pred_tblr = to_tblr(pred); ///< 从中心点x、y，w、h转换成top, bottom, left, right坐标

///< 纠正坐标，确保左小右大，上小下大

float pred_t = fmin(pred_tblr.top, pred_tblr.bot);

float pred_b = fmax(pred_tblr.top, pred_tblr.bot);

float pred_l = fmin(pred_tblr.left, pred_tblr.right);

float pred_r = fmax(pred_tblr.left, pred_tblr.right);

首先利用to_tblr, 将预测结果转换至tblr模式(top, bottom, left, right)。然后得到pred_t, pred_b, pred_l, pred_r四个参数。

同时，ground_truth也得到tblr模式。

dxrep ddx = { 0 };

float X = (pred_b - pred_t) * (pred_r - pred_l); ///< 预测结果: H*W

float Xhat = (truth_tblr.bot - truth_tblr.top) * (truth_tblr.right - truth_tblr.left); ///< Ground Truth(GT): H*W

float Ih = fmin(pred_b, truth_tblr.bot) - fmax(pred_t, truth_tblr.top); ///< 交集H

float Iw = fmin(pred_r, truth_tblr.right) - fmax(pred_l, truth_tblr.left); ///< 交集W

float I = Iw * Ih; ///< 交集面积

float U = X + Xhat - I; ///< 并集面积

float S = (pred[0]-truth[0])*(pred[0]-truth[0])+(pred[1]-truth[1])*(pred[1]-truth[1]); ///< 中心点组成的区域的对角线的距离平方

float giou_Cw = fmax(pred_r, truth_tblr.right) - fmin(pred_l, truth_tblr.left);

float giou_Ch = fmax(pred_b, truth_tblr.bot) - fmin(pred_t, truth_tblr.top);

float giou_C = giou_Cw * giou_Ch; ///< 两个区域组成的最大外接矩形框的面积，用于求GIOU

接下来计算一些信息，用于计算IOU、GIOU、CIOU等。

为了求的IOU相对于t,b,l,r（x,y,w,h）的偏导, 还需要计算一些参数。特别的，以计算IOU得偏导为例，实际上是依赖于如下公式计算

如下是相关代码：

float dX_wrt_t = -1 * (pred_r - pred_l); ///< -1/W

float dX_wrt_b = pred_r - pred_l; ///< W

float dX_wrt_l = -1 * (pred_b - pred_t); ///< -1/H

float dX_wrt_r = pred_b - pred_t; ///> H

// gradient of I min/max in IoU calc (prediction)

float dI_wrt_t = pred_t > truth_tblr.top ? (-1 * Iw) : 0;

float dI_wrt_b = pred_b < truth_tblr.bot ? Iw : 0;

float dI_wrt_l = pred_l > truth_tblr.left ? (-1 * Ih) : 0;

float dI_wrt_r = pred_r < truth_tblr.right ? Ih : 0;

// derivative of U with regard to x

float dU_wrt_t = dX_wrt_t - dI_wrt_t; ///< U对t求导

float dU_wrt_b = dX_wrt_b - dI_wrt_b;

float dU_wrt_l = dX_wrt_l - dI_wrt_l;

float dU_wrt_r = dX_wrt_r - dI_wrt_r;

// gradient of C min/max in IoU calc (prediction)

float dC_wrt_t = pred_t < truth_tblr.top ? (-1 * giou_Cw) : 0;

float dC_wrt_b = pred_b > truth_tblr.bot ? giou_Cw : 0;

float dC_wrt_l = pred_l < truth_tblr.left ? (-1 * giou_Ch) : 0;

float dC_wrt_r = pred_r > truth_tblr.right ? giou_Ch : 0;

// Final IOU loss (prediction) (negative of IOU gradient, we want the negative loss)

float p_dt = 0;

float p_db = 0;

float p_dl = 0;

float p_dr = 0;

if (U > 0) { ///< IOU分别对t、b、l、r求偏导

p_dt = ((U * dI_wrt_t) - (I * dU_wrt_t)) / (U * U);

p_db = ((U * dI_wrt_b) - (I * dU_wrt_b)) / (U * U);

p_dl = ((U * dI_wrt_l) - (I * dU_wrt_l)) / (U * U);

p_dr = ((U * dI_wrt_r) - (I * dU_wrt_r)) / (U * U);

}

// apply grad from prediction min/max for correct corner selection

p_dt = pred_tblr.top < pred_tblr.bot ? p_dt : p_db;

p_db = pred_tblr.top < pred_tblr.bot ? p_db : p_dt;

p_dl = pred_tblr.left < pred_tblr.right ? p_dl : p_dr;

p_dr = pred_tblr.left < pred_tblr.right ? p_dr : p_dl;

再往下是针对GIOU,DIOU和CIOU的计算，不再赘述。大家需先搞懂IOU、GIOU、DIOU、CIOU的理论，再结合看代码会比较容易看懂一些。

3. delta_region_class_V3

delta_region_class_V3主要是用来计算类别损失。

进入到当前函数，说明已经有匹配的类别，但实际计算时，需要先做一个标签平滑。

if(label_smooth_eps){ ///< 标签平滑操作

y_true = y_true * (1 - label_smooth_eps) + 0.5*label_smooth_eps; ///< label * (1-eps) + (1/2)*eps

}

然后计算偏差，并更新diff_。需要注意的一点是，输入数据是已经做过sigmoid激活了，因此直接计算偏差就可以。

float result_delta = y_true - input_data[index + stride*class_label]; ///< 类别偏差

if(!isnan(result_delta) && !isinf(result_delta)){

diff[index + stride*class_label] = (-1.0) * scale * result_delta; ///< 更新误差

}

接下来针对时都是用focal loss进行误差修正。

4. delta_region_box

delta_region_box的功能是用来计算定位损失。

首先通过get_region_box结算出坐标，然后通过计算IOU来更新定位损失，并返回IOU。

5. Forward_cpu

Forward_cpu作为yolo层的前向计算，主要功能就是计算各种误差。

分两部分，

第一部分是找到匹配的anchor，并将卷积得到得buonding box置信值填入，但并未计算损失。

第二部分是更新定位损失和类别损失。

下面着重看一下：

首先就是获得特征图的W和H

side_w_ = bottom[0]->width(); ///< 特征图 W
side_h_ = bottom[0]->height(); ///< 特征图 H

然后拿到label数据：

const Dtype* label_data = bottom[1]->cpu_data(); //[label,x,y,w,h] GT数据

拿到diff指针，这个diff指针是一个形如[13*13*3*85]的shape，存储的是整个Feature map的定位误差、bounding box置信损失和类别损失。前向计算也就是为了计算，并更新diff。

Dtype* diff = diff_.mutable_cpu_data();

获取yolo层输入，也就是上一级的卷积输出：

const Dtype* input_data = bottom[0]->cpu_data();

获取swap，用来暂存输入

Dtype* swap_data = swap_.mutable_cpu_data();

计算len，len指的是每一个cell的预测输出，包含xywh，bndbox_conf和cls。

int len = 4 + num_class_ + 1; ///< 每一个cell的卷积输出结果数量

计算stride，stride是一个anchor层的跨度

int stride = side_w_*side_h_;

通过如下语句，控制处理每一个batch：

for (int b = 0; b < bottom[0]->num(); b++)

接下来直接看代码注释吧：

    /**

     * 实际输出结果是一个形如(13*13) * (3) * (85)的结果，分别是13*13的feature map分辨率，3个anchor，每个cell有85个输出

     * 此处有5个anchor

     */

    for (int b = 0; b < bottom[0]->num(); b++) { ///< 遍历每一个特征图，batch_size

        /**

         * GT与pred输出的误差

         * 找到每一个GT所匹配的最佳类别

         */

        for (int s = 0; s < stride; s++) { ///< 遍历每一个cell

            for (int n = 0; n < num_; n++) { ///< 遍历每一个anchor

                /// 横跨不同anchor层的cell坐标

                int index = n*len*stride + s + (b * bottom[0]->count(1)); ///< cell索引 => b是batch_size   bottom[0]->count(1): C*H*W

                //LOG(INFO)<<index;

                vector<Dtype> pred; ///< 存储识别结果

                float best_iou = 0; ///< 临时存储IOU值

                int best_class = -1; ///< 存储匹配的最佳类别

                vector<Dtype> best_truth; ///< 匹配的最佳Ground truth

        #ifdef CPU_ONLY

                for (int c = 0; c < len; ++c) { ///< 处理每一个cell的预测输出

                    int index2 = c*stride + index;

                    //LOG(INFO)<<index2;

                    if (c == 2 || c==3) { ///< W和H信息

                    swap_data[index2] = (input_data[index2 + 0]); ///< 除了W和H以外，其他都要做激活

                    }

                    else {                      

                    swap_data[index2] = logistic_activate(input_data[index2 + 0]);

                    }

                }

        #endif

                int y2 = s / side_w_; ///< cell的Y坐标

                int x2 = s % side_w_; ///< cell的X坐标

                /// swap_data: 存储一个cell的输出

                /// biases_: anchor的W和H

                /// side_w_*anchors_scale_: 原图宽

                /// side_h_*anchors_scale_: 原图高

                get_region_box(pred, swap_data, biases_, mask_[n], index, x2, y2, side_w_, side_h_, side_w_*anchors_scale_, side_h_*anchors_scale_, stride);

                for (int t = 0; t < 300; ++t) { ///< 每张图片，至多有300个GT目标

                    /// GT坐标

                    vector<Dtype> truth;

                    Dtype x = label_data[b * 300 * 5 + t * 5 + 1]; ///< 获得Ground truth坐标

                    Dtype y = label_data[b * 300 * 5 + t * 5 + 2];

                    Dtype w = label_data[b * 300 * 5 + t * 5 + 3];

                    Dtype h = label_data[b * 300 * 5 + t * 5 + 4];

                    if (!x) ///< 不满300个GT，都是0

                    break;

                    truth.push_back(x);

                    truth.push_back(y);

                    truth.push_back(w);

                    truth.push_back(h);

                    /// 预测结果与GT做IOU计算

                    float iou = box_iou(pred, truth, iou_loss_); ///< iou_loss_: IOU LOSS的类型

                    /// 找出最佳匹配的GT

                    if (iou > best_iou) {

                        best_class = label_data[b * 300 * 5 + t * 5]; ///< 最佳匹配的类别

                        best_iou = iou; ///< 最高的IOU

                        best_truth = truth; ///< 对应的GT坐标

                    }

                }

                /// 

                avg_anyobj += swap_data[index + 4 * stride]; ///< bounding box置信度累加(bounding box中含有目标的概率)

                ///< 没有匹配的bnd施加一个置信惩罚，

                diff[index + 4 * stride] = (-1) * (0 - swap_data[index + 4 * stride]); ///< 此处是原始的bnd box置信输出（还不是真正意义的误差）

                //diff[index + 4 * stride] = (-1) * (0 - exp(input_data[index + 4 * stride]-exp(input_data[index + 4 * stride])));

                //diff[index + 4 * stride] = (-1) * noobject_scale_ * (0 - swap_data[index + 4 * stride]) *logistic_gradient(swap_data[index + 4 * stride]);

                /// 如果没有匹配，但又超过了阈值，测不施加惩罚

                if (best_iou > thresh_) { ///< 超过阈值，当前偏差为0

                    diff[index + 4 * stride] = 0; ///< bounding box置信度偏差为0

                }

                if (best_iou > 1) { ///< IOU会大于1？？？

                    LOG(INFO) << "best_iou > 1"; // plz tell me ..

                    diff[index + 4 * stride] = (-1) * (1 - swap_data[index + 4 * stride]); ///< 真正意义上的置信误差

                    /// 5 * stride: 每一个特征层有5个anchor层，每一个anchor层有stride个cell

                    delta_region_class_v3(swap_data, diff, index + 5 * stride, best_class, num_class_, class_scale_, &avg_cat, stride, use_focal_loss_,label_smooth_eps_);

                    delta_region_box(best_truth, swap_data, biases_, mask_[n], index, x2, y2, side_w_, side_h_,

                    side_w_*anchors_scale_, side_h_*anchors_scale_, diff, coord_scale_*(2 - best_truth[2] * best_truth[3]), stride,iou_loss_,iou_normalizer_,max_delta_,accumulate_);

                }

            }

        }

        /**
         * anchor与GT的误差
         * 先确定是第几个anchor层(将一个feature map按照anchor分层)
         * 然后根据pos定位到确切的cell
         */
        //vector<Dtype> used;
        //used.clear();
        for (int t = 0; t < 300; ++t) { ///< 遍历300个GT
            /// GT坐标
            vector<Dtype> truth;
            truth.clear();
            int class_label = label_data[t * 5 + b * 300 * 5 + 0];
            float x = label_data[t * 5 + b * 300 * 5 + 1]; ///< 获得ground truth坐标
            float y = label_data[t * 5 + b * 300 * 5 + 2];
            float w = label_data[t * 5 + b * 300 * 5 + 3];
            float h = label_data[t * 5 + b * 300 * 5 + 4];
            if (!w)
            break;
            truth.push_back(x);
            truth.push_back(y);
            truth.push_back(w);
            truth.push_back(h);
            float best_iou = 0;
            int best_index = 0;
            int best_n = -1;
            int i = truth[0] * side_w_; ///< 获得在Feature map中的坐标
            int j = truth[1] * side_h_;
            int pos = j * side_w_ + i; ///< 特征图中的全局位置
            vector<Dtype> truth_shift;
            truth_shift.clear();
            truth_shift.push_back(0);
            truth_shift.push_back(0);
            truth_shift.push_back(w);
            truth_shift.push_back(h);
            //LOG(INFO) << j << "," << i << "," << anchors_scale_;
            /**
             * GT与anchor的整理误差
             */
            /// 每一个GT分别与每一个anchor计算IOU
            /// 找到最合适的GT，记录anchor号和IOU
            for (int n = 0; n < biases_size_; ++n) { ///< 遍历每一个anchor
                vector<Dtype> pred(4);
                pred[2] = biases_[2 * n] / (float)(side_w_*anchors_scale_); ///< anchor的W的相对坐标（anchor宽/原图宽）
                pred[3] = biases_[2 * n + 1] / (float)(side_h_*anchors_scale_); ///< anchor的W的相对坐标（anchor高/原图高）
                pred[0] = 0;
                pred[1] = 0;
                float iou = box_iou(pred, truth_shift,iou_loss_); 
                if (iou > best_iou) {
                    best_n = n; ///< 找到是第几个anchor层
                    best_iou = iou;
                }
            }
            /// 输入 anchor索引，最合适的anchor索引，anchor数
            /// 找到最佳匹配anchor的编号在mask_中的索引
            int mask_n = int_index(mask_, best_n, num_); ///< 没有相交，IOU为0，则best_n == -1 => mask_n == -1
            if (mask_n >= 0) {
                bool overlap = false;
                float iou;
                best_n = mask_n;
                //LOG(INFO) << best_n;
                best_index = best_n*len*stride + pos + b * bottom[0]->count(1); ///< 找到对应的cell索引
                
                /// 返回iou，和diff(梯度)
                /// 预测结果和anchor做计算
                /// LOSS=>边框损失
                iou = delta_region_box(truth, swap_data, biases_,mask_[best_n], best_index, i, j, side_w_, side_h_, side_w_*anchors_scale_, side_h_*anchors_scale_, 
                diff, coord_scale_*(2 - truth[2] * truth[3]), stride,iou_loss_,iou_normalizer_,max_delta_,accumulate_);
                if (iou > 0.5)
                    recall += 1;
                if (iou > 0.75)
                    recall75 += 1;
                avg_iou += iou;
                avg_iou_loss += (1 - iou);
                avg_obj += swap_data[best_index + 4 * stride]; ///< 累加边框置信
                if (use_logic_gradient_) { ///< 使用逻辑梯度？？？
                    /// LOSS=>bnd box置信损失（有目标的概率）
                    diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]) * object_scale_; ///< 更新bnd box置信度
                }else {
                    diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]); ///< 计算真实的bnd box损失（上面只是将输出值填入，并未真正计算损失）
                    //diff[best_index + 4 * stride] = (-1) * (1 - exp(input_data[best_index + 4 * stride] - exp(input_data[best_index + 4 * stride])));
                }
                //diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]) ;
                /**
                 * @brief 更新diff，累加avg_cat
                 * arg1: 一个cell的卷积原始输出
                 * arg2: 梯度信息
                 * arg3: 类别置信度起始
                 * arg4: 当前GT的类别标签
                 * arg5: num_class_ 类别数量
                 * arg6: class_scale_ 类别损失系数
                 * arg7: 
                 * arg8: stride 特征图W * 特征图H
                 * arg9: use_focal_loss_ 是否使用focal loss
                 * arg10: label_smooth_eps_ 是否使用标签平滑
                 */
                /// LOSS=>类别损失
                delta_region_class_v3(swap_data, diff, best_index + 5 * stride, class_label, num_class_, class_scale_, &avg_cat, stride, use_focal_loss_, label_smooth_eps_); //softmax_tree_
                ++count;
                ++class_count_;
            }
            /**
             * 遍历其他anchor，看IOU是否有超过iou_thresh_阈值的，当做次级匹配，计算lbox和lcls
             */
            for (int n = 0; n < biases_size_; ++n) { ///< 遍历每一个anchor
                int mask_n = int_index(mask_, n, num_); ///< 返回n在mask_中的索引值
                if (mask_n >= 0 && n != best_n && iou_thresh_ < 1.0f) { ///< 相当于反例
                    vector<Dtype> pred(4);
                    pred[2] = biases_[2 * n] / (float)(side_w_*anchors_scale_);
                    pred[3] = biases_[2 * n + 1] / (float)(side_h_*anchors_scale_);
                    pred[0] = 0;
                    pred[1] = 0;
                    float iou = box_iou(pred, truth_shift, iou_loss_); 
        
                    if (iou > iou_thresh_) {
                        bool overlap = false;
                        float iou;
                        //LOG(INFO) << best_n;
                        best_index = mask_n*len*stride + pos + b * bottom[0]->count(1);
                        
                        iou = delta_region_box(truth, swap_data, biases_,mask_[mask_n], best_index, i, j, side_w_, side_h_, side_w_*anchors_scale_, side_h_*anchors_scale_, 
                        diff, coord_scale_*(2 - truth[2] * truth[3]), stride,iou_loss_,iou_normalizer_,max_delta_,accumulate_);
                        if (iou > 0.5)
                            recall += 1;
                        if (iou > 0.75)
                            recall75 += 1;
                        avg_iou += iou;
                        avg_iou_loss += (1 - iou);
                        avg_obj += swap_data[best_index + 4 * stride];
                        if (use_logic_gradient_) {
                            diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]) * object_scale_;
                        }else{
                            diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]);
                            //diff[best_index + 4 * stride] = (-1) * (1 - exp(input_data[best_index + 4 * stride] - exp(input_data[best_index + 4 * stride])));
                        }
                        //diff[best_index + 4 * stride] = (-1.0) * (1 - swap_data[best_index + 4 * stride]) ;
                        delta_region_class_v3(swap_data, diff, best_index + 5 * stride, class_label, num_class_, class_scale_, &avg_cat, stride, use_focal_loss_,label_smooth_eps_); //softmax_tree_
                        ++count;
                        ++class_count_;
                    }
                }
            }
        }

6. Backward_cpu

Backward_cpu的功能基本都一样，无非就是通过top_diff反推，计算bottom_diff。再就是，根据不同的层来稍作修改。比如conv_layer会更新weight_diff和biases_diff。

对于yolo层，关注这一句就可以了。

caffe_cpu_axpby(bottom[0]->count(), alpha, diff_.cpu_data(), Dtype(0), bottom[0]->mutable_cpu_diff());

赛先生.AI

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录