darknet源码阅读之region_layer.c

最新推荐文章于 2021-06-29 02:14:02 发布

beingod0

最新推荐文章于 2021-06-29 02:14:02 发布

阅读量620

点赞数

分类专栏：神经网络学习文章标签：深度学习 c++

本文链接：https://blog.csdn.net/beingod0/article/details/105256515

版权

神经网络学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

darknet源码中的region_layer.c文件分析

最近在研究yolov2的模型，损失函数真的很头疼，感觉扎进了multi-task objection的原理的坑中，需要从源码进行些分析

前提缩写

gt = ground truth
pred = prediction
tx ty tw th : 网络输出的偏移量
bx by bw bh : 原图的0~1的中心点和选框长宽

文件中函数内容

这里跳过所有的和GPU编译有关的宏定义进行说明

make_region_layer，返回一个layer结构体，并将这个layer结构体进行初始化操作，主要说明：
w,h —— 13 x 13的输出feature的长宽
coords是4长度的xywh
biases是 anchors，52大小
outputs是输出，类似1313525（yolov2）
delta，是一个存储gt和pred的空间
resize_region_layer，将layer的输入输出大小进行调整
get_region_box 将yolov2启用的tx,ty,tw,th转换为bx, by, bw, bh
delta_region_box 计算 tx, ty, tw, tx 的 delta值的地方，并返回iou，这里在计算每一个delta值的时候都乘了scale，其值为
l.coord_scale * (2 - truth.w*truth.h)
这样对于小的框，会有更高的loss计入
delta_region_mask 引入的mask，只有当前面coords大于4的时候会计入
delta_region_class 类别的delta计算，分两种情况，带有softmaxtree的和不带的，同样引入每一位的delta/softmax的情况，yolov2.cfg中采用了softmax，则计算方式为
scale * (((n == class)?1 : 0) - output[index + stride*n])
即每一位都进行计算，并乘scale，这里scale在yolov2是1
float logit(float x) 求 logit激励函数
float tisnan(float x) test nan？
entry_index 方便计算index的接口函数
forward_region_layer 计算前向传播的region的函数，主要功能：
计算出l.cost，即loss计算
打印我们在用darknet训练时的
Region Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, Avg Recall: %f, count: %d\n
这个函数后面着重分析
correct_region_boxes 矫正输出，得到结果，以detection的指针返回给下面那个函数，主要矫正bx, by, bw, bh，
get_region_detections 获取detection的结果，网络的输出解析函数
zero_objectness 给一个输出全是0的layer

forward_region_layer函数如下，我加入了注释，并将GPU和部分和yolov2无关部分去掉了

void forward_region_layer(const layer l, network net)
{
    int i,j,b,t,n;
    memcpy(l.output, net.input, l.outputs*l.batch*sizeof(float));

#ifndef GPU ...
#endif

    memset(l.delta, 0, l.outputs * l.batch * sizeof(float));
    if(!net.train) return;
    // 拿来加最后的avg iou输出的，均值
    float avg_iou = 0;
    // 拿来加最后回掉率的
    float recall = 0;
    // 预测类别的准确率均值
    float avg_cat = 0;
    // 有物体的均值
    float avg_obj = 0;
    // 均值，这里实际是输出一个所有点的平均obj conf的概率
    float avg_anyobj = 0;
    int count = 0;
    int class_count = 0;
    *(l.cost) = 0;
    for (b = 0; b < l.batch; ++b) {
        if(l.softmax_tree){ // yolov2中未定义softmax树，过
            ...
        }
        for (j = 0; j < l.h; ++j) {
            for (i = 0; i < l.w; ++i) {
                for (n = 0; n < l.n; ++n) {
                    // 输出遍历， 类似 13 x 13 x 5 x 25
                    //get the box index
                    int box_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 0);
                    //get the box prediction
                    box pred = get_region_box(l.output, l.biases, n, box_index, i, j, l.w, l.h, l.w*l.h);
                    // 这个bestiou就是yolov2的公式中第一项要找的东西，和阈值比较
                    // 之后就有用了/划水了
                    float best_iou = 0;
                    // 30 gt box limit
                    for(t = 0; t < 30; ++t){
                        box truth = float_to_box(net.truth + t*(l.coords + 1) + b*l.truths, 1);
                        if(!truth.x) break;//没有了就跳出
                        //find the best iou
                        float iou = box_iou(pred, truth);
                        if (iou > best_iou) {
                            best_iou = iou;
                        }
                    }
                    // find the objectness conf
                    int obj_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, l.coords);
                    // first add the object conf
                    avg_anyobj += l.output[obj_index];
                    // define the object delta value, with background / no background divide
                    l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]); //yolov2是这个
                    if(l.background) l.delta[obj_index] = l.noobject_scale * (1 - l.output[obj_index]);
                    // when the best iou max than thresh, clear the delta part 就是这里用掉了best_iou
                    if (best_iou > l.thresh) {
                        l.delta[obj_index] = 0;
                    }
                    // learn the anchor when the iter is smaller than 12800，公式的第二项prior项，学啥呢，全部都学到cell的中心，以及对应的anchor，有目标的，后面会更新掉这里的delta
                    if(*(net.seen) < 12800){
                        box truth = {0};
                        truth.x = (i + .5)/l.w;
                        truth.y = (j + .5)/l.h;
                        truth.w = l.biases[2*n]/l.w;
                        truth.h = l.biases[2*n+1]/l.h;
                        // calculate the yolov2 prior formular
                        // 悄悄的用了0.01的scale
                        delta_region_box(truth, l.output, l.biases, n, box_index, i, j, l.w, l.h, l.delta, .01, l.w*l.h);
                    }
                }
            }
        }
        // seems like the max bounding box number is 30, and this just loop the 30 boxes
        // 这里以来30个标签annotation框来进行匹配
        for(t = 0; t < 30; ++t){
            box truth = float_to_box(net.truth + t*(l.coords + 1) + b*l.truths, 1);

            if(!truth.x) break;
            float best_iou = 0;
            int best_n = 0;
            // transfer the gt to current feature size
            // 这里同时省的在整个里面去找了，直接用gt找到基准的cell
            i = (truth.x * l.w);
            j = (truth.y * l.h);
            box truth_shift = truth; // this is for iou calculation
            truth_shift.x = 0;
            truth_shift.y = 0;
            // 和anchors比较，得到最大iou那个best_n
            for(n = 0; n < l.n; ++n){
                // find out the best iou box within the anchors
                int box_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 0);
                box pred = get_region_box(l.output, l.biases, n, box_index, i, j, l.w, l.h, l.w*l.h);
                if(l.bias_match){
                    pred.w = l.biases[2*n]/l.w;
                    pred.h = l.biases[2*n+1]/l.h;
                }
                pred.x = 0;
                pred.y = 0;
                float iou = box_iou(pred, truth_shift);
                if (iou > best_iou){
                    best_iou = iou;
                    best_n = n;
                }
            }
            //get the best iou box 获取最好的那个的index
            int box_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, 0);
            // cal the iou of gt & pred box 计算这个地方的iou
            float iou = delta_region_box(truth, l.output, l.biases, best_n, box_index, i, j, l.w, l.h, l.delta, l.coord_scale *  (2 - truth.w*truth.h), l.w*l.h);
             //with more than 4 coordinary, cal the mask
             // yolov2这里为l.coords=4
            if(l.coords > 4){
                int mask_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, 4);
                delta_region_mask(net.truth + t*(l.coords + 1) + b*l.truths + 5, l.output, l.coords - 4, mask_index, l.delta, l.w*l.h, l.mask_scale);
            }
            // iou .5  recall rate add one
            if(iou > .5) recall += 1;
            avg_iou += iou;
            // obj index get，获取obj conf的index
            int obj_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, l.coords);
            avg_obj += l.output[obj_index];
            l.delta[obj_index] = l.object_scale * (1 - l.output[obj_index]);
            // yolov2重计分了，这里进来后，和iou比较
            if (l.rescore) {
                // use the iou to replace 1, as the yolov2 formular illustrate
                l.delta[obj_index] = l.object_scale * (iou - l.output[obj_index]);
            }
            // yolov2不考虑
            if(l.background){
                l.delta[obj_index] = l.object_scale * (0 - l.output[obj_index]);
            }
            // get the class
            int class = net.truth[t*(l.coords + 1) + b*l.truths + l.coords];
            if (l.map) class = l.map[class];
            // 把所有 class 的预测概率与真实 class 的 0/1 的差 * scale，然后存入 l.delta 里相应 class 序号的位置
            int class_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, l.coords + 1);
            delta_region_class(l.output, l.delta, class_index, class, l.classes, l.softmax_tree, l.class_scale, l.w*l.h, &avg_cat, !l.softmax);
            ++count;
            ++class_count;
        }
    }
    // this part renew the delta square error cost
    // 这里把delta的结果，全部加起来了，然后开方，再平方
    *(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2);
    printf("Region Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, Avg Recall: %f,  count: %d\n", avg_iou/count, avg_cat/class_count, avg_obj/count, avg_anyobj/(l.w*l.h*l.n*l.batch), recall/count, count);
}

beingod0

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
darknet源码阅读之region_layer.c

darknet源码中的region_layer.c文件分析最近在研究yolov2的模型，损失函数真的很头疼，感觉扎进了multi-task objection的原理的坑中，需要从源码进行些分析前提缩写gt = ground truthpred = predictiontx ty tw th : 网络输出的偏移量bx by bw bh : 原图的0~1的中心点和选框长宽文件中函数内容...
复制链接

扫一扫

专栏目录