目标检测（2）RetinaNet解读

最新推荐文章于 2023-06-03 10:15:52 发布

礼拜天吃芋圆

最新推荐文章于 2023-06-03 10:15:52 发布

阅读量468

点赞数

分类专栏： cv

本文链接：https://blog.csdn.net/weixin_38636668/article/details/108000284

版权

cv 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1.核心：

one-stage方法：速度快，但有大量候选区，只有少部分为真实标记的，导致计算loss的时候正负样本不平衡。
two-stage方法：可以得到高的精确度，却不能保障速度要求。
在这里插入图片描述

思考：
能不能找到一种方法，既保证精确度，又保障速度。

2.解决:

Focal Loss的提出就是在one-stage的基础上解决accurary的问题。
one-stage精确度底的本质：类别不平衡导致，导致计算loss时，以类别多的为主导地位。

提出Focal loss:不是解决异常值问题，而是通过控制样本分类难以程度（给予易分样本小的权重，不易分样本大的权重）

（1）交叉熵定义：
在这里插入图片描述
（2）加入权重，平衡正负样本

（3）Focal loss定义：
α代表：控制正负样本的权重
γ表示：控制样本的容易区分的程度

3.model

在这里插入图片描述 resnet+FPN+CNN（两个子网络，其中一个classifier（A个anchor，K个class），另一个bbox框预测）

4.trick

（1）model initialzation
question:初始化时均匀初始化的话会导致，负样本求得的loss占主导（例如二分类：权重为0.5，0.5）
solve:引入前景类，概率设为：0.01
（2）网络参数初始化
w：n（0,0.01）
bias：（-log(1-π)/π） π=0.01
（3）agumentation
仅翻转
（4）Focal Loss参数设置
α：[0.25,0.75]
γ：[2,5]
（5）anchor设定
使用3中scale，3种aspects，获取最大AP
9个anchor？
（6）输入尺度
尺度大的，accurary越大，花费的time越多
一般设为600

论文翻译：
https://blog.csdn.net/PPLLO_o/article/details/88952923
论文地址：
https://arxiv.org/pdf/1708.02002.pdf

5.anchor 计算

class DataEncoder:
    def __init__(self):
        self.anchor_areas = [32 * 32., 64 * 64., 128 * 128., 256 * 256., 512 * 512.]  # p3 -> p7 每个feature map上对应框的感受野大小
        self.aspect_ratios = [1 / 2., 1 / 1., 2 / 1.]  #长宽比
        self.scale_ratios = [1., pow(2, 1 / 3.), pow(2, 2 / 3.)]  #放缩比
        self.anchor_wh = self._get_anchor_wh()

    def _get_anchor_wh(self):
        """Compute anchor width and height for each feature map.
        每个feature map的长宽
        Returns:
          anchor_wh: (tensor) anchor wh, sized [#fm, #anchors_per_cell, 2].
        """
        anchor_wh = []
        for s in self.anchor_areas:
            for ar in self.aspect_ratios:  # w/h = ar
                h = math.sqrt(s / ar)
                w = ar * h
                for sr in self.scale_ratios:  # scale
                    anchor_h = h * sr
                    anchor_w = w * sr
                    anchor_wh.append([anchor_w, anchor_h])
        num_fms = len(self.anchor_areas)
        return torch.Tensor(anchor_wh).view(num_fms, -1, 2)

    def _get_anchor_boxes(self, input_size):
        """Compute anchor boxes for each feature map.
        求每个feature map上对应格子的中心点。

        Args:
          input_size: (tensor) model input size of (w, h).

        Returns:
          boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
                        where #anchors = fmw * fmh * #anchors_per_cell
        """
        num_fms = len(self.anchor_areas)
        fm_sizes = [(input_size / pow(2., i + 3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes 
        # fm_sizes [tensor([ 56.,  56.]), tensor([ 28.,  28.]), tensor([ 14.,  14.]), tensor([ 7.,  7.]), tensor([ 4.,  4.])]

        boxes = []
        for i in range(num_fms):
            fm_size = fm_sizes[i]
            # grid_size =torch.tensor(input_size / fm_size,dtype=torch.int64)
            grid_size = input_size / fm_size 
            fm_w, fm_h = int(fm_size[0]), int(fm_size[1])
            xy = meshgrid(fm_w, fm_h) + 0.5  # [fm_h*fm_w, 2]
            xy = (xy * grid_size).view(fm_h, fm_w, 1, 2).expand(fm_h, fm_w, 9, 2)
            
            wh = self.anchor_wh[i].view(1, 1, 9, 2).expand(fm_h, fm_w, 9, 2)
            box = torch.cat([xy, wh], 3)  # [x,y,w,h]
            boxes.append(box.view(-1, 4))
        return torch.cat(boxes, 0)

1.根据长宽比和放缩比，计算每个feature map上每个格子对应的anchor大小（9个）
2.计算每个feature map上每个格子的中间点对应原图的坐标。
3.组合1，2 。得到x,y,w,h

礼拜天吃芋圆

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
目标检测（2）RetinaNet解读

1.核心：one-stage方法：速度快，但有大量候选区，只有少部分为真实标记的，导致计算loss的时候正负样本不平衡。two-stage方法：可以得到高的精确度，却不能保障速度要求。思考：能不能找到一种方法，既保证精确度，又保障速度。2.解决:Focal Loss的提出就是在one-stage的基础上解决accurary的问题。one-stage精确度底的本质：类别不平衡导致，导致计算loss时，以类别多的为主导地位。提出Focal loss:不是解决异常值问题，而是通过控制样本分类难以
复制链接

扫一扫