SSD详解

最新推荐文章于 2021-04-28 17:43:13 发布

栐木

最新推荐文章于 2021-04-28 17:43:13 发布

阅读量167

点赞数

分类专栏： # Deep Learning

本文链接：https://blog.csdn.net/m0_37347379/article/details/107900510

版权

Deep Learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

SSD

paper
网络结构图
数据处理
- 数据增强
- Default anchors
Loss
Reference

paper

[https://arxiv.org/pdf/1512.02325.pdf](https://arxiv.org/pdf/1512.02325.pdf)

网络结构图

简版：
在这里插入图片描述详细版：
特点：

Multi-scale feature maps for detection 多尺度特征图
卷积特征层尺寸逐步增加，可以在不同 scales 预测检测结果
Convolutional predictors for detection
对于一个 m×n大小 p channels 的特征层，采用 3×3×p 的 small kernel 来得到类别分数，或者相对于 default box 的相对偏移量. 采用小卷积核来预测 bounding box 的物体类别和偏移量.
Default boxes and aspect ratios 默认boxes 和纵横比
对于网络输出的多个 feature maps，分别将各 feature map 与一组默认的边界框(bounding boxes)关联. 默认的boxes以卷积的方式与 feature map关联，使得各 box 相对于其 feature map cell 的相对位置是固定的. 在各 feature map cell，预测 box 与默认 box的偏移(offsets)，以及 box 内存所在某类实例的类别分数(per-class scores). 即，对于在给定位置的 k 个boxes中的各 box，计算 c 类分数和 4 个相对于默认box的偏移值(offses). 采用 (c+4)k 个 filters 对一个 m×n 的 feature map 各位置进行处理，会产生 (c+4)kmn 个输出. 这里默认的 boxes 类似于 Faster R-CNN 中的 anchor boxes. 通过在多个不同分辨率的 feature map 设定不同的默认 box，能够有效的离散化可能的输出 boxes.

数据处理

数据增强

每次训练随机从以下三种策略中随机抽取一种：

采用整个输入图片
根据与 objects 的minimum jaccard overlap {0.1,0.3,0.5,0.7，or 0.9 }采样 patch.
随机选取一个采样 patch.

采样patch 的尺寸为原始图片尺寸的 [0.1,1] 倍，aspect ratio 在[12,2] 之间. 如果 groundtruth box的中心在采样patch 中，则保留其重叠部分. 采样后，各采样patch 被裁剪为固定尺寸，并以0.5的概率随机水平镜像. 也会进行一些添加一些噪声类似于图片失真.

Default anchors

# voc demo
self.min_sizes = [30, 60, 111, 162, 213, 264]
self.max_sizes = [60, 111, 162, 213, 264, 315]
self.feature_maps = [38, 19, 10, 5, 3, 1]
self.image_size = 300
self.aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
mean = []
for k, f in enumerate(self.feature_maps):
    for i, j in product(range(f), repeat=2):
        f_k = self.image_size / self.steps[k]
        # unit center x,y
        cx = (j + 0.5) / f_k
        cy = (i + 0.5) / f_k

        # aspect_ratio: 1
        # rel size: min_size
        s_k = self.min_sizes[k]/self.image_size
        mean += [cx, cy, s_k, s_k]

        # aspect_ratio: 1
        # rel size: sqrt(s_k * s_(k+1))
        s_k_prime = sqrt(s_k * (self.max_sizes[k]/self.image_size))
        mean += [cx, cy, s_k_prime, s_k_prime]

        # rest of aspect ratios
        for ar in self.aspect_ratios[k]:
            mean += [cx, cy, s_k*sqrt(ar), s_k/sqrt(ar)]
            mean += [cx, cy, s_k/sqrt(ar), s_k*sqrt(ar)]

以min_size为宽高生成一个框
再用sqrt(min_size * max_size)为宽高，生成一个框
在根据aspect_ratio，再去生成，如：aspect_ratio=2, 那么会自动的在添加一个aspect_ratiod = 1/2,在根据下面的公式在生成两个：

直观的说，就是min_size和max_size会分别生成一个正方形的框，而aspect_ratio参数会生成2个长方形的框.

Loss

Matching Strategy 匹配策略

训练时需要确定默认 boxes 所对应的 ground truth. 每个 groundtruth box，是从变化位置(vary over location)、纵横比(aspect ratio)和尺度(scale)所得到的 default boxes中进行选择的

根据最大的 jaccard overlap 来将各 groundtruth box 与 default box 进行匹配
将 default boxes 与任何 jaccard overlap 阈值大于某个特定值(0.5)的 groundtruth box 进行匹配.

Hard Negative Mine 困难负样本挖掘

boxes 匹配后，大部分 default boxes 都是 negatives，尤其是 possible default boxes 数量较大时. 这会造成 positive 和 negative 训练样本的严重不平衡.因此，这里不使用所有的 negative 样本. 而是根据各 default box 的 highest confidence loss 排序，并选择最大的，使得 negatives 和 positives 的比例大概为 3:1.

def log_sum_exp(x):
    """Utility function for computing log_sum_exp while determining
    This will be used to determine unaveraged confidence loss across
    all examples in a batch.
    Args:
        x (Variable(tensor)): conf_preds from conf layers
    """
    x_max = x.data.max()
    return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
    
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

# Hard Negative Mining
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0  # filter out pos boxes for now
_, loss_idx = loss_c.sort(1, descending=True)
_, idx_rank = loss_idx.sort(1)
num_pos = pos.long().sum(1, keepdim=True)
num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)

用log_sum_exp来计算样本的loss_c
把正样本的loss置0（正样本全部保留）
loss_c进行排序，按照3倍的正样本保留（保留大的loss属于hard sample）
log_sum_exp推导：

Function Loss

total loss:
Location Loss是用smooth L1 Loss:

其中： $l_i^m$ 表示predict boxes, $g_j^m$ 表示 ground truth box, $c x 和 c y$ 表示center_x与center_y， $d_i^w$ 表示default bounding box的宽, 通过 $g_j^m$ 与 $d_j^m$ 的计算来得到 $\hat{g}_j^m$ , 最后通过 $x_{ij}^ksmooth_{L1}(l_i^m - \hat{g}_j^m)$ 得到最后的location Loss
Conf Loss是用交叉熵（cross_entropy）来计算:

Reference

https://blog.csdn.net/zziahgf/article/details/78297483
https://zhuanlan.zhihu.com/p/77868999

栐木

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SSD详解

文章目录数据增强网络结构图Loss训练参数数据增强每一次训练都是随机在以下操作中选取一种用原始图像与目标的overlap的0.1, 0.3, 0.5, 0.7, 0.9中选取一个patch随机采样一个patch网络结构图Loss训练参数...
复制链接

扫一扫

专栏目录