ssd源码解析

最新推荐文章于 2024-04-22 21:43:37 发布

XYYHLark01

最新推荐文章于 2024-04-22 21:43:37 发布

阅读量427

点赞数

分类专栏：人工智能文章标签： ssd 源码解析

本文链接：https://blog.csdn.net/XYYHLark01/article/details/101624304

版权

人工智能专栏收录该内容

24 篇文章 0 订阅

订阅专栏

1、preprocess_for_eval()
image预处理
1）tf_image_whitend()
RGB通道分别减去图像集统计的像素均值。
2）tf_image.resize_image()
图像缩放成(300,300,3)。
2、ssd_net
1)SSDNet()
初始化功能。
定义参数：
feat_layers: 使用定义的feature层提取annochor。
feat_shape: 每层feat_layers的尺寸，开始定义为二维数组：[feat_w, feat_h]，ssd网络设置完成后，更新为三维：[feat_w, feat_h, num_anchors]；
anchor_size: 每层feat_layers的anchor的个数；
anchor_ratio: 每层feat_layers的anchor的长宽缩放因子序列；
anchor_steps: 每层feat_layers的ancho的滑动步长。
注意：
len(feat_layers) == len(feat_shape) == len(anchor_size) == len(anchor_ratio) == len(anchor_steps)。

2) ssd_net.net()
ssd网络定义ssd_net（）。
过程参数定义：
$end_points: 对每一层的网络进行存储，具体值如下所示：
'block1': shape = (1, 300, 300, 64),
'block2': shape = (1, 150, 150, 128),
'block3': shape = (1, 75, 75, 256),
'block4': shape = (1, 38, 38, 512),
'block5': shape = (1, 19, 19, 512),
'block6': shape = (1, 19, 19, 1024),
'block7': shape = (1, 19, 19, 1024),
'block8': shape = (1, 10, 10, 512),
'block9': shape = (1, 5, 5, 256),
'block10': shape = (1, 1, 1, 256)

$feat_layer: 进行anchor提取的feat_layers。
['block4', 'block7', 'block8', 'block9', 'block10', 'block11']

$函数ssd_multibox_layer()
该函数功能是建立一个multbox layer，返回classify和localization的预测。返回的网络层大小为eg: location->[1, 38, 38, num_anchors, 4(rectangle的坐标点)]、prediction: [1, 38, 38, num_anchors, 21(目标的种类)]。
construct a multibox layer , return a class and localization predictions.

输入参数：
*inputs: 输入的feat_layer,如<tf.Tensor'ssd_300_vgg/conv4/conv4_3/Relu:0'shape=(1,38,38,512)dtype=float32>;
*num_class: 目标分类的个数，pascal voc2007为21种分类。
*sizes: 为该层的anchor_size[i]。
*raitos: 为改层的anchor_ratio[i] 。
过程值：
*num_anchors=anchor缩放系数个数 x anchor box个数。
每个像素点的location的总数(num_loc_pre) = num_anchors x 4(四个坐标点)；
每个像素点的object class 总数(num_cls_pred) = num_anchors x num_classes(目标类别总数)。

返回值：
*logits: 类别预测
[<tf.Tensor'ssd_300_vgg/block4_box/Reshape_1:0'shape=(1,38,38,4, 21)dtype=float32>,
<tf.Tensor'ssd_300_vgg/block8_box/Reshape_1:0'shape=(1,19,19,6, 21)dtype=float32>
<tf.Tensor'ssd_300_vgg/block9_box/Reshape_1:0'shape=(1,10,10,6, 21)dtype=float32>
<tf.Tensor'ssd_300_vgg/block10_box/Reshape_1:0'shape=(1,5,5,6, 21)dtype=float32>
<tf.Tensor'ssd_300_vgg/block11_box/Reshape_1:0'shape=(1,1,1,4, 21)dtype=float32>]

*locations：位置预测
[<tf.Tensor'ssd_300_vgg/block4_box/Reshape_0:0'shape=(1,38,38,4, 4)dtype=float32>,
<tf.Tensor'ssd_300_vgg/block8_box/Reshape_0:0'shape=(1,19,19,6, 4)dtype=float32>
<tf.Tensor'ssd_300_vgg/block9_box/Reshape_0:0'shape=(1,10,10,6, 4)dtype=float32>
<tf.Tensor'ssd_300_vgg/block10_box/Reshape_0:0'shape=(1,5,5,6, 4)dtype=float32>
<tf.Tensor'ssd_300_vgg/block11_box/Reshape_0:0'shape=(1,1,1,4, 4)dtype=float32>]

最后更新self.param.feat_shapes参数，使其由二维变成三维。

3、ssd_anchor_one_layer()
函数功能：
对于每一层的feat_layers，计算SSD默认的anchor boxes。
输入参数：
*img_shape: 原始图像的尺寸（eg: [300, 300]）;
*feat_shape: feat_layer的尺寸；
*sizes: anchor box的宽度预定义序列；
*ratios: anchor box的长宽比例因子序列；
*step：滑动步长。

输出参数：
*y: 每一个anchor中心点在大图中的y轴坐标（其值为：feat图像上的y轴坐标值 * step/原始图像的高度）；
x: 每一个anchor中心点在大图中的x轴坐标（其值为：feat图像上的x轴坐标值 * step/原始图像的高度）；
h: 每一个anchor的高度，anchor的总数为len(sizes) * len(ratios);
w: 每一个anchor的宽度，anchor的总数为len(sizes) * len(ratios)。

4、 ssd训练的loss定义：
正样本的loss + 负样本的loss + anchor的loss。
其中：
1)正/负样本loss
使用tf.nn.sparse_softmax_cross_entropy_with_logits。

2)anchor的loss
使用localisations的groud true和predict 的差值，再求平均值。