SSD目标检测算法简介:
论文地址:https://arxiv.org/abs/1512.02325
全称:Singel Shot MultiBox Detector(那么为什么不叫SSMD呢,可能因为不太好听吧)
SSD属于one-stage目标检测算法,就是目标的分类和检测任务是同时完成的(从始至终都是用一个网络)。
SSD其实是在Yolo上进行了改进:如下图所示,简单来说SSD最明显的特征就是:多尺度检测。(就是对图片在经过主干网络VGG-16得到的feature map,再进行多次卷积,得到多个不同尺寸的feature map, 分别对这些不同尺寸的feature map进行Detect)
SSD基础网络是基于VGG16,后续在其基础上做了些调整,具体如下:
1.将VGG16的Fc6 Fc7层转化为卷积层;
2.去掉了Dropout层和Fc8层
3.Fc7层后新增了4个卷积层
SSD的基础网络结构可以用下表表示清楚:(以输入3003003为例子)
以上就是SSD特征提取网络的整体结构,其实看起来还是非常清晰的。其中从input到FC6其实就是传统的VGG16网络结构。
可以看出SSD的特征提取网络,实际上就是生成了我们需要的6个 feature map ,其shape分别为:(38,38,512),(19,19,1024),(10, 10, 512),(5,5,256),(3, 3, 256),(1, 1, 256)。
PS:
1x1的卷积就是为了调整通道数。
没有标padding的都是默认padding==same,这种卷积方式不会改变feature map的宽和高。
使用Keras来搭建SSD特征提取网络:
import keras
import numpy as py
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, Input, Reshape, GlobalAveragePooling2D
def VGG16(input_tensor):
#用来存放所有的中间层
net = {}
#block1 两个卷积 一个maxpooling
#input:300 300 3 output:150 150 64
net['input'] = input_tensor #input_tensor就是输入图像
net['conv1_1'] = Conv2D( 64, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv1_1')(net['input'])
net['conv1_2'] = Conv2D(64, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv1_2')(net['conv1_1'])
net['pool1'] = MaxPooling2D(pool_size=(2,2),
strides=(2,2),
padding='same',
name='pool1')(net['conv1_2'])
#block2 两个卷积一个maxpooling
# input:150 150 64 output:75 75 128
net['conv2_1'] = Conv2D(128, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv2_1')(net['pool1'])
net['conv2_2'] = Conv2D(128, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv2_2')(net['conv2_1'])
net['pool2'] = MaxPooling2D(pool_size=(2,2),
strides=(2,2),
padding='same',
name='pool2')(net['conv2_2'])
# block3 3个卷积一个maxpooling
# input: 75 75 128 output:38 38 256
net['conv3_1'] = Conv2D(256, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv3_1')(net['pool2'])
net['conv3_2'] = Conv2D(256,kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv3_2')(net['conv3_1'])
net['conv3_3'] = Conv2D(256, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv3_3')(net['conv3_2'])
net['pool3'] = MaxPooling2D(pool_size=(2, 2),
strides=(2, 2),
padding='same',
name='pool3')(net['conv3_3'])
# block4 3个卷积一个maxpooling
# input: 38 38 256 output:19 19 512
net['conv4_1'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv4_1')(net['pool3'])
net['conv4_2'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv4_2')(net['conv4_1'])
net['conv4_3'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv4_3')(net['conv4_2'])
#nets['conv4_3'] 是一个有效特征层
y1 = net['conv4_3']
net['pool4'] = MaxPooling2D(pool_size=(2, 2),
strides=(2, 2),
padding='same',
name='pool4')(net['conv4_3'])
# block5 3个卷积一个maxpooling
# input: 19 19 512 output:19 19 512
net['conv5_1'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv5_1')(net['pool4'])
net['conv5_2'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv5_2')(net['conv5_1'])
net['conv5_3'] = Conv2D(512, kernel_size=(3, 3),
activation='relu',
padding='same',
name='conv5_3')(net['conv5_2'])
y2 = net['conv5_3']
net['pool5'] = MaxPooling2D(pool_size=(2, 2),
strides=(1, 1),#这里maxpooing 的步长为1 所以不会改变长宽
padding='same',
name='pool5')(net['conv5_3'])
#FC6利用卷积代替传统vgg里面的全连接层
#19 19 512 ---> 19 19 1024
#改变通道数
net['fc6'] = Conv2D(1024, kernel_size=(3, 3),
activation='relu',
padding='same',
name='fc6')(net['pool5'])
#FC7
#19 19 1024 ---> 19 19 1024
#改变通道数
net['fc7'] = Conv2D(1024, kernel_size=(1, 1),
activation='relu',
padding='same',
name='fc7')(net['fc6'])
#nets['fc7']是一个有效特征层
#block6 1个卷积 + 1个zeropadding + 1个卷积
#input:19 19 1024 ouput:10 10 512
net['conv6_1'] = Conv2D(256, kernel_size=(1,1),
activation='relu',
padding='same',
name='conv6_1')(net['fc7'])
net['conv6_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv6_padding')(net['conv6_1'])
net['conv6_2'] = Conv2D(512, kernel_size=(3, 3),
strides=(2, 2),
activation='relu',
name='conv6_2')(net['conv6_2'])
y3 = net['conv6_2']
#nets['conv6_2']是一个有效特征层
#block7 1个卷积 + 1个zeropadding + 1个卷积
#input 10 10 512 output 5 5 256
net['conv7_1'] = Conv2D(128, kernel_size=(1, 1),
activation='relu',
padding='same',
name='conv7_1')(net['conv6_2'])
net['conv7_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv7_padding')(net['conv7_1'])
net['conv7_2'] = Conv2D(256, kernel_size=(3, 3),
strides=(2, 2),
activation='relu',
padding='valid',
name='conv7_2')(net['conv7_2'])
y4 = net['conv7_2']
# nets['conv7_2']是一个有效特征层
#block8 1个卷积 + 1个卷积
#input 5 5 256 output 3 3 256
net['conv8_1'] = Conv2D(128, kernel_size=(1, 1),
activation='relu',
padding='same',
name='conv8_1')(net['conv7_2'])
net['conv8_2'] = Conv2D(256, kernel_size=(3, 3),
strides=(1, 1), activation='relu', padding='valid',
name='conv8_2')(net['conv8_1'])
y5 = net['conv8_2']
# nets['conv8_2']是一个有效特征层
# block9 1个卷积 + 1个卷积
# input 3 3 256 output 1 1 256
net['conv9_1'] = Conv2D(128, kernel_size=(1, 1),
activation='relu',
padding='same',
name='conv9_1')(net['conv8_2'])
net['conv9_2'] = Conv2D(256, kernel_size=(3, 3),
strides=(1, 1), activation='relu', padding='valid',
name='conv9_2')(net['conv9_1'])
y6 = net['conv9_2']
# nets['conv9_2']是一个有效特征层
model = Model(input_tensor, nets, name="VGG16")
return model
if __name__ == '__main__':
inputs = Input(shape=(300,300,3))
model = VGG16(inputs)
#打印网络结构
model.summary()