本篇博客是我学习(https://blog.csdn.net/weixin_44791964)博主写的pytorch的ssd的博客后写的,大家可以直接去看这位博主的博客(https://blog.csdn.net/weixin_44791964/article/details/104981486)。这位博主在b站还有配套视频,传送门:(https://www.bilibili.com/video/BV1A7411976Z)。这位博主的在GitHub的源代码(https://github.com/bubbliiiing/ssd-pytorch)。 侵删
这里使用的代码来自链接,大家可以去下载这位大佬弄的pytorch版ssd的源码
第一篇:ssd基本原理介绍:链接
第二篇:ssd整体框架介绍:链接
这是ssd的第三篇博客,主要是介绍ssd的特征提取的代码。
首先就是让代码跑起来,把大佬的代码给git下来,然后按照他在readme上的步骤来做,基本上就可以跑起来了,我跑的是voc2012的数据集,他本来用的是voc2007的数据集,然后我遇见了一些小问题,但是没遇见大问题,一会儿就处理好了,跑起来之后我们就开始来撸大佬的代码。
我们把ssd的整体框架放上来:
ssd的最开头的那个部分是用的vgg,所以我们就从vgg开始:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import os
base = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',
512, 512, 512]
"""
图片格式的变换如下:
输入图片:300,300,3
conv1_1 300, 300, 64
conv1_2 300, 300, 64
pooling 150, 150, 64
conv2_1 150, 150, 128
conv2_2 150, 150, 128
pooling 75, 75, 128
conv3_1 75, 75, 256
conv3_2 75, 75, 256
conv3_3 75, 75, 256
pooling 38, 38, 256
conv4_1 38, 38, 512
conv4_2 38, 38, 512
conv4_3 38, 38, 512
看到这里的38x38x512的图片大小,是不是想到了我们的38x38x512的特征层呢?
这个时候图片就会被传入我们的回归预测与分类预测部分,进行目标的
pooling 19, 19, 512
conv5_1 19, 19, 512
conv5_2 19, 19, 512
conv5_3 19, 19, 512
然后就是for循环之后的代码,对图片的影响
pool5 19, 19, 512
conv6 19, 19, 1024
conv7 19, 19, 1024
看到这里的19x19x1024的图片大小,是不是想到了我们的19x19x1024的特征层呢?
这个时候图片就会被传入我们的回归预测与分类预测部分,进行目标的预测
"""
#直接调用vgg的函数,这里的参数i就是我们要传入的图片的channel
def vgg(i):
layers = []
#一般来说我们的图片的channel=3,所以i=3
in_channels = i
#下面的for语句会对我们的base列表进行循环,
for v in base:
#当循环到M的时候,表示我们要进行一次最大池化
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
#当循环到c的时候,表示我们也是进行一次最大池化,但是这里和M的区别在于,ceil_mode=True,
#对于ceil_mode 不了解的同学,可以去康康https://blog.csdn.net/html5baby/article/details/100609026博客
elif v == 'C':
layers += [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
#当循环到数字的时候
else:
#首先是卷积
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
#然后我们将卷积层放进layers列表
layers += [conv2d, nn.ReLU(inplace=True)]
#然后修改in_channel
in_channels = v
#pool层
pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
#两个卷积层
conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6)
conv7 = nn.Conv2d(1024, 1024, kernel_size=1)
#放进layers
layers += [pool5, conv6,
nn.ReLU(inplace=True), conv7, nn.ReLU(inplace=True)]
return layers
"""
将上面的for循环和base翻译过来的意思就是:
进行两次64通道的卷积:
nn.Conv2d(in_channels, 64, kernel_size=3, padding=1),放入layers
nn.Conv2d(64, 64, kernel_size=3, padding=1),放入layers
进行一次最大池化:
nn.MaxPool2d(kernel_size=2, stride=2),放入layers
进行两次128通道的卷积:
nn.Conv2d(64, 128, kernel_size=3, padding=1),放入layers
nn.Conv2d(128, 128, kernel_size=3, padding=1),放入layers
进行一次最大池化:
nn.MaxPool2d(kernel_size=2, stride=2),放入layers
进行三次256通道的卷积:
nn.Conv2d(128, 256, kernel_size=3, padding=1),放入layers
nn.Conv2d(256, 256, kernel_size=3, padding=1),放入layers
nn.Conv2d(256, 256, kernel_size=3, padding=1),放入layers
进行一次最大池化,但是这次一之前的有所不同
nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True),放入layers
进行三次512通道的卷积:
nn.Conv2d(256, 512, kernel_size=3, padding=1),放入layers
nn.Conv2d(512, 512, kernel_size=3, padding=1),放入layers
nn.Conv2d(512, 512, kernel_size=3, padding=1),放入layers
进行一次最大池化:
nn.MaxPool2d(kernel_size=2, stride=2),放入layers
进行三次512通道的卷积:
nn.Conv2d(256, 512, kernel_size=3, padding=1),放入layers
nn.Conv2d(512, 512, kernel_size=3, padding=1),放入layers
nn.Conv2d(512, 512, kernel_size=3, padding=1),放入layers
怎么说喃,用for循环感觉代码更高级哈😂
"""
到这里我们的特征提取部分的vgg部分就解释完了,接下来是要解释我们后面自己添加的一部分,这里我们可以先暂时不用管其他代码,只需要搞懂add_extras函数的作用就OK,其他函数我会在后面讲的
import torch
import os
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from utils.config import Config
from nets.ssd_layers import Detect
from nets.ssd_layers import L2Norm,PriorBox
from nets.vgg import vgg as add_vgg
class SSD(nn.Module):
def __init__(self, phase, base, extras, head, num_classes):
super(SSD, self).__init__()
self.phase = phase
self.num_classes = num_classes
self.cfg = Config
self.vgg = nn.ModuleList(base)
self.L2Norm = L2Norm(512, 20)
self.extras = nn.ModuleList(extras)
self.priorbox = PriorBox(self.cfg)
with torch.no_grad():
self.priors = Variable(self.priorbox.forward())
self.loc = nn.ModuleList(head[0])
self.conf = nn.ModuleList(head[1])
if phase == 'test':
self.softmax = nn.Softmax(dim=-1)
self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)
def forward(self, x):
sources = list()
loc = list()
conf = list()
# 获得conv4_3的内容
for k in range(23):
x = self.vgg[k](x)
s = self.L2Norm(x)
sources.append(s)
# 获得fc7的内容
for k in range(23, len(self.vgg)):
x = self.vgg[k](x)
sources.append(x)
# 获得后面的内容
for k, v in enumerate(self.extras):
x = F.relu(v(x), inplace=True)
if k % 2 == 1:
sources.append(x)
# 添加回归层和分类层
for (x, l, c) in zip(sources, self.loc, self.conf):
loc.append(l(x).permute(0, 2, 3, 1).contiguous())
conf.append(c(x).permute(0, 2, 3, 1).contiguous())
# 进行resize
loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1)
conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1)
if self.phase == "test":
# loc会resize到batch_size,num_anchors,4
# conf会resize到batch_size,num_anchors,
output = self.detect(
loc.view(loc.size(0), -1, 4), # loc preds
self.softmax(conf.view(conf.size(0), -1,
self.num_classes)), # conf preds
self.priors
)
else:
output = (
loc.view(loc.size(0), -1, 4),
conf.view(conf.size(0), -1, self.num_classes),
self.priors
)
return output
"""
这里这里
"""
def add_extras(i, batch_norm=False):
# Extra layers added to VGG for feature scaling
#接下来的卷积部分就是我们除了vgg以外,剩余的四个卷积操作
layers = []
in_channels = i
# Block 6
# 19,19,1024 -> 10,10,512,这里的额10, 10, 512的图像大小,其实就是对应的将图片划分成10x10的网格
layers += [nn.Conv2d(in_channels, 256, kernel_size=1, stride=1)]
layers += [nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1)]
# Block 7
# 10,10,512 -> 5,5,256,同理
layers += [nn.Conv2d(512, 128, kernel_size=1, stride=1)]
layers += [nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)]
# Block 8
# 5,5,256 -> 3,3,256,同理
layers += [nn.Conv2d(256, 128, kernel_size=1, stride=1)]
layers += [nn.Conv2d(128, 256, kernel_size=3, stride=1)]
# Block 9
# 3,3,256 -> 1,1,256,同理
layers += [nn.Conv2d(256, 128, kernel_size=1, stride=1)]
layers += [nn.Conv2d(128, 256, kernel_size=3, stride=1)]
return layers
mbox = [4, 6, 6, 6, 4, 4]
def get_ssd(phase,num_classes):
vgg, extra_layers = add_vgg(3), add_extras(1024)
loc_layers = []
conf_layers = []
vgg_source = [21, -2]
for k, v in enumerate(vgg_source):
loc_layers += [nn.Conv2d(vgg[v].out_channels,
mbox[k] * 4, kernel_size=3, padding=1)]
conf_layers += [nn.Conv2d(vgg[v].out_channels,
mbox[k] * num_classes, kernel_size=3, padding=1)]
for k, v in enumerate(extra_layers[1::2], 2):
loc_layers += [nn.Conv2d(v.out_channels, mbox[k]
* 4, kernel_size=3, padding=1)]
conf_layers += [nn.Conv2d(v.out_channels, mbox[k]
* num_classes, kernel_size=3, padding=1)]
SSD_MODEL = SSD(phase, vgg, extra_layers, (loc_layers, conf_layers), num_classes)
return SSD_MODEL