1.语义分割的挑战
- 分辨率
语义分割中避免不了使用卷积以及池化等操作,这些操作会导致图像分辨率下降,会损失原始信息,而且这些信息在上采样的过程中很难被恢复。(目前提出的解决办法有空洞卷积,用步长为2的卷积代替池化)
- 感受野
感受野表示的是当前层所看到的对应原始图像的区域 ,感受野过大会导致小目标分割不准确,感受野过小由于看到的信息不完整也会导致分割不准确。DeepLab中则是采用了ASPP模块来解决这个问题(采用多个感受野相结合的方式进行特征提取),并且ASPP模块的可移植性很强。
- 多尺度特征
可以设置不同参数的卷积层和池化层以获取多种尺度的特征图,将这些特征图进行融合,可以使得网络性能提升,但此种做法可能会引入大量的计算以及参数,在实时性和准确性间需要做一个平衡。
2.论文摘要
DeepLab v1引入CRF概念(因为DCNNs最后一层不足以精确分割目标)和带孔卷积Atrous Convolution(略提),将深度卷积神经网络与CRF相结合,克服深度网络的局部化特性,在PASCAL VOC 2012取得71.6%的IOU,在正常GPU上可达到每秒8帧的处理速度。
DeepLab v2利用空洞卷积,在不增加参数量的情况下扩大感受野;同样也是将DCNNs与CRF相结合;提出ASPP模块,在PASCAL VOC 2012取得79.7%的mIOU。
DeepLab v3设计了级联或并行的空洞卷积模块,扩充了ASPP模块,在不适用CRF的前提下获得与其他模型相当的性能。
DeepLab v3+添加了一个简单有效的解码器模块来扩展DeepLab v3,可以更好的定位分割边界,在PASCAL VOC 2012取得89%的mIOU,在Cityscapes取得82.1%的mIOU。
3.模型架构
DeepLab v1:
DeepLab v2:
空洞卷积:
在原来的区域之间补上0,如上图,原来三个黄色小三角对应一个绿色小方块,经过空洞卷积之后变成5个黄色小三角对应一个绿色小方块,那么现在这个绿色小方块所具备的信息显然是要比之前的信息要来的多的,因此感受野也从原来的3*3变成现在的5*5。
(rate代表的是空洞率,个人感觉空洞卷积在stride不变的前提下通过影响卷积核尺寸间接影响感受域,那为何不直接使用尺寸大的卷积核比如7*7 的卷积核?个人给出的解释为直接使用大尺寸的卷积核会引入大量的参数,尽管空洞卷积之后的卷积核尺寸和直接使用大尺寸的卷积核相同,但空洞卷积由于其实补了很多的0导致参数会比后者要少的多,既保证了感受野的扩大,也保证了网络的效率。)
空洞卷积的提出个人认为是针对语义分割感受野挑战所提出的,感受野对于图像分割是一个关键因素,传统的池化下采样操作缩小图片尺寸增大感受野,再通过上采样增大恢复图片尺寸,在这个过程中其实丢失了一些特征信息并且没办法恢复。那么有没有办法解决这个问题呢?空洞卷积应运而生,它可以增大感受野的同时不改变输出特征图的大小,能够生成更加密集的特征图。
那能否将网络全部采用空洞卷积呢?
答案是否定的,一方面空洞卷积本身并非完美,如果全部单一的使用空洞卷积会带来一些问题,有兴趣的可以深入了解,另一方面,如果只采用空洞卷积不用池化对图像尺寸进行缩小,那么模型的参数和计算量可能会比较大,所耗费的资源可能会比较多从而不实用,所以空洞卷积和池化可以搭配着一起用。
SPPNet:
传统的CNN对于输入图片会有限制,因为全连接层的存在,最后一个卷积层的输出需要固定尺寸,从而要求输入图片尺寸需要固定(简单来说就是你全连接层设置的接收参数为100,那么你就得保证你输入图片经过最后一个卷积层输出来的数也得为100,不然就会报错,所以你不能输入随随便便大小尺寸的图,这样经过卷积之后出来的值也会变得不确定,从而会导致不匹配),SPPNet的提出使得网络可以接收任意尺寸的图,它在最后一个卷积层与全连接层之间又加入一个层,使得经过这个层出来的尺寸能够与全连接设置的值相匹配。
DeepLab v3:
DeepLab v3设计了串联和并联的ASPP模块,并且取消了CRF,为了解决随着采样率越来越大网络随之退化的问题,在ASPP里还加入了全局平均池化。
DeepLabv1,v2,v3对比
DeepLab v3+:
提出了深度可分离卷积,深度可分离卷积可以和传统卷积保持相同的结果下,大幅度减小模型的参数量。
例如输入图片尺寸12*12*3的图片,经过1个5*5*3的卷积核,输出变成8*8*256尺寸所需要的参数量为5*5*3*256=19200,而如果使用深度可分离卷积则需要5*5*1*3+1*1*3*256=843
DeepLab系列发展历程
4.Pytorch实现DeepLabv3+
Xception网络架构:
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class ASPP(nn.Module):
def __init__(self,inplanes,planes,os):
super(ASPP,self).__init__()
if os==16:
dilations=[1,6,12,18]
elif os==8:
dilations=[1,12,24,36]
self.aspp1=nn.Sequential(nn.Conv2d(inplanes,planes,kernel_size=1,stride=1,padding=0,dilation=dilations[0],bias=False),
nn.BatchNorm2d(planes),
nn.ReLU())
self.aspp2=nn.Sequential(nn.Conv2d(inplanes,planes,kernel_size=3,stride=1,padding=dilations[1],dilation=dilations[1],bias=False),
nn.BatchNorm2d(planes),#由于模型是并行输出,所以输入都是inplanes,输出都是planes
nn.ReLU()
)
self.aspp3 = nn.Sequential(
nn.Conv2d(inplanes, planes, kernel_size=3, stride=1, padding=dilations[2], dilation=dilations[2],
bias=False),
nn.BatchNorm2d(planes),
nn.ReLU()
)
self.aspp4 = nn.Sequential(
nn.Conv2d(inplanes, planes, kernel_size=3, stride=1, padding=dilations[3], dilation=dilations[3],
bias=False),
nn.BatchNorm2d(planes),
nn.ReLU()
)
self.gp=nn.Sequential(nn.AdaptiveAvgPool2d((1,1)),
nn.Conv2d(2048,256,1,stride=1,bias=False),
nn.BatchNorm2d(256),
nn.ReLU())
self.conv1=nn.Conv2d(1280,256,1,bias=False)#1280为contract之后的通道数
self.bn1=nn.BatchNorm2d(256)
self._init_weight()#初始化权重
def forward(self,x):
x1=self.aspp1(x)
x2=self.aspp2(x)
x3 = self.aspp3(x)
x4 = self.aspp4(x);#print(x4.size())
x5=self.gp(x);#print(x5.size())
x5=F.interpolate(x5,size=x4.size()[2:],mode="bilinear",align_corners=True)#使得x5的尺寸和x4相同,便于contract拼接
x=torch.cat((x1,x2,x3,x4,x5),dim=1)
x=self.conv1(x)
x=self.bn1(x)
return x
def _init_weight(self):
for m in self.modules():
if isinstance(m,nn.Conv2d):
n=m.kernel_size[0]*m.kernel_size[1]*m.out_channels
m.weight.data.normal_(0,math.sqrt(2. /n))
elif isinstance(m,nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
# model=ASPP(512,256,16)
# model.eval()
# image=torch.randn(1,512,176,240)
# output=model(image)
# print(output.size())
def fixed_padding(inputs,kernel_size,dilation):
kernel_size_effective=kernel_size+(kernel_size-1)*(dilation-1)
pad_total=kernel_size_effective-1
pad_beg=pad_total//2
pad_end=pad_total-pad_beg
padded_inputs=F.pad(inputs,(pad_beg,pad_end,pad_beg,pad_end))
return padded_inputs
class SeparableConv2d_same(nn.Module):
def __init__(self,inplanes,planes,kernel_size=3,stride=1,dilation=1,bias=False):
super(SeparableConv2d_same, self).__init__()
self.conv1=nn.Conv2d(inplanes,inplanes,kernel_size,stride,0,dilation,groups=inplanes,bias=bias)
self.pointwise=nn.Conv2d(inplanes,planes,1,1,0,1,1,bias=bias)
def forward(self,x):
x=fixed_padding(x,self.conv1.kernel_size[0],dilation=self.conv1.dilation[0])
x=self.conv1(x)
x=self.pointwise(x)
return x
class Block(nn.Module):
def __init__(self,inplanes,planes,reps,stride=1,dilation=1,start_with_relu=True,is_last=False):#reps代表每个biock中有几个重复的层
super(Block, self).__init__()
if planes!=inplanes or stride !=1:#根据条件判断是否加入跳跃层
self.skip=nn.Conv2d(inplanes,planes,1,stride=stride,bias=False)
self.skipbn=nn.BatchNorm2d(planes)
else:
self.skip=None
self.relu=nn.ReLU(inplace=True)#inplace是否进行值覆盖,可以节省内存
rep=[]#rep作为列表完成一个block的封装
rep.append(self.relu)#block的第一个层
rep.append(SeparableConv2d_same(inplanes,planes,3,stride=1,dilation=dilation))
rep.append(nn.BatchNorm2d(planes))
filters=planes#记录输出的通道数便于后面使用
for i in range(reps-1):
rep.append(self.relu)
rep.append(SeparableConv2d_same(filters, filters, 3, stride=1, dilation=dilation))
rep.append(nn.BatchNorm2d(filters))
if not start_with_relu:#第一个block最开始没有relu层
rep=rep[1:]
if stride!=1:#前三个block添加stride=2的层
rep.append(SeparableConv2d_same(planes,planes,3,stride=2))
if stride==1 and is_last:
rep.append(SeparableConv2d_same(planes, planes, 3, stride=1))
self.rep=nn.Sequential(*rep)
def forward(self,inp):
x=self.rep(inp)
if self.skip is not None:
skip=self.skip(inp)
skip=self.skipbn(skip)
else:
skip=inp
x+=skip
return x
class Xception(nn.Module):
def __init__(self,inplanes=3,os=16):
super(Xception, self).__init__()
if os==16:
entry_block3_stride=2
middle_block_dilation=1
exit_block_dilations=(1,2)
elif os==8:
entry_block3_stride = 1
middle_block_dilation = 2
exit_block_dilations = (2, 4)
else:
raise NotImplementedError
#entry flow
self.conv1=nn.Conv2d(inplanes,32,3,stride=2,padding=1,bias=False)
self.bn1=nn.BatchNorm2d(32)
self.relu=nn.ReLU(inplace=True)
self.conv2=nn.Conv2d(32,64,3,stride=1,padding=1,bias=False)
self.bn2=nn.BatchNorm2d(64)
self.block1=Block(64,128,reps=2,stride=2,start_with_relu=False)
self.block2 = Block(128, 256, reps=2, stride=2, start_with_relu=True)
self.block3 = Block(256, 728, reps=2, stride=entry_block3_stride, start_with_relu=True)
#middle flow
self.block4=Block(728,728,reps=3,stride=1,dilation=middle_block_dilation,start_with_relu=True)
self.block5 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block6 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block7 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block8 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block9 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block10 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block11 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block12 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block13 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block14 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block15 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block16 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block17 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block18 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
self.block19 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation, start_with_relu=True)
#exit flow
self.block20=Block(728,1024,reps=2,stride=1,dilation=exit_block_dilations[0],start_with_relu=True,is_last=True)
self.conv3=SeparableConv2d_same(1024,1536,3,stride=1,dilation=exit_block_dilations[1])
self.bn3=nn.BatchNorm2d(1536)
self.conv4 = SeparableConv2d_same(1536, 1536, 3, stride=1, dilation=exit_block_dilations[1])
self.bn4 = nn.BatchNorm2d(1536)
self.conv5 = SeparableConv2d_same(1536, 2048, 3, stride=1, dilation=exit_block_dilations[1])
self.bn5= nn.BatchNorm2d(2048)
#init weights
self._init_weight()
def forward(self,x):
#Entry flow
x=self.conv1(x)
x=self.bn1(x)
x=self.relu(x)
x=self.conv2(x)
x=self.bn2(x)
x=self.relu(x)
x=self.block1(x)
low_level_feat=x
x=self.block2(x)
x = self.block3(x)
#middle flow
x = self.block4(x)
x = self.block5(x)
x = self.block6(x)
x = self.block7(x)
x = self.block8(x)
x = self.block9(x)
x = self.block10(x)
x = self.block11(x)
x = self.block12(x)
x = self.block13(x)
x = self.block14(x)
x = self.block15(x)
x = self.block16(x)
x = self.block17(x)
x = self.block18(x)
x = self.block19(x)
#exit flow
x = self.block20(x)
x=self.conv3(x)
x=self.bn3(x)
x=self.relu(x)
x=self.conv4(x)
x = self.bn4(x)
x=self.relu(x)
x = self.conv5(x)
x = self.bn5(x)
x = self.relu(x)
return x,low_level_feat
def _init_weight(self):
for m in self.modules():
if isinstance(m,nn.Conv2d):
n=m.kernel_size[0]*m.kernel_size[1]*m.out_channels
m.weight.data.normal_(0,math.sqrt(2. /n))
elif isinstance(m,nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
class DeepLabv3_plus(nn.Module):
def __init__(self,nInputChannels=3,n_classes=21,os=16,_print=True):
if _print:
print("Constructing DeepLabv3+ model...")
print("Backbone: Xception")
print("Number of classes: {}".format(n_classes))
print("Output stride: {}".format(os))
print("Number of Input Channels: {}".format(nInputChannels))
super(DeepLabv3_plus,self).__init__()
#Atrous Conv
self.xception_features=Xception(nInputChannels,os)
self.ASPP=ASPP(2048,256,16)
self.conv1=nn.Conv2d(256,256,1,bias=False)
self.bn1=nn.BatchNorm2d(256)
self.relu=nn.ReLU()
#adopt[1x1,48] for channel reduction
self.conv2=nn.Conv2d(128,48,1,bias=False)
self.bn2=nn.BatchNorm2d(48)
self.last_conv = nn.Sequential(nn.Conv2d(304, 256, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, n_classes, kernel_size=1, stride=1))
def forward(self,input):
x,low_level_features=self.xception_features(input)
x=self.ASPP(x)
x=self.conv1(x)
x=self.bn1(x)
x=self.relu(x)
x = F.interpolate(x, size=(int(math.ceil(input.size()[-2] / 4)),
int(math.ceil(input.size()[-1] / 4))), mode='bilinear', align_corners=True)
low_level_features=self.conv2(low_level_features)
low_level_features = self.bn2(low_level_features)
low_level_features = self.relu(low_level_features)
x=torch.cat((x,low_level_features),dim=1)
x=self.last_conv(x)
x=F.interpolate(x,size=input.size()[2:],mode='bilinear',align_corners=True)
return x
if __name__ == "__main__":
model = DeepLabv3_plus(nInputChannels=3, n_classes=12, os=16, _print=True)
model.eval()
image = torch.randn(1, 3, 352, 480)
output = model(image)
print(output.size())
参考:
B站深度之眼