一、模型概览
DeepLabv3+由Encoder与Decoder两部分构成。Encoder主要包括backbone(骨架/底模)和ASPP,及对ASPP输出的降维。backbone可以使用ResNet、Xception等。
二、ASPP
简单来说,ASPP将backbone提取出的特征图输入多个平行且不同的层(如卷积层、空洞卷积层、池化层),然后将获得的多个输出拼接。
class ASPP(nn.Module):
def __init__(self, in_channels=2048):
super(ASPP, self).__init__()
self.conv1 = ConvLayer(in_channels, 256, 1, padding=0)
# rate = 6
self.conv2 = ConvLayer(in_channels, 256, 3, padding=6, dilation=6)
# rate = 12
self.conv3 = ConvLayer(in_channels, 256, 3, padding=12, dilation=12)
# rate = 18
self.conv4 = ConvLayer(in_channels, 256, 3, padding=18, dilation=18)
# image pooling
self.pooling = nn.AdaptiveMaxPool2d((1, 1))
self.conv5 = ConvLayer(in_channels, 256, 1, padding=0)
# extract feature from ASPP output
self.conv6 = ConvLayer(256 * 5, 256, 1, padding=0)
def forward(self, x):
o1 = self.conv1(x)
o2 = self.conv2(x)
o3 = self.conv3(x)
o4 = self.conv4(x)
o5 = self.pooling(x)
o5 = self.conv5(o5)
o5 = F.interpolate(o5, scale_factor=x.shape[-1], mode='bilinear')
o = torch.cat((o1, o2, o3, o4, o5), dim=1)
o = self.conv6(o)
return o
- 图中rate = 空洞卷积的dilation
- Image Pooling由池化层、卷积层、上采样构成