一、核心思想
1、使用卷积层替换全连接层中的所有线性层
2、 FCN可分为底模/骨架(backbone)与头部(head)两个部分
- 底模用于提取特征,可以使用VGG16、AlexNet、ResNet等提取特征的部分
- 头部用于预测像素点的类别;因底模中存在下采样操作(如池化和步长为2的卷积层),头部需将图像上采样至原本大小
该图省略了卷积层,激活层等。Kx表示输出尺寸(H,W)是输入尺寸(H,W)的K倍。
3、FCNHead
- 头部包括全卷积层、跳跃结构与上采样(至原始尺寸)。
- 通常使用卷积核大小为1、步长为1无填充的卷积层预测,全卷积层的最后一层为该卷积层。
- 跳跃结构:取多个特征提取中不同尺寸的输出分别做预测;将尺寸最小的预测上采样至尺寸第二小的预测的大小,然后二者求和;重复此过程,直至整合成一个预测。
- 上采样(Upsample)可以通过反卷积层、插值(interpolate)等实现。
二、FCN示例——FCN-VGG16
1、底模
底模使用VGG16的特征提取部分,需要缓存后三个池化层的输出。
class VGG16(Base):
def __init__(self):
super(VGG16, self).__init__()
self.conv1 = ConvLayer(3, 64)
self.conv2 = ConvLayer(64, 64)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv3 = ConvLayer(64, 128)
self.conv4 = ConvLayer(128, 128)
self.pool2 = nn.MaxPool2d(2, 2)
self.conv5 = ConvLayer(128, 256)
self.conv6 = ConvLayer(256, 256)
self.conv7 = ConvLayer(256, 256)
self.pool3 = nn.MaxPool2d(2, 2)
self.conv8 = ConvLayer(256, 512)
self.conv9 = ConvLayer(512, 512)
self.conv10 = ConvLayer(512, 512)
self.pool4 = nn.MaxPool2d(2, 2)
self.conv11 = ConvLayer(512, 512)
self.conv12 = ConvLayer(512, 512)
self.conv13 = ConvLayer(512, 512)
self.pool5 = nn.MaxPool2d(2, 2)
def forward(self, x):
os = []
o = self.conv1(x)
o = self.conv2(o)
o = self.pool1(o)
o = self.conv3(o)
o = self.conv4(o)
o = self.pool2(o)
o = self.conv5(o)
o = self.conv6(o)
o = self.conv7(o)
o = self.pool3(o)
os.insert(0, o)
o = self.conv8(o)
o = self.conv9(o)
o = self.conv10(o)
o = self.pool4(o)
os.insert(0, o)
o = self.conv11(o)
o = self.conv12(o)
o = self.conv13(o)
o = self.pool5(o)
os.insert(0, o)
return os
2、头部
对于头部,三个缓存的输出提供了三种头部。上采样操作使用了反卷积层,也可使用双线性插值等。 反卷积层的输出不是与目标大小恰好相等,所以使用CenterCrop进行裁剪。当使用插值时,不用裁剪。
- num_classes:种类数量
- k:需要将整合后的预测翻多少倍,恢复至原始图像的大小
class FCNHeadVGG(FCNHead):
def __init__(self, num_classes, k=32):
super(FCNHeadVGG, self).__init__(num_classes)
assert k == 32 or k == 16 or k == 8, 'k must be 32 or 16 or 8'
self.k = k
# pool5
self.conv1 = ConvLayer(512, 4096, kernel_size=7, padding=3)
self.dropout1 = nn.Dropout2d(0.5)
self.conv2 = ConvLayer(4096, 4096, kernel_size=1, padding=0)
self.dropout2 = nn.Dropout2d(0.5)
self.classifier1 = nn.Conv2d(4096, num_classes, 1, padding=0)
if k == 32:
self.deconv1 = nn.ConvTranspose2d(num_classes, num_classes, 64, 32, bias=False)
else:
self.deconv1 = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, bias=False)
# pool4
self.classifier2 = nn.Conv2d(512, num_classes, 1, padding=0)
if self.k == 16:
self.deconv2 = nn.ConvTranspose2d(num_classes, num_classes, 32, 16, bias=False)
elif k < 16:
self.deconv2 = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, bias=False)
# pool3
if self.k == 8:
self.classifier3 = nn.Conv2d(256, num_classes, 1, padding=0)
self.deconv3 = nn.ConvTranspose2d(num_classes, num_classes, 16, 8, bias=False)
def forward(self, x, original_size=None):
x5 = x[0]
x4 = x[1]
x3 = x[2]
# pool5
o5 = self.conv1(x5)
o5 = self.dropout1(o5)
o5 = self.conv2(o5)
o5 = self.dropout2(o5)
o5 = self.classifier1(o5)
o = self.deconv1(o5)
# pool4
if self.k != 32:
o4 = self.classifier2(x4)
o4 = o4 + CenterCrop(o4.shape[-2:])(o)
o = self.deconv2(o4)
# pool3
if self.k == 8:
o3 = self.classifier3(x3)
o3 = o3 + CenterCrop(o3.shape[-2:])(o)
o = self.deconv3(o3)
o = CenterCrop(original_size)(o) # ensure that output size is equal to original image size
return o
3、FCN
FCN类由底模与头部构成。在forward中,需要缓存原始图像大小。
class FCN(nn.Module):
def __init__(self, base: Base, head: FCNHead):
super(FCN, self).__init__()
# base net used for feature extraction
self.base = base
# fully convolutional layers
self.head = head
def forward(self, x):
original_size = x.shape[-2:]
os = self.base(x)
o = self.head(os, original_size)
return o
4、其他类
一些其他在代码中用到的自定义类
class ConvLayer(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=3, padding=1, bias=True):
super(ConvLayer, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, padding=padding, bias=bias)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU()
def forward(self, x):
o = self.conv(x)
o = self.bn(o)
o = self.relu(o)
return o
class FCNHead(nn.Module):
def __init__(self, num_classes):
super(FCNHead, self).__init__()
self.num_classes = num_classes
class Base(nn.Module):
def __init__(self):
super(Base, self).__init__()
三、友情提示
在未对底模进行预训练的情况下, FCN训练效率可能较低(如下图,模型为FCN-ResNet50)。建议先让底模搭配图像分类任务的头部通过图像分类任务进行预训练,然后将其参数迁移至FCN。
四、文献
[1411.4038] Fully Convolutional Networks for Semantic Segmentation (arxiv.org)