FCN、U-Net语义分割总结(附代码)

FCN和U-Net在2015年先后发表,主要思路都是先编码后解码(encoder-decoder),最后得到和原图大小相同的特征图,然后对特征图每个点与图像的标注mask上的每个像素点求损失。它们的区别主要在于特征融合的方式,FCN特征融合采用特征直接相加,而U-Net特征融合采用的是两个特征在通道维度的堆叠。本文分别采用tensorflow和pytorch复现了FCN和U-Net。
github上发现了一个语义分割代码合集:https://github.com/mrgloom/awesome-semantic-segmentation

一、FCN

FCN主要介绍FCN-8S,FCN论文中共构建了FCN32s、FCN-16s、FCN-8s三种网络结构,其中,FCN32s没有融合浅层特征,直接对深层特征进行上采样;FCN-16s融合了一层的浅层特征;而FCN-8S融合了两层的浅层特征,分割效果最好。
在这里插入图片描述

FCN-8s编码网络采用全卷积版的VGG-16,将最后的VGG原来的全连接层改为了卷积层。这里采用tf1.x框架编写代码,使用tf高阶API tf.layers搭建网络,这儿只贴出网络结构部分。
FCN原caffe项目地址:https://github.com/shelhamer/fcn.berkeleyvision.org

def encode(input):
    # 全卷积版的VGG
    conv1_1 = tf.layers.conv2d(input,  64, 3, padding= 'SAME', activation=tf.nn.relu)
    conv1_2 = tf.layers.conv2d(conv1_1,  64, 3, padding= 'SAME', activation=tf.nn.relu)
    pool1 = tf.layers.max_pooling2d(conv1_2, pool_size=[2, 2], strides=2)
    
    conv2_1 = tf.layers.conv2d(pool1,  128, 3, padding= 'SAME', activation=tf.nn.relu)
    conv2_2 = tf.layers.conv2d(conv2_1,  128, 3, padding= 'SAME', activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(conv2_2, pool_size=[2, 2], strides=2)
    
    conv3_1 = tf.layers.conv2d(pool2,  256, 3, padding= 'SAME', activation=tf.nn.relu)
    conv3_2 = tf.layers.conv2d(conv3_1,  256, 3, padding= 'SAME', activation=tf.nn.relu)
    pool3 = tf.layers.max_pooling2d(conv3_2, pool_size=[2, 2], strides=2)
    
    conv4_1 = tf.layers.conv2d(pool3,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    conv4_2 = tf.layers.conv2d(conv4_1,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    pool4 = tf.layers.max_pooling2d(conv4_2, pool_size=[2, 2], strides=2)
    
    conv5_1 = tf.layers.conv2d(pool4,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    conv5_2 = tf.layers.conv2d(conv5_1,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    pool5 = tf.layers.max_pooling2d(conv5_2, pool_size=[2, 2], strides=2)
    
    fc6 = tf.layers.conv2d(pool5,  4096, 1, padding= 'valid', activation=tf.nn.relu) #卷积核为4和得到的特征图大小相同(图像128*128输入时)
    tf.layers.dropout(fc6, 0.5)
    
    fc7 = tf.layers.conv2d(fc6,  4096, 1, padding= 'valid', activation=tf.nn.relu)
    tf.layers.dropout(fc7, 0.5)
    
    return pool3, pool4, fc7

    
def decode(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes):
    """
    Create the layers for a fully convolutional network.  Build skip-layers using the vgg layers.
    :param vgg_layer7_out: TF Tensor for VGG Layer 3 output
    :param vgg_layer4_out: TF Tensor for VGG Layer 4 output
    :param vgg_layer3_out: TF Tensor for VGG Layer 7 output
    :param num_classes: Number of classes to classify
    :return: The Tensor for the last layer of output
    """
    # 感觉这个卷积也可以不用
    layer7_conv = tf.layers.conv2d(vgg_layer7_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer7_conv = vgg_layer7_out
    layer7_trans = tf.layers.conv2d_transpose(layer7_conv, num_classes, 4, 2, 
                                             padding= 'SAME', 
                                             kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    # 这儿的卷积是为了与后面的反卷集特征融合,特征要保持相同的维度,要把VGG得到的特征维度变为num_classes
    layer4_conv = tf.layers.conv2d(vgg_layer4_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer4_out = tf.add(layer7_trans, layer4_conv)
    layer4_trans = tf.layers.conv2d_transpose(layer4_out, num_classes, 4, 2, 
                                             padding= 'SAME', 
                                             kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer3_conv = tf.layers.conv2d(vgg_layer3_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer3_out = tf.add(layer3_conv, layer4_trans)
    
    last_layer = tf.layers.conv2d_transpose(layer3_out, num_classes, 16, 8, 
                                               padding= 'SAME', 
                                               kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3), name = "last_layer")
    return last_layer    

FCN参考:
https://blog.csdn.net/m0_37862527/article/details/79843963
https://blog.csdn.net/weixin_40519315/article/details/104412740
https://zhuanlan.zhihu.com/p/62995971?utm_source=wechat_session
https://blog.csdn.net/qq_36269513/article/details/80420363
https://blog.csdn.net/u013303599/article/details/79231503
github:
https://github.com/shelhamer/fcn.berkeleyvision.org
https://github.com/pierluigiferrari/fcn8s_tensorflow
https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn8_vgg.py
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/fcn8s.py

二、U-Net

U-Net特征融合采用拼接的形式,网络结构看起来是一个对称的U型结构,但是注意这个网络的输入输出大小是不同的,输入是572×572,而输出是388×388,原因是在卷积时没有用padding,每次卷积后特征图尺寸都会变小。一般的分割网络需要输入的大小和输出大小相同,这样才能判断输入图像上每个像素点的类别。对于输入和输出大小不同这个问题,U-Nnet办法是,在输入的时候对待分割的区域进行扩大,用一张更大的图片来预测中央小区域的分割结果,相当于在输入的时候结合了目标区域的上下文信息,缺失区域采用重叠-切片(overlap-tile)方法补充。

在这里插入图片描述
下图为重叠-切片策略示意图。预测黄色方框中的分割需要蓝色方框中的图像数据作为输入,缺失的数据由镜像推断。下图白色方框内是原始输入图像,边缘区域是由原图镜像产生的,然后根据蓝色方框的图像作为输入,预测出来黄色方框的分割图(一个区域一个区域地分割是因为一张图片过大显存装不下)。
在这里插入图片描述

U-Net结构比较对称,左边网络卷积加池化进行下采样,右边网络卷积加反卷积进行上采样,最终将特征恢复到原图大小,U-Net这张网络结构图表达地十分清晰,可以对着这张网络结构图写代码构建网络。采用的pytorch复现的网络,复现的代码有两个地方和原论文不同:
(1)这儿代码中,卷积使用了padding,而U-Net原论文卷积没有使用padding,因此每次卷积后长宽像素点都减少2。若要完全按照U-Net的网络来构建,在做特征堆叠的时候,要先对左边卷积后的特征尺寸进行缩放,到尺寸和右边相同,然后再做拼接,pytorch中特征缩放可以用torch.nn.functional.upsample()进行插值。有人说连续对feature map加padding卷积,会使得padding进来的feature误差越来越大,因为越卷积,feature的抽象程度越高,就更容易受到padding的影响。
(2)原文中没有用batch normalization,这儿的代码加上了。

class convBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(convBlock, self).__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels), # 原论文没有说用batch normalization
            nn.ReLU(),

            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            
#             nn.MaxPool2d(kernel_size=2, stride=2)
        )
    def forward(self, x):
        x = self.cnn(x)
        return x

class upSampling(nn.Module):
    def __init__(self, in_channels, middle_channels, out_channels):
        super(upSampling, self).__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(middle_channels),
            nn.ReLU(),

            nn.Conv2d(middle_channels, middle_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(middle_channels),
            nn.ReLU(),
            
            # stride为步长,即为扩大倍数
            nn.ConvTranspose2d(middle_channels, out_channels, kernel_size=2, stride=2) 
        )
    def forward(self, x):
        x = self.cnn(x)
        return x
    

class uNet(nn.Module):
    def __init__(self, num_classes):
        super(uNet, self).__init__()
        self.enCode1 = convBlock(in_channels=3, out_channels=64)
        self.enCode2 = convBlock(in_channels=64, out_channels=128)
        self.enCode3 = convBlock(in_channels=128, out_channels=256)
        self.enCode4 = convBlock(in_channels=256, out_channels=512)
        self.Maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.deCode1 = upSampling(in_channels=512, middle_channels=1024, out_channels=512)
        self.deCode2 = upSampling(in_channels=1024, middle_channels=512, out_channels=256)
        self.deCode3 = upSampling(in_channels=512, middle_channels=256, out_channels=128)
        self.deCode4 = upSampling(in_channels=256, middle_channels=128, out_channels=64)
        
        self.lastLayer = nn.Sequential(
            nn.Conv2d(128, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),

            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            
            nn.Conv2d(64, num_classes, kernel_size=1) # 输出维度为标注的种类
        )
        
    def forward(self, x):
        enc1 = self.enCode1(x)
        enc1_pool = self.Maxpool(enc1)
        enc2 = self.enCode2(enc1_pool)
        enc2_pool = self.Maxpool(enc2)
        enc3 = self.enCode3(enc2_pool)
        enc3_pool = self.Maxpool(enc3)
        enc4 = self.enCode4(enc3_pool)
        enc4_pool = self.Maxpool(enc4)
        
        dec1 = self.deCode1(enc4_pool)
        dec2 = self.deCode2(torch.cat((dec1, enc4), dim=1))
        dec3 = self.deCode3(torch.cat((dec2, enc3), dim=1))
        dec4 = self.deCode4(torch.cat((dec3, enc2), dim=1))
        
        out = self.lastLayer(torch.cat((dec4, enc1), dim=1))
        return out

FCN参考:
https://zhuanlan.zhihu.com/p/31428783
https://zhuanlan.zhihu.com/p/118540575
https://zhuanlan.zhihu.com/p/87593567
https://blog.csdn.net/l2181265/article/details/87735610
https://www.yuque.com/yahei/hey-yahei/segmentation
github:
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/u_net.py
https://github.com/LeeJunHyun/Image_Segmentation

  • 4
    点赞
  • 39
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值