FCN、U-Net语义分割总结（附代码）

最新推荐文章于 2024-04-25 18:44:40 发布

炼丹术师

最新推荐文章于 2024-04-25 18:44:40 发布

阅读量2.6k

点赞数 4

文章标签：计算机视觉卷积神经网络 tensorflow 深度学习

本文链接：https://blog.csdn.net/Never__Say__No/article/details/109090851

版权

FCN和U-Net在2015年先后发表，主要思路都是先编码后解码（encoder-decoder)，最后得到和原图大小相同的特征图，然后对特征图每个点与图像的标注mask上的每个像素点求损失。它们的区别主要在于特征融合的方式，FCN特征融合采用特征直接相加，而U-Net特征融合采用的是两个特征在通道维度的堆叠。本文分别采用tensorflow和pytorch复现了FCN和U-Net。
github上发现了一个语义分割代码合集：https://github.com/mrgloom/awesome-semantic-segmentation

一、FCN

FCN主要介绍FCN-8S，FCN论文中共构建了FCN32s、FCN-16s、FCN-8s三种网络结构，其中，FCN32s没有融合浅层特征，直接对深层特征进行上采样；FCN-16s融合了一层的浅层特征；而FCN-8S融合了两层的浅层特征，分割效果最好。
在这里插入图片描述

FCN-8s编码网络采用全卷积版的VGG-16，将最后的VGG原来的全连接层改为了卷积层。这里采用tf1.x框架编写代码，使用tf高阶API tf.layers搭建网络，这儿只贴出网络结构部分。
FCN原caffe项目地址：https://github.com/shelhamer/fcn.berkeleyvision.org

def encode(input):
    # 全卷积版的VGG
    conv1_1 = tf.layers.conv2d(input,  64, 3, padding= 'SAME', activation=tf.nn.relu)
    conv1_2 = tf.layers.conv2d(conv1_1,  64, 3, padding= 'SAME', activation=tf.nn.relu)
    pool1 = tf.layers.max_pooling2d(conv1_2, pool_size=[2, 2], strides=2)
    
    conv2_1 = tf.layers.conv2d(pool1,  128, 3, padding= 'SAME', activation=tf.nn.relu)
    conv2_2 = tf.layers.conv2d(conv2_1,  128, 3, padding= 'SAME', activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(conv2_2, pool_size=[2, 2], strides=2)
    
    conv3_1 = tf.layers.conv2d(pool2,  256, 3, padding= 'SAME', activation=tf.nn.relu)
    conv3_2 = tf.layers.conv2d(conv3_1,  256, 3, padding= 'SAME', activation=tf.nn.relu)
    pool3 = tf.layers.max_pooling2d(conv3_2, pool_size=[2, 2], strides=2)
    
    conv4_1 = tf.layers.conv2d(pool3,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    conv4_2 = tf.layers.conv2d(conv4_1,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    pool4 = tf.layers.max_pooling2d(conv4_2, pool_size=[2, 2], strides=2)
    
    conv5_1 = tf.layers.conv2d(pool4,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    conv5_2 = tf.layers.conv2d(conv5_1,  512, 3, padding= 'SAME', activation=tf.nn.relu)
    pool5 = tf.layers.max_pooling2d(conv5_2, pool_size=[2, 2], strides=2)
    
    fc6 = tf.layers.conv2d(pool5,  4096, 1, padding= 'valid', activation=tf.nn.relu) #卷积核为4和得到的特征图大小相同（图像128*128输入时）
    tf.layers.dropout(fc6, 0.5)
    
    fc7 = tf.layers.conv2d(fc6,  4096, 1, padding= 'valid', activation=tf.nn.relu)
    tf.layers.dropout(fc7, 0.5)
    
    return pool3, pool4, fc7

    
def decode(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes):
    """
    Create the layers for a fully convolutional network.  Build skip-layers using the vgg layers.
    :param vgg_layer7_out: TF Tensor for VGG Layer 3 output
    :param vgg_layer4_out: TF Tensor for VGG Layer 4 output
    :param vgg_layer3_out: TF Tensor for VGG Layer 7 output
    :param num_classes: Number of classes to classify
    :return: The Tensor for the last layer of output
    """
    # 感觉这个卷积也可以不用
    layer7_conv = tf.layers.conv2d(vgg_layer7_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer7_conv = vgg_layer7_out
    layer7_trans = tf.layers.conv2d_transpose(layer7_conv, num_classes, 4, 2, 
                                             padding= 'SAME', 
                                             kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    # 这儿的卷积是为了与后面的反卷集特征融合，特征要保持相同的维度，要把VGG得到的特征维度变为num_classes
    layer4_conv = tf.layers.conv2d(vgg_layer4_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer4_out = tf.add(layer7_trans, layer4_conv)
    layer4_trans = tf.layers.conv2d_transpose(layer4_out, num_classes, 4, 2, 
                                             padding= 'SAME', 
                                             kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer3_conv = tf.layers.conv2d(vgg_layer3_out, num_classes, 1, 
                                   padding= 'SAME', 
                                   kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3))
    layer3_out = tf.add(layer3_conv, layer4_trans)
    
    last_layer = tf.layers.conv2d_transpose(layer3_out, num_classes, 16, 8, 
                                               padding= 'SAME', 
                                               kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3), name = "last_layer")
    return last_layer

FCN参考：
https://blog.csdn.net/m0_37862527/article/details/79843963
https://blog.csdn.net/weixin_40519315/article/details/104412740
https://zhuanlan.zhihu.com/p/62995971?utm_source=wechat_session
https://blog.csdn.net/qq_36269513/article/details/80420363
https://blog.csdn.net/u013303599/article/details/79231503
github:
https://github.com/shelhamer/fcn.berkeleyvision.org
https://github.com/pierluigiferrari/fcn8s_tensorflow
https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn8_vgg.py
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/fcn8s.py

二、U-Net

U-Net特征融合采用拼接的形式，网络结构看起来是一个对称的U型结构，但是注意这个网络的输入输出大小是不同的，输入是572×572，而输出是388×388，原因是在卷积时没有用padding，每次卷积后特征图尺寸都会变小。一般的分割网络需要输入的大小和输出大小相同，这样才能判断输入图像上每个像素点的类别。对于输入和输出大小不同这个问题，U-Nnet办法是，在输入的时候对待分割的区域进行扩大，用一张更大的图片来预测中央小区域的分割结果，相当于在输入的时候结合了目标区域的上下文信息，缺失区域采用重叠-切片（overlap-tile）方法补充。

在这里插入图片描述
下图为重叠-切片策略示意图。预测黄色方框中的分割需要蓝色方框中的图像数据作为输入，缺失的数据由镜像推断。下图白色方框内是原始输入图像，边缘区域是由原图镜像产生的，然后根据蓝色方框的图像作为输入，预测出来黄色方框的分割图（一个区域一个区域地分割是因为一张图片过大显存装不下）。
在这里插入图片描述

U-Net结构比较对称，左边网络卷积加池化进行下采样，右边网络卷积加反卷积进行上采样，最终将特征恢复到原图大小，U-Net这张网络结构图表达地十分清晰，可以对着这张网络结构图写代码构建网络。采用的pytorch复现的网络，复现的代码有两个地方和原论文不同：
（1）这儿代码中，卷积使用了padding，而U-Net原论文卷积没有使用padding，因此每次卷积后长宽像素点都减少2。若要完全按照U-Net的网络来构建，在做特征堆叠的时候，要先对左边卷积后的特征尺寸进行缩放，到尺寸和右边相同，然后再做拼接，pytorch中特征缩放可以用torch.nn.functional.upsample()进行插值。有人说连续对feature map加padding卷积，会使得padding进来的feature误差越来越大，因为越卷积，feature的抽象程度越高，就更容易受到padding的影响。
（2）原文中没有用batch normalization，这儿的代码加上了。

class convBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(convBlock, self).__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels), # 原论文没有说用batch normalization
            nn.ReLU(),

            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            
#             nn.MaxPool2d(kernel_size=2, stride=2)
        )
    def forward(self, x):
        x = self.cnn(x)
        return x

class upSampling(nn.Module):
    def __init__(self, in_channels, middle_channels, out_channels):
        super(upSampling, self).__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(middle_channels),
            nn.ReLU(),

            nn.Conv2d(middle_channels, middle_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(middle_channels),
            nn.ReLU(),
            
            # stride为步长，即为扩大倍数
            nn.ConvTranspose2d(middle_channels, out_channels, kernel_size=2, stride=2) 
        )
    def forward(self, x):
        x = self.cnn(x)
        return x
    

class uNet(nn.Module):
    def __init__(self, num_classes):
        super(uNet, self).__init__()
        self.enCode1 = convBlock(in_channels=3, out_channels=64)
        self.enCode2 = convBlock(in_channels=64, out_channels=128)
        self.enCode3 = convBlock(in_channels=128, out_channels=256)
        self.enCode4 = convBlock(in_channels=256, out_channels=512)
        self.Maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.deCode1 = upSampling(in_channels=512, middle_channels=1024, out_channels=512)
        self.deCode2 = upSampling(in_channels=1024, middle_channels=512, out_channels=256)
        self.deCode3 = upSampling(in_channels=512, middle_channels=256, out_channels=128)
        self.deCode4 = upSampling(in_channels=256, middle_channels=128, out_channels=64)
        
        self.lastLayer = nn.Sequential(
            nn.Conv2d(128, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),

            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            
            nn.Conv2d(64, num_classes, kernel_size=1) # 输出维度为标注的种类
        )
        
    def forward(self, x):
        enc1 = self.enCode1(x)
        enc1_pool = self.Maxpool(enc1)
        enc2 = self.enCode2(enc1_pool)
        enc2_pool = self.Maxpool(enc2)
        enc3 = self.enCode3(enc2_pool)
        enc3_pool = self.Maxpool(enc3)
        enc4 = self.enCode4(enc3_pool)
        enc4_pool = self.Maxpool(enc4)
        
        dec1 = self.deCode1(enc4_pool)
        dec2 = self.deCode2(torch.cat((dec1, enc4), dim=1))
        dec3 = self.deCode3(torch.cat((dec2, enc3), dim=1))
        dec4 = self.deCode4(torch.cat((dec3, enc2), dim=1))
        
        out = self.lastLayer(torch.cat((dec4, enc1), dim=1))
        return out

FCN参考：
https://zhuanlan.zhihu.com/p/31428783
https://zhuanlan.zhihu.com/p/118540575
https://zhuanlan.zhihu.com/p/87593567
https://blog.csdn.net/l2181265/article/details/87735610
https://www.yuque.com/yahei/hey-yahei/segmentation
github:
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/u_net.py
https://github.com/LeeJunHyun/Image_Segmentation

炼丹术师

关注

4
点赞
踩
39

收藏

觉得还不错? 一键收藏
0
评论
FCN、U-Net语义分割总结（附代码）

FCN和U-Net在2015年先后发表，主要思路都是先编码后解码（encoder-decoder)，最后得到和原图大小相同的特征图，然后对特征图每个点与图像的标注mask上的每个像素点求损失。它们的区别主要在于特征融合的方式，FCN特征融合采用特征直接相加，而U-Net特征融合采用的是两个特征在通道维度的堆叠。一、FCNFCN主要介绍FCN-8S，FCN论文中共构建了FCN32s、FCN-16s、FCN-8s三种网络结构，其中，FCN32s没有融合浅层特征，直接对深层特征进行上采样；FCN-16s融合了
复制链接

扫一扫