【飞桨】图像分割:一文带你从FCN到UNet

本文深入介绍了图像分割技术,包括FCN、VGG和UNet神经网络的基本原理及其相互关系。重点解析了UNet网络结构如何通过结合FCN思想提高分割精度,并提供了基于Paddle框架的UNet模型实现代码。
摘要由CSDN通过智能技术生成

飞桨官方图像分割课程

https://aistudio.baidu.com/aistudio/course/introduce/1767

图像分割

图像分割就是把图像分成若干个特定的、具有独特性质的区域并提出感兴趣目标的技术和过程。它是由图像处理到图像分析的关键步骤。现有的图像分割方法主要分以下几类:基于阈值的分割方法、基于区域的分割方法、基于边缘的分割方法以及基于特定理论的分割方法等。从数学角度来看,图像分割是将数字图像划分成互不相交的区域的过程。图像分割的过程也是一个标记过程,即把属于同一区域的像素赋予相同的编号。
在这里插入图片描述

全连接神经网络

什么是FCN

FCN=Fully Convolutional Networks,简而言之即为全卷积,无FC
语义分割其根本原理即为像素级分类

与VGG的关系

经典的VGG16卷积神经网络经过2-2-3-3-3层Conv后在图像识别方向有很好的表现,
在这里插入图片描述
而FCN神经网络在传统的vgg神经网络基础上利用一维卷积神经网络将FC层替换,使OUT与IPT的维度相同,以实现放大或分类feature map的功能。由于一维卷积神经网络可以实现维度的改变,所以FCN神经网络可以实现input.shape==output.shape

输入原图,经过VGG16网络,得到特征map,然后将特征map上采样回去。再将预测结果和ground truth每个像素一一对应分类,做像素级别分类。也就是说将分割问题变成分类问题,而分类问题正好是深度学习的强项。如果只将特征map直接上采样或者反卷积,明显会丢失很多信息。FCN采取解决方法是将pool4、pool3、和特征map融合起来,由于pool3、pool4、特征map大小尺寸是不一样的,所以融合应该前上采样到同一尺寸。这里的融合是拼接在一起,不是对应元素相加。
在这里插入图片描述

UNet神经网络

与FCN的关系

UNet神经网络同样利用了FCN的思想,对input进行下采样后进行上采样,即UNet也是全卷积神经网络,但是针对FCN分割不够精细的缺点,UNet有其优秀的网络结构来提升识别精度,

在这里插入图片描述
由图可见,UNet网络结构像一个“U”,所以叫UNet,直观的可以看出它分为两部分,即特征提取(类似VGG)和上采样
在这里插入图片描述
每上采样一次,就和特征提取部分对应的通道数相同尺度融合,相对于FCN神经网络它分割的效果是非常好的,并且适用于大尺寸图像处理,在医疗影像方面有很好的表现

Paddle实现UNet Model

import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph import to_variable
from paddle.fluid.dygraph import Layer
from paddle.fluid.dygraph import Conv2D
from paddle.fluid.dygraph import BatchNorm
from paddle.fluid.dygraph import Pool2D
from paddle.fluid.dygraph import Conv2DTranspose


class Encoder(Layer):
    def __init__(self, num_channels, num_filters):
        super(Encoder, self).__init__()
        # TODO: encoder contains:
        #       1 3x3conv + 1bn + relu +
        #       1 3x3conc + 1bn + relu +
        #       1 2x2 pool
        # return features before and after pool
        self.conv1 = Conv2D(num_channels,
                            num_filters,
                            filter_size=3,
                            stride=1,
                            padding=1)
        self.bn1 = BatchNorm(num_filters, act='relu')

        self.conv2 = Conv2D(num_filters,
                            num_filters,
                            filter_size=3,
                            stride=1,
                            padding=1)
        self.bn2 = BatchNorm(num_filters, act='relu')

        self.pool = Pool2D(pool_size=2, pool_stride=2, pool_type='max', ceil_mode=True)

    def forward(self, inputs):
        # TODO: finish inference part
        x = self.conv1(inputs)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x_pooled = self.pool(x)

        return x, x_pooled


class Decoder(Layer):
    def __init__(self, num_channels, num_filters):
        super(Decoder, self).__init__()
        # TODO: decoder contains:
        #       1 2x2 transpose conv (makes feature map 2x larger)
        #       1 3x3 conv + 1bn + 1relu +
        #       1 3x3 conv + 1bn + 1relu

        self.up = Conv2DTranspose(num_channels=num_channels,
                                  num_filters=num_filters,
                                  filter_size=2,
                                  stride=2)

        self.conv1 = Conv2D(num_channels,
                            num_filters,
                            filter_size=3,
                            stride=1,
                            padding=1)
        self.bn1 = BatchNorm(num_filters, act='relu')

        self.conv2 = Conv2D(num_filters,
                            num_filters,
                            filter_size=3,
                            stride=1,
                            padding=1)
        self.bn2 = BatchNorm(num_filters, act='relu')

    def forward(self, inputs_prev, inputs):
        # TODO: forward contains an Pad2d and Concat
        x = self.up(inputs)
        h_diff = (inputs_prev.shape[2] - x.shape[2])
        w_diff = (inputs_prev.shape[3] - x.shape[3])
        x = fluid.layers.pad2d(x, paddings=[h_diff // 2, h_diff - h_diff // 2, w_diff // 2, w_diff - w_diff // 2])
        x = fluid.layers.concat([inputs_prev, x], axis=1)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.bn2(x)

        # Pad

        return x


class UNet(Layer):
    def __init__(self, num_classes=59):
        super(UNet, self).__init__()
        # encoder: 3->64->128->256->512
        # mid: 512->1024->1024

        # TODO: 4 encoders, 4 decoders, and mid layers contains 2 1x1conv+bn+relu
        self.down1 = Encoder(num_channels=3, num_filters=64)
        self.down2 = Encoder(num_channels=64, num_filters=128)
        self.down3 = Encoder(num_channels=128, num_filters=256)
        self.down4 = Encoder(num_channels=256, num_filters=512)

        self.mid_conv1 = Conv2D(512, 1024, filter_size=1, padding=0, stride=1)
        self.mid_bn1 = BatchNorm(1024, act='relu')
        self.mid_conv2 = Conv2D(1024, 1024, filter_size=1, padding=0, stride=1)
        self.mid_bn2 = BatchNorm(1024, act='relu')

        self.up4 = Decoder(1024, 512)
        self.up3 = Decoder(512, 256)
        self.up2 = Decoder(256, 128)
        self.up1 = Decoder(128, 64)

        self.last_conv = Conv2D(num_channels=64, num_filters=num_classes, filter_size=1)

    def forward(self, inputs):
        x1, x = self.down1(inputs)
        print(x1.shape, x.shape)
        x2, x = self.down2(x)
        print(x2.shape, x.shape)
        x3, x = self.down3(x)
        print(x3.shape, x.shape)
        x4, x = self.down4(x)
        print(x4.shape, x.shape)

        # middle layers
        x = self.mid_conv1(x)
        x = self.mid_bn1(x)
        x = self.mid_conv2(x)
        x = self.mid_bn2(x)

        print(x4.shape, x.shape)
        x = self.up4(x4, x)
        print(x3.shape, x.shape)
        x = self.up3(x3, x)
        print(x2.shape, x.shape)
        x = self.up2(x2, x)
        print(x1.shape, x.shape)
        x = self.up1(x1, x)
        print(x.shape)

        x = self.last_conv(x)

        return x


def main():
    with fluid.dygraph.guard(fluid.CUDAPlace(0)):
        model = UNet(num_classes=59)
        x_data = np.random.rand(1, 3, 123, 123).astype(np.float32)
        inputs = to_variable(x_data)
        pred = model(inputs)

        print(pred.shape)


if __name__ == "__main__":
    main()

Refers

飞桨官方课程
网友整理

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值