图像语义分割网络FCN(32s、16s、8s)原理及MindSpore实现

一、FCN网络结构

         全卷积网络(Fully Convolutional Networks),是较早用于图像语义分割的神经网络。根据名称可知,FCN主要网络结构全部由卷积层组成,在图像领域,卷积是一种非常好的特征提取方式。本质上,图像分割是一个分类任务,需要做的就是对图像上每一个像素按照人工标注进行分类。

FCN大致网络结构如下:

上图模型结构为针对VOC数据集的21个语义分割,即数据集包含21种不同分割类型。当图像进入神经网络,第一个卷积层将图像由三通道转换为96通道featuremap,第二个卷积层转换为256个通道,第三个卷积层384个通道,直到最后一个卷积层变为21个通道,每个通道对应不同分割类型。实际上,卷积层整个网络结构中卷积层的通道数可以根据不同任务进行调整,前面每经过一层会对图像进行一次宽高减半的下采样,经过5个卷积层以后,featuremap为输入的1/32,最后通过反卷积层将featuremap宽高恢复到输入图像大小。

二、FCN模型结构实现

         FCN模型结构可以根据分割细粒度使用FCN32s、FCN16s、FCN8s等结构,32s即从32倍下采样的特征图恢复至输入大小,16s和8s则是从16倍和8倍下采样恢复至输入大小,当然还可以使用4s、2s结构,数字越小使用的反卷积层进行上采样越多,对应模型结构更加复杂,理论上分割的效果更精细。这里采用深度学习框架MindSpore来搭建模型结构。

FCN32s模型结构示意图:

 模型构建脚本:

class FCN32s(nn.Cell):
    def __init__(self, n_class=21):
        super(FCN32s, self).__init__()
        self.block1 = nn.SequentialCell(
            nn.Conv2d(3, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block2 = nn.SequentialCell(
            nn.Conv2d(64, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block3 = nn.SequentialCell(
            nn.Conv2d(128, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block4 = nn.SequentialCell(
            nn.Conv2d(256, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block5 = nn.SequentialCell(
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block6 = nn.SequentialCell(
            nn.Conv2d(512, 4096, 7),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.block7 = nn.SequentialCell(
            nn.Conv2d(4096, 4096, 1),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.upscore = nn.SequentialCell(
            nn.Conv2d(4096, n_class, 1),
            nn.Conv2dTranspose(n_class, n_class, 4, 2, has_bias=False),
            nn.Conv2dTranspose(n_class, n_class, 32, 16, has_bias=False)
        )

    def construct(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        x = self.block6(x)
        x = self.block7(x)
        x = self.upscore(x)
        return x

FCN16s模型结构示意图:

FCN16s模型脚本:

class FCN16s(nn.Cell):
    def __init__(self, n_class=21):
        super(FCN16s, self).__init__()
        self.block1 = nn.SequentialCell(
            nn.Conv2d(3, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block2 = nn.SequentialCell(
            nn.Conv2d(64, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block3 = nn.SequentialCell(
            nn.Conv2d(128, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block4 = nn.SequentialCell(
            nn.Conv2d(256, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block5 = nn.SequentialCell(
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block6 = nn.SequentialCell(
            nn.Conv2d(512, 4096, 7),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.block7 = nn.SequentialCell(
            nn.Conv2d(4096, 4096, 1),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.upscore_pool5 = nn.SequentialCell(
            nn.Conv2d(4096, n_class, 1),
            nn.Conv2dTranspose(n_class, n_class, 4, 2)
        )
        self.score_pool4 = nn.Conv2dTranspose(512, n_class, 1, has_bias=False)
        self.add = op.Add()
        self.upscore_pool = nn.Conv2dTranspose(n_class, n_class, 32, 16, has_bias=False)

    def construct(self, x):
        x1 = self.block1(x)
        x2 = self.block2(x1)
        x3 = self.block3(x2)
        x4 = self.block4(x3)
        x5 = self.block5(x4)
        x6 = self.block6(x5)
        x7 = self.block7(x6)
        pool5 = self.upscore_pool5(x7)
        pool4 = self.score_pool4(x4)
        pool = self.add(pool4, pool5)
        pool = self.upscore_pool(pool)
        return pool

 FCN8s模型结构示意图:

 FCN8s模型脚本:

class FCN8s(nn.Cell):
    def __init__(self, n_class=21):
        super(FCN8s, self).__init__()
        self.block1 = nn.SequentialCell(
            nn.Conv2d(3, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block2 = nn.SequentialCell(
            nn.Conv2d(64, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block3 = nn.SequentialCell(
            nn.Conv2d(128, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block4 = nn.SequentialCell(
            nn.Conv2d(256, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block5 = nn.SequentialCell(
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, stride=2)
        )
        self.block6 = nn.SequentialCell(
            nn.Conv2d(512, 4096, 7),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.block7 = nn.SequentialCell(
            nn.Conv2d(4096, 4096, 1),
            nn.BatchNorm2d(4096),
            nn.ReLU()
        )
        self.upscore_pool5 = nn.SequentialCell(
            nn.Conv2d(4096, n_class, 1),
            nn.Conv2dTranspose(n_class, n_class, 4, 2, has_bias=False)
        )
        self.score_pool4 = nn.Conv2dTranspose(512, n_class, 1, has_bias=False)
        self.score_pool3 = nn.Conv2dTranspose(256, n_class, 1, has_bias=False)
        self.add = op.Add()
        self.upscore_pool4 = nn.Conv2dTranspose(n_class, n_class, 4, 2, has_bias=False)
        self.upscore_pool = nn.Conv2dTranspose(n_class, n_class, 16, 8, has_bias=False)

    def construct(self, x):
        x1 = self.block1(x)
        x2 = self.block2(x1)
        x3 = self.block3(x2)
        x4 = self.block4(x3)
        x5 = self.block5(x4)
        x6 = self.block6(x5)
        x7 = self.block7(x6)
        pool5 = self.upscore_pool5(x7)
        pool4 = self.score_pool4(x4)
        pool3 = self.score_pool3(x3)
        pool4 = self.add(pool4, pool5)
        pool4 = self.upscore_pool4(pool4)
        pool = self.add(pool3, pool4)
        pool = self.upscore_pool(pool)
        return pool

三、数据集

         模型结构定义好后,我们需要通过对数据集的训练来检验模型性能。这里使用开源的细胞分割数据集:https://www.kaggle.com/code/kerneler/starter-isbi-challenge-dataset-21087002-9/data。数据集包含30张果蝇一龄幼虫腹神经索(VNC)的连续透射电子显微镜图像数据。

首先通过数值替换对分割标签图像进行转换,将白色背景替换为1。

标签图像预处理:

def convert(path, outpath):
    files = os.listdir(path)
    for i in range(len(files)):
        file = files[i]
        img_path = os.path.join(path, file)
        img = cv2.imread(img_path)
        img[img==255] = 1
        out = os.path.join(outpath, file)
        cv2.imwrite(out, img)

定义数据集:

class Cell_seg_dataset:
    def __init__(self, root_path):
        img_path = os.path.join(root_path, 'images')
        label_path = os.path.join(root_path, 'labels')
        self.img_list = []
        self.label_list = []
        img_names = os.listdir(img_path)
        label_names = os.listdir(label_path)
        self.img_index = np.array(range(len(img_names)))
        self.label_index = np.array(range(len(label_names)))
        for i in range(len(img_names)):
            self.img_list.append(os.path.join(img_path, img_names[i]))
            self.label_list.append(os.path.join(label_path, label_names[i]))
            self.img_index[i] = i
            self.label_index[i] = i
        if len(img_names) != len(label_names):
            raise 'images is not equal to labels !'

    def __getitem__(self, index):
        return self.img_index[index], self.label_index[index]

    def __len__(self):
        return len(self.img_list)

数据预处理:

def _preprocess(dataset, images, labels, classes, batch_size, img_channel, img_shape, label_shape):
    img_path = []
    label_path = []
    for i in range(batch_size):
        img_path.append(dataset.img_list[images[i]])
        label_path.append(dataset.label_list[labels[i]])
    one_hot = ops.OneHot()
    transpose = ops.Transpose()
    img_out = np.zeros((batch_size, img_channel, img_shape, img_shape))
    label_out = np.zeros((batch_size, label_shape, label_shape, classes))
    for i in range(len(images)):
        img = cv2.imread(img_path[i])
        img = img / 255.0
        img = Tensor(img, dtype=mindspore.float32)
        img = transpose(img, (2, 0, 1))
        label = cv2.imread(label_path[i])
        label = cv2.cvtColor(label, cv2.COLOR_RGB2GRAY)
        label = one_hot(Tensor(label, dtype=mindspore.int32), classes,
                        Tensor(1, dtype=mindspore.float32),
                        Tensor(0, dtype=mindspore.float32))
        img_out[i] = img.asnumpy()
        label_out[i] = label.asnumpy()
    img_out = Tensor(img_out, dtype=mindspore.float32)
    label_out = Tensor(label_out, dtype=mindspore.float32)
    return img_out, label_out

四、模型训练

    首先需要根据模型输出结果结合标签数据进行损失计算,这里使用的数据集为二分类图像分割数据,通过onehot将标签图像转换为2通道的featuremap,将网络输出结果与标签featuremap进行逐像素计算loss,通过反向传播更新模型。

    优化器:Adam

    损失函数:交叉熵损失

计算loss:

class MyWithLossCell(nn.Cell):
    def __init__(self, backbone, loss_func, batch_size, classes, label_shape):
        super(MyWithLossCell, self).__init__()
        self._backbone = backbone
        self._loss_func = loss_func
        self.transpose = ops.Transpose()
        self.shape = (batch_size * label_shape * label_shape, classes)
        self.reshape = ops.Reshape()
        self.sum = ops.ReduceSum(False)

    def construct(self, inputs, labels):
        logits = self._backbone(inputs)
        logits = self.transpose(logits, (0, 2, 3, 1))
        logits = self.reshape(logits, self.shape)
        labels = self.reshape(labels, self.shape)
        loss = self._loss_func(logits, labels)
        loss = self.sum(loss)
        return loss

定义训练脚本:

def train():
    train_data_path = config.train_data
    dataset = Cell_seg_dataset(train_data_path)
    train_data = ds.GeneratorDataset(dataset, ["data", "label"], shuffle=True)
    train_data = train_data.batch(config.batch_size)

    if config.backbone == 'FCN8s':
        net = FCN8s(config.num_classes)
    elif config.backbone == 'FCN16s':
        net = FCN16s(config.num_classes)
    else:
        net = FCN32s(config.num_classes)

    if config.use_pretrain_ckpt:
        ckpt_file = config.pretrain_ckpt_path
        param_dict = load_checkpoint(ckpt_file)
        load_param_into_net(net, param_dict)

    opt = nn.Adam(params=net.trainable_params(), learning_rate=config.lr, weight_decay=0.9)
    loss_func = nn.SoftmaxCrossEntropyWithLogits()
    loss_net = MyWithLossCell(net, loss_func, config.batch_size, config.num_classes, config.label_shape)
    train_net = nn.TrainOneStepCell(loss_net, opt)
    train_net.set_train()
    for epoch in range(config.epochs):
        train_loss = 0
        step = 0
        for data in train_data.create_dict_iterator():
            images, labels = _preprocess(dataset, data['data'], data['label'], config.num_classes, config.batch_size,
                                         config.input_channel, config.input_shape, config.label_shape)
            loss = train_net(images, labels)
            step += 1
            print(f'step:{step},loss:{loss}')
            train_loss += loss
        iter = epoch + 1
        print(f'epoch:{iter}, train loss:{train_loss}')
        if iter % 10 == 0:
            save_checkpoint(net, f'{iter}.ckpt')

训练过程loss输出:

 

五、推理验证

     训练完成后,通过加载保存的ckpt文件,在测试数据上进行推理验证。

推理脚本:

import mindspore
from mindspore import load_checkpoint, load_param_into_net, Tensor, ops
from src.model import FCN8s
import numpy as np
import cv2
import matplotlib.pyplot as plt


def main(ckptPath, imagePath, classes):
    img = cv2.imread(imagePath)
    img = img / 255.0
    img = Tensor(img, dtype=mindspore.float32)
    transpose = ops.Transpose()
    img = transpose(img, (2, 0, 1))
    expand_dim = ops.ExpandDims()
    img = expand_dim(img, 0)
    net = FCN8s(classes)
    param_dict = load_checkpoint(ckptPath)
    load_param_into_net(net, param_dict)
    net.set_train(False)
    result = net(img)
    result = np.squeeze(result.asnumpy())
    return result


if __name__ == '__main__':
    img_path = '0.jpg'
    ckpt_path = '800.ckpt'
    num_classes = 2
    result = main(ckpt_path, img_path, num_classes)
    print(result.shape) 
    img_rgb = [[0, 0, 0], [255, 255, 255]]
    img = np.ones((512, 512, 3))
    for i in range(512):
        for j in range(512):
            max_value = 0
            max_index = 0
            for k in range(num_classes):
                value = result[k, i, j]
                if value > max_value:
                    max_value = value
                    max_index = k
            img[i][j] = img_rgb[max_index]
    plt.figure('image')
    plt.imshow(img)
    plt.show()

 

 

  • 3
    点赞
  • 54
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论
### 回答1: FCN-8sFCN-16sFCN-32s是基于全卷积神经网络(Fully Convolutional Network,FCN)的语义分割模型。它们分别使用了8倍、16倍、32倍的下采样和上采样,以实现对输入图像的像素级别的分类和分割。其中,FCN-8s是最早提出的模型,FCN-16sFCN-32s则是在其基础上进行了改进和优化。这些模型在图像分割领域取得了很好的效果,被广泛应用于自动驾驶、医学图像分析等领域。 ### 回答2: FCN是全卷积神经网络(Fully Convolutional Networks)的缩写,是在CNN(卷积神经网络)的基础上进行修改和扩展得到的一个特殊网络结构。FCN的主要特点是可以处理图像的变换和尺度变化,能够输出与输入图像大小相同的特征图,是语义分割和目标识别领域常用的方法之一。 FCN-8sFCN-16sFCN-32sFCN的三种不同变种。其中的数字表示网络最后一层的步长(stride)。简单来说,stride指的是卷积核在对图像进行卷积时每次移动的像素数。步长为1时,卷积核每次移动一个像素;步长为2时,每次移动两个像素。 FCN-32s是最简单的FCN结构,它的输出尺寸为输入图像尺寸的1/32,每层卷积后,特征图的尺度会缩小2倍,因此需要先将输入图像缩小32倍,然后送入网络进行训练和测试。FCN-32s的性能较低,适合处理相对较小的图像FCN-16sFCN-8sFCN网络中比较优秀的版本。他们的输出分别为输入图像尺寸的1/16和1/8。FCN-16sFCN-32s的主要区别在于初始化策略不同。在FCN-16s中,使用了另一个FCN-32s模型的参数来进行初始化,同时保留了FCN-32s中的pool5层,这样可以利用FCN-32s中的pool5层提取的高层特征来进行计算,从而提高分割的精度。在FCN-8s中,使用了FCN-16s模型的参数来进行初始化,同时再加入一个新的迭代层来进行计算,提取更多的低层特征,从而进一步提高分割的精度。 总之,FCN-32sFCN-16sFCN-8s是一系列针对不同需求的图像语义分割神经网络。在实际应用中,可以根据具体需求和计算资源等因素选择不同的FCN结构,以获得更好的分割效果。 ### 回答3: FCN(Fully Convolutional Network)是一种基于卷积神经网络语义分割网络模型。FCN架构的出现,使得我们可以用卷积神经网络来解决图像语义分割问题。FCN-8sFCN-16sFCN-32sFCN网络的不同版本,下面我将分别介绍它们的特点和应用。 FCN-8s FCN-8s是第一个被提出并被广泛应用的FCN版本。它的主要特点是将VGG-16网络的最后三层全连接层(FC6,FC7和FC8)替换为卷积层。这个替换过程将输入图像映射到相应的feature map,以此来解决图像中像素级别的物体分类问题。FCN-8s包含了三个分辨率的feature map,分别是14×14,28×28和56×56。这三个特征图分别代表了高层次,中层次和低层次的图像特征。FCN-8s性能达到了目前最好的语义分割模型。 FCN-16s FCN-16sFCN的改进版本。它是在FCN-8s的基础上加入了额外的pooling层,从而使得feature map的分辨率减小了,并提高了模型的速度。FCN-16s包含了两个分辨率的feature map,分别是14×14和28×28。它的主要应用是在对速度要求较高的任务中进行物体的语义分割FCN-32s FCN-32s是最简单的FCN版本。它是将VGG-16网络的所有全连接层都替换为卷积层,并且只有一个feature map,分辨率为32×32。FCN-32s的训练速度和推断速度都很快,并且是一个参数较少的模型。但是,它的性能要略低于FCN-16sFCN-8s。 总之,FCN-8sFCN-16sFCN-32s都是基于卷积神经网络图像语义分割模型,它们分别在速度和准确性方面有所不同,并适用于不同类型的场景。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

TheMatrixs

你的鼓励将是我创作的最大动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值