昇思25天打卡营-mindspore-ML- Day9-FCN图像语义分割-CSDN博客

本文链接：https://blog.csdn.net/littlesujin/article/details/140027396

结束了入门教程。开始一些实例应用教程。

今天在mindspore平台上学习FCN图像语义分割。

先记录一下一些基本概念：（方便别人看，以及以后自己看）

什么是图像语义分割？

图像语义分割是 将图像中的每个像素点分配到一个语义类别。与传统的图像分类任务不同，语义分割需要对图像中的每个像素进行精细的分类，而不是只识别图像中包含的物体类别。

语义分割的特点

像素级预测: 语义分割任务需要输出与输入图像大小相同的分割图，其中每个像素都对应一个类别标签。
类别层次结构: 语义分割通常涉及多个类别，例如天空、树木、道路、汽车、行人等，这些类别可能具有层次结构。
局部和全局信息: 语义分割需要同时利用图像的局部特征和全局信息，例如物体的形状、纹理和上下文关系。

语义分割的应用

自动驾驶: 用于识别道路、车辆、行人等，实现安全驾驶。
机器人导航: 帮助机器人理解周围环境，进行路径规划。
医学影像分析: 用于识别器官、病变等，辅助医生进行诊断。
卫星图像分析: 用于识别土地类型、建筑物、道路等，进行资源管理和城市规划。

常用的语义分割模型

FCN (Fully Convolutional Network)
U-Net
DeepLab
Mask R-CNN

FCN (Fully Convolutional Network)用于语义分割的基本原理

FCN 模型就像一位超级精细的画家，它的目标是将一张彩色照片“上色”，让每个像素点都对应一个颜色，每种颜色代表一个类别（比如人、车、树）。就是说，FCN 模型就像一位超级精细的画家，它通过提取特征、缩小图像、恢复图像和跳跃连接等步骤，将一张彩色照片“上色”，让每个像素点都对应一个类别，从而实现图像的语义分割。

FCN 的工作流程:

提取特征: FCN 首先通过一系列卷积和池化操作，提取图像中的特征，就像画家观察照片，找出不同的形状和纹理。
缩小图像: 在提取特征的过程中，图像尺寸会逐渐缩小，就像画家将照片缩小，以便更好地观察细节。
恢复图像: 为了得到与原图大小相同的分割图，FCN 需要进行上采样操作，将图像尺寸恢复到原始大小，就像画家将缩小的照片放大，并根据观察到的特征进行上色。
跳跃连接: 为了保留图像的细节信息，FCN 会将浅层特征（细节丰富）与深层特征（全局信息）结合起来，就像画家在放大照片后，既参考整体轮廓，又参考细节纹理，才能画得更加精细。
像素级预测: FCN 能够对图像中的每个像素进行精细的分类，得到更精确的分割结果。
端到端学习: FCN 是一个端到端的学习模型，可以直接从原始图像学习到分割任务，无需手动设计特征。
可扩展性: FCN 可以轻松扩展到不同的分割任务，例如人体姿态估计、视频分割等。

代码部分：

import numpy as np
import mindspore as ms
import mindspore.nn as nn
import mindspore.train as train

class PixelAccuracy(train.Metric):
    def __init__(self, num_class=21):
        super(PixelAccuracy, self).__init__()
        self.num_class = num_class

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class**2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def clear(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def update(self, *inputs):
        y_pred = inputs[0].asnumpy().argmax(axis=1)
        y = inputs[1].asnumpy().reshape(4, 512, 512)
        self.confusion_matrix += self._generate_matrix(y, y_pred)

    def eval(self):
        pixel_accuracy = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
        return pixel_accuracy


class PixelAccuracyClass(train.Metric):
    def __init__(self, num_class=21):
        super(PixelAccuracyClass, self).__init__()
        self.num_class = num_class

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class**2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def update(self, *inputs):
        y_pred = inputs[0].asnumpy().argmax(axis=1)
        y = inputs[1].asnumpy().reshape(4, 512, 512)
        self.confusion_matrix += self._generate_matrix(y, y_pred)

    def clear(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def eval(self):
        mean_pixel_accuracy = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
        mean_pixel_accuracy = np.nanmean(mean_pixel_accuracy)
        return mean_pixel_accuracy


class MeanIntersectionOverUnion(train.Metric):
    def __init__(self, num_class=21):
        super(MeanIntersectionOverUnion, self).__init__()
        self.num_class = num_class

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class**2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def update(self, *inputs):
        y_pred = inputs[0].asnumpy().argmax(axis=1)
        y = inputs[1].asnumpy().reshape(4, 512, 512)
        self.confusion_matrix += self._generate_matrix(y, y_pred)

    def clear(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def eval(self):
        mean_iou = np.diag(self.confusion_matrix) / (
            np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
            np.diag(self.confusion_matrix))
        mean_iou = np.nanmean(mean_iou)
        return mean_iou


class FrequencyWeightedIntersectionOverUnion(train.Metric):
    def __init__(self, num_class=21):
        super(FrequencyWeightedIntersectionOverUnion, self).__init__()
        self.num_class = num_class

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class**2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def update(self, *inputs):
        y_pred = inputs[0].asnumpy().argmax(axis=1)
        y = inputs[1].asnumpy().reshape(4, 512, 512)
        self.confusion_matrix += self._generate_matrix(y, y_pred)

    def clear(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def eval(self):
        freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)
        iu = np.diag(self.confusion_matrix) / (
            np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
            np.diag(self.confusion_matrix))

        frequency_weighted_iou = (freq[freq > 0] * iu[freq > 0]).sum()
        return frequency_weighted_iou

import mindspore
from mindspore import Tensor
import mindspore.nn as nn
from mindspore.train import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor, Model

device_target = "Ascend"
mindspore.set_context(mode=mindspore.PYNATIVE_MODE, device_target=device_target)

train_batch_size = 4
num_classes = 21
# 初始化模型结构
net = FCN8s(n_class=21)
# 导入vgg16预训练参数
load_vgg16()
# 计算学习率
min_lr = 0.0005
base_lr = 0.05
train_epochs = 1
iters_per_epoch = dataset.get_dataset_size()
total_step = iters_per_epoch * train_epochs

lr_scheduler = mindspore.nn.cosine_decay_lr(min_lr,
                                            base_lr,
                                            total_step,
                                            iters_per_epoch,
                                            decay_epoch=2)
lr = Tensor(lr_scheduler[-1])

# 定义损失函数
loss = nn.CrossEntropyLoss(ignore_index=255)
# 定义优化器
optimizer = nn.Momentum(params=net.trainable_params(), learning_rate=lr, momentum=0.9, weight_decay=0.0001)
# 定义loss_scale
scale_factor = 4
scale_window = 3000
loss_scale_manager = ms.amp.DynamicLossScaleManager(scale_factor, scale_window)
# 初始化模型
if device_target == "Ascend":
    model = Model(net, loss_fn=loss, optimizer=optimizer, loss_scale_manager=loss_scale_manager, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
else:
    model = Model(net, loss_fn=loss, optimizer=optimizer, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})

# 设置ckpt文件保存的参数
time_callback = TimeMonitor(data_size=iters_per_epoch)
loss_callback = LossMonitor()
callbacks = [time_callback, loss_callback]
save_steps = 330
keep_checkpoint_max = 5
config_ckpt = CheckpointConfig(save_checkpoint_steps=10,
                               keep_checkpoint_max=keep_checkpoint_max)
ckpt_callback = ModelCheckpoint(prefix="FCN8s",
                                directory="./ckpt",
                                config=config_ckpt)
callbacks.append(ckpt_callback)
model.train(train_epochs, dataset, callbacks=callbacks)

模型评估：

IMAGE_MEAN = [103.53, 116.28, 123.675]
IMAGE_STD = [57.375, 57.120, 58.395]
DATA_FILE = "dataset/dataset_fcn8s/mindname.mindrecord"

# 下载已训练好的权重文件
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/FCN8s.ckpt"
download(url, "FCN8s.ckpt", replace=True)
net = FCN8s(n_class=num_classes)

ckpt_file = "FCN8s.ckpt"
param_dict = load_checkpoint(ckpt_file)
load_param_into_net(net, param_dict)

if device_target == "Ascend":
    model = Model(net, loss_fn=loss, optimizer=optimizer, loss_scale_manager=loss_scale_manager, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})
else:
    model = Model(net, loss_fn=loss, optimizer=optimizer, metrics={"pixel accuracy": PixelAccuracy(), "mean pixel accuracy": PixelAccuracyClass(), "mean IoU": MeanIntersectionOverUnion(), "frequency weighted IoU": FrequencyWeightedIntersectionOverUnion()})

# 实例化Dataset
dataset = SegDataset(image_mean=IMAGE_MEAN,
                     image_std=IMAGE_STD,
                     data_file=DATA_FILE,
                     batch_size=train_batch_size,
                     crop_size=crop_size,
                     max_scale=max_scale,
                     min_scale=min_scale,
                     ignore_label=ignore_label,
                     num_classes=num_classes,
                     num_readers=2,
                     num_parallel_calls=4)
dataset_eval = dataset.get_dataset()
model.eval(dataset_eval)

使用训练的网络对模型推理结果进行展示。

import cv2
import matplotlib.pyplot as plt

net = FCN8s(n_class=num_classes)
# 设置超参
ckpt_file = "FCN8s.ckpt"
param_dict = load_checkpoint(ckpt_file)
load_param_into_net(net, param_dict)
eval_batch_size = 4
img_lst = []
mask_lst = []
res_lst = []
# 推理效果展示(上方为输入图片，下方为推理效果图片)
plt.figure(figsize=(8, 5))
show_data = next(dataset_eval.create_dict_iterator())
show_images = show_data["data"].asnumpy()
mask_images = show_data["label"].reshape([4, 512, 512])
show_images = np.clip(show_images, 0, 1)
for i in range(eval_batch_size):
    img_lst.append(show_images[i])
    mask_lst.append(mask_images[i])
res = net(show_data["data"]).asnumpy().argmax(axis=1)
for i in range(eval_batch_size):
    plt.subplot(2, 4, i + 1)
    plt.imshow(img_lst[i].transpose(1, 2, 0))
    plt.axis("off")
    plt.subplots_adjust(wspace=0.05, hspace=0.02)
    plt.subplot(2, 4, i + 5)
    plt.imshow(res[i])
    plt.axis("off")
    plt.subplots_adjust(wspace=0.05, hspace=0.02)
plt.show()

这部分内容的学习总结如下：

介绍了全卷积网络（FCN），它是一种用于图像语义分割的框架，是深度学习应用在图像语义分割的开山之作。
解释了语义分割的概念，即对图像中每个像素点进行分类，并展示了一些语义分割的实例。
描述了FCN的网络结构，包括卷积化、上采样和跳跃结构等技术。2
提供了数据处理、网络构建、损失函数和评价指标、模型训练和推理等方面的代码实现。

原理上：

FCN通过将全连接层替换为全卷积层，使网络能够接受任意大小的输入图像，并输出与输入图像大小相同的分割结果。
网络中的卷积层用于提取图像的特征，池化层用于降低特征图的分辨率，上采样层用于恢复特征图的分辨率，跳跃结构用于将深层的全局信息与浅层的局部信息相结合。
在训练过程中，使用交叉熵损失函数来计算网络输出与真实标签之间的差异，并通过反向传播算法来更新网络的参数。
在推理过程中，将输入图像输入到训练好的FCN网络中，得到输出的分割结果。

代码包括：

1. 数据预处理：对输入图像进行标准化处理，使其具有相同的尺寸和数值范围。 - 数据加载：将PASCAL VOC2012数据集与SDB数据集进行混合，并使用MindSpore的Dataset类进行加载。 - 训练集可视化：运行代码观察载入的数据集图片。

2. 网络构建：描述了FCN网络的流程，包括卷积、池化、反卷积等操作。 - 网络结构：使用MindSpore的nn模块构建FCN-8s网络，包括卷积层、池化层、反卷积层等。 - 导入预训练权重：导入VGG-16部分预训练权重，以提高模型的性能。

3. 损失函数和评价指标： - 损失函数：选择交叉熵损失函数来计算FCN网络输出与mask之间的交叉熵损失。 - 评价指标：自定义了PixelAccuracy、PixelAccuracyClass、MeanIntersectionOverUnion和FrequencyWeightedIntersectionOverUnion等评价指标，用于评估模型的性能。

4. 模型训练： - 导入预训练参数：实例化损失函数、优化器，使用Model接口编译网络，训练FCN-8s网络。 - 模型评估：使用训练好的模型对测试集进行评估，计算模型的准确率、召回率等指标。

5. 模型推理： - 模型推理：使用训练的网络对模型推理结果进行展示。总结笔记： - FCN是一种用于图像分割的全卷积网络，通过将全连接层替换为全卷积层，实现了对任意大小输入图像的像素级预测。 - FCN的主要贡献包括提出使用全卷积层、可以接受任意大小的输入图像、更加高效等。 - FCN的网络结构包括卷积层、池化层、反卷积层等，通过跳跃结构将深层的全局信息与浅层的局部信息相结合。 - FCN的训练过程包括数据预处理、网络构建、损失函数和评价指标的选择、模型训练和模型评估等步骤。 - FCN的推理结果可以通过可视化展示，展示了模型对输入图像的分割效果。