全卷积网络FCN某些知识点笔记

最新推荐文章于 2024-07-27 17:26:39 发布

嘻嘻作者哈哈

最新推荐文章于 2024-07-27 17:26:39 发布

阅读量2.2k

点赞数 2

分类专栏：计算机视觉文章标签：图像分割评价指标

本文链接：https://blog.csdn.net/weixin_43971252/article/details/114284885

版权

计算机视觉专栏收录该内容

3 篇文章 1 订阅

订阅专栏

文章目录

一、全局信息和局部信息

（一）局部信息

提取位置：浅层网络中提取局部信息；
特点：对应的感受野比较小，所以是在网络中的前面部分，提取局部信息，物体的几何信息比较丰富；
目的：有助于分割比较小的目标，细化，提高分割的精确程度。

（二）全局信息

提取位置：深层网络中提取全局信息；
特点：对应的感受野比较大，所以是在网络的中/后部分，提取全局信息，物体的空间信息比较丰富;
目的：有助于分割比较大的目标，提高分割的精确程度。

总结：使用跳跃连接，将局部信息与全局信息进行融合，提高分割准确性。

二、感受野

定义：在CNNs中，决定某一层输出结果中的某一个元素对应在原始输入层的区域大小。
公式：
在这里插入图片描述
由上式可知，当前层的感受野和当前层的卷积核大小固定后，卷积步长stride越大，则当前层的感受野也越大。
当stride过大，则在卷积过程中，会导致卷积后的特征图信息丢失过多，很多特征无法被提取。因此我们需要在减小stride的同时，让感受野保存不变或者增大。

三、评价指标

IoU：正确标签与模型预测的交并比。
PA：像素准确率，标记正确的像素占总像素的比例。
MPA：均类像素准确率，每个类被标记正确的像素占比，之后求所有类的均值。

四、指标的计算

通过混淆矩阵可以直接计算IoU、PA、MPA、mIoU。

构造混淆矩阵，以n=6个类别为例子

其中L表示这10个像素点对应的标签，P表示模型输出的预测值。
首先计算 $n \times L + P$ ，之后用一个向量来统计 $n \times L + P$ 里面的每个值出现的次数，向量的长度为 $bin=n^2$ ，最后通过将这个向量reshape为(n,n)的矩阵就是混淆矩阵。

混淆矩阵解释：矩阵的行轴为标签对应的分类，列轴为预测值对应的分类。比如真实值为1，预测值为1的像素点为2个；真实值为1，预测值为5的像素点为1个。
（1）对角线表示每个类别分类正确的像素点个数；
（2）行和：即标签，每一行的和表示该类别在全部预测正确的情况下，该类别总的像素点个数，也就是该类别在标签L中的像素点个数，比如类别1为3个像素点；
（3）列和：即预测，每一列的和表示，对于该类别来说，模型将多少个像素点预测为该类别，即类别在预测P中的像素点个数，比如类别1，模型预测其有4个像素点。
2. $I o U = (对角线元素) / (行和 + 列和 - 对角线元素)$
3. mIoU为IoU求和取平均
4. $P A = 对角线元素之和 / 总和$
5. MPA ：每个类别预测正确的像素点个数(每一行的对角线元素) / 该类别的总像素点个数(每一行的和)，之后再去平均。

计算代码

from __future__ import division

import numpy as np
import six


def calc_semantic_segmentation_confusion(pred_labels, gt_labels):
    """Collect a confusion matrix.

    The number of classes :math:`n\_class` is
    :math:`max(pred\_labels, gt\_labels) + 1`, which is
    the maximum class id of the inputs added by one.

    Args:
        pred_labels (iterable of numpy.ndarray): A collection of predicted
            labels. The shape of a label array
            is :math:`(H, W)`. :math:`H` and :math:`W`
            are height and width of the label.
        gt_labels (iterable of numpy.ndarray): A collection of ground
            truth labels. The shape of a ground truth label array is
            :math:`(H, W)`, and its corresponding prediction label should
            have the same shape.
            A pixel with value :obj:`-1` will be ignored during evaluation.

    Returns:
        numpy.ndarray:
        A confusion matrix. Its shape is :math:`(n\_class, n\_class)`.
        The :math:`(i, j)` th element corresponds to the number of pixels
        that are labeled as class :math:`i` by the ground truth and
        class :math:`j` by the prediction.

    """
    pred_labels = iter(pred_labels)     # (352, 480)
    gt_labels = iter(gt_labels)     # (352, 480)

    n_class = 12  # 12个类别
    confusion = np.zeros((n_class, n_class), dtype=np.int64)    # (12, 12)
    for pred_label, gt_label in six.moves.zip(pred_labels, gt_labels):
        if pred_label.ndim != 2 or gt_label.ndim != 2:
            raise ValueError('ndim of labels should be two.')
        if pred_label.shape != gt_label.shape:
            raise ValueError('Shape of ground truth and prediction should'
                             ' be same.')
        pred_label = pred_label.flatten()   # (168960, )
        gt_label = gt_label.flatten()   # (168960, )

        # Dynamically expand the confusion matrix if necessary.
        lb_max = np.max((pred_label, gt_label))
        # print(lb_max)
        if lb_max >= n_class:
            expanded_confusion = np.zeros(
                (lb_max + 1, lb_max + 1), dtype=np.int64)
            expanded_confusion[0:n_class, 0:n_class] = confusion

            n_class = lb_max + 1
            confusion = expanded_confusion

        # Count statistics from valid pixels.  极度巧妙 × class_nums 正好使得每个ij能够对应.
        mask = gt_label >= 0
        # 关键代码：
        confusion += np.bincount(
            n_class * gt_label[mask].astype(int) + pred_label[mask],
            minlength=n_class ** 2)\
            .reshape((n_class, n_class))

    for iter_ in (pred_labels, gt_labels):
        # This code assumes any iterator does not contain None as its items.
        if next(iter_, None) is not None:
            raise ValueError('Length of input iterables need to be same')

    return confusion


def calc_semantic_segmentation_iou(confusion):
    """Calculate Intersection over Union with a given confusion matrix.

    The definition of Intersection over Union (IoU) is as follows,
    where :math:`N_{ij}` is the number of pixels
    that are labeled as class :math:`i` by the ground truth and
    class :math:`j` by the prediction.

    * :math:`\\text{IoU of the i-th class} =  \
        \\frac{N_{ii}}{\\sum_{j=1}^k N_{ij} + \\sum_{j=1}^k N_{ji} - N_{ii}}`

    Args:
        confusion (numpy.ndarray): A confusion matrix. Its shape is
            :math:`(n\_class, n\_class)`.
            The :math:`(i, j)` th element corresponds to the number of pixels
            that are labeled as class :math:`i` by the ground truth and
            class :math:`j` by the prediction.

    Returns:
        numpy.ndarray:
        An array of IoUs for the :math:`n\_class` classes. Its shape is
        :math:`(n\_class,)`.

    """
    iou_denominator = (confusion.sum(axis=1) + confusion.sum(axis=0)
                       - np.diag(confusion))
    iou = np.diag(confusion) / iou_denominator
    # 注：不同的数据集，背景所在的类别索引不一样
    return iou[:-1]  # 最后一个类别是背景，不包含背景
    # return iou[1:]  # 第一个类别是背景
    # return iou  # 包含背景


def eval_semantic_segmentation(pred_labels, gt_labels):
    """Evaluate metrics used in Semantic Segmentation.

    This function calculates Intersection over Union (IoU), Pixel Accuracy
    and Class Accuracy for the task of semantic segmentation.

    The definition of metrics calculated by this function is as follows,
    where :math:`N_{ij}` is the number of pixels
    that are labeled as class :math:`i` by the ground truth and
    class :math:`j` by the prediction.

    * :math:`\\text{IoU of the i-th class} =  \
        \\frac{N_{ii}}{\\sum_{j=1}^k N_{ij} + \\sum_{j=1}^k N_{ji} - N_{ii}}`
    * :math:`\\text{mIoU} = \\frac{1}{k} \
        \\sum_{i=1}^k \
        \\frac{N_{ii}}{\\sum_{j=1}^k N_{ij} + \\sum_{j=1}^k N_{ji} - N_{ii}}`
    * :math:`\\text{Pixel Accuracy} =  \
        \\frac \
        {\\sum_{i=1}^k N_{ii}} \
        {\\sum_{i=1}^k \\sum_{j=1}^k N_{ij}}`
    * :math:`\\text{Class Accuracy} = \
        \\frac{N_{ii}}{\\sum_{j=1}^k N_{ij}}`
    * :math:`\\text{Mean Class Accuracy} = \\frac{1}{k} \
        \\sum_{i=1}^k \
        \\frac{N_{ii}}{\\sum_{j=1}^k N_{ij}}`

    The more detailed description of the above metrics can be found in a
    review on semantic segmentation [#]_.

    The number of classes :math:`n\_class` is
    :math:`max(pred\_labels, gt\_labels) + 1`, which is
    the maximum class id of the inputs added by one.

    .. [#] Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, \
    Victor Villena-Martinez, Jose Garcia-Rodriguez. \
    `A Review on Deep Learning Techniques Applied to Semantic Segmentation \
    <https://arxiv.org/abs/1704.06857>`_. arXiv 2017.

    Args:
        pred_labels (iterable of numpy.ndarray): A collection of predicted
            labels. The shape of a label array
            is :math:`(H, W)`. :math:`H` and :math:`W`
            are height and width of the label.
            For example, this is a list of labels
            :obj:`[label_0, label_1, ...]`, where
            :obj:`label_i.shape = (H_i, W_i)`.
        gt_labels (iterable of numpy.ndarray): A collection of ground
            truth labels. The shape of a ground truth label array is
            :math:`(H, W)`, and its corresponding prediction label should
            have the same shape.
            A pixel with value :obj:`-1` will be ignored during evaluation.

    Returns:
        dict:

        The keys, value-types and the description of the values are listed
        below.

        * **iou** (*numpy.ndarray*): An array of IoUs for the \
            :math:`n\_class` classes. Its shape is :math:`(n\_class,)`.
        * **miou** (*float*): The average of IoUs over classes.
        * **pixel_accuracy** (*float*): The computed pixel accuracy.
        * **class_accuracy** (*numpy.ndarray*): An array of class accuracies \
            for the :math:`n\_class` classes. \
            Its shape is :math:`(n\_class,)`.
        * **mean_class_accuracy** (*float*): The average of class accuracies.

    # Evaluation code is based on
    # https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/
    # score.py#L37
    """

    # 计算混淆矩阵：行轴为正确标签，列轴为预测值，对角线为分类正确
    confusion = calc_semantic_segmentation_confusion(
        pred_labels, gt_labels)
    iou = calc_semantic_segmentation_iou(confusion)     # (12, ), 12个类别对应的IoU
    pixel_accuracy = np.diag(confusion).sum() / confusion.sum()  # PA，标注正确的像素占总像素的比例
    class_accuracy = np.diag(confusion) / (np.sum(confusion, axis=1) + 1e-10)  # 每个类被正确标记的像素占比，axis=1表示一行一行的求和

    return {'iou': iou, 'miou': np.nanmean(iou),
            'pixel_accuracy': pixel_accuracy,
            'class_accuracy': class_accuracy,
            'mean_class_accuracy': np.nanmean(class_accuracy[:-1])}
            # 'mean_class_accuracy': np.nanmean(class_accuracy)}