图像任务的常见指标计算

倔强青铜ⅳ

已于 2023-12-12 19:47:28 修改

阅读量1.1k

点赞数 17

文章标签：人工智能机器学习深度学习计算机视觉

于 2023-12-12 19:46:57 首次发布

本文链接：https://blog.csdn.net/weixin_53162487/article/details/134957019

版权

计算机视觉传统的重要任务是图像分类，图像检测，图像分割。而在这些任务中，常见的指标计算也是非常重要，下面介绍一些常用的指标计算方式以及代码的简单实现。

图像分类的指标计算

准确率、精确率、召回率和F1分数是分类任务中常用的性能评估指标。以下是它们的定义和计算公式：
准确率 (Accuracy)：准确率是正确分类的实例占总实例数的比例。公式是： $\text{Accuracy} = \frac{\text{True Positives (TP)} + \text{True Negatives (TN)}}{\text{Total Number of Instances}}$
精确率 (Precision)：精确率是正确预测为正类的实例占所有预测为正类实例的比例。公式是: $\text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}$
召回率 (Recall) 或灵敏度（Sensitivity）：召回率是正确预测为正类的实例占所有实际正类实例的比例。公式是： $\text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}}$
F1分数 (F1 Score)：F1分数是精确率和召回率的调和平均，用于衡量分类模型的准确性。它对精确率和召回率给予相同的权重。公式是： $\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$
在这些公式中：
True Positives (TP)：正确预测为正类的实例数。
True Negatives (TN)：正确预测为负类的实例数。
False Positives (FP)：错误预测为正类的实例数。
False Negatives (FN)：错误预测为负类的实例数。

这些指标通常在二分类问题中更为常用，但也可以扩展到多分类问题。下面是这些指标的计算函数实现：

import torch

def calculate_accuracy(y_pred, y_true):
    """计算准确率"""
    _, predicted = torch.max(y_pred, 1)
    correct = (predicted == y_true).sum().item()
    accuracy = correct / y_true.size(0)
    return accuracy

def precision_recall_f1(y_pred, y_true, class_id):
    """计算精确率、召回率和F1分数，针对特定类别"""
    _, predicted = torch.max(y_pred, 1)
    true_positives = ((predicted == class_id) & (y_true == class_id)).sum().item()
    predicted_positives = (predicted == class_id).sum().item()
    actual_positives = (y_true == class_id).sum().item()

    precision = true_positives / predicted_positives if predicted_positives > 0 else 0
    recall = true_positives / actual_positives if actual_positives > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, f1

使用这些函数时，你需要传入模型的预测结果 y_pred 和真实标签 y_true。对于多分类问题，你可以针对每个类别单独计算精确率、召回率和 F1 分数，或者计算所有类别的平均值。请注意，这些函数假设 y_pred 是模型的原始输出（例如，未经 softmax 处理的 logits），并且 y_true 是包含类别索引的张量。

在图像分割和检测领域

mIoU (mean Intersection over Union)、mAP (mean Average Precision) 和 FPS (Frames Per Second) 是常用的评估指标。下面分别介绍这些指标的含义和计算方法。

1. mIoU (mean Intersection over Union)

mIoU 是图像分割(检测)领域常用的指标，用于衡量预测分割(检测框)区域与真实分割(检测框)区域的重叠程度。

IoU (Intersection over Union): 对于单个类别，IoU 是预测区域与真实区域的交集与并集的比值，并根据一个阈值（通常为0.5）判断预测框是真正例（ $\text{TP}$ ）还是假正例（ $\text{FP}$ ）。
mIoU (mean IoU): 是所有类别IoU的平均值。

def calculate_iou(pred, target, n_classes):
    ious = []
    for cls in range(n_classes):
        pred_inds = (pred == cls)
        target_inds = (target == cls)
        intersection = (pred_inds[target_inds]).long().sum().item()  # True Positive
        union = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection
        if union == 0:
            ious.append(float('nan'))  # 若没有预测或真实的该类，设置为nan
        else:
            ious.append(float(intersection) / float(max(union, 1)))
    return ious

def mean_iou(pred, target, n_classes):
    ious = calculate_iou(pred, target, n_classes)
    valid_ious = [iou for iou in ious if iou >= 0]
    mean_iou = sum(valid_ious) / len(valid_ious)
    return mean_iou

2. mAP (mean Average Precision)

mAP 是图像检测中的一个重要指标，尤其在物体检测任务中常用。

AP (Average Precision): 对于单个类别，AP 是模型预测为该类别的置信度阈值与召回率的函数下的面积。
mAP (mean AP): 是所有类别AP的平均值。

计算 mAP（mean Average Precision）是一个相对复杂的过程，特别是在物体检测领域。mAP的计算涉及到为每个类别构建precision-recall曲线，并计算这些曲线下的面积。这个过程通常包括以下步骤：

对每个类别，基于不同的置信度阈值，计算precision和recall。
对每个类别，计算在所有置信度阈值下的AP（Average Precision）。
计算所有类别AP的平均值，得到mAP。

from sklearn.metrics import average_precision_score
import numpy as np

def calculate_ap_per_class(y_true, y_scores, class_id):
    """计算单个类别的AP"""
    # 将预测为当前类别的情况标记为1，其他情况标记为0
    y_true_class = (y_true == class_id).astype(int)
    y_scores_class = y_scores[:, class_id]
    ap = average_precision_score(y_true_class, y_scores_class)
    return ap

def calculate_map(y_true, y_scores, n_classes):
    """计算所有类别的mAP"""
    aps = []
    for class_id in range(n_classes):
        ap = calculate_ap_per_class(y_true, y_scores, class_id)
        aps.append(ap)
    return np.nanmean(aps)  # 忽略nan值

# 示例数据
y_true = np.array([0, 1, 2, 1, 0])  # 真实类别
y_scores = np.random.rand(5, 3)  # 随机生成的预测置信度

# 计算mAP
map_score = calculate_map(y_true, y_scores, n_classes=3)
print("mAP:", map_score)

这个示例使用了sklearn.metrics.average_precision_score来计算每个类别的AP。注意，这里的y_true是每个样本的真实类别标签，y_scores是模型为每个类别输出的置信度。

对于更复杂的情况，比如物体检测，其中涉及到边界框（bounding boxes）的比较，通常需要更复杂的处理过程，包括非极大值抑制（Non-Maximum Suppression, NMS）等步骤。在实际应用中，强烈建议使用成熟的库或遵循特定数据集的评估标准进行mAP的计算，例如在PASCAL VOC或COCO数据集上的标准计算方法。

3. FPS (Frames Per Second)

FPS 是衡量模型速度的一个指标，表示每秒处理的帧数。

import time

def calculate_fps(model, input, iterations=100):
    start_time = time.time()
    for _ in range(iterations):
        _ = model(input)
    end_time = time.time()
    fps = iterations / (end_time - start_time)
    return fps

这个函数通过记录在固定次数的迭代中模型运行所需的时间来计算 FPS。这里的 model 是你的模型，input 是送入模型的样本数据。

ps：请注意，这些指标的具体实现可能会根据你的具体任务和数据集有所不同。对于更精确的实现，可以参考相关数据集和模型的官方实现。

4.Dice Score

Dice系数是用于比较图像分割的有效度量，尤其是在医学图像处理中。Dice系数衡量两个样本的相似度，通常用于比较预测的分割结果和ground truth。这种方法假设分割结果是二值化的（即像素要么属于前景，要么属于背景）。它在0到1之间变化，其中1表示完美的相似度，0表示没有重叠。

Dice系数的公式如下：
$\frac{2 \times |X \cap Y|}{|X| + |Y|}$
其中：

$X$ 是预测的分割结果。
$Y$ 是参考或真实的分割。
$\cap Y|$ 是 $X$ 和 $Y$ 的交集的大小。
$∣ X ∣$ 和 $∣ Y ∣$ 分别是 $X$ 和 $Y$ 的元素数量。

import numpy as np

def dice_coefficient(y_true, y_pred):
    intersection = np.sum(y_true * y_pred)
    return (2. * intersection) / (np.sum(y_true) + np.sum(y_pred))

# 示例数据
y_true = np.array([[1, 1, 0, 0],
                   [0, 1, 1, 0],
                   [0, 0, 1, 1]])

y_pred = np.array([[1, 0, 1, 0],
                   [1, 1, 0, 1],
                   [0, 1, 1, 1]])

dice = dice_coefficient(y_true, y_pred)
print(f"Dice Coefficient: {dice}")