【CV\segmentation】实例分割算法在竞赛中的评价指标(Evaluation Metrics)|| 学习笔记

【start:20231115】

引言

研究动机

实例分割作为计算机视觉领域的关键任务,对于准确理解和定位图像中的个体对象至关重要。在这一背景下,评价指标成为不可或缺的工具,它们不仅仅是对模型性能进行量化评估的手段,更是指导研究方向和优化算法的关键因素。

简介

下文将详细介绍一些常见的实例分割任务的评价指标,包括:

  • mIoU(平均交并比)
  • ImAP(实例平均精度)
  • F1分数
  • AJI(聚合杰卡德指数)
  • mPQ(多类全景质量)
  • 运行时间

然后,我们还将介绍一些在竞赛中常见且特殊的实例分割任务的评价指标。

参考资料

【ref】【生动理解】深度学习中常用的各项评价指标含义TP、FP、TN、FN、Accuracy、Recall、IoU、mIoU

【ref】实例分割计算指标TP,FP,FN,F1(附代码)

常见的评价指标

TP、FP、TN、FN

TP、FP、TN、FN是机器学习中最基本的指标,

对某一类别A来讲:

T = true,表示正确分类的;F = false,表示错误分类的;

P = Positive,表示预测结果为A;N = Negative,表示预测结果为非A。

  • TP(True Positive): 正确分成A的数目,即预测为A,真值也是A,。
  • FP(False Positive): 错误分成A的数目,即预测为A,真值是非A。
  • TN(True Negative): 正确分成非A的数目, 即预测为非A,真值也是非A,。
  • FN(False Negative): 错误分成非A的数目,即预测为非A, 真值是A。

Precision、Accuracy、Recall

  • Precision:精确率,由混淆矩阵计算得出,P = TP/(TP+FP)

  • Recall:召回率,R = TP/(TP+FN)

  • Accuracy:准确率,accuracy = (TP+TN)/(TP+TN+FP+FN)

F1 Score

F1 Score是精确度和召回率的调和平均值,用于综合考虑精确度和召回率。

其计算公式如下:

F 1 ⋅ S c o r e = 2 ⋅ Precision ⋅ Recall Precision + Recall F1·Score = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} F1Score=Precision+Recall2PrecisionRecall

其中:

  • Precision(精确度)是指模型预测为正例的样本中实际为正例的比例,计算公式为 T P T P + F P \frac{TP}{TP + FP} TP+FPTP
  • Recall(召回率)是指实际为正例的样本中被模型正确预测为正例的比例,计算公式为 T P T P + F N \frac{TP}{TP + FN} TP+FNTP
  • T P TP TP 是真正例数(True Positives), F P FP FP 是假正例数(False Positives), F N FN FN 是假负例数(False Negatives)。

mAP(Mean Average Precision)

ImAP(Instance Mean Average Precision)

ImAP是精确度(Precision)和召回率(Recall)的组合度量,它对每个类别计算AP(Average Precision),然后对所有类别取平均值,用于综合评估模型性能。

其计算公式如下:

I m A P = 1 C ∑ c = 1 C A P c ImAP = \frac{1}{C} \sum_{c=1}^{C} AP_c ImAP=C1c=1CAPc

其中:

  • C C C 是类别的总数。
  • A P c AP_c APc 是第 c c c 个类别的平均精度(Average Precision)。

每个类别的平均精度 A P c AP_c APc 的计算涉及到精确度和召回率,通常通过计算精确度-召回率曲线下的面积(Area Under the Precision-Recall curve,AUC-PR)来获得。AP 的计算方式在实际应用中可能会有一些变化,通常是计算离散的精确度和召回率点,然后进行插值得到平滑曲线下的面积。

Jaccard Index(JI,又名 IoU:intersection over union)

IoU

IoU(Intersection over Union)

对于一个特定的实例,IoU是指模型预测的区域与真实区域的交集比上它们的并集,计算公式为:
I o U = ∣ M ∩ G ∣ ∣ M ∪ G ∣ IoU = \frac{|M \cap G|}{|M \cup G|} IoU=MGMG
其中, M M M 是模型预测的实例的分割区域, G G G 是真实实例的分割区域。

mIoU

mIoU(Mean Intersection over Union)

mIoU是对所有实例计算IoU的平均值,计算公式为:
m I o U = 1 N ∑ i = 1 N I o U i mIoU = \frac{1}{N} \sum_{i=1}^{N} IoU_i mIoU=N1i=1NIoUi
其中, N N N 是实例的总数, I o U i IoU_i IoUi 是第 i i i 个实例的IoU。

AJI(Aggregated Jaccard Index)

Jaccard指数是预测实例区域与真实区域交集大小与并集大小的比例;

AJI综合了所有实例的Jaccard指数,通过权重平均的方式来得到最终的聚合Jaccard指数,用于度量模型分割结果与真实分割之间的相似性。

其计算公式如下:

A J I = ∑ k = 1 K w k ⋅ J I k ∑ k = 1 K w k AJI = \frac{\sum_{k=1}^{K} w_k \cdot JI_k}{\sum_{k=1}^{K} w_k} AJI=k=1Kwkk=1KwkJIk

其中:

  • K K K 是实例的总数。
  • J I k JI_k JIk 是第 k k k 个实例的Jaccard指数,计算公式为 ∣ M i ∩ G i ∣ ∣ M i ∪ G i ∣ \frac{|M_i \cap G_i|}{|M_i \cup G_i|} MiGiMiGi,其中 M i M_i Mi 是模型预测的第 k k k 个实例的分割区域, G i G_i Gi 是真实的第 k k k 个实例的分割区域。
  • w k w_k wk 是第 k k k 个实例的权重,通常是该实例的大小(像素数)。

Dice Index

Dice vs IoU

Dice:
在这里插入图片描述

IOU:
在这里插入图片描述

DSC(Dice Similarity Coefficient)

在这里插入图片描述

PQ(panoptic quality)

全景分割可以理解为语义分割和物体检测的结合,所以评价指标需要结合IoU以及AP得出,PQ (Panoptic Quality),定义如下:

在这里插入图片描述

mPQ(multi-class panoptic quality)

mPQ综合了所有实例的Panoptic Quality,通过取平均值得到最终的多类别全景分割质量度量,用于评估模型在分割任务中的整体性能。

参考《CoNIC 2022》的 Assessment Metrics,

Multi-Class Panoptic Quality (mPQ) 的计算公式使用Markdown表示如下:

在这里插入图片描述

在这里插入图片描述

Running time

运行时间是指模型完成分割任务所需的时间,通常以秒为单位;

低运行时间通常是一个重要的考虑因素,特别是在实时应用中。


参考《NeurIPS 2022 CellSeg》的 Assessment Metrics

其计算公式如下:
在这里插入图片描述

竞赛中的评价指标

{2022} NeurIPS 2022 CellSeg

【paper】Nucleus segmentation: towards automated solutions

【link】https://neurips22-cellseg.grand-challenge.org/metrics/

  • F1 Score
  • Running time
  • F1 Score (Code, threshold=0.5)
  • Running time (Code, please limit the maximum consumption of GPU memory to 10G and RAM to 28GB)

补充

  • 2023.08.13 update: We also present the F1 scores at other thresholds (0.6, 0.7, 0.8, 0.9) on the leaderboard.

Ranking Scheme

Both F1 score and running time are used in the ranking scheme. However, the two metrics cannot be directly fused because they have different dimensions. Thus, we use a “rank-then-aggregate" scheme for ranking, including the following three steps:

  • Step 1. Computing the two metrics for each testing case and each team;
  • Step 2. Ranking teams for each of the N testing cases such that each team obtains Nx2 rankings;
  • Step 3. Computing ranking scores for all teams by averaging all these rankings and then normalizing them by the number of teams.

{2022} CoNIC 2022

【paper】CoNIC: Colon Nuclei Identification and Counting Challenge

【link】https://conic-challenge.grand-challenge.org/Evaluation/

  • mPQ+(multi-class panoptic quality)
  • multi-class coefficient of determination
  1. Task 1: Nuclei instance segmentation and classification

We will use multi-class panoptic quality (PQ) to determine the performance of nuclear instance segmentation and classification.

Henceforth, we define the multi-class PQ (mPQ) as the task ranking metric, which takes averages the PQ over all classes:

Note, for mPQ we calculate the statistics over all images to ensure there are no issues when a particular class is not present in a patch. This is different to mPQ calculation used in previous publications, such as PanNuke, MoNuSAC and in the original Lizard paper, where the PQ is calculated for each image and for each class before the average is taken. Hence, for the purpose of this challenge, we refer to the metric as mPQ+.

  1. Task 2: Nuclear composition regression

For the second task, we will use multi-class coefficient of determination to determine the correlation between the predicted and true counts. For this, the statistic is calculated for each class independently and then the results are averaged.

{2021} SegPC-2021

【paper】SegPC-2021: A challenge & dataset on segmentation of Multiple Myeloma plasma cells from microscopic images

【link】https://segpc-2021.grand-challenge.org/Evaluation/

  • mIoU(Mean intersection over union)
  • ImAP(Instance Mean Average Precision)
  1. Validation Phase:

mIoU——Mean intersection over union: IoU will be calculated for each instance of the cells of interest. It will be used as the metric to rank the methods/participating teams in the validation phase.

  1. Final Testing Phase:

ImAP——Instance Mean Average Precision (ImAP): This mean is computed on the average precision (obtained from each cell instance) of all cell instances of the test data. ImAP will be used as the metric to rank the methods/participating teams in the final testing phase.

{2020} MoNuSAC 2020

【link】https://monusac-2020.grand-challenge.org/Evaluation_Metric/

  • PQ(Panoptic Quality)

The metric to evaluate submitted results will be the weighted average of the class-specific Panoptic Quality (PQ). Please refer section V of this document to get more information about the metric.

{2018} MoNuSAC 2018

【link】https://monuseg.grand-challenge.org/Evaluation/

  • AJI(Aggregated Jaccard Index)

Participants of this challenge should submit 14 PNG images, one for each of the test images, with value 0 for background pixels, and a unique positive integer for pixels corresponding to each segmented nucleus, similar to the label data provided for training images.

Aggregated Jaccard Index (AJI) will be used to compute the nuclei segmentation accuracy. The details of AJI are provided in Algorithm 1 of the paper provided in the Training Data section of the data page.

  • Mean Aggregated Jaccard index over 14 test images will be computed to rank the participants
  • Submissions with missing results on any of the test images will not be ranked
  • Only fully-automated methods, that is the methods that require no manual intervention during testing, will be ranked

The code to compute Aggregated Jaccard Index (AJI) is available here.

对评价指标的批判

论文精选

Panoptic quality should be avoided as a metric for assessing cell nuclei segmentation and classification in digital pathology

【摘要】:全景质量 (PQ) 是为“全景分割”(PS) 任务而设计的,自 2019 年推出以来,已在多个数字病理学挑战和细胞核实例分割和分类 (ISC) 出版物中使用。其目的是涵盖任务的检测和分割方面在一个单一的测量中,以便算法可以根据其整体性能进行排名。仔细分析该指标的属性、其在 ISC 中的应用以及核心 ISC 数据集的特征,表明该指标不适合此目的,应避免。通过理论分析,我们证明 PS 和 ISC 尽管有相似之处,但存在一些根本差异,导致 PQ 不适合。我们还表明,使用并交交集作为 PQ 中的匹配规则和分割质量度量并不适合像原子核这样的小物体。我们用 NuCLS 和 MoNuSAC 数据集中的示例来说明这些发现。用于复制我们结果的代码可在 GitHub 上找到 (https://github.com/adfou cart/panop tic-quality-supp)。

在这里插入图片描述

### 回答1: 语义分割是图像处理的一个任务,目的是将图像的每个像素进行分类,识别出不同的物体或场景。评价指标是用来衡量模型对图像进行分割的准确程度的指标。 常用的语义分割评价指标有IoU(Intersection over Union)和mIoU(Mean Intersection over Union)。 IoU是指预测的分割结果和真实标签之间的交集面积与并集面积之比。具体计算公式为: IoU = (预测结果与真实标签的交集面积) / (预测结果与真实标签的并集面积) mIoU是所有图像预测结果的IoU的平均值。 以下是用Python计算语义分割评价指标的示例代码: ```python import numpy as np def calculate_iou(pred, target): intersection = np.logical_and(pred, target) union = np.logical_or(pred, target) iou_score = np.sum(intersection) / np.sum(union) return iou_score def calculate_miou(preds, targets): miou_scores = [] for pred, target in zip(preds, targets): iou_score = calculate_iou(pred, target) miou_scores.append(iou_score) miou = np.mean(miou_scores) return miou # 假设有5个图像的预测结果和真实标签 preds = [np.array([[1, 1, 0, 0], [1, 1, 0, 0], [0, 0, 1, 1], [0, 0, 1, 1]]), np.array([[1, 0, 0, 1], [1, 0, 0, 1], [1, 1, 0, 0], [1, 1, 0, 0]]), np.array([[0, 0, 1, 1], [0, 0, 1, 1], [1, 0, 0, 1], [1, 0, 0, 1]]), np.array([[1, 1, 1, 0], [1, 1, 1, 0], [0, 0, 0, 1], [0, 0, 0, 1]]), np.array([[0, 1, 1, 0], [0, 1, 1, 0], [0, 0, 0, 1], [0, 0, 0, 1]])] targets = [np.array([[1, 1, 0, 0], [1, 1, 0, 0], [0, 1, 1, 0], [0, 1, 1, 0]]), np.array([[1, 0, 0, 1], [1, 0, 0, 1], [1, 1, 0, 0], [1, 1, 0, 0]]), np.array([[0, 1, 1, 0], [0, 1, 1, 0], [1, 0, 0, 1], [1, 0, 0, 1]]), np.array([[1, 1, 1, 0], [1, 1, 1, 0], [0, 0, 1, 1], [0, 0, 1, 1]]), np.array([[0, 1, 1, 0], [0, 1, 1, 0], [0, 0, 1, 1], [0, 0, 1, 1]])] miou = calculate_miou(preds, targets) print("mIoU:", miou) ``` 这段代码,首先定义了两个函数calculate_iou和calculate_miou,用于计算IoU和mIoU。 然后创建了5个预测结果和5个真实标签的示例数据。 最后调用calculate_miou函数计算mIoU,并输出结果。 该示例只是一个简单的演示,实际应用可能需要考虑更复杂的情况,例如处理多分类问题或处理整个数据集的评价指标。 ### 回答2: 在语义分割任务,我们常常需要评估模型的性能。以下是一些常用的语义分割评价指标及其对应的 Python 代码实现: 1. 像素准确率(Pixel Accuracy):计算预测结果正确分类的像素数目与总像素数目的比值。 ```python def pixel_accuracy(y_true, y_pred): total_pixels = y_true.size correct_pixels = np.sum(y_true == y_pred) accuracy = correct_pixels / total_pixels return accuracy ``` 2. 平均像素准确率(Mean Pixel Accuracy):计算每个类别的像素准确率的平均值。 ```python def mean_pixel_accuracy(y_true, y_pred, num_classes): class_pixels = np.zeros(num_classes) for c in range(num_classes): class_pixels[c] = np.sum(np.logical_and(y_true == c, y_pred == c)) class_accuracy = class_pixels / np.sum(y_true == y_pred, axis=(0, 1)) mean_accuracy = np.mean(class_accuracy) return mean_accuracy ``` 3. 平均交并比(Mean Intersection over Union,mIOU):计算每个类别的交并比的平均值。 ```python def mean_iou(y_true, y_pred, num_classes): class_iou = np.zeros(num_classes) for c in range(num_classes): intersection = np.sum(np.logical_and(y_true == c, y_pred == c)) union = np.sum(np.logical_or(y_true == c, y_pred == c)) class_iou[c] = intersection / union mean_iou = np.mean(class_iou) return mean_iou ``` 以上是语义分割评价指标的一些示例代码。根据实际需求,也可以使用其他指标来评估模型性能。 ### 回答3: Python 语义分割评价指标代码通常用于评估语义分割模型的性能。以下是一个示例代码,用于计算语义分割模型的准确率、精确率、召回率和F1值。 ```python import numpy as np def evaluate_semantic_segmentation(predictions, targets, num_classes): """ 计算语义分割模型的评价指标:准确率、精确率、召回率和F1值 :param predictions: 预测的语义分割结果,形状为[H, W] :param targets: 真实的语义分割标签,形状为[H, W] :param num_classes: 类别数量 :return: 准确率、精确率、召回率和F1值 """ confusion_matrix = np.zeros((num_classes, num_classes), dtype=np.int32) for i in range(predictions.shape[0]): for j in range(predictions.shape[1]): predicted_class = predictions[i, j] target_class = targets[i, j] confusion_matrix[predicted_class, target_class] += 1 tp = np.diag(confusion_matrix) fp = confusion_matrix.sum(axis=0) - tp fn = confusion_matrix.sum(axis=1) - tp accuracy = tp.sum() / confusion_matrix.sum() precision = tp / (tp + fp) recall = tp / (tp + fn) f1 = (2 * precision * recall) / (precision + recall) return accuracy, precision, recall, f1 ``` 这个代码,我们首先定义了一个大小为num_classes x num_classes的混淆矩阵,用于计算预测的类别与真实的类别之间的匹配情况。然后,我们遍历预测结果和真实标签,并更新混淆矩阵。接着,我们计算真正类别(tp)、假正类别(fp)和假负类别(fn)的数量。最后,我们使用这些信息计算准确率、精确率、召回率和F1值。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值