医疗图像分割指标

HackerTom

已于 2024-07-28 20:30:49 修改

阅读量2.5k

点赞数 4

分类专栏：机器学习文章标签：医疗图像 medical segmentation 指标分割

于 2023-09-28 17:08:46 首次发布

本文链接：https://blog.csdn.net/HackerTom/article/details/133382705

版权

机器学习专栏收录该内容

121 篇文章

订阅专栏

医疗图像其中两种图像格式：MRI（Magnetic Resonance Imaging，磁共振成像）、CT（Computed Tomography，计算机断层），常存成 .nii.gz 格式。都是 3D 的 $\times W \times L$ ，但不是 RGB，而是类似 grey-scale 图只有一个 channel，如 CT 的值可能是 Hu 值，取值范围可从负几千到正几千，参考 [1]。segmentation label 也相应是 3D 的。
med-planes
这里记录几种分割指标，参考 [2,3]，可调 medpy 的包，其文档也有指标列表。记有 C + 1 个类，0 固定是 background，（某个数据的）prediction 和 label $\hat{Y},Y\in \{0, \dots, C\}^{H \times W \times L}$ ，只留下第 c 类的 binary prediction 和 label 为 $\hat{B}^c,B^c \in \{0, 1\}^{H \times W \times L}$ ， $|\cdot|$ 表示非零元素个数， $\hat{B}^c,B^c$ 的 surface / boundray 记为 $\hat{S}^c, S^c$ ，是其表面 voxels 的三维索引向量（下标）的集合。

Dice Coefficient

即 F1-scoure^[2]。第 c 类的 dice 系数： $DC_c=\frac{2|\hat{B}^c \cap B^c|}{|\hat{B}^c| + |B^c|}$ 调包：medpy.metric.binary.dc。

IoU

即 Jaccard 系数^[2]。第 c 类的 IoU： $IoU_c = \frac{|\hat{B}^c \cap B^c|}{|\hat{B}^c \cup B^c|}$ 调包：medpy.metric.binary.jc。由 [3]，IoU 与 dice 系数有确定的转换关系，即等价，所以两者应该用其中一种就行。C 类的 IoU 取平均就是 mIoU： $mIoU=\frac{\sum_{c=0}^{C}IoU_c}{C + 1}$

（2023.11.23）代码可选包不包含 background。
（2023.10.17）[6,7] 都是算上 background，此处跟它们。
（2023.9.28）不确定算 mIoU 时要不要加上 background。

Frequency Weighted IoU

[5] 用到，但其加权系数的好像写错了，应参照 [6,7] 的公式。而 [5] 本身的实现与 [6-8] 一致，是对的。公式应为： $\begin{aligned} fwIoU_c &= \frac{|B^c|}{HWL}IoU_c \\ fwIoU &= \sum_{c=0}^{C}fwIoU_c \end{aligned}$ 注意此时只求和，不再除以 C + 1。另 [6] 有一份 tensorflow 版。

Sensitivity

即 recall，也叫 TPR（True Positive Rate），非对称。第 c 类的 sensitivity： $R_c=\frac{|\hat{B}^c \cap B^c|}{|B^c|}$ 调包：medpy.metric.binary.sensitivity 或 medpy.metric.binary.recall。

Specificity

也叫 selectivity、TNR（True Negative Rate），是 sensitivity 的补。第 c 类的 specifity： $SP_c = \frac{|(1-\hat{B}^c) \cap (1-B^c)|}{|1-B^c|}$ 调包：medpy.metric.binary.specificity。

(Pixel) Accuracy

[5] 用到 $acc=\frac{1(\hat{Y} = Y)}{HWL}$

Average (Symmetric) Surface Distance

由 [4]，第 c 类的 average surface distance： $ASD(\hat{S}^c, S^c) = \sum_{p \in \hat{S}^c} d_s(p,S^c) \big/ |\hat{S}^c| \ne ASD(S^c, \hat{S}^c)$ 其中 $d_s(\cdot,\cdot)$ 是点到点集（此处是表面 S）距离： $d_s(p,S)=\min_{q \in S} d(p,q)$ $\in \{1,\dots,W\} \times \{1,\dots,H\} \times \{1,\dots,L\}$ 是三维索引向量（下标）， $d(\cdot,\cdot)$ 是欧氏距离。就是对 $\hat{S}$ 中属于此 object 表面的每一个 voxel，都算一下它到 S 的距离，然后取平均。调包：medpy.metric.binary.asd。

ASD 不对称，一般会用 average symmetric surface distance： $ASSD(\hat{S}^c, S^c) = \frac{ASD(\hat{S}^c, S^c) + ASD(S^c, \hat{S}^c)}{2} = ASSD(S^c, \hat{S}^c)$ 调包：medpy.metric.binary.assd。有些文章也会将 ASSD 简叫成 ASD，感觉可以无脑只用 ASSD。

（2024.7.28）此前 assd 在某些情况会报错，在 0.5.2 版本修复，见 #131。

Hausdorff Distance

由 [2,4]，第 c 类的 Hausdorff 距离： $HD(\hat{S}^c,S^c)=\max\left\{ \sup_{p \in \hat{S}^c} d_s(p, S^c), \sup_{p \in S^c} d_s(p, \hat{S}^c) \right\}$ 调包：medpy.metric.binary.hd。

Calibration

从公式看，有些指标在遇见特殊情况（全零 prediction、全零 label、全一 label）可能会有除零问题，一例见下文 example: error from distance metrics 小节。现测试 medpy.metric.binary 各 API 对这些突殊情况作何处理，以便校正不合理之处。

import numpy as np
import medpy.metric.binary as mmb

np.bool = np.bool_ # 新的 numpy 已无 np.bool

p0 = np.array([0, 0, 0, 0]) # all 0
p1 = np.array([1, 1, 1, 1]) # all 1
p  = np.array([0, 1, 1, 0]) # both 0 & 1
y0 = p0
y1 = p1
y  = 1 - p

print("\tdice")
print(mmb.dc(p0, y0), mmb.dc(p0, y), mmb.dc(p, y0))

print("\tiou")
try:
    print(mmb.jc(p0, y0))
except ZeroDivisionError as e:
    print("iou(p0, y0):", e)
print(mmb.jc(p0, y), mmb.jc(p, y0))

print("\tsensitivity")
print(mmb.sensitivity(p0, y0), mmb.sensitivity(p0, y), mmb.sensitivity(p, y0))

print("\tspecificity")
print(mmb.specificity(p1, y1), mmb.specificity(p1, y), mmb.specificity(p, y1))

# ASD 不对称！此处只依上文公式调包，即 pred 在前、label 在后
print("\tasd")
try:
    print(mmb.asd(p0, y0))
except RuntimeError as e:
    print("asd(p0, y0):", e)
try:
    print(mmb.asd(p0, y))
except RuntimeError as e:
    print("asd(p0, y):", e)
try:
    print(mmb.asd(p, y0))
except RuntimeError as e:
    print("asd(p, y0):", e)

print("\tassd")
try:
    print(mmb.assd(p0, y0))
except RuntimeError as e:
    print("assd(p0, y0):", e)
try:
    print(mmb.assd(p0, y))
except RuntimeError as e:
    print("assd(p0, y):", e)
try:
    print(mmb.assd(p, y0))
except RuntimeError as e:
    print("assd(p, y0):", e)

print("\thd")
try:
    print(mmb.hd(p0, y0))
except RuntimeError as e:
    print("hd(p0, y0):", e)
try:
    print(mmb.hd(p0, y))
except RuntimeError as e:
    print("hd(p0, y):", e)
try:
    print(mmb.hd(p, y0))
except RuntimeError as e:
    print("hd(p, y0):", e)

print("\thd95")
try:
    print(mmb.hd95(p0, y0))
except RuntimeError as e:
    print("hd95(p0, y0):", e)
try:
    print(mmb.hd95(p0, y))
except RuntimeError as e:
    print("hd95(p0, y):", e)
try:
    print(mmb.hd95(p, y0))
except RuntimeError as e:
    print("hd95(p, y0):", e)

输出：

        dice
0.0 0.0 0.0
        iou
iou(p0, y0): float division by zero
0.0 0.0
        sensitivity
0.0 0.0 0.0
        specificity
0.0 0.0 0.0
        asd
asd(p0, y0): The first supplied array does not contain any binary object.
asd(p0, y): The first supplied array does not contain any binary object.
asd(p, y0): The second supplied array does not contain any binary object.
        assd
assd(p0, y0): The first supplied array does not contain any binary object.
assd(p0, y): The first supplied array does not contain any binary object.
assd(p, y0): The second supplied array does not contain any binary object.
        hd
hd(p0, y0): The first supplied array does not contain any binary object.
hd(p0, y): The first supplied array does not contain any binary object.
hd(p, y0): The second supplied array does not contain any binary object.
        hd95
hd95(p0, y0): The first supplied array does not contain any binary object.
hd95(p0, y): The first supplied array does not contain any binary object.
hd95(p, y0): The second supplied array does not contain any binary object.

可见 medpy.metric.binary 的处理分两类：

置零：dice、sensitivity、specificity
报错：iou、asd、assd、hd、hd95

分类讨论：

当 prediction 与 label 一致时，即使是全零或全一，也应该视为预测正确，赋最优值，即 dice、iou、sensitivity、specificity 为 1，asd、assd、hd、hd95 为 0，这点 medpy.metric.binary 没处理好；
sensitivity 当 label 为空、prediction 非空时，应当为 0，medpy.metric.binary 的处理是对的；
asd、assd、hd、hd95 四个距离指标当 prediction 和 label 一方为空、一方非空时，没有标准解法，可能这是 medpy.metric.binary 报错、MONAI 只抛 warning 不处理的原因。本文选择置 NaN，求平均时忽略不计，但同时记录 NaN 的个数。

example: error from distance metrics

monai 1.3.0

算 assd 时，如果 y_true 为空，会报错：

Traceback (most recent call last):
  File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 268, in <module>
    main(tempdir)
  File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 246, in main
    _res = evaluate(val_outputs, val_labels, 4+1)
  File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 66, in my_evaluate
    res[metr].append(fn(B_pred_c, B_c))
  File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 453, in assd
    assd = numpy.mean( (asd(result, reference, voxelspacing, connectivity), asd(reference, result, voxelspacing, connectivity)) )
  File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 561, in asd
    sds = __surface_distances(result, reference, voxelspacing, connectivity)
  File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 1215, in __surface_distances
    raise RuntimeError('The second supplied array does not contain any binary object.')
RuntimeError: The second supplied array does not contain any binary object.

其它 distance metrics（asd、hd、hd95）只要 prediction 和 label 其中一方为空也都会遇到。好像没有标准解法：

[11] 中 eval_UDA.py 是报错就将 assd 置 1；
MONAI^[12] 只抛 warning 不处理；
[13] 在 prediction、ground-truth 皆空时置零，只其中一方空时置 128（而 128 恰好是 [13] 中设定的 crop patch size，不知与此有无关系）；
[14] 有置零、373.128664、NaN（即忽略）的方案。

Code

（2024.1.18）重构代码，写成类以便调用，后文对拍测试亦相应更新。
（2023.11.23）改了代码，现跟 MONAI^[12] 在 2D 数据对拍，（基本）一致。
distance metrics 的问题仍未解决，似乎没有标准做法，相关讨论见 [14-16]。现在 Y_pred、Y_true 皆空时，学 [13] 置零；而只其中一方为空时，[13] 置 128、[14] 有置 0、373.128664、NaN 的，不知道什么鬼，此处选置 NaN，即忽略。
MONAI 的 distance metrics 默认忽略 background，此处强制忽略；而其它 metrics 就可选忽不忽略。
~~（2023.11.21）asd、assd、hd、hd95 等 distance metrics 有潜在报错，见后文，未完全修复~~。
分割模型应该一般 predict 的都是 (C+1)-way softmax prediction，即 $\hat{Y}$ 。

import packaging.version
import numpy as np
import medpy.metric.binary as mmb

# https://stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool
if packaging.version.parse(np.__version__) >= packaging.version.parse('1.24'):
    if hasattr(np, 'bool_'):
        np.bool = np.bool_
    else:
        np.bool = bool

class SegEvaluator:
    """segmentation evaluation based on medpy.metric.binary
    It will calibrate some results from medpy.metric.binary (see __call__).
    It will also record the number of NaN in distance-based metrics that are
    caused by empty prediction or label.
    Usage:
        ```python
        # 2D image evaluation
        evaluator = SegEvaluator(N_CLASSES, IGNORE_BACKGROUND)
        for images, labels in loader:
            predictions = model(images)         # [bs, c, h, w]
            predictions = predictions.argmax(1) # -> [bs, h, w]
            for pred, lab in zip(predictions, labels):
                evaluator(pred, lab) # eval each image
        print(evaluator.reduce())
        evaluator.reset()
        ```
    """

    METRICS = {
        "dice": mmb.dc,
        "iou": mmb.jc,
        "accuracy": lambda _B1, _B2: (_B1 == _B2).sum() / _B1.size,
        "sensitivity": mmb.sensitivity,
        "specificity": mmb.specificity,
        "hd": mmb.hd,
        "assd": mmb.assd,
        "hd95": mmb.hd95,
        "asd": mmb.asd
    }

    def __init__(self, n_classes, ignore_bg=False, select=[]):
        """
        Input:
            n_classes: int, assuming 0 is background
            ignore_bg: bool, ignore background or not in reduction
            select: List[str], list of name of metrics of interest,
                i.e. will only evaluate on these selected metrics if provided.
                Should be a subset of supported metrics (see METRICS).
        """
        self.n_classes = n_classes
        self.ignore_bg = ignore_bg

        self.metrics = {}
        _no_select = len(select) == 0
        for m, f in SegEvaluator.METRICS.items():
            if _no_select or m in select:
                self.metrics[m] = f

        self.reset()

    def reset(self):
        # records:
        #  - records[metr][c][i] = <metr> score of i-th datum on c-th class, or
        #  - records[metr][c] = # of NaN caused by empty pred/label
        self.records = {}
        for metr in self.metrics:
            # self.records[metr] = [[]] * self.n_classes # wrong
            self.records[metr] = [[] for _ in range(self.n_classes)]
        for metr in ("hd", "assd", "hd95", "asd"):
            if metr in self.metrics:
                self.records[f"empty_gt_{metr}"] = [0] * self.n_classes
                self.records[f"empty_pred_{metr}"] = [0] * self.n_classes

    def __call__(self, Y_pred, Y_true):
        """evaluates 1 prediction
        Input:
            Y_pred: numpy.ndarray, typically [H, W] or [H, W, L], predicted class ID
            Y_true: same as `Y_pred`, label, ground-truth class ID
        """
        for c in range(self.n_classes):
            B_pred_c = (Y_pred == c).astype(np.int64)
            B_c      = (Y_true == c).astype(np.int64)
            pred_l0, pred_inv_l0, gt_l0, gt_inv_l0 = B_pred_c.sum(), (1 - B_pred_c).sum(), B_c.sum(), (1 - B_c).sum()
            for metr, fn in self.metrics.items():
                is_distance_metr = metr in ("hd", "assd", "hd95", "asd")
                if 0 == c and (self.ignore_bg or is_distance_metr):
                    # always ignore bg for distance metrics
                    a = np.nan
                elif 0 == gt_l0 and 0 == pred_l0 and metr in ("dice", "iou", "sensitivity"):
                    a = 1
                elif 0 == gt_inv_l0 and 0 == pred_inv_l0 and "specificity" == metr:
                    a = 1
                elif is_distance_metr and pred_l0 * gt_l0 == 0: # at least one party is all 0
                    if 0 == pred_l0 and 0 == gt_l0: # both are all 0
                        # nips23a&d, xmed-lab/GenericSSL
                        a = 0
                    else: # only one party is all 0
                        a = np.nan
                        if 0 == pred_l0:
                            self.records[f"empty_pred_{metr}"][c] += 1
                        else: # 0 == gt_l0
                            self.records[f"empty_gt_{metr}"][c] += 1
                else: # normal cases or that medpy can solve well
                    # try:
                    a = fn(B_pred_c, B_c)
                    # except:
                    #     a = np.nan

                self.records[metr][c].append(a)

    def reduce(self, prec=4):
        """calculate class-wise & overall average
        Input:
            prec: int, decimal precision
        Output:
            res: dict
                - res[<metr>]: float, overall average
                - res[<metr>_clswise]: List[float], class-wise average of each class
                - res[empty_pred|gt_<metr>]: int, overall #NaN caused by empty pred/label
                - res[empty_pred|gt_<metr>_clswise]: List[int], class-wise #NaN
        """
        res = {}
        for metr in self.records:
            if metr.startswith("empty_"):
                res[metr+"_clswise"] = self.records[metr]
                res[metr] = int(np.sum(self.records[metr]))
            else:
                CxN = np.asarray(self.records[metr], dtype=np.float32)
                nans = np.isnan(CxN)
                CxN[nans] = 0
                not_nans = ~nans

                # class-wise average
                cls_n = not_nans.sum(1) # [c]
                # cls_avg = np.where(cls_n > 0, CxN.sum(1) / cls_n, 0)
                _cls_n_denom = cls_n.copy()
                _cls_n_denom[0 == _cls_n_denom] = 1 # to avoid warning though not necessary
                cls_avg = np.where(cls_n > 0, CxN.sum(1) / _cls_n_denom, 0)
                res[f"{metr}_clswise"] = np.round(cls_avg, prec).tolist()

                # overall average
                ins_cls_n = not_nans.sum(0) # [n]
                # ins_avg = np.where(ins_cls_n > 0, CxN.sum(0) / ins_cls_n, 0)
                _ins_cls_n_denom = ins_cls_n.copy()
                _ins_cls_n_denom[0 == _ins_cls_n_denom] = 1 # to avoid warning though not necessary
                ins_avg = np.where(ins_cls_n > 0, CxN.sum(0) / _ins_cls_n_denom, 0)
                ins_n = (ins_cls_n > 0).sum()
                avg = ins_avg.sum() / ins_n if ins_n > 0 else 0
                res[metr] = float(np.round(avg, prec))

        return res

compare to MONAI

MedPy 0.4.0
monai 1.3.0

2D

加背景共 5 类

import os, os.path as osp
import numpy as np
import medpy.metric.binary as mmb
from collections import defaultdict
import torch
from monai.networks.utils import one_hot
from monai.metrics import DiceMetric, MeanIoU, GeneralizedDiceScore, ConfusionMatrixMetric, HausdorffDistanceMetric, SurfaceDistanceMetric


class SegEvaluator:
    """同上文"""
    pass


print("自制 pred、label，保证非空以对拍 asd、assd、hd、hd95")
nc = 4+1
gen_data_size = [100, 40*(nc-1)]
gen_data_list = []
for i in range(30):
    pred = np.zeros(gen_data_size, dtype=np.uint8)
    label = pred.copy()
    for ic in range(1, nc):
        fill_v = np.concatenate([np.zeros([70*40], dtype=np.uint8), ic * np.ones([30*40], dtype=np.uint8)])
        np.random.shuffle(fill_v)
        pred[:, (ic-1)*40: ic*40] = fill_v.reshape(100, 40)
        if i > 2: # let the 1st 3 be perfect predictions
            np.random.shuffle(fill_v)
        label[:, (ic-1)*40: ic*40] = fill_v.reshape(100, 40)
    gen_data_list.append((pred, label))
print(len(gen_data_list)) # 30


print("MONAI overall")
monai_metrics = {
    "dice": DiceMetric(include_background=True, reduction="mean", get_not_nans=False, ignore_empty=True, num_classes=4+1),
    "iou": MeanIoU(include_background=True, reduction="mean", get_not_nans=False, ignore_empty=True),
    "sensitivity": ConfusionMatrixMetric(include_background=True, metric_name='sensitivity', reduction="mean", compute_sample=True, get_not_nans=False),
    "specificity": ConfusionMatrixMetric(include_background=True, metric_name='specificity', reduction="mean", compute_sample=True, get_not_nans=False),
    "accuracy": ConfusionMatrixMetric(include_background=True, metric_name='accuracy', reduction="mean", get_not_nans=False),
    # 下面 distance metrics 忽略 background
    "hd": HausdorffDistanceMetric(include_background=False, reduction="mean", get_not_nans=False),
    "hd95": HausdorffDistanceMetric(include_background=False, percentile=95, reduction="mean", get_not_nans=False),
    "assd": SurfaceDistanceMetric(include_background=False, symmetric=True, reduction="mean", get_not_nans=False),
    "asd": SurfaceDistanceMetric(include_background=False, symmetric=False, reduction="mean", get_not_nans=False),
}

print("MONAI class-wise")
monai_metrics_cls = {
    "dice": DiceMetric(include_background=True, reduction="mean_batch", get_not_nans=False, ignore_empty=True, num_classes=4+1),
    "iou": MeanIoU(include_background=True, reduction="mean_batch", get_not_nans=False, ignore_empty=True),
    "sensitivity": ConfusionMatrixMetric(include_background=True, metric_name='sensitivity', reduction="mean_batch", compute_sample=True, get_not_nans=False),
    "specificity": ConfusionMatrixMetric(include_background=True, metric_name='specificity', reduction="mean_batch", compute_sample=True, get_not_nans=False),
    "accuracy": ConfusionMatrixMetric(include_background=True, metric_name='accuracy', reduction="mean_batch", get_not_nans=False),
    # 下面 distance metrics 忽略 background
    "hd": HausdorffDistanceMetric(include_background=False, reduction="mean_batch", get_not_nans=False),
    "hd95": HausdorffDistanceMetric(include_background=False, percentile=95, reduction="mean_batch", get_not_nans=False),
    "assd": SurfaceDistanceMetric(include_background=False, symmetric=True, reduction="mean_batch", get_not_nans=False),
    "asd": SurfaceDistanceMetric(include_background=False, symmetric=False, reduction="mean_batch", get_not_nans=False),
}

print("medpy")
seg_eval = SegEvaluator(4+1, False)


print("2D seg，一张张测")
for pred, label in gen_data_list:
    pred = pred.astype(np.int32)
    label = label.astype(np.int32)
    # print(pred.shape, pred.dtype, np.unique(pred), label.shape, label.dtype, np.unique(label))

    seg_eval(pred, label)

    # MONAI 要求 [B, C, H, W]，且有些指标要求 one-hot
    pred = one_hot(torch.from_numpy(pred).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
    label = one_hot(torch.from_numpy(label).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
    for _m in monai_metrics:
        monai_metrics[_m](y_pred=pred, y=label)
        monai_metrics_cls[_m](y_pred=pred, y=label)


print("monai")
for _m in monai_metrics:
    print('\t', _m)
    _agg = monai_metrics[_m].aggregate()#.item()
    _agg_cls = monai_metrics_cls[_m].aggregate()
    # print(type(_agg_cls))
    if isinstance(_agg, list):
        assert len(_agg) == 1
        _agg = _agg[0]
    if isinstance(_agg_cls, list):
        assert len(_agg_cls) == 1
        _agg_cls = _agg_cls[0]
    _agg = round(_agg.item(), 6)
    _agg_cls = [round(_x, 6) for _x in _agg_cls.tolist()]
    # monai_metrics[_m].reset()
    print("class-wise:", _agg_cls)
    print("overall:", _agg)


print("medpy")
medpy_res = {k: v for k, v in seg_eval.reduce(6).items()
	if not k.startswith("empty_")} # 虑掉那些 NaN 记数
for metr in medpy_res:
    if not metr.endswith("_clswise"):
        print('\t', metr)
        print("class-wise:", medpy_res[metr+"_clswise"])
        print("overall:", medpy_res[metr])

输出

monai
	 dice
class-wise: [0.819524, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 iou
class-wise: [0.699342, 0.197282, 0.200425, 0.199778, 0.197985]
overall: 0.298962
	 sensitivity
class-wise: [0.819524, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 specificity
class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
overall: 0.825223
	 accuracy
class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
overall: 0.884495
	 hd
class-wise: [3.283388, 3.363868, 3.308639, 3.316445]
overall: 3.318085
	 hd95
class-wise: [2.012461, 2.004592, 2.012461, 2.012461]
overall: 2.010494
	 assd
class-wise: [0.945859, 0.939, 0.945332, 0.943413]
overall: 0.943401
	 asd
class-wise: [0.943933, 0.938989, 0.944227, 0.941334]
overall: 0.942121


medpy
	 dice
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 iou
class-wise: [0.699342, 0.197282, 0.200425, 0.199778, 0.197985]
overall: 0.298962
	 sensitivity
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 specificity
class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
overall: 0.825223
	 accuracy
class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
overall: 0.884495
	 hd
class-wise: [0.0, 3.283388, 3.363868, 3.308639, 3.316445]
overall: 3.318085
	 hd95
class-wise: [0.0, 2.004592, 2.004592, 2.005379, 2.004592]
overall: 2.004789
	 assd
class-wise: [0.0, 0.945857, 0.939, 0.945332, 0.94341]
overall: 0.9434
	 asd
class-wise: [0.0, 0.943933, 0.938989, 0.944227, 0.941334]
overall: 0.942121

结论：hd95 不同，其它一致。对比 medpy 的实现和 MONAI 的实现，是因为它们取 95% 分位数的方式不同：

medpy 是取两个方向的所有 voxel 的 surface distance 的 95% 分位数；
MONAI 是两个方向的 surface diatance 各取 95%分位数，再取两者最大值。

3D

代码接上节

print("2D 辑成 3D")
gen_data_3d_list = []
for i in range(0, len(gen_data_list), 10):
    pred3d = np.concatenate([x[0][:, :, np.newaxis] for x in gen_data_list[i: i+10]], axis=2)
    label3d = np.concatenate([x[1][:, :, np.newaxis] for x in gen_data_list[i: i+10]], axis=2)
    print(pred3d.shape, label3d.shape) # (100, 160, 10) (100, 160, 10)
    gen_data_3d_list.append((pred3d, label3d))
print(len(gen_data_3d_list)) # 3


print("重置 metrics 记录")
seg_eval.reset()
for _m in monai_metrics:
    monai_metrics[_m].reset()
    monai_metrics_cls[_m].reset()


print("3D，逐 volume 测")
for pred, label in gen_data_3d_list:
    pred = pred.astype(np.int32)
    label = label.astype(np.int32)
    # print(pred.shape, pred.dtype, np.unique(pred), label.shape, label.dtype, np.unique(label))

    seg_eval(pred, label)

    pred = one_hot(torch.from_numpy(pred).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
    label = one_hot(torch.from_numpy(label).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
    for _m in monai_metrics:
        monai_metrics[_m](y_pred=pred, y=label)
        monai_metrics_cls[_m](y_pred=pred, y=label)


print("monai") # 同前面 2D
for _m in monai_metrics:
    print('\t', _m)
    _agg = monai_metrics[_m].aggregate()#.item()
    _agg_cls = monai_metrics_cls[_m].aggregate()
    # print(type(_agg_cls))
    if isinstance(_agg, list):
        assert len(_agg) == 1
        _agg = _agg[0]
    if isinstance(_agg_cls, list):
        assert len(_agg_cls) == 1
        _agg_cls = _agg_cls[0]

    _agg = round(_agg.item(), 6)
    _agg_cls = [round(_x, 6) for _x in _agg_cls.tolist()]
    # monai_metrics[_m].reset()
    print("class-wise:", _agg_cls)
    print("overall:", _agg)


print("medpy")
medpy_res = {k: v for k, v in seg_eval.reduce(6).items()
	if not k.startswith("empty_")} # 虑掉那些 NaN 记数
for metr in medpy_res:
    if not metr.endswith("_clswise"):
        print('\t', metr)
        print("class-wise:", medpy_res[metr+"_clswise"])
        print("overall:", medpy_res[metr])

输出

monai
	 dice
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 iou
class-wise: [0.695224, 0.165041, 0.168134, 0.168165, 0.165894]
overall: 0.272492
	 sensitivity
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 specificity
class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
overall: 0.825223
	 accuracy
class-wise: [0.711237, 0.927554, 0.928058, 0.927958, 0.927667]
overall: 0.884495
	 hd
class-wise: [2.307209, 2.44949, 2.575802, 2.44949]
overall: 2.445498
	 hd95
class-wise: [1.414214, 1.414214, 1.414214, 1.414214]
overall: 1.414214
	 assd
class-wise: [0.815341, 0.80984, 0.813648, 0.815224]
overall: 0.813514
	 asd
class-wise: [0.812973, 0.809779, 0.813435, 0.814375]
overall: 0.812641


medpy
	 dice
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 iou
class-wise: [0.695224, 0.165041, 0.168134, 0.168165, 0.165894]
overall: 0.272492
	 sensitivity
class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
overall: 0.38638
	 specificity
class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
overall: 0.825223
	 accuracy
class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
overall: 0.884495
	 hd
class-wise: [0.0, 2.307209, 2.44949, 2.575802, 2.44949]
overall: 2.445498
	 hd95
class-wise: [0.0, 1.414213, 1.414213, 1.414213, 1.414213]
overall: 1.414214
	 assd
class-wise: [0.0, 0.815341, 0.80984, 0.813648, 0.815224]
overall: 0.813514
	 asd
class-wise: [0.0, 0.812973, 0.809779, 0.813435, 0.814375]
overall: 0.812641

结论：