SE, SP, PC, F1, JS, DC医学图像分割训练的时候全为0？医学图像分割的评价指标

鱼小丸

已于 2023-12-14 20:41:11 修改

阅读量1.9k

点赞数 11

文章标签：深度学习计算机视觉人工智能

于 2023-02-26 15:34:42 首次发布

本文链接：https://blog.csdn.net/goodenough5/article/details/129226719

版权

文章目录

出现的问题
解决方法
一、基础知识
二、评价指标
参考

出现的问题

验证是一个深度图像分割模型是一个基本的任务，它评价了模型的好坏，以便于更好改进。
自己模型训练时候评价指标出现的问题，使用别人的代码https://github.com/LeeJunHyun/Image_Segmentation，改成自己的数据集之后错误频出，终于能开始训练，SE, SP, PC, F1, JS, DC这些指标全为0。整了1天。
还有一种可能，可能训练失败，Loss为NAN，然后其他指标也是0。我用的Focal Loss，把初始学习率调小一点。

解决方法

最新发现自己的数据集不是严格的二值，但是像素集中在0和255，为什么用他的evaluation不行，待研究。就是用我修改后的代码是可以的。

import cv2
import numpy as np
import matplotlib.pyplot as plt
path = '../tg3k/thyroid-mask/'
img = cv2.imread(path+'0001.jpg',-1)  
hist = cv2.calcHist([img], [0], None, [256], [0, 256])

在这里插入图片描述

最后发现自己的mask是灰度图，不是二值图像。转变成tensor，进行二值化即可。不必修改原作者的代码。😂

分割线————————————————————————————————————————

刚开始的时候，在考虑是不是模型崩塌了，输出的全是黑的，所以这些值全为0。看着Loss一点一点减少，acc增加，以及准确率明显大于只输出为黑色mask的情况，排除这个可能。
第二个我试了试GT，输出出来它是单通道，但是评价指标里面GT变成bool量，我觉得是二值化，他是用的 GT == torch.max(GT)。我测试一下，在我的数据集上输出来全是0。可能这个模型的标签是二值图像可以这样，我的图片还需要自己二值化。然后我就把它二值化了。

GT =GT/torch.max(GT) > threshold

训练了一次还是全为0？？？？
我看着代码没问题。后来我输出TN，TP，FN，FP，发现他们也是全为0。这就奇怪了。
最后，我发现问题出现在这个代码上。

TP = ((SR==1)+(GT==1))==2

我在jupyter notebook里面跑了一下，发现这个不管SR和GT是什么它跑出来全是False。我才发现就是SR，GT相加也是bool值0-1，不会出现2。就只能用逻辑运算替换了。或者把这个bool量转换成float或者其他类型（我没试过）。我用的逻辑运算替换。

import torch


# SR : Segmentation Result
# GT : Ground Truth

def get_accuracy(SR, GT, threshold=0.5):
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold
    corr = (SR == GT).sum().item()
    tensor_size = SR.size(0) * SR.size(1)* SR.size(2) * SR.size(3)
    acc = float(corr) / float(tensor_size)

    return acc


def get_sensitivity(SR, GT, threshold=0.5):
    # Sensitivity == Recall
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold

    # TP : True Positive
    # FN : False Negative
    TP = ((SR == 1) & (GT == 1)).sum().item()
    FN = ((SR == 0) & (GT == 1)).sum().item()
    FP = ((SR == 1) & (GT == 0)).sum().item()
    TN = ((SR == 0) & (GT == 0)).sum().item()
    # TN = ((SR==0)*(GT==0))
    # FP = ((SR==1)*(GT==0))
    SE = float(TP) / (float(TP + FN)+ 1e-6)
    # print('TP:'+str(TP))
    # print('FN:' + str(FN))
    # print('FP:' + str(FP))
    # print('TN:' + str(TN))
    return SE


def get_specificity(SR, GT, threshold=0.5):
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold

    # TN : True Negative
    # FP : False Positive
    TN = ((SR == 0) & (GT == 0)).sum().item()
    # TP = ((SR==1)*(GT==1))
    FP = ((SR == 1) & (GT == 0)).sum().item()
    # FN = ((SR == 0) *(GT == 1))
    SP = float(TN) / (float(TN + FP) + 1e-6)

    return SP


def get_precision(SR, GT, threshold=0.5):
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold

    # TP : True Positive
    # FP : False Positive
    TP = ((SR == 1) & (GT == 1)).sum().item()
    FP = ((SR == 1) & (GT == 0)).sum().item()

    PC = float(TP) / (float(TP + FP) + 1e-6)

    return PC



def get_F1(SR, GT, threshold=0.5):
    # Sensitivity == Recall
    SE = get_sensitivity(SR, GT, threshold=threshold)
    PC = get_precision(SR, GT, threshold=threshold)

    F1 = 2 * SE * PC / (SE + PC + 1e-6)

    return F1


def get_JS(SR, GT, threshold=0.5):
    # JS : Jaccard similarity
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold

    Inter =( SR & GT).sum().item()
    # print(Inter)
    Union =((SR | GT)).sum().item()
    # print(torch.sum(SR))
    # print(torch.sum(GT))
    # print(Union)

    JS = float(Inter) / (float(Union) + 1e-6)

    return JS


def get_DC(SR, GT, threshold=0.5):
    # DC : Dice Coefficient
    SR = SR > threshold
    GT = GT / torch.max(GT) > threshold

    Inter =(SR & GT).sum().item()
    Union = ((SR | GT)).sum().item()
    DC = float(2 * Inter) / (float(Union+Inter) + 1e-6)

    return DC

一、基础知识

在这里插入图片描述

二、评价指标

1.Precision(精确率)

精确率，也称为真预测评估，是真阳性的数量与真阳性和假阳性计数之和之间的比率。真阳性表示正类的正确发生，而假阳性描述其不正确发生。它是衡量预测成功的一个有用的度量。更具体地说，它表明哪一部分积极的发现是真实的。该度量值越高，表明模型返回的结果越精确。因此，精度值越高，说明该架构在提供的数据上训练得越好。

$\begin{equation*} \text {Precision }(\text {PR}) = \frac {\text {TP}}{\text {TP} + \text {FP}}\end{equation*}$

def get_precision(SR,GT,threshold=0.5):
    SR = SR > threshold
    GT = GT == torch.max(GT)

    # TP : True Positive
    # FP : False Positive
    TP = ((SR==1)+(GT==1))==2
    FP = ((SR==1)+(GT==0))==2

    PC = float(torch.sum(TP))/(float(torch.sum(TP+FP)) + 1e-6)

    return PC

2.Recall(召回率)

召回率又称敏感度，是真阳性个数与真阳性和假阴性个数之和的比值。其中，真阳性表示正类的正确发生，而假阴性则表示负类的错误预测。更明确地说，它表明正确找到了实际正数的哪一部分。与精度一样，召回率也是衡量预测成功的一个有用的指标，特别是当类不平衡时。该度量的较高值表明模型主要返回所有正确标记的积极结果。因此，召回率的值越高，训练出的架构越好。

$\begin{equation*} \text {Recall }(\text {RE}) = \frac {\text {TP}}{\text {TP} + \text {FN}}\end{equation*}$

def get_sensitivity(SR,GT,threshold=0.5):
    # Sensitivity == Recall
    SR = SR > threshold
    GT = GT == torch.max(GT)

    # TP : True Positive
    # FN : False Negative
    TP = ((SR==1)+(GT==1))==2
    FN = ((SR==0)+(GT==1))==2
    print("%.4f",torch.sum(TP))
    print("%d", torch.sum(FN))
    SE = float(torch.sum(TP))/(float(torch.sum(TP+FN)) + 1e-6)     
    
    return SE

3. F-Measure

它是验证模型准确性所需的度量指标之一。它以调和均值的形式将精确率和召回率统一在一起。使用调和平均数而不是简单平均数的目的是为了惩罚极端值。该指标有助于在精确率和召回率之间取得平衡。得到的分值在1时达到最好，在0时达到最差。因此，导致F-测量接近1的模型被认为是最优的，这意味着存在较低的假阳性和假阴性。该度量适用于二进制以及多类分类和分割问题。
$\begin{equation*} \text {F-Measure} = 2 \times \frac {\text {PR} \times \text {RE}}{\text {PR}+\text {RE}}\end{equation*}$

def get_F1(SR,GT,threshold=0.5):
    # Sensitivity == Recall
    SE = get_sensitivity(SR,GT,threshold=threshold)
    PC = get_precision(SR,GT,threshold=threshold)

    F1 = 2*SE*PC/(SE+PC + 1e-6)

    return F1

4. AUC

曲线下面积( area under the curve，AUC )是用于分类和分割问题的聚合性能度量指标。采用受试者工作特征( ROC )曲线计算。它从坐标[ 0、0 ]到[ 1 , 1]测量ROC曲线下方的二维空间，即曲线在西北方向的程度。取值范围为0 ~ 1。AUC值越大说明结果越好。此外，AUC是尺度和分类阈值不变的。该指标使用ROC曲线进行评估，该曲线取决于假阳性率( FPR )和真阳性率( TPR )。FPR为ROC图的横轴，TPR为ROC图的纵轴。
$\begin{align*} \text {TPR}=&\frac {\text {TP}}{\text {TP}+\text {FN}} \\ \text {FPR}=&\frac {\text {FP}}{\text {FP}+\text {TN}}\end{align*}$

5. IoU

交集联合（IoU）是一种统计验证工具，也称为 Jaccard 指数。它是细分中最常使用的测量方法之一。IoU 是一种非常成功的简单测量方法，通常用于评估分段架构。交集联合定义为地面真实与预测之间的重叠面积（AoO）比率，由地面真实与预测之间的面积并集（AoU）预测[129]。此评估指标范围介于 0 和 1 之间，其中 0 表示没有重叠，1 表示完全重叠。在数学上，可以在（11）中查看。
$\begin{equation*} \text {IoU}=\frac {\text {AoO}}{\text {AoU}}=\frac {\text {TP}}{\text {TP}+\text {FP}+\text {FN}}\end{equation*}$

def get_JS(SR,GT,threshold=0.5):
    # JS : Jaccard similarity
    SR = SR > threshold
    GT = GT == torch.max(GT)
    
    Inter = torch.sum((SR+GT)==2)
    Union = torch.sum((SR+GT)>=1)
    
    JS = float(Inter)/(float(Union) + 1e-6)
    
    return JS

6. Dice coefficient

骰子系数（DC）是分割应用中常用的度量标准，与 IoU 基本相同。它定义为两个图像中总像素数重叠区域的两倍。与 IoU 类似，它的范围也从 0 到 1。值 1 描述了预测值与基本事实之间的最高相似性 [130]。因此，它找到了两个数据样本之间的相似性。在数学上，骰子系数可以表示为（12）所示。
$\begin{equation*} \text {DC} = \frac {2 \times \text {TP}}{(\text {TP}+\text {FP})+(\text {TP}+\text {FN})}\end{equation*}$

def get_DC(SR,GT,threshold=0.5):
    # DC : Dice Coefficient
    SR = SR > threshold
    GT = GT == torch.max(GT)

    Inter = torch.sum((SR+GT)==2)
    DC = float(2*Inter)/(float(torch.sum(SR)+torch.sum(GT)) + 1e-6)

    return DC