sklearn.metrics.roc_auc报错ValueError: unknown format is not supported

最新推荐文章于 2023-11-02 22:36:35 发布

Never_Jiao

最新推荐文章于 2023-11-02 22:36:35 发布

阅读量2.1k

点赞数 5

分类专栏： python&pytorch python以及pytorch踩过的坑文章标签： sklearn python unknown数据类型

本文链接：https://blog.csdn.net/Acmer_future_victor/article/details/128216074

版权

python&pytorch 同时被 2 个专栏收录

49 篇文章 1 订阅

订阅专栏

python以及pytorch踩过的坑

3 篇文章 0 订阅

订阅专栏

好记性不如烂笔头，之前踩的坑，不记下来，还是会掉进去爬不出来。

在使用sklearn.metrics.roc_auc绘制roc曲线时，报错ValueError: unknown format is not supported。查了好多资料，发现是输入数据的type为unknown导致的。不过这里的type不是用type函数打印的，而是sklearn.utils.multiclass.type_of_target。具体看下面代码。

import torch
import torch.nn as nn
import os
import numpy as np
import SimpleITK as sitk
from iantsen_dataset import HecktorDataset
from torch.utils.data import DataLoader
from DMCTNet_noGC import DMCTNet_noGC
from scipy import interp
from pathlib import Path
import matplotlib.pyplot as plt
from itertools import cycle
import transforms
from sklearn.metrics import roc_curve, auc, f1_score, precision_recall_curve, average_precision_score
from sklearn.utils.multiclass import type_of_target

os.environ['CUDA_VISIBLE_DEVICES'] = "0"

resample_data_path = r''  # 测试集路径
test_weights_path = r''  # 训练好的模型参数路径
num_class = 1  # 类别数量  
gpu = "cuda:0"

def test(model, test_path):
    # 加载测试集和预训练模型参数

    path_to_imgs = Path(resample_data_path)
    # patients只有一个CHUP015
    patients = [p for p in os.listdir(path_to_imgs) if os.path.isdir(path_to_imgs / p)]

    test_paths = []
    for p in patients:
        path_to_ct = path_to_imgs / p / (p + '_ct.nii.gz')
        path_to_pt = path_to_imgs / p / (p + '_pt.nii.gz')
        path_to_gtvt = path_to_imgs / p / (p + '_gtvt.nii.gz')
        test_paths.append((path_to_ct, path_to_pt))

    val_transforms = transforms.Compose([
        transforms.NormalizeIntensity(),
        transforms.ToTensor(mode='test')
    ])
    output_transform = transforms.Compose([
        transforms.transform_back(mode='test')
    ])

    test_set = HecktorDataset(test_paths, transforms=val_transforms, mode='test')
    test_loader = DataLoader(test_set, batch_size=1, shuffle=False)
    
    model = torch.nn.DataParallel(model).cuda()
    checkpoint = torch.load(test_path)
    model.load_state_dict(checkpoint, strict=True)
    model.eval()

    # 上面部分就是加载数据，加载模型

    score_list = []  # 存储预测得分
    label_list = []  # 存储真实标签
    for sample in test_loader:
        inputs = sample['input'].cuda()
        outputs = model(inputs)
	# score_array shape:[1, 1, 144,144, 72]
    score_array = outputs.detach().cpu().numpy()
    label_itk = sitk.ReadImage(str(path_to_gtvt))
    # label_array shape:[144, 144, 72]
    label_array = sitk.GetArrayFromImage(label_itk)
    score_array = np.squeeze(score_array)
    # label_array type: <class 'numpy.ndarray'>
    # score_array type: <class 'numpy.ndarray'>
    print('label_array type:', type(label_array))
    print("score_array type:", type(score_array))  
    # type of score: unknown
    # type of label: unknown
    print('type of score:', type_of_target(score_array))
    print('type of label:', type_of_target(label_array))
    
    score_array = score_array.reshape(-1, 1)
    label_array = label_array.reshape(-1, 1)
    # score_array shape: (2985984, 1)
    # label_array shape: (2985984, 1)
    print('score_array shape:', score_array.shape)
    print('label_array shape:', label_array.shape)
    
	# label_array type: <class 'numpy.ndarray'>
    # score_array type: <class 'numpy.ndarray'>
    # type of score: continuous
    # type of label: binary
    print('label_array type:', type(label_array))
    print("score_array type:", type(score_array))  
    print('type of score:', type_of_target(score_array))
    print('type of label:', type_of_target(label_array))

由于我的需求是基于已经训练好的模型，绘制测试集上的ROC曲线。以一个样本为例，首先加载测试集，加载训练模型，利用训练模型生成预测结果outputs。这个过程是在gpu上实现的，此时的数据类型是tensor。利用cpu()和numpy()函数将数据转移到cpu上，并转换成numpy格式score_array。此时，score_array的大小为1×1×144×144×144。然后加载label，label的大小为144×144×72。roc_curve的使用需要保证score_array和label_array大小一致，因此利用numpy.squeeze()将socre_array大小变为144×144×72。此时直接使用roc_curve函数会报上述错误。我们分别使用type和type_of_target打印score_array和labe_array的数据类型。可以看到type输出结果为numpy.ndarray，而type_of_target输出结果为unknown。

查了一些资料，方法大多是将输出转换为numpy或者list格式，还有就是使用astype函数将结果设置为int类型。这些方法都尝试过了，对我来说，没用。然后去看官方文档，看到说score_array和label_array的大小是样本数。心里有点儿疑惑，144×144×72的shape满足样本数的格式要求吗？然后想着，死马当做活马医，试一下吧，利用reshape函数将shape转换成了n×1的格式，再次打印type_of_target，居然变成了continuous和binary。本来不抱希望的，没想到居然有用，可见不管自己觉得多离谱，想到了就得试试。

代码还没改完就来记下这个坑，以后再也不能掉进来了。博客仅是记录以防后续踩坑，如有不严谨或者分析不对的地方，欢迎大家批评指正。（我要滚去改代码了。。。。）