nnUNet在2d数据集上训练测试一体教程（绕过五折交叉验证

最新推荐文章于 2025-03-17 21:29:53 发布

DewNose

最新推荐文章于 2025-03-17 21:29:53 发布

阅读量4.8k

点赞数 12

分类专栏：笔记文章标签：计算机视觉深度学习 python pytorch 图像处理

本文链接：https://blog.csdn.net/only_ctrl/article/details/124775303

版权

笔记专栏收录该内容

3 篇文章

订阅专栏

nnUNet在2d数据集上训练测试一体教程（绕过五折交叉验证）

前言

最近在MICCAI的Brats2019的2d数据集上用nnunet做了对比实验，于是就想记录一下整个的流程以及踩的坑（算力不够就想办法绕过了五折交叉验证直接训练），照着做就可以得到预期的结果。

一、环境配置

这里参考
https://blog.csdn.net/weixin_41693877/article/details/121333947
的博客配置环境

1.创建虚拟环境

conda create -n nnUNet python=3.8
source activate nnUnet

2.创建目录

随后创建一个nnUNetFrame文件夹（叫什么都行，之后的nnunet东西都放里边），在终端中cd进入这个文件夹。

3.git nnUNet

git clone https://github.com/MIC-DKFZ/nnUNet.git

通过该命令clone nnUNet的代码文件，随后在nnUNetFrame中就有了一个名为nnUNet的目录，所有的操作都将在其中进行。

随后进入该目录，并安装其所需库

cd nnUNet
pip install -e .
注意加.

4.创建数据目录

请完全按照下图方式创建所有目录

nnUNetFrame:

nnUNet中是代码文件，我们把数据都放在DATASET中

DATASET:

在这里插入图片描述

nnUNet_raw：

在这里插入图片描述

nnUNet_raw_data:

在这里插入图片描述
这里边存放你的任务，请按照Taskxxx_xxxxx的格式命名，其中编号最好以100+命名（为了防止与可能会发生的预训练任务冲突）

Taskxxx_xxxx:

在这里插入图片描述
先创建上面5个文件夹，dataset.json后边会讲如何制作
五个文件夹从上至下分别是训练集图片，测试集图片，测试集预测结果（用来测指标），训练集真值，测试集真值。

二、数据准备

1.2D图像转为3D

nnUNet的数据要求为三维的.nii.gz文件，对于我们已有的切片，不论是图像格式还是npy格式，读取后通过numpy在0轴增加维度后通过sitk保存为.nii.gz格式即可

对于具有多模态的数据（如脑肿瘤四个模态），将四个模态分别保存，命名为i_000j.nii.gz，其中i为文件名，j为模态，保存至上述五个文件中，对应的数据和真值要拥有相同的文件名。如下图所示

imageTr

在这里插入图片描述

labelTr

在这里插入图片描述

json制作，请按照如下格式制作，尤其是训练集和测试集的文件名字典序列要制作正确。

在这里插入图片描述
小坑：对于过大的数据集，json文件pycharm等编译器打不开，用txt打开去复制等操作即可。

json文件制作及数据处理代码示例（原始数据为.npy）

import os
import numpy as np
import SimpleITK as sitk
import tqdm
import json

# nnUNet的真值要求为连续的[0, 1, 2, 3....]不可间断
# 目标路径和原始数据路径
# 注意：如果源数据是测试集，请按照指示更改change_and_json函数中的下标
direct_dir = "../nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task100_MICCAIre/"
source_data = "../unet2d-Brats/val_data/"
# 改为你的路径
dir_list = ["imagesTr", "labelsTr", "imagesTs", "labelsTs"]
class_list = ["Image", "Mask"]
# class_list是原始数据的分类，分别是图像和真值

# 划分模态
def flaris_split(img):
    imgs = []
    for i in range(img.shape[2]):
        imgs.append(np.expand_dims(img[:, :, i], 0))
    return imgs


def change_and_json():
    all_result = []
    img_files = os.listdir(source_data + class_list[0] + "/")
    for i in tqdm.tqdm(range(len(img_files))):

        img = np.load(source_data + class_list[0] + "/" + img_files[i])
        mask = np.load(source_data + class_list[1] + "/" + img_files[i])
        mask[mask == 4] = 3  # 4 to 3
        # print(img.shape)
        imgs = flaris_split(img)
        mask = np.expand_dims(mask, 0)
        
        for j in range(len(imgs)):
        	 
            img_nii = sitk.GetImageFromArray(imgs[j])
            # 这里的下标如果是测试集就是2，训练集就是0
            sitk.WriteImage(img_nii, direct_dir + dir_list[2] + "/" + f"{i}_000{j}.nii.gz")
            # sitk.WriteImage(img_nii, f"{i}_000{j}.nii.gz")
            # np.save(direct_dir + dir_list[0] + "/" + f"{i}")
            # 这里的下标如果是测试集就是2，训练集就是0
            img_dir = "./" + dir_list[2] + "/" + f"{i}.nii.gz"
            # 这里的下标如果是测试集就是3，训练集就是1
            mask_dir = "./" + dir_list[3] + "/" + f"{i}.nii.gz"
            # 这里的下标如果是测试集就是3，训练集就是1
            result = {"image": img_dir, "label": mask_dir}
            all_result.append(result)
        mask_nii = sitk.GetImageFromArray(mask)
        sitk.WriteImage(mask_nii, direct_dir + dir_list[3] + "/" + f"{i}.nii.gz")

    print(all_result)
    b = json.dumps(all_result)
    f2 = open('result.json', 'w')
    f2.write(b)
    f2.close()

if __name__ == '__main__':
    change_and_json()

三、训练前准备及数据预处理

在终端中输入（仍为nnUNet目录下）来配置环境变量

export nnUNet_raw_data_base=“…/DATASET/nnUNet_raw”
export nnUNet_preprocessed=“…/DATASET/nnUNet_preprocessed”
export RESULTS_FOLDER=“…/DATASET/nnUNet_trained_models”

数据预处理：

在处理之前需要对代码文件进行修改，具体为nnunet/preprocessing/sanity_checks.py文件中的verify_dataset_integrity函数，具体位置为：

改动1：如果你的数据含有测试集真值，那么需要将expected_test_identifiers行中的i后边加入[‘image’]，如上图所示，如果数据测试集没有真值则不需要上述改动。

改动2：在改动1下一行加入

expected_train_identifiers = np.unique(expected_train_identifiers)
    expected_test_identifiers = np.unique(expected_test_identifiers)
    print('train num', len(expected_train_identifiers))
    print('test num:', len(expected_test_identifiers))

随后在终端中输入下述命令进行数据预处理
nnUNet_plan_and_preprocess -t 100 --verify_dataset_integrity
100是你的任务编号

三、训练

若要进行五折交叉验证，则可按照https://blog.csdn.net/weixin_41693877/article/details/121333947
中的方式进行训练

这里采用非五折交叉验证的方式，在nnunet的github readme文件中有提及，即

CUDA_VISIBLE_DEVICES=0 nnUNet_train 2d nnUNetTrainerV2 Task101_MICCAIadd all --npz

CUDA是用于指定GPU卡号，默认为0

训练时打开DATASET中的nnUNet_trained_models目录中一系列目录下的当前任务的目录，可以看到
在这里插入图片描述
all中即为我们训练的模型，若不含模型则需要检查上述操作是否出错

四、测试

因为没有采用五折交叉验证，所以无法通过nnUNet中的选择模型命令得到测试命令，我们可以自行输入，首先需要更改一些文件中的参数，

在nnunet/inference/目录下的predict.py和predict_simple文件中，通过Ctrl f来找到包含model_final_checkpoint参数的函数，两文件中各有一个，把model_final_checkpoint改为model_best即可

随后在终端中输入命令：
nnUNet_predict -i …/DATASET/nnUNet_raw/nnUNet_raw_data/Task100_MICCAIre/imagesTs/ -o …/DATASET/nnUNet_raw/nnUNet_raw_data/Task100_MICCAIre/imagesTsPred/ -m 2d -t Task100_MICCAIre -f all
即可完成测试，输出结果保存在-o所示路径中。

-i为需要预测的数据路径，-o为输出路径，-m为2d（模型类别），-t为任务名，-f为选择的模型，因为这里我们选用的是直接训练所以是all

评估

因为输出为.nii.gz文件，所以若要进行评估则需要转为numpy类型，下面提供了一个示例，可根据需要自行调整。

import os
import numpy as np
import SimpleITK as sitk
import tqdm
import torch.utils.data
from glob import glob
from hausdorff import hausdorff_distance
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device = "cuda" if torch.cuda.is_available() else "cpu"


def dice_coef(output, target):
    smooth = 1e-5
    if torch.is_tensor(output):
        output = output.data.cpu().numpy()
    if torch.is_tensor(target):
        target = target.data.cpu().numpy()
    #output = torch.sigmoid(output).view(-1).data.cpu().numpy()
    #target = target.view(-1).data.cpu().numpy()

    intersection = (output * target).sum()

    return (2. * intersection + smooth) / \
        (output.sum() + target.sum() + smooth)

infer_path = "/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task100_MICCAIre/imagesTsPred/"  # 推理结果地址
label_path = "/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task100_MICCAIre/labelsTs/"  # 测试集label地址

# 因为是脑肿瘤数据，所以这里将三个标签0，1，2转为wt、tc、et三个区域
def wt_tc_et_make(npmask):
    WT_Label = npmask.copy()
    WT_Label[npmask == 1] = 1
    WT_Label[npmask == 2] = 1
    WT_Label[npmask == 3] = 1
    TC_Label = npmask.copy()
    TC_Label[npmask == 1] = 1
    TC_Label[npmask == 2] = 0
    TC_Label[npmask == 3] = 1
    ET_Label = npmask.copy()
    ET_Label[npmask == 1] = 0
    ET_Label[npmask == 2] = 0
    ET_Label[npmask == 3] = 1
    # nplabel = np.empty((240, 240, 3))#之前切成160 现在临时改成240
    # nplabel = np.empty((160, 160, 3))
    nplabel = np.empty((npmask.shape[0], npmask.shape[1], 3))
    nplabel[:, :, 0] = WT_Label
    nplabel[:, :, 1] = TC_Label
    nplabel[:, :, 2] = ET_Label
    nplabel = nplabel.transpose((2, 0, 1))
    del npmask
    return nplabel

def visit_data():
    wt_dices = []
    tc_dices = []
    et_dices = []
    dices = [wt_dices, tc_dices, et_dices]

    wt_hd = []
    tc_hd = []
    et_hd = []
    hds = [wt_hd, tc_hd, et_hd]
    image_list = os.listdir(label_path)
    for i in tqdm.tqdm(range(len(image_list))):
        pred_nii = sitk.ReadImage(infer_path + image_list[i], sitk.sitkUInt8)
        pred_arr = sitk.GetArrayFromImage(pred_nii)[0, :, :]
        pred_cu = np.array(pred_arr)
        pred = wt_tc_et_make(pred_cu)

        mask_nii = sitk.ReadImage(label_path + image_list[i], sitk.sitkUInt8)
        mask_arr = sitk.GetArrayFromImage(mask_nii)[0, :, :]
        mask_cu = np.array(mask_arr)
        mask = wt_tc_et_make(mask_cu)
        for j in range(3):
            dice = dice_coef(pred[j, :, :], mask[j, :, :])
            hd = hausdorff_distance(pred[j, :, :].get(), mask[j, :, :].get())
            dices[j].append(dice)
            hds[j].append(hd)
        del pred_nii, pred, pred_cu, pred_arr, mask_nii, mask, mask_cu, mask_arr
    dices = np.array(dices)
    hds = np.array(hds)

    print(f"wt dice is {np.mean(dices[0, :])}")
    print(f"tc dice is {np.mean(dices[1, :])}")
    print(f"et dice is {np.mean(dices[2, :])}")
    print(f"wt hd is {np.mean(hds[0, :])}")
    print(f"tc hd is {np.mean(hds[1, :])}")
    print(f"et hd is {np.mean(hds[2, :])}")

if __name__ == '__main__':
    visit_data()