【nnUNet V2系列】nnUNet V2在Ubuntu下安装调试篇-CSDN博客

本文链接：https://blog.csdn.net/qqbb1987/article/details/140553160

安装之前网上很多教程，很多是nnUNet V1的安装过程，有的V1和V2混在一起讲解，导致V1的转化指令用到V2中，产生不少误解。这篇是针对V2整理出来的安装过程，有什么不妥之处请指出会及时修改。

1. 创建虚拟环境

conda create -n nnUNetV2 python=3.9
2. 激活虚拟环境

conda activate nnUNetV2
3. 输入nvidia-smi查询CUDA的版本号（如服务器是CUDA12.1），然后选择不超过此CUDA的torch版本（CUDA11.8）进行安装：

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

安装成功如下：

测试：

python

import torch

print(torch.__version__) 输出：版本号

print(print(torch.cuda.is_available()) 输出为：True

print(torch.backends.cudnn.version()) 输出

4. 最后，参考官方文档安装nnUNet V2程序：

git clone https://github.com/MIC-DKFZ/nnUNet.git
cd nnUNet
pip install -e .
注意：别丢掉小圆点。
【可选项】如果下载不方便，这里提供下载好的压缩包，直接解压包【提取码:x2FK】从压缩包里安装：
cd nnUNet 
pip install -e .
【可选项】安装hiddenlayer，让nnU-net可以生成网络拓扑图，指令如下（用不到可以不装）：
pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git

5.修改环境变量，可以在.bashrc(XXX：自己的用户名)中最下面添加：

export nnUNet_raw="/home/xxx/nnUNetV2-master/DATASET/nnUNet_raw"
export nnUNet_preprocessed="/home/xxx/nnUNetV2-master/DATASET/nnUNet_preprocessed"
export nnUNet_results="/home/xxx/nnUNetV2-master/DATASET/nnUNet_trained_models"

然后source .bashrc。这里直接修改的程序不会影响环境或者影响V1版本，修改nnunetv2文件下的paths.py程序：

6.需要下载官网数据，登录Medical Segmentation Decathlon的谷歌网盘。

7.由于nnUNet V1跟nnUNet V2版有一些差别，需要进行数据转换：

nnUNetv2_convert_MSD_dataset -i /home/XXX/nnUNetV2-master/DATASET/nnUNet_raw/Task04_Hippocampus -overwrite_id 004

文件名由Task04_Hippocampus变成Dataset004_Hippocampus，另外就是dataset.json的变化，主要是V2不再需要记录样本名字和标签的记录方式发生了变化：

别V2和V1操作别搞混了，有的博客上写用nnUNet_convert_decathlon_task，这个是V1版本用的；然后，要想给V2用还需要在经过一次转换：

nnUNetv2_convert_old_nnUNet_dataset Task004_Hippocampus Dataset004_Hippocampus

8. 对样本进行预处理:

模版：nnUNetv2_plan_and_preprocess -d DATASET_ID --verify_dataset_integrity

nnUNetv2_plan_and_preprocess -d 004 --verify_dataset_integrity
注：--verify_dataset_integrity首次需要校验之后就不需要。

预处理的结果：

发现DATASET的子目录nnUNet_preprocessed生成预处理的文件：

9. 训练nnUNet模型:

模版：nnUNetv2_train DATASET_NAME_OR_ID UNET_CONFIGURATION FOLD --val --npz

nnUNetv2_train 4 3d_fullres 0

DATASET_NAME_OR_ID：指定应在哪个数据集上进行训练；

UNET_CONFIGURATION：是一个字符串，用于标识请求的 U-Net 配置（默认值：2d、3d_fullres、3d_lowres、3d_cascade_lowres）；

FOLD：指定训练 5 折交叉验证的哪个折叠。

注：nnU-Net 每 50 个周期存储一个模型权重checkpoint。如果需要继续之前的训练，只需在训练命令中添加 --c 即可。

这里为了做测试大概跑了100epochs，默认训练周期为1000次，可以搜一下nnUNetTrainer.py中改为self.num_epochs = 100。

插播一下棘手的问题：

刚开始训练遇到一个错误，困扰了一天：

1）

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

解决：

重新安装一下cuda，在pip install -e .的时候，系统有时会重新安装torch导致它从gpu版变成CPU版，先测试一下torch.is_available是否为True，如果不是建议重新装一下torch：

conda install pytorch torchvision torchaudio pytorch-cuda=xx.x -c pytorch -c nvidia （具体你的cuda版本根据自己系统而定）

确认torch正常之后，可以用如下命令训练：

nnUNet_n_proc_DA=0 CUDA_VISIBLE_DEVICES=0 nnUNetv2_train 4 3d_fullres 0 -device cuda

2）

/tmp/tmpzj43o3qe/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpzj43o3qe/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmprdfpoq6h/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpzj43o3qe/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmprdfpoq6h/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmprdfpoq6h/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmprdfpoq6h/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmprdfpoq6h/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpdlxf7jri/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpdlxf7jri/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpdlxf7jri/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpdlxf7jri/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpdlxf7jri/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpb2y0zzge/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpb2y0zzge/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpb2y0zzge/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpb2y0zzge/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpb2y0zzge/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpgv8r5a3m/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpgv8r5a3m/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpgv8r5a3m/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpgv8r5a3m/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpgv8r5a3m/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpatkm0ctm/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpatkm0ctm/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {
   ^
/tmp/tmpatkm0ctm/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpatkm0ctm/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpatkm0ctm/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (Py_ssize_t i = 0; i < len; i++) {

解决方案：

可以先试试这个方案：

pip install triton==2.1.0

如果问题依旧，再修改如下：

主要是由于服务器的编译器默认使用了较旧的C标准，不支持在for循环中声明变量。

从/home/XXX/anaconda3/envs/nnUNetV2/lib/python3.9/site-packages/triton/common/找到build.py修改如下：