nnunet(一) expected epoch times

最新推荐文章于 2024-05-10 04:47:30 发布

shchojj

最新推荐文章于 2024-05-10 04:47:30 发布

阅读量1.4k

点赞数 2

分类专栏： segmentation

原文链接：https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/expected_epoch_times.md

版权

segmentation 专栏收录该内容

32 篇文章 15 订阅

订阅专栏

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/expected_epoch_times.md

Introduction

推荐使用32G内存、6核12线程CPU、2080ti GPU、SSD存储数据

Benchmark Details

2d、3d_fullres、3d_fullres_large。
Task002_Heart、Task005_Prostate 和Task003_Liver数据集能够较好的反应spectrum of dataset properties。
nnUNetTrainerV2_5epochs做为trainer，不进行验证，值运行5个epochs，列出最快的时间做为epoch time
nnUNetTrainerV2_5epochs_dummyLoad的trainer不载入数据，会自己模拟数据，绕过CPU(数据扩增)和I/O瓶颈。
所有训练都使用混合精度模型。

How to run the benchmark

找到预处理数据保存的文件夹和对应task的plans文件，如：/home/fabian/data/nnUNet_preprocessed/Task002_Heart

创建3d_fullres_large配置，注意这里需要GPUs显存在16G以上。

from batchgenerators.utilities.file_and_folder_operations import *
plans = load_pickle('nnUNetPlansv2.1_plans_3D.pkl')
stage = max(plans['plans_per_stage'].keys())
plans['plans_per_stage'][stage]['batch_size'] *= 3
save_pickle(plans, 'nnUNetPlansv2.1_bs3x_plans_3D.pkl')

运行benchmarks，每条命令大概需要几分钟。可以在输出窗或者RESULTS_FOLDER对应的log日志文件中找到epoch time。trainers运行5个epochs，然后选择最快的那个座位benchmark time。

nnUNet_train 2d nnUNetTrainerV2_5epochs TASKID 0
nnUNet_train 3d_fullres nnUNetTrainerV2_5epochs TASKID 0
nnUNet_train 3d_fullres nnUNetTrainerV2_5epochs_dummyLoad TASKID 0
nnUNet_train 3d_fullres nnUNetTrainerV2_5epochs TASKID 0 -p nnUNetPlansv2.1_bs3x # optional, only for GPUs with more than 16GB of VRAM

Results

作者的epoch time

V100 32GB SXM3 (DGX2) 350W	V100 32GB SXM2 300W	V100 32GB PCIe 250W	Titan RTX 24GB 280W	RTX 2080 ti 11GB 250W	Titan Xp 12GB 250W
Task002_Heart 2d	65.63	69.07	73.22	82.27	99.39	183.71
Task003_Liver 2d	71.80	73.44	78.63	86.11	103.89	187.30
Task005_Prostate 2d	69.68	70.07	76.85	88.04	106.97	187.38
Task002_Heart 3d_fullres	156.13	166.32	177.91	142.74	174.60	499.65
Task003_Liver 3d_fullres	137.08	144.83	157.05	114.78	146.90	500.74
Task005_Prostate 3d_fullres	119.82	126.20	135.72	106.01	135.08	463.21
Task002_Heart 3d_fullres dummy	153.41	160.44	172.28	136.90	163.52	497.51
Task003_Liver 3d_fullres dummy	135.63	139.76	147.33	110.61	146.37	495.55
Task005_Prostate 3d_fullres dummy	115.65	121.48	130.71	102.03	129.16	464.14
Task002_Heart 3d_fullres large	317.63	338.79	349.91	371.94	OOM	OOM
Task003_Liver 3d_fullres large	271.54	285.41	295.42	324.74	OOM	OOM
Task005_Prostate 3d_fullres large	280.30	296.37	304.16	289.22	OOM	OOM

Troubleshooting

确保cuDNN安装正确，且支持混合精度。测试pytorch是不是安装正确。输出是8002以上就可以了。
```
python -c 'import torch;print(torch.backends.cudnn.version())'
```

Identifying the bottleneck

nvidia-smi查看GPU，watch -n 0.1表示每隔0.1s刷新一次，GPU利用率稳定在90%~100%，表示充分利用了GPU资源。电压基本稳定在峰值237W / 250 W。
htop显示CPU的使用情况，nnunet使用12个线程做数据扩增，以及1个主线程，索引应该同时存在13条主线程。

1、htop 
2、watch -n 0.1 nvidia-smi

GPU bottleneck

如果nvidia-smi显示利用率稳定在90%~100%，且电压稳定接近峰值，说明GPU充分利用起来了。

CPU bottleneck

htop显示nnunet大概关联10个左右线程。
nvidia-smi中显示GPU利用率偶尔到0.

如果训练数据的模态越多，需要的线程越多，大多数数据集需要12线程以上，如果数据集模态在4种以上，就需要将nnUNet_n_proc_DA设置12以上的线程数。
如果CPU没有12线程，可能需要将nnUNet_n_proc_DA设置在12以下了。
CPU升级。

I/O bottleneck

可以通过LED反应。

nvidia-smi没有显示GPU跳跃到0
htop显示CPU占用线程比较少。
I/O LED闪屏或者卡着不动。

换SSD直连。
SATA SSD 只能喂饱1~2个GPUs。更多的GPUs可能需要升级nvme驱动，确保是PCIe接口。

shchojj

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
nnunet(一) expected epoch times

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/common_problems_and_solutions.mdIntroduction推荐使用32G内存、6核12线程CPU、2080ti GPU、SSD存储数据Benchmark Details2d、3d_fullres、3d_fullres_large。 Task002_Heart、Task005_Prostate和Task003_Liver数据集能够较好的...
复制链接

扫一扫

专栏目录