- SimpleITKIO: .nii.gz, .nrrd, .mha
- Tiff3DIO: .tif, .tiff. 3D tif images! Since TIF does not have a standardized way of storing spacing information,
nnU-Net expects each TIF file to be accompanied by an identically named .json file that contains three numbers
(no units, no comma. Just separated by whitespace), one for each dimension.
The file extension lists are not exhaustive and depend on what the backend supports. For example, nibabel and SimpleITK
support more than the three given here. The file endings given here are just the ones we tested!
IMPORTANT: nnU-Net can only be used with file formats that use lossless (or no) compression! Because the file
format is defined for an entire dataset (and not separately for images and segmentations, this could be a todo for
the future), we must ensure that there are no compression artifacts that destroy the segmentation maps. So no .jpg and
the likes!
Dataset folder structure
Datasets must be located in the nnUNet_raw
folder (which you either define when installing nnU-Net or export/set every
time you intend to run nnU-Net commands!).
Each segmentation dataset is stored as a separate ‘Dataset’. Datasets are associated with a dataset ID, a three digit
integer, and a dataset name (which you can freely choose): For example, Dataset005_Prostate has ‘Prostate’ as dataset name and
the dataset id is 5. Datasets are stored in the nnUNet_raw
folder like this:
nnUNet_raw/
├── Dataset001_BrainTumour
├── Dataset002_Heart
├── Dataset003_Liver
├── Dataset004_Hippocampus
├── Dataset005_Prostate
├── ...
Within each dataset folder, the following structure is expected:
Dataset001_BrainTumour/
├── dataset.json
├── imagesTr
├── imagesTs # optional
└── labelsTr
When adding your custom dataset, take a look at the dataset_conversion folder and
pick an id that is not already taken. IDs 001-010 are for the Medical Segmentation Decathlon.
- imagesTr contains the images belonging to the training cases. nnU-Net will perform pipeline configuration, training with
cross-validation, as well as finding postprocessing and the best ensemble using this data. - imagesTs (optional) contains the images that belong to the test cases. nnU-Net does not use them! This could just
be a convenient location for you to store these images. Remnant of the Medical Segmentation Decathlon folder structure. - labelsTr contains the images with the ground truth segmentation maps for the training cases.
- dataset.json contains metadata of the dataset.
The scheme introduced above results in the following folder structure. Given
is an example for the first Dataset of the MSD: BrainTumour. This dataset hat four input channels: FLAIR (0000),
T1w (0001), T1gd (0002) and T2w (0003). Note that the imagesTs folder is optional and does not have to be present.
nnUNet_raw/Dataset001_BrainTumour/
├── dataset.json
├── imagesTr
│ ├── BRATS_001_0000.nii.gz
│ ├── BRATS_001_0001.nii.gz
│ ├── BRATS_001_0002.nii.gz
│ ├── BRATS_001_0003.nii.gz
│ ├── BRATS_002_0000.nii.gz
│ ├── BRATS_002_0001.nii.gz
│ ├── BRATS_002_0002.nii.gz
│ ├── BRATS_002_0003.nii.gz
│ ├── ...
├── imagesTs
│ ├── BRATS_485_0000.nii.gz
│ ├── BRATS_485_0001.nii.gz
│ ├── BRATS_485_0002.nii.gz
│ ├── BRATS_485_0003.nii.gz
│ ├── BRATS_486_0000.nii.gz
│ ├── BRATS_486_0001.nii.gz
│ ├── BRATS_486_0002.nii.gz
│ ├── BRATS_486_0003.nii.gz
│ ├── ...
└── labelsTr
├── BRATS_001.nii.gz
├── BRATS_002.nii.gz
├── ...
Here is another example of the second dataset of the MSD, which has only one input channel:
nnUNet_raw/Dataset002_Heart/
├── dataset.json
├── imagesTr
│ ├── la_003_0000.nii.gz
│ ├── la_004_0000.nii.gz
│ ├── ...
├── imagesTs
│ ├── la_001_0000.nii.gz
│ ├── la_002_0000.nii.gz
│ ├── ...
└── labelsTr
├── la_003.nii.gz
├── la_004.nii.gz
├── ...
Remember: For each training case, all images must have the same geometry to ensure that their pixel arrays are aligned. Also
make sure that all your data is co-registered!
See also dataset format inference!!
dataset.json
The dataset.json contains metadata that nnU-Net needs for training. We have greatly reduced the number of required
fields since version 1!
Here is what the dataset.json should look like at the example of the Dataset005_Prostate from the MSD:
{
"channel_names": { # formerly modalities
"0": "T2",
"1": "ADC"
},
"labels": { # THIS IS DIFFERENT NOW!
"background": 0,
"PZ": 1,
"TZ": 2
},
"numTraining": 32,
"file_ending": ".nii.gz"
"overwrite_image_reader_writer": "SimpleITKIO" # optional! If not provided nnU-Net will automatically determine the ReaderWriter
}
The channel_names determine the normalization used by nnU-Net. If a channel is marked as ‘CT’, then a global
normalization based on the intensities in the foreground pixels will be used. If it is something else, per-channel
z-scoring will be used. Refer to the methods section in our paper
for more details. nnU-Net