nnU-Net论文详详详详读：原文+翻译＋笔记

大臣不想在月亮上上热搜

于 2024-10-10 15:31:56 发布

阅读量5k

点赞数 53

分类专栏：论文解读文章标签：计算机视觉人工智能图像处理笔记

本文链接：https://blog.csdn.net/weixin_46046293/article/details/142810845

版权

论文解读专栏收录该内容

1 篇文章

订阅专栏

最近了解医学十项全能挑战赛的时候，读了这篇文章，做一个详细笔记帮助大家了解，并且方便后续翻看，以后也会持续在这里记录自己的一些工作。[开心]

论文下载地址：https://arxiv.org/pdf/1809.10486

代码地址：GitHub - MIC-DKFZ/nnUNet

标题：nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

翻译：nnU-Net：基于 U-Net 的医学图像分割的自适应框架

作者：

Fabian Isensee, Jens Petersen, Andre Klein, David Zimmerer, Paul F. Jaeger,
Simon Kohl, Jakob Wasserthal, Gregor K¨ohler, Tobias Norajitra, Sebastian
Wirkert, and Klaus H. Maier-Hein
Division of Medical Image Computing, German Cancer Research Center (DKFZ),
Heidelberg, Germany

翻译：德国癌症研究中心医学图像计算部中心 (DKFZ), 海德堡, 德国

摘要：The U-Net was presented in 2015. With its straight-forward and successful architecture it quickly evolved to a commonly used benchmark in medical image segmentation. The adaptation of the U-Net to novel problems, however, comprises several degrees of freedom regarding the exact architecture, pre-processing, training and inference. These choices are not independent of each other and substantially impact the overall performance. The present paper introduces the nnU-Net (”no-new-Net”), which refers to a robust and self-adapting framework on the basis of 2D and 3D vanilla U-Nets. We argue the strong case for taking away superfluous bells and whistles of many proposed network designs and instead focus on the remaining aspects that make out the performance and generalizability of a method. We evaluate the nnU-Net in the context of the Medical Segmentation Decathlon challenge, which measures segmentation performance in ten disciplines comprising distinct entities, image modalities, image geometries and dataset sizes, with no manual adjustments between datasets allowed. At the time of manuscript submission, nnU-Net achieves the highest mean dice scores across all classes and seven phase 1 tasks (except class 1 in BrainTumour) in the online leaderboard of the challenge.

翻译：U-Net于2015年提出，凭借其简单且成功的架构，迅速演变为医学图像分割中常用的基准模型。然而，将U-Net应用于新问题时，需要在多个方面做出调整，包括架构设计、预处理、训练和推理。这些选择并非彼此独立，并且对整体性能产生重大影响。本文介绍了nnU-Net（"no-new-Net"），这是一个基于2D和3D基础U-Net的稳健且自适应的框架。我们强烈主张去除许多网络设计中的冗余特性，专注于真正影响性能和泛化能力的关键方面。我们在医学分割十项全能挑战赛（Medical Segmentation Decathlon challenge）中评估了nnU-Net，该挑战赛测试了十个不同领域的分割性能，涵盖了不同的实体、图像模态、图像几何和数据集规模，且在数据集之间不允许进行手动调整。在提交时，nnU-Net在挑战赛的在线排行榜上，在所有类别和七个阶段1任务中（除BrainTumour的类1外）取得了最高的平均Dice分数。

笔记：这篇论文介绍了nnU-Net框架，这是一种基于2D和3D U-Net的稳健自适应框架，旨在简化网络设计，专注于影响性能和泛化能力的关键因素。nnU-Net在医学分割十项全能挑战赛中展示了卓越的表现，在多个任务和类别中取得了最高分数，验证了其强大的适应性和性能。

关键字：Semantic Segmentation, Medical Imaging, U-Net

翻译：语义分割，医学成像，U-Net

笔记：语义分割和实例分割：

语义分割：将图像或视频中的每个像素或子像素赋予相应的语义类别标签，从而实现对图像的深入理解。
实例分割：实例分割不仅要区分不同类别的物体，还要区分同一类别的不同实例，并为每个实例分配一个唯一的标签。

例如在一张包含多个人的照片中，语义分割会将所有人都标记为“人”这一类别，而实例分割则会为每个人分配一个唯一的标签，从而区分开来。

1 Introduction：

①：Medical Image Segmentation is currently dominated by deep convolutional neural networks (CNNs). However, each segmentation benchmark seems to require specialized architectures and training scheme modifications to achieve competitive performance [1,2,3,4,5]. This results in huge amounts of publications in the field that, alongside often limited validation on only few or even just a single dataset, make it increasingly difficult for researchers to identify methods that live up to their promised superiority beyond the limited scenarios they are demonstrated on. The Medical Segmentation Decathlon is intended to specifically address this issue: participants in this challenge are asked to create a segmentation algorithm that generalizes across 10 datasets corresponding to different entities of the human body. These algorithms may dynamically adapt to the specifics of a particular dataset, but are only allowed to do so in a fully automatic manner. The challenge is split into two successive phases: 1) a development phase in which participants are given access to 7 datasets to optimize their approach on and, using their final and thus frozen method, must submit segmentations for the corresponding 7 held-out test sets. 2) a second phase to evaluate the same exact method on 3 previously undisclosed datasets.

翻译：医学图像分割目前由深度卷积神经网络（CNNs）主导。然而，每个分割基准似乎都需要专门的架构和训练方案的修改，以实现竞争力的性能。这导致该领域出现大量文献，然而往往只在少数甚至单一数据集上进行有限的验证，使研究人员越来越难以识别那些在其展示的有限场景之外仍然具备优越性的算法。医学分割十项全能挑战赛旨在专门解决这一问题：参与者需创建一种分割算法，能够在对应于人体不同实体的10个数据集上进行泛化。这些算法可以动态适应特定数据集的特性，但只能以完全自动的方式进行调整。挑战分为两个连续的阶段：1）开发阶段，参与者可以访问7个数据集以优化他们的方法，并必须使用最终固定的方法提交相应的7个保留测试集的分割结果；2）第二阶段，评估同一方法在3个先前未公开的数据集上的表现。

笔记：医学分割十项全能挑战赛为参与者提供了一个平台，以开发能够在多个数据集上泛化的自动化分割算法。

②：We hypothesize that some of the architectural modifications presented recently are in part overfitted to specific problems or could suffer from imperfect validation that results from sub-optimal reimplementations of the state-of-the-art. Using the U-Net as a benchmark on an in-house dataset, for example, requires the adaptation of the method to the novel problem. This spans several degrees of freedom. Even though the architecture itself is quite straight-forward, and even though the method is quite commonly used as a benchmark, we believe that the remaining interdependent choices regarding the exact architecture, preprocessing, training, inference and post-processing quite often cause the U-Net to underperform when used as a benchmark. Additionally, architectural tweaks that are intended to improve the performance of a network can rather easily be demonstrated to work if the network is not yet fully optimized for the task at hand, allowing for plenty of headroom for the tweak to improve results. In our own preliminary experiments, these tweaks however were unable to improve segmentation results in fully optimized networks and thus most likely unable to advance the state of the art. This leads us to believe that the influence of non-architectural aspects in segmentation methods is much more impactful, but at the same time also severely underestimated.

翻译：我们假设，最近提出的一些架构修改部分上是对特定问题的过拟合，或可能由于次优的重实现导致验证不完善，从而未能达到最优效果。例如，使用U-Net在内部数据集上作为基准时，需要将该方法调整以适应新的问题，这涉及多个自由度。尽管U-Net的架构本身相当简单，也经常被用作基准，但我们认为，关于确切架构、预处理、训练、推理和后处理的相互依赖选择，往往导致U-Net在作为基准使用时表现不佳。此外，旨在提高网络性能的架构调整，如果网络尚未为任务完全优化，很容易显示出改进效果，因为在这种情况下还有很大的改进空间。然而，在我们自己的初步实验中，这些调整并未能在完全优化的网络中提高分割结果，因此极有可能未能推动技术的进步。由此我们认为，分割方法中非架构方面的影响远比想象中大，但同时也被严重低估了。

笔记：尽管U-Net常用作基准，但其表现受架构、预处理和训练等多种相互影响因素制约。作者指出，在未优化的网络上，架构调整容易显现改进效果，但在优化的网络中可能无效，从而认为非架构因素的影响被严重低估。

③：In this paper, we present the nnU-Net (”no-new-Net”) framework. It resides on a set of three comparatively simple U-Net models that contain only minor modifications to the original U-Net [6]. We omit recently proposed extensions such as for example the use of residual connections [7,8], dense connections [5] or attention mechanisms [4]. The nnU-Net automatically adapts its architectures to the given image geometry. More importantly though, the nnU-Net framework thoroughly defines all the other steps around them. These are steps where much of the nets’ performance can be gained or respectively lost: preprocessing (e.g.resampling and normalization), training (e.g. loss, optimizer setting and data augmentation), inference (e.g. patch-based strategy and ensembling across test-time augmentations and models) and a potential post-processing (e.g. enforcing single connected components if applicable).

翻译：本文提出了nnU-Net（“no-new-Net”）框架。它基于三个相对简单的U-Net模型，且仅对原始U-Net进行了一些小的修改。我们省略了最近提出的扩展，例如残差连接、密集连接或注意力机制。nnU-Net能够自动调整其架构以适应给定的图像几何结构。更重要的是，nnU-Net框架详细定义了围绕这些模型的其他所有步骤，而这些步骤对网络的性能提升或下降起到了关键作用：预处理（如重采样和归一化）、训练（如损失函数、优化器设置和数据增强）、推理（如基于块的策略以及通过测试时增强和模型集成）以及可能的后处理（如在适用时强制单一连通组件）。

笔记：nnU-Net框架它基于简化的U-Net模型，并省略了复杂扩展。nnU-Net不仅能自动调整架构，还详尽定义了预处理、训练、推理和后处理等关键步骤，并且强调这些步骤对性能的影响。

2 Methods

2.1 Network architectures

①：Medical images commonly encompass a third dimension, which is why we consider a pool of basic U-Net architectures consisting of a 2D U-Net, a 3D U-Net and a U-Net Cascade. While the 2D and 3D U-Nets generate segmentations at full resolution, the cascade first generates low resolution segmentations andsubsequently refines them. Our architectural modifications as compared to the U-Net’s original formulation are close to negligible and instead we focus our efforts on designing an automatic training pipeline for these models.

The U-Net [6] is a successful encoder-decoder network that has received a lot of attention in the recent years. Its encoder part works similarly to a traditional classification CNN in that it successively aggregates semantic information at the expense of reduced spatial information. Since in segmentation, both semantic as well as spatial information are crucial for the success of a network, the missing spatial information must somehow be recovered. The U-Net does this through the decoder, which receives semantic information from the bottom of the ’U’ and recombines it with higher resolution feature maps obtained directly from the encoder through skip connections. Unlike other segmentation networks, such as FCN [9] and previous iterations of DeepLab [10] this allows the U-Net to segment fine structures particularly well.

Just like the original U-Net, we use two plain convolutional layers between poolings in the encoder and transposed convolution operations in the decoder. We deviate from the original architecture in that we replace ReLU activation functions with leaky ReLUs (neg. slope 1e−2) and use instance normalization [11] instead of the more popular batch normalization [12].

翻译：医学图像通常包含第三个维度，因此我们考虑了一组基本的U-Net架构，包括2D U-Net、3D U-Net和U-Net Cascade。2D和3D U-Net以全分辨率生成分割结果，而Cascade首先生成低分辨率的分割结果，然后再进行精细化。与原始U-Net的架构相比，我们的修改几乎可以忽略不计，反而更专注于为这些模型设计一个自动训练管道。

U-Net是一种成功的编码器-解码器网络，近年来受到广泛关注。其编码器部分类似于传统的分类CNN，逐步聚合语义信息，但牺牲了空间信息。由于在分割中，语义和空间信息对网络的成功至关重要，因此必须以某种方式恢复缺失的空间信息。U-Net通过解码器实现这一点，解码器接收来自“U”底部的语义信息，并将其与通过跳跃连接直接从编码器获得的高分辨率特征图重新组合。与其他分割网络（如FCN和DeepLab的早期版本）不同，这使得U-Net能够特别好地分割细小结构。

与原始U-Net类似，我们在编码器的池化层之间和解码器的反卷积操作中使用两个普通卷积层。我们在原始架构上有所不同的是，将ReLU激活函数替换为Leaky ReLU（负斜率1e−2），并使用实例归一化而不是更流行的批归一化。

笔记：相较于原始架构，本文对U-Net的修改较小，主要在于采用了Leaky ReLU激活函数和实例归一化，同时设计了自动化的训练流程。

a、批归一化（BN）和实例归一化（IN）：

Batch Normalization (BN)：在训练过程中，使用整个批次的数据进行标准化，适用于图像分类和其他任务，能够加速收敛并提高模型稳定性。
Instance Normalization (IN)：针对单个样本进行标准化，通常用于风格迁移和生成模型，强调图像中的局部特征。

b、ReLU激活函数和Leaky ReLU激活函数：

ReLU激活函数： $f(x)=max(0,x)$ ，优势在于计算简单，但它在负值区域输出为零，可能导致“神经元死亡”。
Leaky ReLU激活函数： $f(x)=max(\alpha x,x)$ ，Leaky ReLU允许小的负值输出， $\alpha$ 通常为一个小常数，论文中为1e−2。这种修改避免了ReLU在负值区域输出恒为零的问题，减少“神经元死亡”的情况。

② 2D U-Net：Intuitively, using a 2D U-Net in the context of 3D medical image segmentation appears to be suboptimal because valuable information along the z-axis cannot be aggregated and taken into consideration. However, there is evidence [13] that conventional 3D segmentation methods deteriorate in performance if the dataset is anisotropic (cf. Prostate dataset of the Decathlon challenge).

翻译：2D U-Net：直观上，在3D医学图像分割中使用2D U-Net似乎效果不佳，因为它无法汇总并考虑z轴上的有价值信息。然而，有证据表明，当数据集是各向异性的情况下，传统的3D分割方法的性能会下降（参见医学分割十项全能挑战赛中的前列腺数据集）。

笔记：2D U-Net虽然无法利用3D图像的全部信息，但在各向异性数据集上表现较好，传统3D分割方法在这些情况下可能会性能下降。z轴上有价值的信息指的是空间信息。

③ 3D U-Net： A 3D U-Net seems like the appropriate method of choice for 3D image data. In an ideal world, we would train such an architecture on the entire patient’s image. In reality however, we are limited by the amount of available GPU memory which allows us to train this architecture only on image patches. While this is not a problem for datasets comprised of smaller images (in terms of number of voxels per patient) such as the Brain Tumour, Hippocampus and Prostate datasets of this challenge, patch-based training, as dictated by datasets with large images such as Liver, may impede training. This is due to the limited field of view of the architecture which thus cannot collect sufficient contextual information to e.g. correctly distinguish parts of a liver from parts of other organs.

翻译：3D U-Net：3D U-Net似乎是3D图像数据的合适选择。在理想的情况下，我们会在整个患者的图像上训练这种架构。然而，实际上我们受到可用GPU内存的限制，只能在图像块上进行训练。对于像脑肿瘤、海马和前列腺等小体素数据集而言，这并不是什么问题，但对于像肝脏这样大体积的图像数据集而言，基于图像块的训练可能会影响训练效果。这是因为该架构的视野有限，无法收集足够的上下文信息，例如正确区分肝脏的部分与其他器官的部分。

笔记：3D U-Net在3D数据上表现良好，但GPU内存限制使得只能使用图像块进行训练，这对大尺寸图像数据集（如肝脏数据）造成挑战，因为其有限的视野不足以捕捉足够的上下文信息。

④ U-Net Cascade： To address this practical shortcoming of a 3D U-Net on datasets with large image sizes, we additionally propose a cascaded model. Therefore, a 3D U-Net is first trained on downsampled images (stage 1). The segmentation results of this U-Net are then upsampled to the original voxel spacing and passed as additional (one hot encoded) input channels to a second 3D U-Net,which is trained on patches at full resolution (stage 2). See Figure 1.

翻译：U-Net级联：为了克服3D U-Net在大图像数据集上的实际不足，我们提出了一个级联模型。首先在降采样的图像上训练3D U-Net（阶段1）。然后，将这个U-Net的分割结果上采样到原始体素间距，并作为附加的（独热编码的）输入通道传递给第二个3D U-Net，第二阶段在全分辨率的图像块上进行训练（阶段2）。参见图1。

笔记：级联模型，先在降采样图像上训练U-Net，再将分割结果传递给第二个U-Net以在全分辨率下进行细化训练。

Fig.1. U-Net Cascade (on applicable datasets only). Stage 1 (left): a 3D U-Net processes downsampled data, the resulting segmentation maps are upsampled to the original resolution. Stage 2 (right): these segmentations are concatenated as one-hot encodings to the full resolution data and refined by a second 3D U-Net.

翻译：U-Net级联（仅适用于适用的数据集）。第 1 阶段（左）：3D U-Net 处理下采样数据，将生成的分割图上采样到原始分辨率。第 2 阶段（右）：这些分割作为 one-hot 编码连接到全分辨率数据，并由第二个 3D U-Net 进行细化。

⑤Dynamic adaptation of network topologies： Due to the large differences in image size (median shape 482 × 512 × 512 for Liver vs. 36 × 50 × 35 for Hippocampus) the input patch size and number of pooling operations per axis (and thus implicitly the number of convolutional layers) must be automatically adapted for each dataset to allow for adequate aggregation of spatial information. Apart from adapting to the image geometries, there are technical constraints like the available memory to account for. Our guiding principle in this respect is to dynamically trade off the batch-size versus the network capacity, presented in detail below:

We start out with network configurations that we know to be working with our hardware setup. For the 2D U-Net this configuration is an input patch size of 256×256, a batch size of 42 and 30 feature maps in the highest layers (number of feature maps doubles with each downsampling). We automatically adapt these parameters to the median plane size of each dataset (where we use the plane with the lowest in-plane spacing, corresponding to the highest resolution), so that the network effectively trains on entire slices. We configure the networks to pool along each axis until the feature map size for that axis is smaller than 8 (but not more than a maximum of 6 pooling operations). Just like the 2D U-Net, our 3D U-Net uses 30 feature maps at the highest resolution layers. Here we start with a base configuration of input patch size 128 × 128 × 128, and a batch size of 2. Due to memory constraints, we do not increase the input patch volume beyond 1283 voxels, but instead match the aspect ratio of the input patch size to that of the median size of the dataset in voxels. If the median shape of the dataset is smaller than 1283 then we use the median shape as input patch size and increase the batch size (so that the total number of voxels processed is the same as with 128 × 128 × 128 and a batch size of 2). Just like for the 2D U-Net we pool (for a maximum of 5 times) along each axis until the feature maps have size 8.

For any network we limit the total number of voxels processed per optimizer step (defined as the input patch volume times the batch size) to a maximum of 5% of the dataset. For cases in excess, we reduce the batch size (with a lower-bound of 2).

All network topologies generated for the phase 1 datasets are presented in table 2.1.

翻译：网络拓扑的动态自适应：由于图像尺寸差异较大（肝脏数据集的中位形状为482 × 512 × 512，而海马数据集为36 × 50 × 35），输入的图像块大小以及每个轴的池化操作次数（从而隐含地决定了卷积层的数量）必须针对每个数据集自动调整，以便能够充分汇聚空间信息。除了适应图像几何形状之外，还必须考虑一些技术限制，如可用内存。我们在这方面的指导原则是，动态权衡批量大小与网络容量，具体如下所述：

我们从已知能够与我们硬件配置兼容的网络配置开始。对于2D U-Net，这一配置是输入图像块大小为256×256，批量大小为42，最高层的特征图数量为30（特征图数量随着每次下采样加倍）。我们自动将这些参数调整为每个数据集的中位平面大小（使用平面内间距最小的那个平面，通常对应于分辨率最高的平面），以便网络能有效地在整个切片上进行训练。我们将网络配置为沿每个轴进行池化，直到该轴上的特征图大小小于8（但不超过最多6次池化操作）。与2D U-Net类似，我们的3D U-Net在最高分辨率层中使用30个特征图。我们从一个基础配置开始，输入图像块大小为128 × 128 × 128，批量大小为2。由于内存限制，我们不将输入图像块体积增大到超过128^3体素，而是将输入图像块大小的纵横比匹配到数据集中体素的中位尺寸。如果数据集的中位形状小于128^3体素，则我们使用中位形状作为输入图像块大小，并增大批量大小（以便处理的总体素数量与128 × 128 × 128图像块和批量大小为2时相同）。与2D U-Net一样，我们沿每个轴进行池化（最多5次），直到特征图大小为8。

对于任何网络，我们将每次优化器步骤处理的总体素数（定义为输入图像块体积乘以批量大小）限制为最多占数据集的5%。如果超出此范围，我们会减小批量大小（下限为2）。

表2.1展示了为第一阶段数据集生成的所有网络拓扑。

笔记：网络拓扑动态调整

动态自适应网络拓扑是为了应对数据集之间的巨大尺寸差异。
初始配置从硬件可兼容的网络配置出发，根据具体数据集的中位尺寸进行调整。
2D U-Net和3D U-Net使用不同的输入图像块大小，但都会根据数据集的中位形状进行动态调整。
批量大小与输入图像块体积被权衡，确保每次处理的体素总量不超过数据集的5%。
内存约束导致批量大小需要在某些情况下进行缩减，确保网络的可操作性。

Table 1. Network topologies as automatically generated for the seven phase 1 tasks of the Medical Segmentation Decathlon challenge. 3D U-Net lowres refers to the first stage of the U-Net Cascade. The configuration of the second stage of the U-Net Cascade is identical to the 3D U-Net.

翻译：为医疗细分十项全能挑战赛的七个第一阶段任务自动生成的网络拓扑。3D U-Net lowres 是指 U-Net 级联的第一阶段。U-Net 级联第二级的配置与 3D U-Net 相同。

2.2 Preprocessing

①：The preprocessing is part of the fully automated segmentation pipeline that our method consists of and, as such, the steps presented below are carried out without any user intervention.

翻译：预处理是我们的方法所组成的全自动分割管道的一部分，因此，下面介绍的步骤是在没有任何用户干预的情况下执行的。

② Cropping：All data is cropped to the region of nonzero values. This has no effect on most datasets such as liver CT, but will reduce the size (and therefore the computational burden) of skull stripped brain MRI.

翻译：裁剪：所有数据都被裁剪到非零值的区域。这对大多数数据集（如肝脏CT）没有影响，但会减小已经去除颅骨的脑部MRI数据的大小（因此也减少了计算负担）。

③ Resampling：CNNs do not natively understand voxel spacings. In medical images, it is common for different scanners or different acquisition protocols to result in datasets with heterogeneous voxel spacings. To enable our networks to properly learn spatial semantics, all patients are resampled to the medianvoxel spacing of their respective dataset, where third order spline interpolation is used for image data and nearest neighbor interpolation for the corresponding segmentation mask.

Necessity for the U-Net Cascade is determined by the following heuristics: If the median shape of the resampled data has more than 4 times the voxels
that can be processed as input patch by the 3D U-Net (with a batch size of 2), it qualifies for the U-Net Cascade and this dataset is additionally resampled to a lower resolution. This is done by increasing the voxel spacing (decrease resolution) by a factor of 2 until the above mentioned criterion is met. If the dataset is anisotropic, the higher resolution axes are first downsampled until they match the low resolution axis/axes and only then all axes are downsampled simultaneously. The following datasets of phase 1 fall within the set of described heuristics and hence trigger usage of the U-Net Cascade: Heart, Liver, Lung, and Pancreas.

翻译：重采样：CNN并不能原生理解体素间距。在医学图像中，不同的扫描仪或获取协议常常会导致数据集中体素间距的不一致。为了使网络能够正确学习空间语义，所有患者的数据都会重采样到各自数据集的中位体素间距，其中图像数据使用三次样条插值，而相应的分割掩码使用最近邻插值。

使用U-Net级联的必要性是通过以下启发式规则确定的：如果重采样后数据的中位形状包含的体素数量超过3D U-Net（批量大小为2）能够处理的输入图像块体素数量的四倍，则该数据集符合使用U-Net级联的条件，并且该数据集将被额外重采样到较低分辨率。这是通过将体素间距增大（即降低分辨率）2倍的方式来实现，直到符合上述标准为止。如果数据集是各向异性的，则首先对高分辨率轴进行降采样，直到其与低分辨率轴匹配，然后同时对所有轴进行降采样。第一阶段中的以下数据集符合这些启发式规则，因此触发了U-Net级联的使用：心脏、肝脏、肺和胰腺。

笔记：由于医学图像中的体素间距差异，所有患者的数据都重采样到各自数据集的中位体素间距，以确保网络能够正确学习空间语义。对于超大体素数据集，使用U-Net级联来进一步减少分辨率，以便能适应网络处理限制。心脏、肝脏、肺和胰腺数据集符合使用级联网络的条件。

a、最近邻插值和三阶样条插值：

最近邻插值： $f(x)=f(x_{nearest})$ ，其中 $x_{nearest}$ 是距离目标点 $x$ 最近的已知数据点的坐标， $f(x_{nearest})$ 是该已知点的值。最近邻插值通过选择距离目标插值点最近的已知点的值来完成插值，是一种简单的插值方式，其速度快，简单直观，但是容易产生伪影块，精度不高。
三阶样条插值：三阶样条插值是一种分段多项式插值方法。每两个相邻的数据点之间用一个三次多项式来连接，这些多项式的系数是通过保证曲线在每个插值点处连续且光滑来计算的。其基本思想是在每个区间内使用三次多项式函数来进行插值： $S_i(x) = a_i (x - x_i)^3 + b_i (x - x_i)^2 + c_i (x - x_i) + d_i$ ，其中 $a_i,b_i,c_i,d_i$ 是需要确定的系数， $S_i(x)$ 在 $[x_i,x_{i+1}]$ 区间上定义。插值需要满足连续性条件。

④ Normalization：Because the intensity scale of CT scans is absolute, all CT images are automatically normalized based on statistics of the entire respective dataset: If the modality description in a dataset’s corresponding json desccriptor file indicates ‘ct’, all intensity values occurring within the segmentation masks of the training dataset are collected and the entire dataset is normalized by clipping to the [0.5, 99.5] percentiles of these intensity values, followed by a z-score normalization based on the mean and standard deviation of all collected intensity values. For MRI or other image modalities (i.e. if no ‘ct’ string is found in the modality), simple z-score normalization is applied to the patient individually.

If cropping reduces the average size of patients in a dataset (in voxels) by 1/4 or more the normalization is carried out only within the mask of nonzero elements and all values outside the mask are set to 0.

翻译：标准化：由于CT扫描的强度尺度是绝对的，所有CT图像都会根据整个相应数据集的统计信息自动归一化：如果数据集的对应json描述文件中的模态描述指示为‘ct’，则会收集训练数据集中分割掩膜内出现的所有强度值，并通过将这些强度值裁剪到它们的[0.5, 99.5]百分位范围来归一化整个数据集，然后根据所有收集的强度值的均值和标准差进行z-score归一化。对于MRI或其他成像模态（即如果模态中没有找到‘ct’字符串），则对每个患者单独应用简单的z-score归一化。

如果裁剪使得数据集中患者的平均大小（以体素计）减少了1/4或更多，则归一化仅在非零元素的掩膜内进行，并将掩膜外的所有值设为0。

笔记：CT图像使用基于整个数据集的强度统计信息进行裁剪和z-score归一化，而其他模态（如MRI）则对每个患者单独应用z-score归一化。此外，如果裁剪后数据集体素大小减少显著，归一化仅在非零元素区域内进行。

a、z-score归一化：将数据集的特征值转换为标准正态分布（均值为0，标准差为1）。这种归一化方法确保不同特征的数据在同一个尺度上，尤其在不同特征具有不同单位或量纲时很有用。 $z=\frac{x-\mu }{\sigma }$ ， $z$ ：表示归一化后的数值， $x$ ：表示原始数值， $\mu$ ：表示该特征的均值， $\sigma$ ：表示该特征的标准差。

2.3 Training Procedure

①：All models are trained from scratch and evaluated using five-fold cross-validation on the training set. We train our networks with a combination of dice and crossentropy loss:

For 3D U-Nets operating on nearly entire patients (first stage of the U-Net Cascade and 3D U-Net if no cascade is necessary) we compute the dice loss for each sample in the batch and average over the batch. For all other networks we interpret the samples in the batch as a pseudo-volume and compute the dice loss over all voxels in the batch.

The dice loss formulation used here is a multi-class adaptation of the variant proposed in [14]. Based on past experience [13,1] we favor this formulation over other variants [8,15]. The dice loss is implemented as follows:

where u is the softmax output of the network and v is a one hot encoding of the ground truth segmentation map. Both u and v have shape I × K with i ∈ I being the number of pixels in the training patch/batch and k ∈ K being the classes.

We use the Adam optimizer with an initial learning rate of 3 × 10−4 for all experiments. We define an epoch as the iteration over 250 training batches. During training, we keep an exponential moving average of the validation (l v MA) and training (l t MA) losses. Whenever l t MA did not improve by at least 5 × 10−3 within the last 30 epochs, the learning rate was reduced by factor 5. The training was terminated automatically if l v MA did not improve by more than 5 × 10−3 within the last 60 epochs, but not before the learning rate was smaller than 10−6 .

翻译：所有模型从零开始训练，并在训练集上使用五折交叉验证进行评估。我们使用Dice损失和交叉熵损失的组合来训练网络：

对于在几乎整个患者图像上操作的3D U-Net（包括U-Net 级联的第一阶段和不需要级联时的3D U-Net），我们对批次中的每个样本计算Dice损失，并在整个批次上取平均值。对于所有其他网络，我们将批次中的样本视为伪体积，并对批次中的所有体素计算Dice损失。

这里使用的Dice损失公式是对[14]中提出的变体的多分类适应。基于以往的经验[13,1]，我们更倾向于这种形式而不是其他变体[8,15]。Dice损失的实现如下：

其中， $u$ 是网络的softmax输出， $v$ 是地面真实分割图的one-hot编码。 $u$ 和 $v$ 的形状为 $I\times K$ ，其中 i ∈ Ii∈I是训练patch或batch中的像素数量，k∈K 是类别。

我们在所有实验中使用Adam优化器，初始学习率为 3×10−43 × 10^{-4}3×10−4。我们定义一个epoch为250个训练批次的迭代。在训练过程中，我们保留验证损失 $l_{MA}^{v}$ 和训练损失 $l_{MA}^{t}$ 的指数移动平均值。每当 $l_{MA}^{t}$ 在最近30个epoch内没有提高至少 5×10−35 × 10^{-3}5×10−3，学习率就会减少5倍。如果在60个epoch内 $l_{MA}^{v}$ 没有提高超过 5×10−35 × 10^{-3}5×10−3，训练将自动终止，但在学习率低于 10−610^{-6}10−6 之前不会停止。

笔记：

Dice损失和交叉熵损失的组合用于网络训练。
对于操作几乎整个患者图像的3D U-Net，Dice损失在批次上取平均。
对于其他网络，样本被视为伪体积，计算Dice损失。
使用了一个多分类Dice损失公式，优先考虑这一形式。
使用Adam优化器，初始学习率为 3×10−43 × 10^{-4}3×10−4。
通过验证和训练损失的指数移动平均值（EMA）来调整学习率。

② Data Augmentation：When training large neural networks from limited training data, special care has to be taken to prevent overfitting. We address this problem by utilizing a large variety of data augmentation techniques. The following augmentation techniques were applied on the fly during training: random rotations, random scaling, random elastic deformations, gamma correction augmentation and mirroring. Data augmentation was done with our own in-house framework which is publically available at github.com/MIC-DKFZ/batchgenerators.

We define sets of data augmentation parameters for the 2D and 3D U-Net separately. These parameters are not modified between datasets.

Applying three dimensional data augmentation may be suboptimal if the maximum edge length of the input patch size of a 3D U-Net is more than two times as large as the shortest. For datasets where this criterion applies we use our 2D data augmentation instead and apply it slice-wise for each sample.

The second stage of the U-Net Cascade receives the segmentations of the previous step as additional input channels. To prevent strong co-adaptation we apply random morphological operators (erode, dilate, open, close) and randomly remove connected components of these segmentations.

Patch Sampling： To increase the stability of our network training we enforce that more than a third of the samples in a batch contain at least one randomly chosen foreground class.

翻译：数据增强：在从有限的训练数据中训练大型神经网络时，必须特别注意防止过拟合。为了解决这一问题，我们采用了多种数据增强技术。以下增强技术是在训练过程中实时应用的：随机旋转、随机缩放、随机弹性变形、伽马校正增强和镜像。数据增强使用了我们自己的内部框架，该框架已公开在 github.com/MIC-DKFZ/batchgenerators。

我们分别为2D和3D U-Net定义了一组数据增强参数，这些参数在不同数据集之间不会改变。

当3D U-Net的输入补丁尺寸的最长边长超过最短边长的两倍时，应用三维数据增强可能效果不佳。对于符合此标准的数据集，我们改为使用2D数据增强，并对每个样本进行逐切片的增强处理。

U-Net级联的第二阶段将上一阶段的分割结果作为额外的输入通道。为了防止强烈的协同适应，我们对这些分割结果应用了随机形态学操作（如侵蚀、膨胀、开操作、闭操作），并随机移除其中的连通分量。

补丁采样：为了提高我们网络训练的稳定性，我们确保批次中超过三分之一的样本包含至少一个随机选择的前景类。

笔记：

为防止模型在有限训练数据下出现过拟合，通过多种技术（旋转、缩放、弹性变形等）实时增强数据。
针对2D和3D U-Net分别定义了增强参数，并且在数据集间不做改变。
如果3D U-Net输入的最长边长超过最短边的两倍，三维增强可能效果不佳，此时改用二维逐切片增强。
U-Net级联，在第二阶段中，模型接收上一阶段的分割结果作为输入，并通过随机形态学操作防止过度适应。
不将整个图像一次性输入网络，而是将其分割成较小的区域，称为补丁。Patch采样的策略：为保证训练的稳定性，批次中至少三分之一的样本包含前景类。

2.4 Inference

Due to the patch-based nature of our training, all inference is done patch-based as well. Since network accuracy decreases towards the border of patches, we weigh voxels close to the center higher than those close to the border, when aggregating predictions across patches. Patches are chosen to overlap by patch size / 2 and we further make use of test time data augmentation by mirroring all patches along all valid axes.

Combining the tiled prediction and test time data augmentation result in segmentations where the decision for each voxel is obtained by aggregating up to 64 predictions (in the center of a patient using 3D U-Net). For the test cases we use the five networks obtained from our training set cross-validation as an ensemble to further increase the robustness of our models.

翻译：由于训练是基于补丁的，因此所有的推理也都是基于补丁的。因为网络的准确性在补丁边界附近会降低，因此在跨补丁聚合预测时，我们对靠近中心的体素赋予比靠近边界的体素更高的权重。补丁的选择是通过补丁大小的一半进行重叠。此外，我们在测试时通过在所有有效轴上镜像所有补丁来进行数据增强。

将平铺预测和测试时的数据增强相结合，能够生成每个体素的分割结果，在使用3D U-Net时，在患者中心的体素可获得多达64个预测。对于测试样例，我们使用通过交叉验证获得的5个网络作为集成，以进一步提高模型的鲁棒性。

笔记：

推理与训练一样，都是基于补丁进行的。为应对网络在补丁边界的准确性下降（边界缺乏上下文信息），聚合预测时对中心体素给予更高权重。
补丁之间的重叠是以补丁大小的一半为标准。测试时还通过沿所有有效轴的镜像操作来增强数据。
通过平铺预测和测试时的数据增强，使用3D U-Net时，患者中心体素的分割可从最多64个预测中聚合结果。
为了增强鲁棒性，测试时使用了交叉验证得到的5个网络作为集成模型。

2.5 Postprocessing

A connected component analysis of all ground truth segmentation labels is performed on the training data. If a class lies within a single connected component in all cases, this behaviour is interepreted as a general property of the dataset. Hence, all but the largest connected component for this class are automatically removed on predicted images of the corresponding dataset.

翻译：对所有的真实分割标签在训练数据中执行连通分量分析。如果在所有样本中，某个类别始终属于单个连通组件，则这种行为被解释为数据集的一般特性。因此，针对该类别的预测图像，除了最大的连通组件外，其他所有连通组件都会被自动移除。

笔记：连通分量分析有助于理解每个类别的解剖学形态和分布特征。

在训练数据上进行连通分量分析，识别类别是否总是表现为单个连通区域。
自动移除非最大组件：自动移除除了最大连通分量的其他连通分量。

2.6 Ensembling and Submission

To further increase the segmentation performance and robustness all possible combinations of two out of three of our models are ensembled for each dataset. For the final submission, the model (or ensemble) that achieves the highest mean foreground dice score on the training set cross-validation is automatically chosen.

翻译：为了进一步提高分割性能和鲁棒性，每个数据集都会对我们模型中的任意两个模型组合进行集成。对于最终的提交，系统会自动选择在训练集交叉验证中获得最高前景dice评分的模型（或集成）。

3 Experiments and Results

We optimize our network topologie using five-fold cross-validations on the phase 1 datasets. Our phase 1 cross-validation results as well as the corresponding submitted test set results are summarized in Table 2. - indicates that the U-Net Cascade was not applicable (i.e. necessary, according to our criteria) to a dataset because it was already fully covered by the input patch size of the 3D U-Net. The model that was used for the final submission is highlighted in bold. Although several test set submissions were allowed by the platform, we believe it to be bad practice to do so. Hence we only submitted once and report the results of this single submission.

As can be seen in Table 2 our phase 1 cross-validation results are robustly recovered on the held-out test set indicating a desired absence of over-fitting. The only dataset that suffers from a dip in performance on all of its foreground classes is BrainTumour. The data of this phase 1 dataset stems from the BRATS challenge [16] for which such performance drops between validation and testing are a common sight and attributed to a large shift in the respective data and/or ground-truth distributions.

翻译：我们通过对第一阶段数据集进行五折交叉验证来优化我们的网络拓扑结构。我们的第一阶段交叉验证结果以及相应的提交测试集结果总结在表2中。"-" 表示U-Net级联对某些数据集不适用（即，根据我们的标准不需要使用级联模型），因为这些数据集已经完全被3D U-Net的输入补丁大小所覆盖。用于最终提交的模型在表中以粗体标出。尽管平台允许提交多次测试集结果，我们认为这样做是错误的做法。因此，我们只提交了一次，并报告了这次单一提交的结果。

如表2所示，我们的第一阶段交叉验证结果在留出的测试集上得到了稳健的恢复，这表明没有出现过拟合现象。唯一一个在所有前景类别上表现下降的数据集是BrainTumour。该第一阶段数据集来自BRATS挑战赛[16]，这种验证和测试之间性能下降的现象很常见，通常归因于数据和/或真实标签分布之间的巨大差异。

笔记：

交叉验证结果在测试集上稳健恢复，表明没有过拟合。
唯一性能下降的是BrainTumour数据集，性能下降归因于数据分布或真实标签的变化。

Tabel 2. Mean dice scores for the proposed models in all phase 1 tasks. All experiments were run as five-fold cross-validation. The models that we used for generating our test set submission are highlighted in bold. The dice scores of the test sets are shown at the bottom of the table. Test dice scores in bold denote that at the time of manuscript submission these scores were the highest in the online leaderboard of the challenge (decathlon.grand-challenge.org/evaluation/results).

翻译：所提出模型在所有第一阶段任务中的平均Dice得分。所有实验均以五倍交叉验证的方式进行。我们用于生成测试集提交的模型以粗体突出显示。测试集的Dice分数显示在表的底部。粗体的测试Dice分数表示在提交稿件时，这些分数是挑战在线排行榜中的最高分数（decathlon.grand-challenge.org/evaluation/results）。

4 Discussion

In this paper we present the nnU-Net segmentation framework for the medical domain that directly builds around the original U-Net architecture [6] and dynamically adapts itself to the specifics of any given dataset. Based on our hypothesis that non-architectural modifications can be much more powerful than some of the recently presented architectural modifications, the essence of this framework is a thorough design of adaptive preprocessing, training scheme and inference. All design choices required to adapt to a new segmentation task are done in a fully automatic manner with no manual interaction. For each task the nnU-Net automatically runs a five-fold cross-validation for three different automatically configures U-Net models and the model (or ensemble) with the highest mean foreground dice score is chosen for final submission. In the context of the Medical Segmentation Decathlon we demonstrate that the nnU-Net performs competitively on the held-out test sets of 7 highly distinct medical datasets, achieving the highest mean dice scores for all classes of all tasks (except class 1 in the BrainTumour dataset) on the online leaderboard at the time of manuscript submission. We acknowledge that training three models and picking the best one for each dataset independently is not the cleanest solution. Given a larger time-scale, one could investigate proper heuristics to identify the best model for a given dataset prior to training. Our current tendency favors the U-Net Cascade (or the 3D U-Net if the cascade cannot be applied) with the sole (close) exceptions being the Prostate and Liver tasks. Additionally, the added benefit of many of our design choices, such as the use of Leaky ReLUs instead of regular ReLUs and the parameters of our data augmentation were not properly validated in the context of this challenge. Future work will therefore focus on systematically evaluating all design choices via ablation studies.

翻译：在本文中，我们提出了适用于医学领域的nnU-Net分割框架，该框架直接基于原始的U-Net架构[6]，并能够根据给定数据集的具体情况进行动态自适应调整。基于我们的假设，即非架构上的修改可能比一些最近提出的架构修改更为有效，该框架的核心在于自适应的预处理、训练方案和推理过程的精心设计。适应新分割任务所需的所有设计选择都是全自动完成的，无需人工干预。对于每项任务，nnU-Net会自动运行三种不同配置的U-Net模型，并进行五折交叉验证，最终选择平均前景dice得分最高的模型（或集成模型）进行最终提交。在医学分割十项全能挑战赛中，我们展示了nnU-Net在7个高度不同的医学数据集的留出测试集上表现具有竞争力，并在提交论文时的在线排行榜上，除了BrainTumour数据集的类别1外，所有任务的所有类别均取得了最高的平均dice得分。我们承认，独立地为每个数据集训练三个模型并选择最佳模型不是最优雅的解决方案。在更长的时间范围内，可以研究适当的启发式方法来在训练之前识别最适合某个数据集的模型。目前，我们倾向于使用U-Net级联（如果不能应用级联，则使用3D U-Net），唯一的例外是前列腺和肝脏任务。此外，许多设计选择的附加好处，例如使用Leaky ReLU而不是常规ReLU，以及我们数据增强的参数，在本次挑战中并未得到充分验证。因此，未来的工作将集中于通过消融研究系统地评估所有设计选择。

笔记：

nnU-Net框架：
- 基于原始U-Net架构的自动化分割框架，能动态适应不同的数据集。
- 框架核心是自适应的预处理、训练方案和推理过程，完全自动化无需人工干预。
模型选择：
- nnU-Net为每个任务自动运行三种不同的U-Net模型配置，进行五折交叉验证，并选择平均dice得分最高的模型或集成模型进行最终提交。
医学分割十项全能挑战赛结果：
- nnU-Net在7个医学数据集上表现优异，在在线排行榜上所有任务的所有类别中，除了BrainTumour的类别1外，都取得了最高的平均dice得分。
设计选择和未来工作：
- 承认现有方法不是最优解，未来工作将研究如何在训练前识别最佳模型。
- 计划通过消融研究系统评估框架中的所有设计选择，如Leaky ReLU和数据增强参数的有效性。