Reference
Learn2Reg: Comprehensive Multi-Task Medical Image Registration Challenge, Dataset and Evaluation in the Era of Deep Learning 👉
Abstract
Medical image registration plays a very important role in improving clinical workflows, computer-assisted interventions and diagnosis as well as for research studies involving e.g. morphological analysis. Besides ongoing research into new concepts for optimisation, similarity metrics, domain adaptation and deformation models, deep learning for medical registration is currently starting to show promising advances that could improve the robustness, generalisation, computation speed and accuracy of conventional algorithms to enable better practical translation. Nevertheless, before Learn2Reg there was no commonly used benchmark dataset to compare stateof-the-art learning based registration among another and with their conventional (not trained) counterparts. With few exceptions (CuRIOUS at MICCAI 2018/2019, the Continuous Registration Challenge at WBIR 2018 and Learn2Reg 2020) there has also been no comprehensive registration challenge covering different anatomical structures and evaluation metrics. We also believe that the entry barrier for new teams to contribute to this emerging field are higher than e.g. for segmentation, where standardised datasets (e.g. Medical Decathlon, BraTS) are easily available. In contrast, many registration tasks, require resampling from different voxel spacings, affine pre-registration and can lead to ambiguous and error-prone evaluation of whole deformation fields. We propose a simplified challenge design that removes many of the common pitfalls for learning and applying transformations. We will provide pre-preprocessed data (resample, crop, pre-align, etc.) that can be directly employed by most conventional and learning frameworks. Only docker containers that generate displacement fields in voxel dimensions in a standard orientation will have to be provided by participants and python code to test their application (on local machines) to training data will be provided as open-source along with all evaluation metrics. Our challenge will consist of three clinically relevant sub-tasks (datasets) that are complementary in nature. They can either be individually or comprehensively addressed by participants and cover both intra- and inter-patient alignment, CT, ultrasound and MRI modalities, neuro-, thorax and abdominal anatomies and the four of the imminent challenges of medical image registration:
- learning from small datasets
- estimating large deformations
- dealing with multi-modal scans
- learning from noisy annotations
黄色部分 是可以直接引用在论文写作中的;黑体部分 是 Learn2Reg2021 的意义和提供的数据集以及提出的挑战。
Task 01: Abdominal MR-CT
Abstract
人体腹部是一个重要而复杂的身体空间。腹部以横膈为上界,以骨盆为下界,以脊椎为支撑,以肌肉腹壁为保护,腹部包含血液储备、排毒、排尿、内分泌功能和消化等器官,并包括许多重要的动脉和静脉。
计算机断层扫描(CT)扫描和磁共振图像(MRI)通常用于腹部相关疾病的诊断和预后或干预计划;然而,针对腹部的特定图像配准工具很少,并且几乎没有算法能够处理多模态配准。
yet few specific image registration tools for the abdomen have been developed and nearly no algorithm is capable of dealing with multimodal alignment.
在腹部 CT 和 MRI 上,不同表现之间的差异(例如,年龄、性别、身高、正常解剖变异和疾病状态)可以通过观察每个器官的大小、形状和外观来预测。但是,个体本身由于诸如:姿势、呼吸周期、水肿、消化状态等,很容易改变腹部器官的形状和器官之间的位置关系,使配准进一步复杂化。
On abdominal CT and MRI, inter-subject variability (e.g., age, gender, stature, normal anatomical variants, and disease status) can be observed in terms of the size, shape, and appearance of each organ. Soft anatomy deformation further complicates the registration by varying the inter-organ relationships, even within individuals (e.g., pose, respiratory cycle, edema, digestive status).
在 Learn2reg 挑战中,这项任务是对几个分离的区域进行配准,这些区域具有较大的器官间差异和巨大的体积可变性:从几百个体素的小器官到非常大的器官。
aligning several disjunct regions with large inter-subject variations and
great variability in volume: from a few hundred voxels to very large organs.
当仅有一种模态有大量的标签时,域自适应可以在多模态配准中起决定性作用。为了探索这种多模态迁移学习的挑战,我们将只提供用于训练的 CT 标签,但在测试中评估模态间(CT-MR)和模态内配准。
Domain adaptation can play a decisive role when large labelled datasets are only available for one modality. To explore the challenges of this multimodal transfer learning, we will only provide CT labels for training, but evaluate both inter- and intra-modal registration at test.
Keywords
intra-patient, CT, MRI, registration, multimodal
challenge
- Multimodal registration.
- Learning from few/noisy annotations.
- Learning with domain gaps.
Cohorts
-
Target cohorts - 哪些主体 / 对象将在最终的应用中获得有用的数据?
一方面,接受图像引导手术干预、活检或放射治疗的患者可以受益于可变形解剖器官图谱,该图谱可捕捉危险器官和靶区的空间关系。另一方面,对大群体进行形状分析可以深入了解与常见疾病相关的流行病学差异。
On the one hand patients undergoing image-guided surgical interventions, biopsies or radiotherapy could benefit from a deformable anatomical organ atlas that captures spatial relations of organs-at-risk and target regions. On the other hand, shape analysis over large cohorts could provide insight into epidemiological difference in relation to common disease.
-
challenge cohort - 获取这些数据的主体 / 对象。
在接受结直肠癌化疗试验的患者中,腹部 CT 扫描的基线时段(baseline sessions)是从转移性肝癌患者中随机选择的;其余的扫描是从疑似腹疝的术后回顾性队列(cohort)中获得的。附加的隐藏数据集是研究一般人群的全身 MRI。
Patients from an colorectal cancer chemotherapy trial, the baseline sessions of the abdominal CT scans were randomly selected from metastatic liver cancer patients; the remaining scans were acquired from a retrospective post-operative cohort with suspected ventral hernias. Additional hidden dataset of general population study with whole-body MRI.
Imaging modality(ies)
Magnetic Resonance Imaging (MRI) & Computed Tomography (CT)
fixed images: MR; moving images: CT
Context information
主要考虑来自同一患者腹部的成对 CT 和 MRI 扫描。附加另外两个未配对的 CT (Task3 L2R’20) 和 MRI (CHAOS MR) 数据集用于辅助训练。
122 CT/MR scans (16 CT-MR scan pairs (8 Training, 8 Test) + 90 unpaired CT (50)/MR (40) scans)
- 对于 CT:附加的子集将包括 13 个被认为是感兴趣区域(ROI)的腹部器官,包括脾脏、右肾、左肾、胆囊、食管、肝、胃、主动脉、下腔静脉、门静脉和脾静脉、胰腺、左肾上腺和右肾上腺。并非所有的 ROI 都是有手动分割的。
- 对于 MRI:只有手动分割较少数量的器官,并部分提供训练数据。我们鼓励使用多模态领域适应学习的方法进行配准。
We encourage approaches that learn by multimodal domain adaptation.
Algorithm target
即,说明算法设计用于关注的结构 / 主题 / 对象 / 组件(例如大脑肿瘤、医疗器械尖端、手术室护士、透视扫描中的导管)。
算法应该关注腹部器官在单模和多模情况下的配准。重点在特定的感兴趣区域(ROI)的对齐方面,例如,脾脏,右肾,左肾和肝脏(就这 4 个 ROI 有手动分割标签);以及较小器官:胆囊,食管,胃,下腔静脉,门耳和脾静脉,胰腺的对齐;以及生成具有空间平滑度的合理变形(雅可比行列式的低标准偏差)。
Alignment of abdominal organs within and across modalities in a heterogenous patient cohort. The focus will be on the alignment of particular regions of interest (ROI), e.g. spleen, right kidney, left kidney and liver.
Training and test case characteristics
一个 case 指的是一对 CT/MR 扫描,每对扫描来自不同的患者(患者间配准)。所有 CT 扫描(训练和测试)手动分割标签一起提供。所有成对的 CT/MRI 扫描都将手动分割标签(隐藏测试标签)。
-
说明训练、验证和测试案例的总数。
Training: 20-25 paired MR/CT cases (40-50 scans) + 30 additional CT scans and ~30 additional MRI Test: 10 paired MR/CT cases (20 scans)
-
Mention further important characteristics of the training, validation and test cases
在这个 intra-subject 配准任务中,可以直接使用多个成对的 MR/CT 扫描进行有监督学习。通过提供不成对的 CT 和 MRI 扫描,跨领域学习可以成为这项挑战任务的一个组成部分。
In this intra-subject registration task, a number of paired MR/CT scans can be directly employed for supervised training. Cross-domain learning can be an integral part of this challenge task with the provision of unpaired CT and MRI scans.
手动分割标签的特征
Annotation characteristics
MRI 和配对的 MRI/CT:至少四个腹部器官的手动 3D 体素分割:肝脏(1)、脾脏(2)、右肾(3)、左肾(4),标注是来自经验丰富的研究生,具有 3 年以上的医学成像经验。ITK-SNAP1
MRI and paired MRI/CT: manual 3D voxel segmentation of at least four abdominal organs: liver(1), spleen(2), right kidney(3), left kidney(4),
附加 CT:13 个腹部器官被视为感兴趣区域(ROI),包括脾脏、右肾、左肾、胆囊、食管、肝、胃、主动脉、下腔静脉、门静脉和脾静脉、胰腺、左肾上腺和右肾上腺。
正如一位放射科医生所建议的,由于数据集中缺乏完整的外观,心脏被排除在外,取而代之的是肾上腺被纳入临床研究。
数据预处理
Data pre-processing method(s)
提供相同体素分辨率(192 × 160 × 192)和体素空间维度(2mm voxel spacing)的通用预处理以及仿射预配准,以便于先前在图像配准方面经验不足的参与者使用基于学习的算法。
Common pre-processing to same voxel resolutions and spatial dimensions as well as affine pre-registration will be provided to ease the use of learning-based algorithms for participants with little prior experience in image registration.
误差来源
CT:标注者之间的平均总体 DSC 分数(即标注者间的误差)为 0.87 ± 0.13(仅考虑脾脏、肾脏和肝脏时为 0.95 ± 0.04)。MRI:待定。
指标
- DSC (Dice similarity coefficient) of segmentations
- HD95 (95% percentile of Haussdorff distance) of segmentations
- Robustness: 30% lowest DSC of all cases
- SD (standard deviation) of log Jacobian determinant
- Run-time computation time
DSC 或 TRE 度量准确度;HD95 度量可靠性;使用稳健性得分(最低平均 DSC 的 30% 或最高平均 TRE 的 30%)对异常值进行惩罚;变形场的平滑度(对数雅可比行列式的标准差)在配准中很重要;运行时计算时间与临床应用相关。
我们认为逆一致性(inverse consistency)是一个附加的度量标准,这在医学图像配准中是有争议的。我们决定不使用它作为竞争(排名)指标,而是出于信息的原因(即逆一致算法是否更稳健的问题)计算它。
Baselines
我们将提供几种 baseline 算法来比较你的方法,包括:
- PDD-Net (MICCAI '19) unsupervised training
- Voxelmorph (CVPR’18) with and without label supervision
- Deeds
- Elastix, NiftyReg, and/or ANTs (where applicable)
Paul A. Yushkevich, Joseph Piven, Heather Cody Hazlett, Rachel Gimpel Smith, Sean Ho, James C. Gee, and Guido Gerig. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage. 2006 Jul 1; 31(3):1116-28. ↩︎