Title
题目
Universal and extensible language-vision models for organ segmentation and tumor detection from abdominal computed tomography
用于腹部计算机断层扫描中器官分割和肿瘤检测的通用且可扩展的语言-视觉模型
01
文献速递介绍
计算机断层扫描(CT)是一种广泛使用且功能强大的疾病诊断和治疗计划工具(Mattikalli 等,2022;Zhou 等,2022;Qu 等,2023;Chen 等,2024a)。在常规的临床工作流程中,放射科医生需要分析单个CT卷中的数百个2D切片,以寻找和解读诊断信息,这个过程既繁琐又容易导致误诊(Zhou,2021)。医学图像分割提供了一个有前景的解决方案,通过自动识别器官、描绘其边界并突出显示异常,来提高诊断效率和质量(Liu 等,2021;Hu 等,2023;Chen 等,2023a,2024b;Lai 等,2024)。
医学图像分割的进展在很大程度上依赖于专门的数据集。这些数据集包括器官/肿瘤特定的数据集,例如LiTS(肝肿瘤分割)(Bilic 等,2019)、KiTS(肾肿瘤分割)(Heller 等,2019)和MSD(医学分割十项全能)(Simpson 等,2019),以及腹部多器官标注数据集,如BTCV(Beyond The Cranial Vault)(Landman 等,2015)、AMOS(腹部多器官分割)(Ji 等,2022)和AbdomenAtlas(Li 等,2024)。此外,Wasserthal 等(2023)提出了一种提供全身解剖视图的方法,旨在捕捉人体的综合解剖结构,而不仅仅是特定的身体区域。
考虑到人体解剖结构的高度复杂性和对精细临床要求的需求,我们可以预见新的器官/肿瘤标注的出现(Jaus 等,2023),例如阑尾和脾脏肿瘤的标注。此外,对更详细的解剖标注的需求也在增长,例如区分肝脏的右叶和左叶(Germain 等,2014)。虽然在当前的数据集中,肝脏的标注很常见,但肝叶的划分研究仍然不足(Bilic 等,2023),这可能导致未来肝叶与整个肝脏之间标注的重叠。为满足这些新兴需求,最新的努力涉及使用人类参与重新标注现有数据集并相应地重新训练模型(Qu 等,2023;Wasserthal 等,2023;Jaus 等,2023)。然而,这种方法带来了巨大的标注成本,特别是在3D医学影像中,并且重新从头训练模型还需要大量的计算资源(Zhang 等,2023b,2024)。因此,探索一种能够有效处理新器官/肿瘤标注并减轻重新训练模型所带来的计算负担的新框架显得尤为重要。
Abatract
摘要
The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by thegrowing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However,these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classesdue to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome theselimitations, we propose a universal, extensible framework enabling a single model, termed Universal Model,to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce anovel language-driven parameter generator that leverages language embeddings from large language models,enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers arereplaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organsand six types of tumors and ease the addition of new classes. We train our Universal Model on 3410 CTvolumes assembled from 14 publicly available datasets and then test it on 6173 CT volumes from four externaldatasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD)public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary,Universal Model exhibits remarkable computational efficiency (6× faster than other dataset-specific models),demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks,and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes,
人工智能(AI)在器官分割和肿瘤检测方面的进步得益于带有详细体素级注释的计算机断层扫描(CT)数据集的日益普及。然而,由于单热编码、架构设计和学习方案的限制,这些AI模型通常在处理部分标注数据集以及扩展到新类别(例如器官/肿瘤)时表现出灵活性不足。为克服这些限制,我们提出了一个通用且可扩展的框架,使单一模型(称为Universal Model)能够处理多个公共数据集并适应新类别。首先,我们引入了一种新颖的语言驱动参数生成器,该生成器利用大语言模型中的语言嵌入,与单热编码相比,增强了语义编码。其次,我们将传统的输出层替换为轻量级的类别特定头部,使Universal Model能够同时分割25个器官和6种类型的肿瘤,并简化新类别的添加。我们在由14个公开可用数据集组成的3410个CT卷上训练了Universal Model,并在来自四个外部数据集的6173个CT卷上对其进行了测试。Universal Model在医学分割十项全能(MSD)公开排行榜的六项CT任务中获得第一名,并在Beyond The Cranial Vault(BTCV)数据集上取得领先表现。总而言之,Universal Model展现了卓越的计算效率(比其他特定数据集的模型快6倍)、在不同医院之间的强泛化能力、良好的下游任务迁移性,更重要的是,它在扩展新类别时有效避免了先前学习类别的灾难性遗忘。