VISION-MAE: A Foundation Model for Medical Image Segmentation and Classification
Abstract
Artificial Intelligence (AI) has the potential to revolutionize diagnosis and segmentation in medical imaging. However, development and clinical implementation face multiple challenges including limited data availability, lack of generalizability, and the necessity to incorporate multimodal data effectively. A foundation model, which is a large-scale pre-trained AI model, offers a versatile base that can be adapted to a variety of specific tasks and contexts. Here, we present a novel foundation model, VISION-MAE, specifically designed for medical imaging. Specifically, VISION-MAE is trained on a dataset of 2.5 million unlabeled images from various modalities (CT, MR, PET, X-rays, and ultrasound), using self-supervised learning techniques. It is then adapted to classification and segmentation tasks using explicit labels. VISION-MAE has high label efficiency, outperforming several benchmark models in both in-domain and out-of-domain applications, and achieves high performance even with reduced availability of labeled data. This model represents a significant advancement in medical imaging AI, offering a generalizable and robust solution for improving segmentation and classification tasks while reducing the data annotation workload.
人工智能(AI)具有彻底改变医学成像中诊断和分割技术的潜力。然而,其开发和临床实施面临多重挑战,包括数据可用性有限、缺乏通用性,以及需要有效整合多模态数据。基础模型是一种大规模的预训练AI模型,提供了一个多功能的基础,可以适应各种特定任务和上下文。在此,本文介绍了一种专为医学成像设计的新型基础模型VISION-MAE。具体来说,VISION-MAE使用自监督学习技术,在包含250万张来自不同模态(CT、MR、PET、X光和超声)的未标记图像的数据集上进行训练。然后,它使用明确的标签适应于分类和分割任务。VISION-MAE具有高标签效率,在域内和域外应用中均优于多个基准模型,即使在标记数据可用性降低的情况下也能实现高性能。该模型代表了医学成像AI领域的重大进展,为改善分割和分类任务提供了一种通用且稳健的解决方案,同时减轻了数据标注的工作负担。
Introduction
人工智能(AI)领域的最新进展催生了基础模型的发展,这些机器学习模型在大规模、多样化的数据集上进行训练,可以适应各种下游任务。虽然传统的深度学习模型是专为特定应用(如间质性肺病或COVID-19的分类)而训练的,并且在重新用于其他任务时表现不佳,但基础模型提供了更通用和可适应的能力