TransUNet: 通过Transformer的视角重新思考U-Net架构在医学图像分割中的设计|文献速递-Transformer架构在医学影像分析中的应用

Title

题目

TransUNet: Rethinking the U-Net architecture design for medical imagesegmentation through the lens of transformers

TransUNet: 通过Transformer的视角重新思考U-Net架构在医学图像分割中的设计

01

文献速递介绍

卷积神经网络(CNNs),特别是全卷积网络(FCNs)(Long 等,2015),在医学图像分割领域中获得了显著的关注。在其各种迭代模型中,U-Net 模型(Ronneberger 等,2015)因其对称的编码器–解码器设计,并通过跳跃连接增强细节保留,成为许多研究人员的首选。基于这一方法,各类医学成像任务中取得了显著进展。这些进展包括磁共振成像(MRI)中的心脏分割(Yu 等,2017)、利用计算机断层扫描(CT)进行的器官勾勒(Zhou 等,2017;Li 等,2018b;Yu 等,2018;Luo 等,2021)以及结肠镜检查中的息肉分割(Zhou 等,2019)。

尽管CNN在表示能力方面无可匹敌,但由于卷积操作的局部性,在建模远程关系时往往表现不足。当面对不同患者之间纹理、形状和大小的巨大变化时,这一局限性尤其明显。认识到这一不足,研究界越来越倾向于使用完全基于注意力机制的Transformers模型,因为它们在捕捉全局上下文方面有着天然的优势(Vaswani 等,2017)。然而,Transformers将输入处理为一维序列,优先进行全局上下文建模,容易生成低分辨率的特征。因此,一种更有前景的混合方法是结合CNN和Transformer编码器。

TransUNet(Chen 等,2021)于2021年首次提出,是首批将Transformer集成到医学图像分析中的模型之一。该方法利用了U-Net编码器的高分辨率空间细节,同时发挥了Transformers在全局上下文建模中的优势,这在医学图像分割中至关重要。这一创新促使了后续研究的开展(Cao 等,2022;Xie 等,2021;Hatamizadeh 等,2021)。尽管如此,不同U-Net组件中Transformers自注意力机制的全面理解仍然缺失。

Abatract

摘要

Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitationsin modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequencepredictions have been integrated into medical image segmentation. However, a comprehensive understandingof Transformers’ self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widelyrecognized as one of the first models to integrate Transformer into medical image analysis. In this study,we present the versatile framework of TransUNet that encapsulates Transformers’ self-attention into two keymodules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN)feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regionsthrough cross-attention between proposals and U-Net features. These modules can be flexibly inserted intothe U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder.TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailorthe chosen architecture. Our findings highlight the encoder’s efficacy in modeling interactions among multipleabdominal organs and the decoder’s strength in handling small targets like tumors. It excels in diversemedical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vesselsegmentation. Notably, our Trans

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值