Transformer-Based Visual Segmentation: A Survey
Abstract
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research.
视觉分割寻求将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中有许多应用,如自动驾驶、图像编辑、机器人传感和医学分析。在过去的十年中,基于深度学习的方法在这一领域取得了显著的进步。
最近,Transformer,一种基于自关注的神经网络,最初是为自然语言处理而设计的,在各种视觉处理任务中大大超过了以前的卷积或循环方法。具体来说,视觉Transformer为各种分割任务提供了强大,统一,甚至更简单的解决方案。
本调查提供了基于Transformer的视觉分割的全面概述,总结了最近的进展。
首先回顾背景,包括问题定义、数据集和先前的卷积方法。
接下来,总结一个统一了所有最近基于转换器的方法的元体系结构。基于这个元体系结构,研究了各种方法设计,包括对元体系结构和相关应用程序的修改。本
文还介绍了几个具体的子领域,包括3D点云分割,基础模型调整,领域感知分割,高效分割和医疗分割。
此外,本文在几个完善的数据集上编译并重新评估了所审查的方法。
最后,确定了该领域的开放挑战,并提出了未来的研究方向。