TPAMI 2024 基于Transformer视觉分割研究进展

Transformer-Based Visual Segmentation: A Survey

GitHub - TPAMI-2024  |  Arxiv

Abstract

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research.

视觉分割寻求将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中有许多应用,如自动驾驶、图像编辑、机器人传感和医学分析。在过去的十年中,基于深度学习的方法在这一领域取得了显著的进步。

最近,Transformer,一种基于自关注的神经网络,最初是为自然语言处理而设计的,在各种视觉处理任务中大大超过了以前的卷积或循环方法。具体来说,视觉Transformer为各种分割任务提供了强大,统一,甚至更简单的解决方案。

本调查提供了基于Transformer的视觉分割的全面概述,总结了最近的进展。

        首先回顾背景,包括问题定义、数据集和先前的卷积方法。

        接下来,总结一个统一了所有最近基于转换器的方法的元体系结构。基于这个元体系结构,研究了各种方法设计,包括对元体系结构和相关应用程序的修改。本

        文还介绍了几个具体的子领域,包括3D点云分割,基础模型调整,领域感知分割,高效分割和医疗分割。

        此外,本文在几个完善的数据集上编译并重新评估了所审查的方法。

        最后,确定了该领域的开放挑战,并提出了未来的研究方向。

  • 6
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值