TPAMI 2024 基于Transformer视觉分割研究进展

Phoenixtree_DongZhao

于 2024-08-16 17:25:24 发布

阅读量537

点赞数 6

分类专栏： MyDLNote-Attention Transformer MyDLNote-Segmentation 文章标签： transformer 深度学习人工智能

本文链接：https://blog.csdn.net/u014546828/article/details/141265001

版权

MyDLNote-Attention 同时被 3 个专栏收录

39 篇文章 6 订阅

订阅专栏

Transformer

33 篇文章 6 订阅

订阅专栏

MyDLNote-Segmentation

6 篇文章 1 订阅

订阅专栏

Transformer-Based Visual Segmentation: A Survey

GitHub - TPAMI-2024 | Arxiv

Abstract

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research.

视觉分割寻求将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中有许多应用，如自动驾驶、图像编辑、机器人传感和医学分析。在过去的十年中，基于深度学习的方法在这一领域取得了显著的进步。

最近，Transformer，一种基于自关注的神经网络，最初是为自然语言处理而设计的，在各种视觉处理任务中大大超过了以前的卷积或循环方法。具体来说，视觉Transformer为各种分割任务提供了强大，统一，甚至更简单的解决方案。

本调查提供了基于Transformer的视觉分割的全面概述，总结了最近的进展。

首先回顾背景，包括问题定义、数据集和先前的卷积方法。

接下来，总结一个统一了所有最近基于转换器的方法的元体系结构。基于这个元体系结构，研究了各种方法设计，包括对元体系结构和相关应用程序的修改。本

文还介绍了几个具体的子领域，包括3D点云分割，基础模型调整，领域感知分割，高效分割和医疗分割。

此外，本文在几个完善的数据集上编译并重新评估了所审查的方法。

最后，确定了该领域的开放挑战，并提出了未来的研究方向。

Phoenixtree_DongZhao

关注

6
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
TPAMI 2024 基于Transformer视觉分割研究进展

视觉分割寻求将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中有许多应用，如自动驾驶、图像编辑、机器人传感和医学分析。在过去的十年中，基于深度学习的方法在这一领域取得了显著的进步。最近，Transformer，一种基于自关注的神经网络，最初是为自然语言处理而设计的，在各种视觉处理任务中大大超过了以前的卷积或循环方法。具体来说，视觉Transformer为各种分割任务提供了强大，统一，甚至更简单的解决方案。本调查提供了基于Transformer的视觉分割的全面概述，总结了最近的进展。
复制链接

扫一扫

专栏目录