MobileViT: Lightweight general purpose and mobile friendly vision transformer (移动端-ViT：一种轻量级的通用移动端V

最新推荐文章于 2023-05-31 13:23:07 发布

Robo-网络矿产提炼工

最新推荐文章于 2023-05-31 13:23:07 发布

阅读量238

点赞数

分类专栏：风机关键部件以及轴承故障诊断文章标签： transformer 深度学习人工智能

本博客为个人撰写，未经商业授权严禁转载！

本文链接：https://blog.csdn.net/u013537270/article/details/127323494

版权

风机关键部件以及轴承故障诊断专栏收录该内容

52 篇文章 108 订阅 ¥29.90 ¥99.00

订阅专栏

超级会员免费看

本文提出了一种新的轻量级、通用的移动端视觉Transformer模型——MobileViT，它结合了CNN的空间归纳偏置和Transformer的全局信息处理能力。MobileViT在保持较低延迟和参数量的同时，能有效学习视觉任务的表示，相比于其他轻量级CNN模型，如MobileNet，MobileViT在ImageNet-1K上的表现更优。

摘要由CSDN通过智能技术生成

1. 摘要

轻量级卷积网络在移动端计算中得到了广泛的应用。他们的spatial inductive biases allow them to learn representations with fewer parameters across different vision tasks. 空间归纳偏差让模型能够在不同的视觉任务中以较少的参数下学习到表征。然而，这些表征往往在空间上具有较强局限性。为了学习到全局的表征，视觉transformer结构中的自注意力模块被采纳。 How to cambine the sttrength of CNN and Transformer 结构，以构建一个低延迟、轻量级的网络实现对视觉任务的有效检测？基于此，文章提出了一种Mobile Vit模型。 In this paper, we ask the following question: is it possible to combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks? Towards this end, we introduce MobileViT, a light-weight and general-purpose vision transformer for mobile devices.