论文翻译:YOLO Nano_active area of research-CSDN博客

本文链接：https://blog.csdn.net/SilenceDXY/article/details/102322961

Object detection remains an active area of research in the field of computer vision, and considerable advances and successes has been achieved in this area through the design of deep convolutional neural networks for tackling object detection.
对象检测仍然是计算机视觉领域的一个活跃研究领域，通过设计用于处理对象检测的深度卷积神经网络，这一领域已经取得了相当大的进展和成功。
Despite these successes, one of the biggest challenges to widespread deployment of such object detection networks on edge and mobile scenarios is the high computational and memory requirements.
尽管取得了这些成功，但在边缘和移动场景中广泛部署此类对象检测网络的最大挑战之一是对计算和内存的高要求。
As such, there has been growing research interest in the design of efficient deep neural network architectures catered for edge and mobile usage.
因此，对于满足边缘和移动应用的高效深度神经网络架构的设计，已经引起了越来越多的研究兴趣。
In this study, we introduce YOLO Nano, a highly compact deep convolutional neural network for the task of object detection.
在本研究中，我们介绍了一种高度紧凑的深度卷积神经网络YOLO Nano来完成目标检测任务。
A human-machine collaborative design strategy is leveraged to create YOLO Nano, where principled network design prototyping, based on design principles from the YOLO family of single-shot object detection network architectures, is coupled with machinedriven design exploration to create a compact network with highly customized module-level macroarchitecture and microarchitecture designs tailored for the task of embedded object detection.
人机协同设计策略是用于创建YOLO Nano意思,有原则的网络设计原型,基于设计原则YOLO家庭意思的单发对象检测网络架构,加上machinedriven设计探索与高度定制的模块级macroarchitecture创建一个紧凑的网络和微体系结构设计为嵌入式目标检测的任务。
The proposed YOLO Nano possesses a model size of 4.0MB (>15.1 and >8.3 smaller than Tiny YOLOv2 and Tiny YOLOv3, respectively) and requires 4.57B operations for inference (>34% and 17% lower than Tiny YOLOv2 and Tiny YOLOv3, respectively) while still achieving an mAP of 69.1% on the VOC 2007 dataset ( 12% and 10.7% higher than Tiny YOLOv2 and Tiny YOLOv3, respectively).
提出YOLO纳米具有意思模型大小为4.0 mb(> 15.1 - > 8.3小于小YOLOv2和小YOLOv3,分别),需要4.57 b操作推理(> 34%和17%低于小YOLOv2和小YOLOv3,分别)同时还达到69.1%的地图2007年VOC数据集(12%和10.7%高于小YOLOv2和小YOLOv3,分别)。
Experiments on inference speed and power efficiency on a Jetson AGX Xavier embedded module at different power budgets further demonstrate the efficacy of YOLO Nano for embedded scenarios.
通过对Jetson AGX Xavier嵌入式模块在不同功耗下的推理速度和功耗效率的实验，进一步验证了YOLO Nano在嵌入式场景下的有效性。

1 Introduction

An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within a scene, but also assign a class label to each of these objects of interest.
计算机视觉领域的一个活跃领域是对象检测，其目标不仅是在一个场景中定位感兴趣的对象，而且还要为每个感兴趣的对象分配一个类标签。
Considerable recent successes in the area of object detection stems from modern advances in deep learning [8, 7], particularly leveraging deep convolutional neural networks.
最近在对象检测领域的相当大的成功源于深度学习的现代进展[8,7]，尤其是利用深度卷积神经网络。
Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD [11], R-CNN [2], Mask R-CNN [3], and other extended variants of these networks [6, 9, 18].
最初的重点是提高准确性，导致越来越复杂的目标检测网络，如SSD[11]、R-CNN[2]、Mask R-CNN[3]，以及这些网络的其他扩展变体[6,9,18]。
While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints.
尽管这些网络展示了最先进的对象检测性能，但由于计算和内存方面的限制，在edge和移动设备上部署它们即使不是不可能，也是非常具有挑战性的。
In fact, even faster variants such as Faster R-CNN [15] have inference speeds at low single-digit frame rates when running on embedded processors.
事实上，即使是更快的变体，如更快的R-CNN[15]，在嵌入式处理器上运行时，也可以在较低的个位数帧率下获得推断速度。
This greatly limits the widespread adoption of such networks for a wide range of applications such as unmanned aerial vehicles, video surveillance, autonomous driving where local embedded processing is required.
这极大地限制了这种网络在无人机、视频监控、自动驾驶等需要本地嵌入式处理的广泛应用中的广泛采用。
To address this challenge of achieving embedded object detection, there has been a growing interest in the exploration and design of highly efficient deep neural network architectures for object detection that are more well-suited for edge and mobile devices
为了解决这一实现嵌入式对象检测的挑战，人们对探索和设计更适合于边缘和移动设备的高效深度神经网络结构越来越感兴趣
A particularly interesting family of object detection networks designed around efficiency is the YOLO family of neural network architectures [12, 13, 14], which leverage a number of design principles to create single-shot architectures which can achieve embedded object detection performance on high-end desktop GPUs.
围绕效率设计的一个特别有趣的对象检测网络家族是YOLO家族的神经网络体系结构[12、13、14]，它利用许多设计原则来创建单一的体系结构，可以在高端桌面gpu上实现嵌入式对象检测性能。
However, these network architectures remain too large for many edge and mobile scenarios (e.g.,240MB in the case of the YOLOv3 architecture), and their inference speeds drop considerably when running on edge and mobile processors due to computational complexity (e.g., >65B operations in the case of YOLOv3).
然而，对于许多边缘和移动场景(例如，YOLOv3架构中的240MB)来说，这些网络架构仍然太大了，而且由于计算复杂性(例如，YOLOv3中的>65B操作)，在边缘和移动处理器上运行时，它们的推理速度显著下降。
To address this issue, Redmon et al. introduced the Tiny YOLO family of network architectures, which has greatly reduced model sizes at a cost of object detection performance.
为了解决这个问题，Redmon等人引入了网络体系结构的微型YOLO家族，该家族极大地减小了模型大小，但却牺牲了对象检测性能。
In this study, we are motivated to explore a human-machine collaborative design strategy to designing highly compact deep convolutional neural networks for the task of object detection, where principled network design prototyping is coupled with machine-driven design exploration.
在本研究中，我们的动机是探索一种人机协同设计策略，设计高度紧凑的深卷积神经网络用于目标检测，其中有原则的网络设计原型与机器驱动的设计探索相结合。
More specifically, we leverage the design principles from the YOLO family of single-shot object detection network architectures within this human-machine collaborative design strategy to create YOLO Nano, a highly compact network with highly customized module-level macroarchitecture and microarchitecture designs tailored for the task of embedded object detection.
更具体地说,我们利用设计原则YOLO家庭意思的单发对象检测网络架构在这个人机协同设计策略创建YOLO Nano意思,一个高度紧凑的网络高度定制的模块级macroarchitecture和微体系结构设计为嵌入式目标检测的任务。

2 Methods

In this study, we introduce YOLO Nano, a highly compact deep convolutional neural network for embedded object detection designed using a human-machine collaborative design strategy [21].
在本研究中，我们介绍了YOLO Nano，这是一个高度紧凑的深度卷积神经网络，用于嵌入式对象检测，使用人机协同设计策略[21]设计。
The human-machine collaborative design strategy for designing YOLO Nano comprises of two main design stages: i) principled network design prototyping, and ii) machine-driven design exploration.
设计YOLO Nano的人机协同设计策略包括两个主要的设计阶段:有原则的网络设计原型和机器驱动的设计探索。
2.1 Principled network design prototyping The first design stage in creating YOLO Nano is a principled network design prototyping stage, where we create an initial network design prototype (denoted as ϕ), based on human-driven design principles to guide the machine-driven design exploration stage.
2.1原则网络设计原型创建YOLO Nano意思第一个设计阶段是一个有原则的网络设计原型阶段,我们创建一个初始网络设计原型(表示ϕ),基于人为设计原则指导机动设计探索阶段。
More specifically, we construct an initial network design prototype based on the design principles of the YOLO family of single-shot architecture [12, 13, 14].
更具体地说，我们构建了一个基于YOLO家族单镜头架构设计原则的初始网络设计原型[12,13,14]。
A standout characteristic of the YOLO family of network architectures is that, unlike region proposal-based networks which rely on the construction of a regional proposal network to generate proposals for where objects lie in the scene followed by classification on the generated proposals, they instead leverage a single network architecture to process the input image and generate the output results.
的一个突出特征YOLO家庭网络架构的意思是,不同地区基于提案的网络依赖的建设区域建议网络场景中生成对象所在的建议之后,分类生成的建议,他们利用一个单一的网络体系结构来处理输入图像和生成输出结果。
As such, all object detection predictions for a single image are made in a single forward pass, compared to hundreds to thousands of passes that need to be performed to get the final results for region proposal-based networks.
因此，对于单个图像的所有对象检测预测都是在单个前向遍历中进行的，而对于基于区域提议的网络，需要执行数百到数千次遍历才能获得最终结果。
This makes the YOLO family of network architectures significantly faster to run, and thus better suited for embedded object detection.
这使得YOLO网络体系结构的运行速度显著提高，因此更适合于嵌入式对象检测。
The initial design prototype used in this study draws inspiration from the YOLO family of network architectures and is comprised of a stack of feature representation modules, with shortcut connections between the modules as with [14].
本研究使用的初始设计原型从网络架构的YOLO家族中获得灵感，由一组特征表示模块组成，模块之间采用[14]的快捷连接方式。
Also, as with [14], the feature representation modules are configured in a way, similar to feature pyramid networks [10], such that it is capable of representing features at three different scales.
此外，与[14]一样，特征表示模块的配置方式类似于特征金字塔网络[10]，因此它能够表示三种不同规模的特征。
These feature representation modules are followed by several convolutional layers, with output being a three-dimensional tensor that encodes bounding box, objectness, and class predictions for three different scales.
这些特征表示模块之后是几个卷积层，输出是一个三维张量，对三种不同尺度的边界框、对象和类预测进行编码。
As a result, this initial design prototype architecture design allows for efficient multi-scale object detection.
因此，这种初始设计原型体系结构设计允许有效的多尺度对象检测。
The actual macroarchitecture and microarchitecture designs of the individual modules and layers in the final YOLO Nano network architecture, as well as the number of network modules, are left for the machine-driven design exploration stage to determine automatically given data as well as human-specified design requirements and constraints designed specifically around edge and mobile scenarios with limited computational and memory capabilities.
个人的实际macroarchitecture和微体系结构设计模块和层在最后YOLO纳米网络结构,意思以及网络模块的数量,剩下机动设计探索阶段自动确定给定数据以及人工指定设计要求和约束专门在边缘和移动场景有限的计算和记忆功能。