论文翻译:YOLO Nano

Object detection remains an active area of research in the field of computer vision, and considerable advances and successes has been achieved in this area through the design of deep convolutional neural networks for tackling object detection.
Despite these successes, one of the biggest challenges to widespread deployment of such object detection networks on edge and mobile scenarios is the high computational and memory requirements.
As such, there has been growing research interest in the design of efficient deep neural network architectures catered for edge and mobile usage.
In this study, we introduce YOLO Nano, a highly compact deep convolutional neural network for the task of object detection.
在本研究中,我们介绍了一种高度紧凑的深度卷积神经网络YOLO Nano来完成目标检测任务。
A human-machine collaborative design strategy is leveraged to create YOLO Nano, where principled network design prototyping, based on design principles from the YOLO family of single-shot object detection network architectures, is coupled with machinedriven design exploration to create a compact network with highly customized module-level macroarchitecture and microarchitecture designs tailored for the task of embedded object detection.
人机协同设计策略是用于创建YOLO Nano意思,有原则的网络设计原型,基于设计原则YOLO家庭意思的单发对象检测网络架构,加上machinedriven设计探索与高度定制的模块级macroarchitecture创建一个紧凑的网络和微体系结构设计为嵌入式目标检测的任务。
The proposed YOLO Nano possesses a model size of 4.0MB (>15.1 and >8.3 smaller than Tiny YOLOv2 and Tiny YOLOv3, respectively) and requires 4.57B operations for inference (>34% and 17% lower than Tiny YOLOv2 and Tiny YOLOv3, respectively) while still achieving an mAP of 69.1% on the VOC 2007 dataset ( 12% and 10.7% higher than Tiny YOLOv2 and Tiny YOLOv3, respectively).
提出YOLO纳米具有意思模型大小为4.0 mb(> 15.1 - > 8.3小于小YOLOv2和小YOLOv3,分别),需要4.57 b操作推理(> 34%和17%低于小YOLOv2和小YOLOv3,分别)同时还达到69.1%的地图2007年VOC数据集(12%和10.7%高于小YOLOv2和小YOLOv3,分别)。
Experiments on inference speed and power efficiency on a Jetson AGX Xavier embedded module at different power budgets further demonstrate the efficacy of YOLO Nano for embedded scenarios.
通过对Jetson AGX Xavier嵌入式模块在不同功耗下的推理速度和功耗效率的实验,进一步验证了YOLO Nano在嵌入式场景下的有效性。

1 Introduction

An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within a scene, but also assign a class label to each of these objects of interest.
Considerable recent successes in the area of object detection stems from modern advances in deep learning [8, 7], particularly leveraging deep convolutional neural networks.
Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD [11], R-CNN [2], Mask R-CNN [3], and other extended variants of these networks [6, 9, 18].
最初的重点是提高准确性,导致越来越复杂的目标检测网络,如SSD[11]、R-CNN[2]、Mask R-CNN[3],以及这些网络的其他扩展变体[6,9,18]。
While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints.
In fact, even faster variants such as Faster R-CNN [15] have inference speeds at low single-digit frame rates when running on embedded processors.
This greatly limits the widespread adoption of such networks for a wide range of applications such as unmanned aerial vehicles, video surveillance, autonomous driving where local embedded processing is required.
To address this challenge of achieving embedded object detection, there has been a growing interest in the exploration and design of highly efficient deep neural network architectures for object detection that are more well-suited for edge and mobile devices
A particularly interesting family of object detection networks designed around efficiency is the YOLO family of neural network architectures [12, 13, 14], which leverage a number of design principles to create single-shot architectures which can achieve embedded object detection performance on high-end desktop GPUs.
However, these network architectures remain too large for many edge and mobile scenarios (e.g.,240MB in the case of the YOLOv3 architecture), and their inference speeds drop considerably when running on edge and mobile processors due to computational complexity (e.g., >65B operations in the case of YOLOv3).
To address this issue, Redmon et al. introduced the Tiny YOLO family of network architectures, which has greatly reduced model sizes at a cost of object detection performance.
In this study, we are motivated to explore a human-machine collaborative design strategy to designing highly compact deep convolutional neural networks for the task of object detection, where principled network design prototyping is coupled with machine-driven design exploration.
More specifically, we leverage the design principles from the YOLO family of single-shot object detection network architectures within this human-machine collaborative design strategy to create YOLO Nano, a highly compact network with highly customized module-level macroarchitecture and microarchitecture designs tailored for the task of embedded object detection.
更具体地说,我们利用设计原则YOLO家庭意思的单发对象检测网络架构在这个人机协同设计策略创建YOLO Nano意思,一个高度紧凑的网络高度定制的模块级macroarchitecture和微体系结构设计为嵌入式目标检测的任务。

2 Methods

In this study, we introduce YOLO Nano, a highly compact deep convolutional neural network for embedded object detection designed using a human-machine collaborative design strategy [21].
在本研究中,我们介绍了YOLO Nano,这是一个高度紧凑的深度卷积神经网络,用于嵌入式对象检测,使用人机协同设计策略[21]设计。
The human-machine collaborative design strategy for designing YOLO Nano comprises of two main design stages: i) principled network design prototyping, and ii) machine-driven design exploration.
设计YOLO Nano的人机协同设计策略包括两个主要的设计阶段:有原则的网络设计原型和机器驱动的设计探索。
2.1 Principled network design prototyping The first design stage in creating YOLO Nano is a principled network design prototyping stage, where we create an initial network design prototype (denoted as ϕ), based on human-driven design principles to guide the machine-driven design exploration stage.
2.1原则网络设计原型创建YOLO Nano意思第一个设计阶段是一个有原则的网络设计原型阶段,我们创建一个初始网络设计原型(表示ϕ),基于人为设计原则指导机动设计探索阶段。
More specifically, we construct an initial network design prototype based on the design principles of the YOLO family of single-shot architecture [12, 13, 14].
A standout characteristic of the YOLO family of network architectures is that, unlike region proposal-based networks which rely on the construction of a regional proposal network to generate proposals for where objects lie in the scene followed by classification on the generated proposals, they instead leverage a single network architecture to process the input image and generate the output results.
As such, all object detection predictions for a single image are made in a single forward pass, compared to hundreds to thousands of passes that need to be performed to get the final results for region proposal-based networks.
This makes the YOLO family of network architectures significantly faster to run, and thus better suited for embedded object detection.
The initial design prototype used in this study draws inspiration from the YOLO family of network architectures and is comprised of a stack of feature representation modules, with shortcut connections between the modules as with [14].
Also, as with [14], the feature representation modules are configured in a way, similar to feature pyramid networks [10], such that it is capable of representing features at three different scales.
These feature representation modules are followed by several convolutional layers, with output being a three-dimensional tensor that encodes bounding box, objectness, and class predictions for three different scales.
As a result, this initial design prototype architecture design allows for efficient multi-scale object detection.
The actual macroarchitecture and microarchitecture designs of the individual modules and layers in the final YOLO Nano network architecture, as well as the number of network modules, are left for the machine-driven design exploration stage to determine automatically given data as well as human-specified design requirements and constraints designed specifically around edge and mobile scenarios with limited computational and memory capabilities.





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


