论文翻译：You Only Look Once: Unified, Real-Time Object Detection

最新推荐文章于 2023-03-06 13:40:05 发布

b_dxac

最新推荐文章于 2023-03-06 13:40:05 发布

阅读量423

点赞数

分类专栏：论文

本文链接：https://blog.csdn.net/b_dxac/article/details/108789897

版权

YOLO是一种新的目标检测方法，将检测视为回归问题，直接预测边界框和类概率，实现端到端优化。YOLO模型速度快，基本版每秒处理45帧，快速版达到155帧，同时在定位准确性上优于其他实时系统。YOLO通过全局分析图像，减少背景误检，且具备良好的泛化能力，适合实时计算机视觉应用。

摘要由CSDN通过智能技术生成

摘要

翻译：

提出了一种新的目标检测方法YOLO。以前关于目标检测的工作使用分类器来执行检测。相反，我们将目标检测框架为一个回归问题，以特定的分离边界盒和相关的类概率。单个神经网络在一次评估中直接从完整的图像预测边界框和类概率。由于整个检测管道是一个单一网络，因此可以对其检测性能进行端到端的直接优化。

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regressionproblemtospatiallyseparatedboundingboxesand associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

翻译：

我们的统一架构非常快。我们的基本YOLO模型以每秒45帧的速度实时处理图像。一个更小的版本，快速YOLO，处理惊人的155帧每秒，同时仍然实现两倍于其他实时探测器的精度。与最先进的检测系统相比，YOLO有更多的定位错误，但不太可能预测背景的假阳性。最后，YOLO学习物体的一般表征。当从自然图像推广到艺术品等其他领域时，它的性能优于其他检测方法，包括DPM和R-CNN。

Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

1.说明

翻译：

人类瞥一眼图像，立即知道图像中有什么物体，它们在哪里，以及它们如何相互作用。人类的视觉系统快速而准确，使我们能够在很少有意识思考的情况下完成诸如驾驶这样复杂的任务。快速、准确的目标检测算法将使计算机无需专门的传感器就能驾驶汽车，使辅助设备能够向人类用户传递实时场景信息，并释放出用于通用、响应性机器人系统的潜力。

Humans glance at an image and instantly know what objects are in the image, where they are, and how they interact. The human visual system is fast and accurate, allowing us to perform complex tasks like driving with little conscious thought. Fast, accurate algorithms for object detection would allow computers to drive cars without specialized sensors, enable assistive devices to convey real-time scene information to human users, and unlock the potential for general purpose, responsive robotic systems.

翻译：

当前的检测系统重新使用分类器来执行检测。为了检测一个目标，这些系统对该目标进行分类，并在测试图像的不同位置和尺度上对其进行评估。像可变形部件模型(DPM)这样的系统使用滑动窗口方法，其中分类器在整个图像[10]的均匀间隔位置运行。

Current detection systems repurpose classifiers to perform detection. To detect an object, these systems take a classifier for that object and evaluate it at various locations and scales in a test image. Systems like deformable parts models (DPM) use a sliding window approach where the classifier is run at evenly spaced locations over the entire image [10].

翻译：

最近的一些方法，如R-CNN使用区域建议方法，首先在图像中生成潜在的边界盒，然后对这些被建议的盒运行分类器。分类完成后，通过后处理对边框进行细化，消除重复检测，并根据场景[13]中的其他对象对边框进行重核。这些复杂的管道速度很慢，而且很难优化，因为每个单独的组件都必须单独训练。

More recent approaches like R-CNN use region proposal methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene [13]. These complex pipelines are slow and hard to optimize because each individual component must be trained separately.

翻译：

我们将目标检测重构为一个单一的回归问题，直接从图像像素到边界盒坐标和类概率。使用我们的系统，你只需看一次图像就能预测出有什么物体存在以及它们在哪里。

We reframe object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. Using our system, you only look once (YOLO) at an image to predict what objects are present and where they are.

图1:YOLO检测系统。用YOLO处理图像是简单而直接的。我们的系统(1)将输入图像的大小调整为448×448，(2)对图像运行单一卷积网络，(3)根据模型的置信度对检测结果进行阈值。

Figure 1: The YOLO Detection System. Processing images with YOLO is simple and straightforward. Our system (1) resizes the input image to 448 × 44

翻译：

YOLO非常简单:参见图1。一个卷积网络同时预测多个边界框和这些框的类概率。YOLO训练完整的图像和直接优化检测性能。与传统的目标检测方法相比，这种统一的模型有几个优点。

YOLO is refreshingly simple: see Figure 1. A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO trains on full images and directly optimizes detection performance. This unified model has several benefits over traditional methods of object detection.8, (2) runs a single convolutional network on the image, and (3) thresholds the resulting detections by the model’s confidence.

翻译：

首先，YOLO非常快。由于我们将帧检测作为一个回归问题，我们不需要一个复杂的管道。我们只需在测试时对新图像运行我们的神经网络来预测检测结果。我们的基本网络运行速度为45帧/秒，在泰坦X GPU上没有批处理，快速版本运行速度超过150帧/秒。这意味着我们可以在延迟小于25毫秒的情况下实时处理流视频。此外，YOLO达到了其他实时系统平均精度的两倍以上。关于我们的系统在webcam上实时运行的演示，请查看我们的项目网页:http://pjreddie.com/yolo/。

First, YOLO is extremely fast. Since we frame detection as a regression problem we don’t need a complex pipeline. We simply run our neural network on a new image at test time to predict detections. Our base network runs at 45 frames per second with no batch processing on a Titan X GPU and a fast version runs at more than 150 fps. This means we can process streaming video in real-time with less than 25 milliseconds of latency. Furthermore, YOLO achieves more than twice the mean average precision of other real-time systems. For a demo of our system running in real-time on a webcam please see our project webpage: http://pjreddie.com/yolo/.

翻译：

第二，YOLO在做预测的时候从全球范围内对这幅图片进行分析。与基于滑动窗口和区域提议的技术不同，YOLO在训练和测试时看到整个图像，因此它隐式地编码有关类的上下文信息以及它们的外观。快速R-CNN是一种顶级的检测方法[14]，由于它看不到更大的背景，它会将图像中的背景补丁误当成物体。YOLO产生的背景错误少于Fast R-CNN的一半。

Second, YOLO reasons globally about the image when making predictions. Unlike sliding window and region proposal-based techniques, YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance. Fast R-CNN, a top detection method [14], mistakes background patches in an image for objects because it can’t see the larger context. YOLO makes less than half the number of background errors compared to Fast R-CNN.

翻译：

第三，YOLO学会了可归纳的对象表征。当在自然图像上进行训练和在艺术品上进行测试时，YOLO大大超过了DPM和R-CNN等顶级检测方法。因为YOLO是高度概括的，当应用到新领域或意外输入时，它不太可能崩溃。

Third, YOLO learns generalizable representations of objects. When trained on natural images and tested on artwork, YOLO outperforms top detection methods like DPM and R-CNN by a wide margin. Since YOLO is highly generalizable it is less likely to break down when applied to new domains or unexpected inputs.

翻译：

YOLO在准确度方面仍然落后于最先进的检测系统。虽然它可以快速识别图像中的物体，但它很难精确定位一些物体，特别是小的。我们在实验中进一步研究了这些权衡。我们所有的培训和测试代码都是开源的。各种预先训练的模型也可以下载。

YOLO still lags behind state-of-the-art detection systems in accuracy. While it can quickly identify objects in images it struggles to precisely localize some objects, especially small ones. We examine these tradeoffs further in our experiments. All of our training and testing code is open source. A variety of pretrained models are also available to download.

2 统一检测

翻译：

我们将目标检测的各个部分统一到一个单一的神经网络中。我们的网络使用来自整个图像的特征来预测每个边界框。它还可以同时预测图像中所有类的所有边框。这意味着我们的网络在全球范围内对整个图像和图像中的所有对象进行分析。YOLO设计使端到端训练和实时速度，同时保持高平均精度。

We unify the separate components of object detection into a single neural network. Our network uses features from the entire image to predict each bounding box. It also predicts all bounding boxes across all classes for an image simultaneously. This means our network reasons globally about the full image and all the objects in the image. The YOLO design enables end-to-end training and realtime speeds while maintaining high average precision.

翻译：

我们的系统将输入图像划分为S×S网格。如果一个对象的中心落在一个网格单元格中，该网格单元格负责检测该对象。

Our system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.

翻译：

每个网格单元预测B边界框和这些框的置信值。这些置信分数反映了模型对边界框里有一个物体的信心程度，也反映了它对框预测的准确度。形式上，我们将置信度定义为。如果该单元格中不存在对象，置信度分数应该为零。否则，我们希望置信分数等于预测框与真实框之间的并集的交集(IOU)。

Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and

最低0.47元/天解锁文章

b_dxac

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
论文翻译：You Only Look Once: Unified, Real-Time Object Detection

摘要翻译：提出了一种新的目标检测方法YOLO。以前关于目标检测的工作使用分类器来执行检测。相反，我们将目标检测框架为一个回归问题，以特定的分离边界盒和相关的类概率。单个神经网络在一次评估中直接从完整的图像预测边界框和类概率。由于整个检测管道是一个单一网络，因此可以对其检测性能进行端到端的直接优化。We present YOLO, a new approach to object detection. Prior work on object detection repurposes cla.
复制链接

扫一扫

专栏目录