CV之DL之YoloV2：Yolo V2算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

一个处女座的程序猿

已于 2024-01-04 19:09:02 修改

阅读量1.2w

点赞数 10

分类专栏： DL/R CV/MLM 文章标签： YOLO 算法

于 2018-04-16 00:31:54 首次发布

本文链接：https://blog.csdn.net/qq_41185868/article/details/79955385

版权

DL/R 同时被 2 个专栏收录

397 篇文章 66 订阅

订阅专栏

CV/MLM

259 篇文章 237 订阅

订阅专栏

CV之DL之YoloV2：Yolo V2算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

相关论文

《YOLO9000: Better, Faster, Stronger》翻译与解读

地址	论文地址：https://arxiv.org/abs/1612.08242
时间	2016年12月25日
作者	Joseph Redmon, Ali Farhadi
总结	这篇论文提出了一个实时的目标检测方法YOLO9000，主要内容总结如下：背景： >> 目前的对象检测方法主要面临三个问题：检测速度快、检测准确和能识别广泛对象种类。 >> 对象检测数据集相对分类数据集对象种类少，标签图片少，难以像分类那样覆盖广泛类别。论文解决方案： >> 提出优化后的检测方法YOLOv2，在速度和准确率上都优于前沿方法，成为实时检测的新标准。 >> 提出共同训练分类与检测数据的方法，将ImageNet分类数据引入检测，使检测类别从几百扩展到9000多个。 >> 构建WordsTree ontology树结构，将不同数据集如COCO、ImageNet等结合起来，保留各数据集本身结构。 >> 联合训练分类与检测模型，利用COCO数据训练定位与分类细节，利用 ImageNet扩展识别类别。核心特点： >> YOLOv2实现实时速度同时达到最先进准确率。 >> YOLO9000可以实时检测9000多个客观类别。 >> 联合分类与检测数据训练，弱监督下也能学习大量新类别。 >> WordsTree结构灵活结合不同数据集，既保留各自属性也实现整合。优势： >> 实现实时广泛类别检测，为视觉应用如自动驾驶提供技术支撑。 >> 利用已有的数据集最大限度训练更强大模型，较好解决检测数据集规模受限问题。 >> 方法思想可扩展到其他视觉任务如语义分割，且训练技术如多尺度训练也有借鉴意义。

Abstract

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. Us-ing a novel, multi-scale training method the same YOLOv2 model can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster R-CNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on ob-ject detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don’t have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts de-tections for more than 9000 different object categories. And it still runs in real-time.

我们介绍了YOLO9000，这是一种先进的实时目标检测系统，可以检测超过9000个目标类别。首先，我们对YOLO检测方法进行了各种改进，既包括新颖的方法，也包括借鉴于先前的工作。改进后的模型，YOLOv2，在PASCAL VOC和COCO等标准检测任务中达到了最先进的水平。通过一种新颖的多尺度训练方法，同一YOLOv2模型可以在不同尺寸上运行，提供了速度和准确性之间的便捷权衡。在67 FPS的速度下，YOLOv2在VOC 2007上达到了76.8 mAP。在40 FPS的速度下，YOLOv2获得了78.6 mAP，优于像Faster R-CNN with ResNet和SSD这样的最新方法，同时运行速度明显更快。最后，我们提出了一种同时在目标检测和分类上进行联合训练的方法。使用这种方法，我们同时在COCO检测数据集和ImageNet分类数据集上训练YOLO9000。我们的联合训练使YOLO9000能够预测那些没有标记检测数据的目标类别。我们在ImageNet检测任务上验证了我们的方法。尽管仅对200个类别中的44个进行了检测数据的训练，YOLO9000在ImageNet检测验证集上获得了19.7 mAP。在COCO中没有的156个类别中，YOLO9000获得了16.0 mAP。但是YOLO不仅能检测超过200个类别，还可以预测超过9000个不同的目标类别，并且仍然能够实时运行。

Conclusion

We introduce YOLOv2 and YOLO9000, real-time de-tection systems. YOLOv2 is state-of-the-art and faster than other detection systems across a variety of detection datasets. Furthermore, it can be run at a variety of image sizes to provide a smooth tradeoff between speed and accu-racy.

YOLO9000 is a real-time framework for detection more than 9000 object categories by jointly optimizing detection and classification. We use WordTree to combine data from various sources and our joint optimization technique to train simultaneously on ImageNet and COCO. YOLO9000 is a strong step towards closing the dataset size gap between de-tection and classification.

Many of our techniques generalize outside of object de-tection. Our WordTree representation of ImageNet offers a richer, more detailed output space for image classification. Dataset combination using hierarchical classification would be useful in the classification and segmentation domains. Training techniques like multi-scale training could provide benefit across a variety of visual tasks.

For future work we hope to use similar techniques for weakly supervised image segmentation. We also plan to improve our detection results using more powerful match-ing strategies for assigning weak labels to classification data during training. Computer vision is blessed with an enor-mous amount of labelled data. We will continue looking for ways to bring different sources and structures of data together to make stronger models of the visual world.

我们介绍了YOLOv2和YOLO9000，这是实时目标检测系统。YOLOv2在各种检测数据集上都达到了最先进的水平，并且比其他检测系统更快。此外，它可以在各种图像尺寸上运行，以在速度和准确性之间提供平滑的权衡。

YOLO9000是一个实时框架，通过联合优化检测和分类来检测超过9000个目标类别。我们使用WordTree将来自各种来源的数据结合在一起，采用联合优化技术同时在ImageNet和COCO上进行训练。YOLO9000是在检测和分类之间缩小数据集差距的重要一步。

我们许多技术在目标检测之外也具有推广性。我们对ImageNet的WordTree表示提供了更丰富、更详细的图像分类输出空间。使用层次分类的数据集组合在分类和分割领域可能会很有用。像多尺度训练这样的训练技术可能在各种视觉任务中提供益处。

对于未来的工作，我们希望使用类似的技术进行弱监督图像分割。我们还计划通过在训练过程中使用更强大的匹配策略为分类数据分配弱标签来改善我们的检测结果。计算机视觉受益于大量标记数据。我们将继续寻找将不同来源和结构的数据结合在一起以构建更强大的视觉模型的方法。

Yolo V2算法的简介

1、YOLOV2的特点、改进、优缺点

(1)、YOLOV2的特点

YOLOv2是YOLO的第二个版本，其目标是在提高速度的同时显著提高准确度。
与基于proposal的检测器相比，YOLOv1定位误差更高，并且召回率（测量所有目标的定位有多好）更低。
SSD是YOLOv1的强大竞争对手，它在某一方面表现出更高的实时处理精度。

(2)、YOLOV2的改进处

YOLO v2: 使用一系列的方法对YOLO v1进行了改进，在保持原有速度的同时提升准确度。
YOLO9000: 提出了一种目标分类与检测的联合训练方法，通过WordTree来混合检测数据集与识别数据集之中的数据，同时在COCO和ImageNet数据集中进行训练得到YOLO9000，实现9000多种目标的实时检测。

2、实验结果

VOC2007数据集

Here is the accuracy improvements after applying the techniques discussed so far:
注：anchor机制只是试验性在YOLOv2上尝试，一旦有了dimension priors就把anchor抛弃了。最后达到78.6mAP的模型上也没用anchor boxes。