检测回顾一之yolov2

最新推荐文章于 2024-08-31 08:02:30 发布

一名ai小菜鸡

最新推荐文章于 2024-08-31 08:02:30 发布

阅读量163

点赞数

分类专栏：经典目标检测回顾文章标签：深度学习

本文链接：https://blog.csdn.net/fxwfxw7037681/article/details/115870527

版权

经典目标检测回顾专栏收录该内容

2 篇文章 0 订阅

订阅专栏

yolov2论文解读

贡献

Using a novel, multi-scale training method the same YOLOv2 model can run at varying sizes, offering an easy tradeoff between speed and accuracy
we propose a method to jointly train on object detection and classification。 it predicts detections for more than 9000 different object categories. And it still runs in real-time。

Better改进

By adding batch normalization on all of the convolutional layers in YOLO we get more than 2% improvement in mAP。
we first fine tune the classification network at the full 448×448 resolution for 10 epochs on ImageNet， then fine tune the resulting network on detection，this gives us an increase of almost 4% mAP.。
We remove the fully connected layers from YOLO and
use anchor boxes to predict bounding boxes
- we eliminate one pooling layer to make the output of the network’s convolutional layers higher resolution
- We alsoshrink the network to operate on 416 input images instead of 448×448.We do this because we want an odd number of locations in our feature map so there is a single center cell.
- we also decouple the class prediction mechanism from the spatial location and instead predict class and objectness for every anchor box。
- With anchor boxes our model gets 69.2 mAP
  with a recall of 88%
we run k-meansclustering on the training set bounding boxes to automatically find good priors。for our distance metric we use: d(box,centroid) = 1 − IOU(box,centroid)
we predict location coordinates relative to the location of the grid cell. This bounds the ground truth to fall between 0 and 1. We use a logistic activation to constrain the network’s predictions to fall in this range.The network predicts 5 coordinates for each bounding box, $t_x, t_y, t_w, t_h$ , and $t_o$ . If the cell is offset from the top left corner of the image by $c_x, c_y)$ and the bounding box prior has width and height $p_w, p_h$ , then the predictions correspond to:

Using dimension clusters along with directly predicting the bounding box center location improves YOLO by almost 5% over the version with anchor boxes.
We take a different approach, simply adding a passthrough layer that brings features from an earlier layer at 26 × 26 resolution.This gives a modest 1% performance increase
Instead of fixing the input image size we change the network every few iterations. Every 10 batches our network randomly chooses a new image dimension size

Faster改进

We propose a new classification model to
be used as the base of YOLOv2, called Darknet-19, has 19 convolutional layers and 5 maxpooling layers

Stronger改进

We propose a mechanism for jointly training on classification and detection data.During training we mix images from both detection and classification datasets. When our network sees an image labelled for detection we can backpropagate based on the
full YOLOv2 loss function. When it sees a classification image we only backpropagate loss from the classification-specific parts of the architecture
Hierarchical classification,