READING NOTE: YOLO9000: Better, Faster, Stronger

TITLE: YOLO9000: Better, Faster, Stronger

AUTHOR: Joseph Redmon, Ali Farhadi

ASSOCIATION: University of Washington, Allen Institute for AI

FROM: arXiv:1612.08242

CONTRIBUTIONS

  1. several improvements have been made for YOLO.
  2. a new method is proposed to harness the large amount of classification data and use it to expand the scope of detection systems.
  3. a joint training algorithm is proposed that trains object detectors on both detection and classification data.

METHOD

The authors summarize the work as a better, faster and Stronger version of YOLO.

Better

  1. Batch Normalization

    Batch Normalization is used in this work. The authors claim that it helps YOLO get more than 2% improvement in mAP. Even though, I doubt BN would help or it might even worsen the performance in real world applications because of my own experience using BN.

  2. High Resolution Classifier

    Instead finetuned on 224*224 images, the classification network is finetuned on 448*448 images, which helps the network perform better on higher resolution. This high resolution classification network gives an increase of almost 4% mAP.

  3. Convolutional With Anchor Boxes

    In YOLOv2, anchor boxes and FCN manner are also adopted. This enbles the YOLO generate much more boxes, which improves recall from 81% (69.5 mAP) to 88% (69.2 mAP).

  4. Dimension Clusters

    Prior works usally define the anchor boxes by hand, for example 1:1, 1:2(2:1) or 1:3(3:1) in SSD. In this work, the anchor boxes are defined by clustering. K-means clustering is used and the distance metric is defined based on IOU, which eliminates the effect caused by the actual size of boxes: larger boxes generate more error than smaller boxes using Euclidean distance.

  5. Direct Location Prediction

    Instead of predicting offsets to the center of the bounding box, YOLO9000 predicts location coordinates relative to the location of the grid cell, which bounds the ground truth to fall between 0 and 1. Then constrained location prediction is easier to learn.

  6. Fine-Grained Features

    In order to ultize finer grained features for localizing smaller objects, the authors add a passthrough layer that brings features from an earlier layer.
    This is similar what has been done in ResNet.

  7. Multi-Scale Training

    Data of different resolutions are used to train the network. This regime forces the network to learn to predict well across a variety of input dimensions. This means the same network can predict detections at different resolutions.

Faster

  1. Instead of using VGG-16, The YOLO framework uses a custom network Darnet-19, which has has 19 convolutional layers and 5 maxpooling layers.

Stronger

  1. Hierarchical Classification

    A hierarchical prediction is built. Several nodes are added to build a tree. At each node, a semantic category is defined at a level. Thus images of different objects may be combined as one label because they belong to one higher level semantic label.

  2. Joint Classification and Detection

    Two datasets are used to train the large scale detetor. One is a traditional classification dataset, which contains a large number of categories. The other one is a detection dataset. When a detection image is seen, backpropagate loss as normal. For classification loss, only backpropagate loss at or above the corresponding level of the label.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值