2017-09-04 You Only Look Once: Unified, Real-Time Object Detection(YOLO)

来源知乎:https://zhuanlan.zhihu.com/p/2491678

 You Only Look Once: Unified, Real-Time Object Detection(YOLO)

    1. Question
      1. Prior work on object detection repurposes classifiers to perform detection. Can we think of object detection as a regression problem?
      2. Object proposal and classifier is apart. Can we predict bounding box and class probabilities?
    2. Solution

Frame object detection as a regression problem to separated bounding boxes and associated class probabilities.

    1. Advantage and Contribution
      1. A single convolutional network simultaneously predicts multiple bounding

boxes and class probabilities for those boxes.

      1. Simple and fast. Don’t need a complex pipeline and run at 45 frames per second with no batch processing.
      2. YOLO reasons globally about the image when making predictions, so YOLO makes less than half the number of background errors compared to Fast R-CNN.
      3. According to inception modules, this paper use 1*1 reduction layers followed by 3*3 convolutional layers to construct Network, which greatly reduces the amount of calculation and parameters.
      4. Predict the square root of the bounding box width and height instead of the width and height directly to reduce error of different boxes.

    1. Weakness
      1. Each grid cell only predicts two boxes and can only have one class. This spatial constraint limits the number of nearby objects that the model can predict. In other words, this model predicts less accuracy when approaching smaller targets.
      2. The model also uses relatively coarse features for predicting bounding boxes since the architecture has multiple downsampling layers from the input image.
      3. loss function treats errors the same in small bounding boxes versus large bounding boxes, and the error of incorrect localizations is big.
    2. Model overview

Define confidence:

C=PrObject*IOU

Class-specific confidence scores for each box(Use it at test time.

PrClassiObject*PrObject*IOU=PrClassi*IOU

 

 
 


Detection principle

 


Network structure

Alternating 1 × 1 convolutional layers reduce the features space from preceding layers. The model pretrain the convolutional layers on the ImageNet classification task at half the resolution (224 × 224 input image) and then double the resolution for detection.

Implementation steps:

1)Reshape the input image into 488*488, and divide the input image into an S × S grid.

2)For each grid sell, get a vector of 30 dimensions(B*5+C(two bounding boxes so B=2. 5 predictions: x, y, w, h, and confidence. 20 labelled classes so C = 20))through the network of 24 convolutional layers followed by 2 fully connected layers. The final output of the network is the 7 × 7 × 30 tensor of predictions.

 

The multi-part loss function at train time

  1. 3) Non-maximal suppression is used to get the final detection.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值