检测回顾一之yolov1

yolo v1论文解读

所有都摘抄自论文

优势与缺陷

优势:

  1. First, YOLO is extremely fast
  2. YOLO sees the entire image during training and test time so it implicitly encodes contex-
    tual information about classes as well as their appearance,YOLO makes less than half the number
    of background errors compared to Fast R-CNN.
  3. YOLO learns generalizable representations of objects.Since YOLO is highly generalizable it is less likely to break down when applied to new domains or unexpected inputs.

缺陷:

  1. While it can quickly identify objects in images it struggles to precisely localize some objects, especially small ones.
  2. each grid cell only predicts two boxesand can only have one class. This spatial constraint limits the number of nearby objects that our model can predict. Our model struggles with small objects that appear in groups
  3. our model learns to predict bounding boxes from data, it struggles to generalize to objects in new or unusual aspect ratios or configurations
  4. Our model also uses relatively coarse features for predicting bounding boxes since our architecture has multiple downsampling layers from the input image
  5. our loss function treats errors the same in small bounding boxes versus large bounding boxes
论文思路
  • Our system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
  • Each grid cell predicts B bounding boxes and confidence scores for those boxes
  • Each bounding box consists of 5 predictions: x, y, w, h, and confidence
  • Each grid cell also predicts C conditional class probabilities,Pr( C l a s s i {Class}_i Classi|Object). We only predict one set of class probabilities per grid cell, regardless of the number of boxes B
  • These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
  • we use S = 7,B = 2. PASCAL VOC has 20 labelled classes so C = 20. Our final prediction is a 7 × 7 × 30 tensor

网络结构

在这里插入图片描述

  • Our detection network has 24 convolutional layers followed by 2 fully connected layers
  • We pretrain the convolutional layers on the ImageNet classification task at half the resolution (224 × 224 input image) and then double the resolution for detection.
  • Detection often requires fine-grained visual information so we increase the input resolution of the network from 224 × 224 to 448 × 448.
  • We normalize the bounding box width and height by the image width and height so that they fall between 0 and 1.
  • We parametrize the bounding box x and y coordinates to be offsets of a particular grid cell location so they are also bounded between 0 and 1
  • We use a linear activation function for the final layer and all other layers use the following leaky rectified linear activation:
    在这里插入图片描述

loss设计

  • We use sum-squared error because it is easy to optimize
  • It weights localization error equally with classification error which may not be ideal
  • in every image many grid cells do not contain any object. This pushes the “confidence” scores of those cells towards zero, often overpowering the gradient from cells that do contain objects. This can lead to model instability, causing training to diverge early on.
  • To remedy this, we increase the loss from bounding box coordinate predictions and decrease the loss from confidence predictions for boxes that don’t contain objects.
  • We use two parameters, λ c o o r d λ_{coord} λcoord and λ n o o b j λ_{noobj} λnoobj to accomplish this. We set λ c o o r d λ_{coord} λcoord= 5 and λ n o o b j λ_{noobj} λnoobj= 0.5.
  • Sum-squared error also equally weights errors in large boxes and small boxes. Our error metric should reflect that small deviations in large boxes matter less than in small boxes
  • To partially address this we predict the square root of the bounding box width and height instead of the width and height directly
  • the loss function only penalizes classificationerror if an object is present in that grid cell (hence the conditional class probability discussed earlier)
  • It also only penalizes bounding box coordinate error if that predictor is “responsible” for the ground truth box (i.e. has the highest IOU of any predictor in that grid cell).
  • During training we optimize the following, multi-part loss function:
    在这里插入图片描述在这里插入图片描述
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值