READING NOTE: SSD: Single Shot MultiBox Detector

TITLE: SSD: Single Shot MultiBox Detector##

AUTHER: Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

FROM: arXiv:1512.02325v2

CONTRIBUTIONS

  1. SSD, a single-shot detector for multiple categories is introduced that is fast and accurate.
  2. The network is easy to train, simple end-to-end training and high accuracy, even with relatively low resolution input images, further improving the speed vs accuracy trade-off.

METHOD

Network structure:

  1. Multiple scale feature maps from different layers are used in order to handle objects with different sizes.
  2. On each feature map used for detectoin, an unique small network (filter) is utilized to learn to predict category scores and location offsets.
  3. Each feature map corresponds to a fixed set of default boxes. These default boxes have different aspect ratios.

Training:

  1. Default and groundtruth boxes are matched. Each ground truth box is matched to the default box with the best jaccard overlap. On the other hand default boxes are matched to any ground truth with jaccard overlap higher than a threshold.
  2. The training objective is is a weighted sum of the localization loss (loc) and the confidence loss (conf):
    L(x,c,l,g)=1N(Lconf(x,c)+αLloc(x,l,g))

    where N is the number of matched default boxes, and the localization loss is the Smooth L1 loss between the predicted box (l) and the ground truth box (g) parameters. Confidence loss is the softmax loss over multiple classes confidences (c) .
  3. The scale of the default boxes for each (kth) feature map is computed as:
    sk=smin+smaxsminm1(k1)

    where smin=0.2 and smax=0.95 . The width of default box is skar and the height is sk/ar where ar is the aspect ratio. The centre of a default box at location of (i,j) in the kth feature map is (i+0.5|fk|,j+0.5|fk|) .
  4. Hard negatives are extracted. The unmatched default boxes are sorted according to confidence and top ones are used as hard negatives so that the ratio between the negatives and positives is at most 3:1.
  5. Data augmentation is done by using the entire original input image and sampling a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9.

ADVANTAGES

  1. It is fast because only one shot is utilized and the input is of lower resolution.
  2. Multiple scale feature maps are used so that it can handle objects with different sizes.
  3. End-to-end training.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值