DenseBox: Unifying Landmark Localization with End to End Object Detection 阅读笔记

1,(CHARACTERISTIC) A single FCN perform on object detection, which directly predicts bounding boxes and object class confidences through all locations and scales of an image, and does not require proposal generation.

2,(MOTIVATION) ① R-CNN is very hard to detect small objects since the low resolution and lack of contexts in each candidate box significantly decrease the classification accuracy on them. ② R-CNN with general proposal methods designed for general object detection could results in inferior performance in detection task such as face detection, due to loss recall for small-sized faces and faces in complex appearance variations.

3,(MERIT) Can detect objects under different scales with heavy occlusion extremely accurately and efficiently.

4,(SIMILAR) YOLO also predicts bounding boxes and class probabilities directly from full images in one evaluation.

5,(NETWORK)
这里写图片描述
5.1 The single convolutional network simultaneously output multiple predicted bounding boxes and class confidence.
5.2 The system takes an image(m×n) as input, and output a (m/4×n/4) feature map with 5 channels.

6,(KERNEL) Define the left top and right bottom points of the target bounding box in output coordinate space as pt=(xt,yt) p t = ( x t , y t ) and pb=(xb,yb) p b = ( x b , y b ) respectively, then each pixel i i located at (xi,yi) in the output feature map ti={scorei,xixt,yiyt,xixb,yiyb} t i = { s c o r e i , x i − x t , y i − y t , x i − x b , y i − y b } .

7,(TRAIN DATA) Crop large patches containing faces and sufficient background information on single scale for training, specificly, the patches are cropped and resized to 240×240 240 × 240 with a object in the center roughly has the height of 50 50 pixels, and each pixel can be treated as one sample , since every 5-channel pixel describe a bounding box.

8,(LABEL) The positive labeled region in the first channel of ground truth map is a filled circle with radius rc r c , located in the center of a face bounding box. The remaining 4 channels are filled with the distance between the pixel location of output map between the left top and right bottom corners of the nearest bounding box.

9,(NETWORK)
这里写图片描述
9.1 (INITIALIZATION) The whole network has 16 16 convolution layers, with the first 12 12 initialized by VGG-19 model.
9.2 (FEATURE FUSION) We concatenate feature map from conv3-4 and conv4-4, and we use a bilinear up-sampling layer to transform them to the same resolution.

10, (LOSS)
10.1 (BALANCE SAMPLE) Ignoring Gray Zone and Hard Negative Mining. We use a binary mask for each output pixel to indicate whether it is selected in training.
10.2 We normalize the regression target d by dividing by the standard object height.
10.3 这里写图片描述
这里写图片描述
10.4 Classification Loss: Lcls=yy2 L c l s = ‖ y − y ∗ ‖ 2 ; BBR loss: Lloc=i{tx,ty,bx,by}didi L l o c = ∑ i ∈ { t x , t y , b x , b y } ‖ d i − d i ∗ ‖ ;

11,(AUGMENTATIONS) We apply left-right flip, translation shift (of 25 pixels), and scale deformation (from [0:8; 1:25]).

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值