High Performance Visual Tracking with Siamese Region Proposal Network 阅读笔记

这里写图片描述

1,(IDEA) In tracking task we don’t have pre-defined categories, so we need the template branch to encode the target’s appearance information into the RPN feature map to discriminate foreground from background.

2,(RPN) RPN has many successful applications in detection because of its speed and great performance, however, it hasn’t been fully exploited in tracking.

3,(NETWORK) we use the modified AlexNet where the groups from conv2 and conv4 are removed.

4,(KERNEL) The template feature maps [Math Processing Error] [ φ ( z ) ] c l s and [Math Processing Error] [ φ ( z ) ] r e g are used as kernels.

5,(LOSS) Softmax loss is adopted to supervise the classification branch. Loss for classification is the cross-entropy loss and we adopt smooth L1 loss with normalized coordinates for regression.

6,(DATA) During the training phase, sample pairs are picked from ILSVRC with a random interval and from Youtube-BB continuously. We extract image pairs from VID and Youtube-BB by choosing frames with interval less than 100 and performing further crop procedure

7,(TRAIN) We train Siamese-RPN end-to-end using Stochastic Gradient Descent (SGD) after the Siamese subnetwork being pretrained using Imagenet.

8,(AUGMENTATIONS) Because of the need of training regression branch, some data augmentations are adopted including affine transformation.

9,(SAMPLE) The criterion used in object detection task is adopted here that we use IoU together with two thresholds 0.6 and 0.3.

10,(SAMPLE) We also limit at most 16 positive samples and totally 64 samples from one training pair.

11,(TRICK) The first proposal selection strategy is discarding the bounding boxes generated by the anchors too far away from the center. We only keep the center 7×7 anchors.

12,(TRICK) The second proposal selection strategy is that we use cosine window and scale change penalty to re-rank the proposals’ score to get the best one.

13,(TRICK) After the final bounding box is selected, target size is updated by linear interpolation to keep the shape changing smoothly.

14,(TRAIN) We use a modified AlexNet pretrained from ImageNet with the parameters of the first three convolution layers fixed and only fine-tune the last two convolution layers in Siamese-RPN.

15,(TRAIN) There are totally 50 epoches performed and the learning rate is decreased in log space from 10−2 to 10−6.

16,(PLATFORM) Our experiments are implemented using PyTorch.

17,(ACCURACY) VOT2016 EAO:0.3441,OTB2015 AUC:0.637.

18,(SPEED) 160fps.

这里写图片描述

这里写图片描述

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值