Reading Note: Detect to Track and Track to Detect

42 篇文章 0 订阅
1 篇文章 0 订阅

TITLE: Detect to Track and Track to Detect

AUTHOR: Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman

ASSOCIATION: Graz University of Technology, University of Oxford

FROM: arXiv:1710.03958

CONTRIBUTION

  1. A ConvNet architecture is set up for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression.
  2. Correlation features that represent object co-occurrences across time are introduced to aid the ConvNet during tracking.
  3. Frame-level detections are linked to produce high accuracy detections at the video-level based on across-frame tracklets .

METHOD

For frame-level detections, this work adopts R-FCN as the base framework to detect objects in a single frame. The inter-frame correlation features are extracted from the feature maps of the two frames. A multi-task loss of localization, classification and displacement is used to train the net work. The workflow of this work is shown in the following figure.

Framework

The key innovation of this work is an operation denoted as ROI tracking. The input of this operation is the bounding box regression features of the two frames

xtreg
,
xt+τreg
and the correlation features
xt,t+τcorr
, which are concatenated. The correlation layer performs point-wise feature comparison of two feature maps
xtl
,
xt+τl

xt,t+τcorr(i,j,p,q)=xtl(i,j),xt+τl(i+p,j+q)

where dpd and dqd are offsets to compare features in a square neighbourhood around the locations i , j in the feature map, defined by the maximum displacement d .

The loss function is written as

Loss({pi},{bi},{Δi})=1Ni=1NLcls(pi,c)+λ1Nfgi=1N[ci>0]Lreg(bi,bi)+λ1Ntrai=1NtraLtra(Δt+τi,Δ,t+τi)

A class-wise linking score is defined to combine detections and tracks across time

sc(Dti,c,Dt+τj,c,Tt,t+τ)=pti,c+pt+τj,c+ϕ(Dti,Dj,Tt,t+τ)

where the pairwise term ϕ evaluates to 1 if the IoU overlap a track correspondences Tt,t+τ with the detection boxes Dti , Dt+τi is larger than 0.5. pti,c , pt+τj,c is the softmax probability for class c . The optimal path across a video can be found by maximizing the scores over the duration T of the video. Once the optimal tube is found, the detections corresponding to that tube are removed. Then reweight the detection scores in the tube by adding the mean of the 50% highest scores in that tube. And the procedure is applied again to the remaining detections.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值