[论文阅读 2020 AAAI 目标跟踪]SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation

最新推荐文章于 2023-07-02 16:13:40 发布

lingqing97

最新推荐文章于 2023-07-02 16:13:40 发布

阅读量1.1k

点赞数

分类专栏：论文阅读文章标签：目标跟踪机器学习人工智能深度学习

本文链接：https://blog.csdn.net/qq_39621037/article/details/115220973

版权

论文阅读专栏收录该内容

19 篇文章 5 订阅

订阅专栏

简介

paper:SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

code:MegviiDetection/video_analyst

这篇论文在SiamFC基础上进行了改进，提出了SiamFC++,并在OTB2015,VOT2018,LaSOT,GOT-10k和TrackingNet上取得了SOTA.

这篇论文基于以下原则设计了SiamFC++:

G1: (decomposition of classification and state estimation) The tracker should perform two sub-tasks: classification and state estimation.(跟踪模型应该包含分类和状态估计两个子任务)
G2: (non-ambiguous scoring) The classification score should represent the confidence score of target existence directly, in the ”field of view”, i.e. subwindow of the corresponding pixel, rather than the pre-defined settings like anchor boxes.(跟踪模型需要无歧义的评分机制)
G3: (prior knowledge-free) Tracking approaches should be free of prior knowledge like scale/ratio distribution.(跟踪模型不应该要求有太多先验知识)
G4: (estimation quality assessment) An estimation quality score independent of classification should be used.(需要对跟踪的质量进行评估)

主要内容

在这里插入图片描述

如上图所示是SiamFC++的主要框架，其中蓝色部分和红色部分是相比于原版SiamFC新添加的分支. 不同于SiamFC,SiamFC++多了quality评估分支和regression bbox预测分支.

Siamese-based Feature Extraction and Matching

通过cross-correlation计算embedding feature的过程可以表示为:

$f_{i}(z, x)=\psi_{i}(\phi(z)) \star \psi_{i}(\phi(x)), i \in\{\mathrm{cls}, \mathrm{reg}\}$

both $ψ_{cls}$ and $ψ_{reg}$ after common feature extraction to adjust the common features into task-specific feature space. Note that the extracted features of $ψ_{cls}$ and $ψ_{reg}$ are of the same size.

Application of Design Guidelines in Head Network

在提取了embedding feature后，设计了classification head和regression head将模型划分为分类任务和回归任务(依据原则G1).

regression head分支输出一个4维的向量 $\boldsymbol{t}^{*}=\left(l^{*}, t^{*}, r^{*}, b^{*}\right)$ ,各元素代表的含义如下(其中s表示backbone网络的步长，这篇论文中是8):

$\begin{array}{ll} l^{*} & =\left(\left\lfloor\frac{s}{2}\right\rfloor+x s\right)-x_{0}, \quad t^{*}=\left(\left\lfloor\frac{s}{2}\right\rfloor+y s\right)-y_{0} \\ r^{*} & =x_{1}-\left(\left\lfloor\frac{s}{2}\right\rfloor+x s\right), \quad b^{*}=y_{1}-\left(\left\lfloor\frac{s}{2}\right\rfloor+y s\right) \end{array}$

where $x_0, y_0)$ and $x_1, y_1)$ denote the left-top and rightbottom corners of the ground-truth bounding box B∗ associated with point (x, y).

classification head一个分支输出分类的分数 $ψ_{cls}$

location (x, y) on feature map $ψ_{cls}$ is considered as a positive sample if its corresponding location $\left(\left\lfloor\frac{s}{2}\right\rfloor+x s,\left\lfloor\frac{s}{2}\right\rfloor+y s\right)$ on the input image falls into the ground-truth bounding box. Otherwise,it is a negative sample.

另一个分支预测PSS(论文中指出也可以使用IOU),这个分支是为了评估预测的bbox质量，用于抑制远离目标中心的bbox.

Training Objecti

最终优化的损失函数如下:

$\begin{array}{r} L\left(\left\{p_{x, y}\right\}, q_{x, y},\left\{\boldsymbol{t}_{x, y}\right\}\right)=\frac{1}{N_{\mathrm{pos}}} \sum_{x, y} L_{\mathrm{cls}}\left(p_{x, y}, c_{x, y}^{*}\right) \\ +\frac{\lambda}{N_{\mathrm{pos}}} \sum_{x, y} 1_{\left\{c_{x, y}^{*}>0\right\}} L_{\text {quality }}\left(q_{x, y}, q_{x, y}^{*}\right) \\ +\frac{\lambda}{N_{\mathrm{pos}}} \sum_{x, y} 1_{\left\{c_{x, y}^{*}>0\right\}} L_{\mathrm{reg}}\left(\boldsymbol{t}_{x, y}, \boldsymbol{t}_{x, y}^{*}\right) \end{array}$

其中 $L_{cls}$ 使用focal loss(参考Focal Loss for Dense Object Detection), $L_{quality}$ 使用BCE loss, $L_{reg}$ 使用IOU loss.

补充

论文作者在Appendices B中对预测过程的处理进行了更详细的讨论.

模型将cls_score和quality_accessment进行element-wise production后得到score map.之后对score map进行系列处理(乘hanning window等操作)得到最终的score map，即 $\tilde{s}[x]$ .

之后通过一个argmax操作得到bbox的预测值,如下所示:
$\begin{aligned} x^{*} &=\arg \max _{x \in[0 . . N-1] \otimes 2} \tilde{s}[x] \\ B_{\text {curr }} &=B\left[x^{*}\right] \end{aligned}$

最后根据下面的式子更新得到最终的bbox( $\alpha$ 是一个超参数):

$\begin{aligned} \alpha^{\prime} &=\bar{s}\left[x^{*}\right] \cdot \alpha \\ B_{\text {pred }} \text { .size } &=\left(1-\alpha^{\prime}\right) \cdot B_{\text {prev }} . \text { size }+\alpha^{\prime} \cdot B_{\text {curr }} \cdot \text { size } \end{aligned}$

实验结果

在这里插入图片描述

小结

SiamFC++延续了SiamFC简单高速的特点，其中从论文中的实验结果也可以看出新添加的回归分支确实大幅度提升了跟踪的效果，同时这篇论文也借鉴了目标检测中的一些做法，看来检测和跟踪的联系是越来越紧密了，以后也要多多关注目标检测领域的动向了!

lingqing97

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
[论文阅读 2020 AAAI 目标跟踪]SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation

简介paper:SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelinescode:MegviiDetection/video_analyst这篇论文在SiamFC基础上进行了改进，提出了SiamFC++,并在OTB2015,VOT2018,LaSOT,GOT-10k和TrackingNet上取得了SOTA.这篇论文基于以下原则设计了SiamFC++:G1: (decomposit
复制链接

扫一扫