简介
paper:SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking
code:ohhhyeahhh/SiamCAR
这篇论文提出的动机是:SiamRPN
系列的跟踪器依赖于RPN
来进行Classfication
和Regression
,而这些基于RPN
的跟踪需要设置好anchor boxes
相关参数才能达到比较好的跟踪效果,对调参的要求较高。基于此,这篇论文基于一个简单的网络结构,实现了一个简单高效的跟踪模型。
主要内容
上图是SiamCAR
的网络结构,SiamCAR
分为用于提取特征的Siamese Subnetwork
和用于分类和回归的Classification-Regression Subnetwork
两部分,而SiamCAR
最重要的贡献点在于后半部分的Classification-regression Subnetwork
.
Feature Extraction
SiamCAR
的特征提取网络使用的是SiamRPN++
中修改过的Resnet50
,同SiamRPN++
一样,采用了REsnet50
后三层的特征经过cat
后分别得到Template patch
的feature map
φ
(
Z
)
\varphi(Z)
φ(Z)和Search region
的feature map
φ
(
X
)
\varphi(X)
φ(X).
之后采用SiamRPN++
中的Depthwise Cross Correlation
计算Correlation
:
R = φ ( X ) ⋆ φ ( Z ) R=\varphi(X) \star \varphi(Z) R=φ(X)⋆φ(Z)
由于此时得到的response map R
的通道数太大(
256
∗
3
256 * 3
256∗3),所以作者使用了一个
1
X
1
1X1
1X1的卷积层对response map
进行了降维,减少了模型参数量并提高了模型推导速度。
Bounding Box Prediction
Classification-Regresssion Subnetwork
又分为classification branch
和regression branch
.其中对于regression branch
输出
(
l
,
t
,
r
,
b
)
(l,t,r,b)
(l,t,r,b),表示从相应位置到搜索区域中边界框四个边角的距离.而classification branch
除了输出clasification
,作者还添加了一个centerness branch
,且centerness branch
输出centerness score
(动机是:作者观察到距离目标越远的bbox质量越低)
An observation is that the locations far away from the center of an target tend to produce low-quality predicted bounding boxes, which reduces the performance of the tracking system.
最终,对于regression branch
的输出使用IOU loss
,classification branch
使用cross-entropy loss
,而对于centerness branch
使用如下损失:
C ( i , j ) = I ( t ~ ( i , j ) ) ∗ min ( l ~ , r ~ ) max ( l ~ , r ~ ) × min ( t ~ , b ~ ) max ( t ~ , b ~ ) C(i, j)=\mathbb{I}\left(\tilde{t}_{(i, j)}\right) * \sqrt{\frac{\min (\tilde{l}, \tilde{r})}{\max (\tilde{l}, \tilde{r})} \times \frac{\min (\tilde{t}, \tilde{b})}{\max (\tilde{t}, \tilde{b})}} C(i,j)=I(t~(i,j))∗max(l~,r~)min(l~,r~)×max(t~,b~)min(t~,b~)
L c e n = − 1 ∑ I ( t ~ ( i , j ) ) ∑ I ( t ~ ( i , j ) ) = = 1 C ( i , j ) ∗ log A w × h × 1 c e n ( i , j ) + ( 1 − C ( i , j ) ) ∗ log ( 1 − A w × h × 1 c e n ( i , j ) ) \begin{aligned} \mathcal{L}_{c e n} &=\frac{-1}{\sum \mathbb{I}\left(\tilde{t}_{(i, j)}\right)} \sum_{\mathbb{I}\left(\tilde{t}_{(i, j)}\right)==1} C(i, j) * \log A_{w \times h \times 1}^{c e n}(i, j) +(1-C(i, j)) * \log \left(1-A_{w \times h \times 1}^{c e n}(i, j)\right) \end{aligned} Lcen=∑I(t~(i,j))−1I(t~(i,j))==1∑C(i,j)∗logAw×h×1cen(i,j)+(1−C(i,j))∗log(1−Aw×h×1cen(i,j))
最终总损失为:
L = L c l s + λ 1 L c e n + λ 2 L r e g \mathcal{L}=\mathcal{L}_{c l s}+\lambda_{1} \mathcal{L}_{c e n}+\lambda_{2} \mathcal{L}_{r e g} L=Lcls+λ1Lcen+λ2Lreg
The Tracking Phase
最终跟踪时,对于位置 ( i , j ) (i,j) (i,j),网络会输出一个6维的向量 ( c l s , c e n , l , t , r , b ) (cls,cen,l,t,r,b) (cls,cen,l,t,r,b)
For a location (i, j), the proposed method produces a 6D vector T i j = ( c l s , c e n , l , t , r , b ) T_{ij} = (cls, cen, l, t, r, b) Tij=(cls,cen,l,t,r,b),where c l s cls cls represents the foreground score of classification, c e n cen cen represents the centerness socre, and l + r l + r l+r and t + b t + b t+b represent the predicted width and height of the target in current frame.
所以根据下面的式子计算highesdt score
所在的位置
q
q
q:
q = arg max i , j { ( 1 − λ d ) c l s i j × p i j + λ d H i j } q=\arg \max _{i, j}\left\{\left(1-\lambda_{d}\right) c l s_{i j} \times p_{i j}+\lambda_{d} H_{i j}\right\} q=argi,jmax{(1−λd)clsij×pij+λdHij}
where H H H is the cosine window and λ d λ_d λd is the balance weight. The output q q q is a queried location with the highest score being a target pixel.
由于在
q
q
q周围的都有可能是是trage pixel
,所以作者又计算
q
q
q的
n
n
n个neightborhoods
的
c
l
s
i
j
∗
p
i
,
j
cls_{ij} * p_{i,j}
clsij∗pi,j得分,并取top-k
的regression boxes
进行加权平均得到最终的target box
.
We observed that the pixels located around q q q are more likely to be the target pixel. Hence we choose the top-k points from n neighborhoods of q q q according to the value c l s i j × p i j cls_{ij} × p_{ij} clsij×pij . The final prediction is the weighted average of the selected k k k regression boxes
实验结果
小结
SiamCAR
用简单的模型就实现了SiamRPN++
的性能,甚至要更好,这是非常不容易的,这个模型非常值得深入学习,就是不知道模型训练的难度如何;另外,作者的论文和代码都写的极好理解,给作者点个攒!