【SiamFC】《Fully-Convolutional Siamese Networks for Object Tracking》

最新推荐文章于 2024-07-31 14:59:58 发布

bryant_meng

最新推荐文章于 2024-07-31 14:59:58 发布

阅读量964

点赞数 24

分类专栏： CNN / Transformer 文章标签：人工智能深度学习 SiamFC VOT 单目标跟踪

本文链接：https://blog.csdn.net/bryant_meng/article/details/136214804

版权

CNN / Transformer 专栏收录该内容

211 篇文章 7 订阅

订阅专栏

在这里插入图片描述

ECCV 2016 Workshops

1 Background and Motivation

单目标跟踪

track any arbitrary object, it is impossible to have already gathered data and trained a specific detector

在线学习方法的缺点（either apply “shallow” methods (e.g. correlation filters) using the network’s internal representation as features or perform SGD (stochastic gradient descent) to fine-tune multiple layers of the network）

a clear deficiency of using data derived exclusively from the current video is that only comparatively simple models can be learnt.

实时性可能也是个问题

作者基于全卷积孪生网络，来实现单目标跟踪，且只要是目标检测的数据集，都可以拿来训练（the fairness of training and testing deep models for tracking using videos from the same domain is a point of controversy）

在这里插入图片描述

2 Related Work

train Recurrent Neural Networks (RNNs) for the problem of object tracking
track objects with a particle filter that uses a learnt distance metric to compare the current appearance to that of the first frame.
feasibility of fine-tuning from pre-trained parameters at test time

3 Advantages / Contributions

we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video
frame-rates beyond real-time
achieves state-of-the-art performance in multiple benchmarks

4 Method

在这里插入图片描述

$g(\varphi(z), \varphi(x))$

exemplar image $z$

candidate image $x$

在这里插入图片描述

$g$ is a simple distance or similarity metric

$\varphi$ 是孪生网络，结构如下

在这里插入图片描述
x 和 z 获取的细节（来自 pysot 代码）

在这里插入图片描述

更具体的公式如下

在这里插入图片描述

$\mathbb{L}$ denotes a signal which takes value $\mathbb{R}$ in every location

每个空间位置的 b 应该是相等的吧

损失函数

在这里插入图片描述
y 是标签，1 或者 -1

v 是 score map 上的得分（0-1）之间

在这里插入图片描述
u 是空间位置，D 是 score map

预测的bounding box 中心点位于 ground true bounding box 中心半径小于 R 区域的都属于正样本

c 是 GT bbox 的中心点

stride k of the network

训练的时候用的 SGD 优化

在这里插入图片描述

5 Experiments

50 epochs 50,000 sampled pairs

SiamFC (Siamese Fully Convolutional) and SiamFC-3s, which searches over 3 scales instead of 5.

scale 的细节不太清楚

5.1 Datasets and Metrics

训练集
ImageNet Video for tracking，4500 videos

测试集

ALOV
OTB-13
VOT-14 / VOT-15 / VOT-16

a tracker is successful in a given frame if the intersection over-union (IoU) between its estimate and the ground-truth is above a certain threshold

OTB上常用的3个：TRE、SRE、OPE

OPE：单次评估精度，TRE运行一次的结果。
TRE: 将序列划分为20个片段，每次是从不同的时间初始化，然后去跟踪目标。
SRE: 从12个方向对第一帧的目标位置设置10%的偏移量，然后跟踪目标，判断目标跟踪精度。

通用指标

OP(%): overlap precision 重叠率
重叠率 = 重叠区域面积/(预测矩形的面积+真实矩形的面积-重叠区域的面积)
CLE（pixels）: center location error 中心位置误差
中心位置误差 = 真实中心和预测中心的欧式距离
DP:distance precision 精确度
AUC: area under curve 成功率z图的曲线下面积

VOT当中一些指标

Robustness：数值越大，稳定性越差。

5.2 The OTB-13 benchmark

在这里插入图片描述

5.3 The VOT benchmarks

VOT-14
在这里插入图片描述
VOT-15

5.4 Dataset size

在这里插入图片描述

看看实际的效果
在这里插入图片描述
缺点：框的 spatial ratio 是固定的

6 Conclusion（own）/ Future work

参考文章：

仅看文章，许多实现细节我都不够清晰，还是得撸撸代码

Deep Siamese conv-nets have previously been applied to tasks such as face verification, keypoint descriptor learning and one-shot character recognition