更深入和更广泛的暹罗网络实时视觉跟踪学习笔记

最新推荐文章于 2024-08-11 22:45:00 发布

jing_jing95

最新推荐文章于 2024-08-11 22:45:00 发布

阅读量1.4k

点赞数

分类专栏： CV Learning

本文链接：https://blog.csdn.net/jing_jing95/article/details/100927753

版权

本文探讨了如何利用更深更宽的卷积神经网络提升暹罗网络在视觉跟踪中的鲁棒性和准确性。尽管直接替换AlexNet为ResNet或Inception并未带来显著改进，但作者提出了新的残差模块（CIR），以消除填充的负面影响，并设计了具有可控接受场的新架构，旨在保持实时跟踪速度的同时提高性能。实验结果显示新架构优于基线跟踪器。

摘要由CSDN通过智能技术生成

Deeper and Wider Siamese Networks for Real-Time Visual Tracking

https://arxiv.org/abs/1901.01660

暹罗网络由于其平衡的精度和速度在视觉跟踪中引起了极大的关注。
然而，暹罗追踪器中使用的骨干网络相对较浅，例如AlexNet[18]，它没有充分利用现代深层神经网络的能力。
在本文中，我们研究了如何利用更深更宽的卷积神经网络来增强跟踪的鲁棒性和准确性。
我们观察到，用现有的强大架构(如ResNet[14]和Inception[33])直接替换主干并不会带来改进。
其主要原因是：

1)神经元感受野的大幅度增加导致特征可辨别性和定位精度的降低；

2)卷积的网络填充导致学习中的位置偏差。

为了解决这些问题，我们提出了新的剩余模块来消除填充的负面影响，并使用这些模块进一步设计了具有受控接受场大小和网络跨度的新架构。
设计的架构重量轻，并在应用于SiamFC[2]和SiamRPN[20]时保证实时跟踪速度。

Visual tracking is one of the fundamental problems in computer vision. It aims to estimate the position of an arbitrary target in a video sequence, given only its location in the initial frame. Tracking at real-time speeds plays a vital role in numerous vision applications, such as surveillance, robotics, and human-computer interaction

视觉跟踪是计算机视觉的基本问题之一。
它的目的是估计视频序列中任意目标的位置，仅给出其在初始帧中的位置。
实时速度跟踪在许多视觉应用中起着至关重要的作用，例如监视、机器人技术和人机交互

Recently, trackers based on Siamese networks [2, 7, 12, 13, 20, 34, 40] have drawn great attention due to their high speed and accuracy. However, the backbone network uti- lized in these trackers is still the classical AlexNet rather than modern deep neural networks that have proven more effective for feature embedding.

近年来，基于暹罗网络的跟踪器[2,7,12,13,20,34,40]因其速度快、精度高而备受关注。然而，在这些跟踪器中使用的骨干网络仍然是经典的AlexNet，而不是现代的深度神经网络，后者已被证明在特征嵌入方面更有效。

To examine this issue, we replace the shallow backbone with deeper and wider networks, including VGG [29], Inception [33] and ResNet [14]. Unexpectedly, this straightforward replace- ment does not bring much improvement, and can even cause substantial performance drops when the network depth or width increases, as shown in Fig. 1. This phenomenon runs counter to the evidenc