博士申请——Research Proposal

       之前在申请境外博士的时候,写过一篇RP(研究计划书)。由于不是硕士的研究方向,所以写的比较浅显。在这里贴出来,供大家参阅,对于某些童鞋,或许会有所帮助。

                                                                       Why do I want the Ph.D

About Myself

Background

     My current research direction is multimodal language generation. The research goal is to integrate the multimodal information acquired by the robot into a perfect sentence. At present I have designed a basic framework by referring the methods of image caption and machine translation to carry on my research. I plan to pursue graduate studies towards a Ph.D. degree at your school of computer science from the fall of 20XX. In the future, I am determined to devote myself to the research on object tracking.

Research Motivation

     Object tracking is a process to locate an interested object in a series of images, so as to reconstruct the moving object’s track. Given a bounding box defining the object of interest in a single frame, the goal of tracking is to automatically determine the objects bounding box or indicate that the object is not visible in every frame that follows.

    As a mid-level task in computer vision, object tracking grounds high-level tasks such as pose estimation, action recognition, and behavior analysis. It has numerous practical applications, such as visual surveillance, human computer interaction and virtual reality. Although object tracking has been studied for several decades, it still remains challenging due to factors like abrupt appearance changes and severe object occlusions. Apart from those practical requirements that appeal to me deeply, I am curious about dealing with all these issues.

Object Tracking

Introduction

    There are two main forms of object tracking, namely, single-object-tracking(SOT) and multi- object-tracking(MOT). Compared with single-object-tracking, which primarily focuses on designing sophisticated appearance models and/or motion models to deal with challenging factors such as object deformation, occlusion, illumination changes, motion blur and background clutters, multiple-object-tracking additionally requires two tasks to be solved: determining the number of objects, which typically varies over time, and maintaining their identities. Apart from the common challenges in both SOT and MOT, further key issues that complicate MOT include among others:1) frequent occlusions, 2) initialization and termination of tracks, 3) similar appearance, and 4) interactions among multiple objects.

    In order to deal with all these issues, a wide range of solutions have been proposed in the past decades. In general, most tracking algorithms can be categorized into two classes based on their representation schemes: generative and discriminative models. Generative models typically learn an appearance model and use it to search for image regions with minimal reconstruction errors as tracking results. The typical generative algorithms are sparse representation methods, which have been used to represent the object by a set of targets and trivial templates to deal with partial occlusion, illumination change and pose variation. Discriminative models pose object tracking as a detection problem in which a classifier is learned to separate the target object from its surrounding background within a local region. Unlike generative methods, discriminative approaches use both target and background information to find a decision boundary for differentiating the target object from the background. And this is employed in tracking-by-detection methods, where a discriminative classifier is trained online using sample patches of the target and the surrounding background.

Related Work

     A class of tracking techniques called “tracking-by-detection are proposed in object tracking after Mykhaylo combining the advantages of both detection and tracking in a single framework. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. However, light inaccuracies in the tracker can lead to incorrectly labeled training examples, which degrade the classifier and can cause drift. Boris et al. show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems and can lead to a more robust tracker with fewer parameter tweaks.

    Particle filter (PF) realizes recursive Bayesian estimation based on the Monte Carlo method, using random particle groups to discretely express the posterior probability density function (PDF) of object state. Particle filter performs very well with non-linear and non-Gaussian dynamic state estimation problems, and it is widely used in object tracking. Since the invention of the particle filter, several types of appearance models for this framework have been proposed, including color, contour, edge, and saliency. However, a particle filter itself is a high complexity algorithm because each particle must be processed separately. Complex models can dramatically increase the overall execution time of a particle filter framework, rendering it useless in real-life applications. In addition, the particle filter which is a generative algorithm has a poorer performance under some complex visual scenarios compared with discriminative algorithms such as correlation filters and deep learning.  

    Some traditional types of correlation filters such as ASEF and UMACE filters have been trained offline and are used for object detection or target identification. However, their training needs are poorly suited to tracking. Object tracking requires robust filters to be trained from a single frame and dynamically adapted as the appearance of the target object changes. Bolme et al. introduce a regularized variant of ASEF named Minimum Output Sum of Squared Error (MOSSE) which is suitable for visual tracking. A tracker based upon MOSSE filters is robust and effective. Because correlation filters can be interpreted as linear classifiers, there is the question of whether they can take advantage of the Kernel Trick to classify on richer non-linear feature spaces. Some researchers investigate this problem, and Henriques et al. derive a new Kernelized Correlation Filter(KCF) and Kernel SDF filters have been proposed by Patnaik et al..

    In the past few years, deep learning architectures have been used successfully to give very promising results for some complicated tasks, including image classification and speech recognition. The key to success is to make use of deep architectures to learn richer invariant features via multiple nonlinear transformations. Naiyan Wang et al. believe that visual tracking can also benefit from deep learning for the same reasons, and they propose a novel deep learning tracker (DLT) for robust visual tracking. DLT uses a stacked denoising autoencoder (SDAE) to learn generic image features from a large image dataset as auxiliary data and then transfers the features learned to the online tracking task. Then, they bring the biologically-inspired convolutional neural network (CNN) framework to visual tracking to address the challenge of limited labeled training data. Subsequently, Hyeonseob et al. propose a novel CNN architecture, referred to as Multi-Domain Network (MDNet), to learn the shared representation of targets from multiple annotated video sequences for visual tracking, where each video is regarded as a separate domain. Besides, Milan et al. present an approach based on recurrent neural networks(RNN) to address the challenging problem of data association and trajectory estimation. And they show that an RNN-based approach can be utilised to learn complex motion models in realistic environments.

Tracking System

     A tracking system generally consists of four basic components:

  1. Motion Model. It relates the locations of the object over time. Based on the estimation from the previous frame, the motion model generates a set of candidate regions or bounding boxes which may contain the target in the current frame.  
  2. Feature Extraction. The features extracted from candidate regions or bounding boxes are usually used for object representation. The common features are histogram features, texture features, color features, haar-like features and deep convolutional features.
  3. Appearance Model. An appearance model can be used to evaluate the likelihood that the object of interest is at these candidate regions. For object tracking, local appearance models are generally more robust than holistic ones.
  4. Online Update Mechanism. This mechanism controls the strategy and frequency of updating the appearance model. It has to strike a balance between model adaptation and drift.

Future Work

    Some research work can be carried out in the future, such as

  1. Reducing the search scope of object in motion model.
  2. Applying the visual selective attention mechanism to object tracking.
  3. Researching object tracking in discontinuous video with the aid of person re-id.

Bibliography

[1] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]. computer vision and pattern recognition, 2010: 2544-2550

[2] Babenko B, Yang M, Belongie S J, et al. Robust Object Tracking with Online Multiple Instance Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619-1632.

[3] Yilmaz A, Javed O, Shah M, et al. Object tracking: A survey[J]. ACM Computing Surveys, 2006, 38(4).

[4] Andriluka M, Roth S, Schiele B, et al. People-tracking-by-detection and people-detection-by-tracking[C]. computer vision and pattern recognition, 2008: 1-8.

[5] Kalal Z, Mikolajczyk K, Matas J, et al. Tracking-Learning-Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422.

[6] Truong M T, Pak M, Kim S, et al. Single object tracking using particle filter framework and saliency-based weighted color histogram[J]. Multimedia Tools and Applications, 2018, 77(22): 30067-30088.

[7] Zhou T, Ouyang Y, Wang R, et al. Particle filter based on real-time Compressive Tracking[C]. international conference on audio language and image processing, 2016: 754-759.

[8] Wang N, Yeung D Y. Learning a Deep Compact Image Representation for Visual Tracking[C]. neural information processing systems, 2013: 809-817.

[9] Choi J, Chang H J, Yun S, et al. Attentional Correlation Filter Network for Adaptive Visual Tracking[C]. computer vision and pattern recognition, 2017: 4828-4837

[10] Milan A, Rezatofighi S H, Dick A R, et al. Online Multi-Target Tracking Using Recurrent Neural Networks.[J]. national conference on artificial intelligence, 2016: 4225-4232.

[11] Babenko B, Yang M, Belongie S J, et al. Robust Object Tracking with Online Multiple Instance Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619-1632.

[12] Wang N, Shi J, Yeung D Y, et al. Understanding and Diagnosing Visual Tracking Systems[J]. international conference on computer vision, 2015: 3101-3109.

[13] Wang N, Li S, Gupta A, et al. Transferring Rich Feature Hierarchies for Robust Visual Tracking.[J]. arXiv: Computer Vision and Pattern Recognition, 2015.

[14] Wei J, Hongjuan L, Wei S, et al. A new particle filter object tracking algorithm based on dynamic transition model[C]. international conference on information and automation, 2016: 1832-1835.

[15] Huang L, Ma B, Shen J, et al. Visual Tracking by Sampling in Part Space[J]. IEEE Transactions on Image Processing, 2017, 26(12): 5800-5810.

[16] Zhang K, Zhang L, Liu Q, et al. Fast Visual Tracking via Dense Spatio-Temporal Context Learning[C]. european conference on computer vision, 2014: 127-141.

[17] Nam H, Han B. Learning Multi-domain Convolutional Neural Networks for Visual Tracking[J]. computer vision and pattern recognition, 2016: 4293-4302.

[18] Henriques J F , Caseiro R , Martins P , et al. High-Speed Tracking with Kernelized Correlation Filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.

[19] R. Patnaik and D. Casasent. Fast FFT-based distortion-invariant kernel filters for general object recognition. In Proceedings of SPIE, volume 7252, 2009.

[20] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]. european conference on computer vision, 2012: 702-715.

 

 

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
研究计划(Research Proposal)是一份详细描述研究项目的文书,用于向导师、评审委员会或资助机构说明研究的目的、方法、预期结果和重要性。一个好的研究计划应该清晰地阐述研究问题、目标和方法,并能够证明该研究的可行性和研究者的资格。 以下是编写研究计划时的一般步骤和要点: 1. 引言:介绍研究项目的背景和目的,描述已有研究的现状和问题。 2. 研究问题和目标:明确阐述你要解决的研究问题,并描述你的研究目标。 3. 文献综述:回顾相关领域的现有文献,介绍前人研究工作,指出已有研究的不足或未解决的问题。 4. 研究方法:详细描述你计划采用的研究方法和实施步骤,包括数据收集、实验设计、统计分析等。 5. 预期结果和重要性:说明你期待的研究结果及其对学术或实际应用的重要性。 6. 时间安排:列出完成研究各阶段所需的时间,并制定一个合理的时间表。 7. 参考文献:列出你在研究计划中引用的所有文献资料。 在撰写研究计划时,你应该注意以下几点: 1. 简明扼要:确保你的研究计划清晰、简明扼要,避免冗长和重复。 2. 逻辑连贯:确保各部分之间有良好的逻辑连贯性,使整个研究计划具有条理性。 3. 可行性和可靠性:证明你的研究计划是可行和可靠的,包括研究方法的可行性、数据的可获得性等。 4. 语言准确性:注意语法、拼写和标点符号的准确性,确保整个研究计划的书面表达无误。 最后,研究计划的具体要求可能因学校、学科或研究项目的不同而有所差异,建议在撰写研究计划之前,仔细阅读相关的指南和要求,并向导师或专业人士寻求指导和建议。此外,参考一些优秀的研究计划范文也是一个很好的方法。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值