Visual Tracking 介绍


Visual Tracking 的定义

Detection of a new moving object Locations of the previous objects in the current frame

--- from paper : Multi-kernel Object Tracking

The goal of object tracking is to find the targets between the consecutive frames in image sequences.

--- from paper : Efficient Mean-Shift Tracking via a New Similarity Measure


Visual Tracking 的难点

Much effort has been made to solve the problem of real-time object tracking over the years. However, tracking algorithms still suffer from fundamental problems including drifts away from targets (partially due to change of viewpoint), inability to adapt to changes of object appearance, dependence on the first frame for template matching, instability to track objects under deformations (e.g. deformed contours), the inefficiency of Monte Carlo simulations for temporal tracking, and reliance on gradients by active contours, i.e. problems with similar intensities on the background and the object, or high gradient edges on the object itself. These problems are due to the complexity of the object dynamics. We also have to deal with difficult tracking conditions which include illumination changes, occlusions, changes of viewpoint, moving cameras and non-translational object motions like zooming and rotation.

--- from paper : Mean-Shift Tracking with RandomSampling


Visual Tracking 的分类

Many tracking algorithms have been proposed and implemented to overcome difficulties that arise from noise, occlusion, clutter, and changes in the foreground objects or in the background environment. Gradient based methods align tracked regions between successive frames by minimizing a cost function using various gradient descent techniques. Feature-based approaches extract features (such as intensity, colors, edges, contours) and use them to establish correspondence between model images and target images. Knowledge-based tracking algorithms incorporate a priori information about the tracked objects to obtain representations such as projected shape, skin complexion, body blobs, kinematic skeletons and silhouettes. Learning-based approaches apply pattern recognition algorithms to learn the objects either in the eigenspace or in the kernel space, and then search for targts in image sequences.

--- from paper : Efficient Mean-Shift Tracking via a New Similarity Measure


Visual Tracking 的框架分类

Botton-up and Top-down approaches are two kinds of methodologies to approach the visual tracking problem. Botton-up approaches generally tend to construct object states by analyzing the content of images. Basically, many segmentation-based methods can be categorized as Botton-up approaches. For example, blob tracking techniques group similar image pixels into blobs to estimate the positions and shapes of the target. On the contrary, Top-down approaches generate candidate hypotheses form previous time frame based on a parametric representation of the target. Tracking is achieved by measuring and verifying these hypotheses against image observations. Many model-based and template-matching methods can be categorized as Top-down approaches. Botton-up methods could be efficient, yet the robustness is largely limited by the ability of image analysis. On the other hand, Top-down approaches depend less on image analysis, but their performances are largely determined by hypotheses generating and verification.

--- from paper : A Co-inference Approach to Robust Visual Tracking


Visual Tracking 4大要素

Target representation: To discriminate the target from other objects, target representation, including the target's geometry, motion, appearance, etc., characterizes the target in a state space either explicitly or implicitly. It is a fundamental problem in computer vision. Observation representation: Closely related to target representation, observation representation defines the image evidence of the object representation. Hypotheses measurement: Hypotheses measurenment evaluates the matching between hypotheses and image observations. Hypotheses generating: Hypotheses generating is to produce new hypotheses based on old estimation of target's representation and old observation. Target's dynamics could be embedded in such a predicting process. Intuitively, hypotheses generating characterizes the search range and confidence level ot the tracking.

--- from paper : A Co-inference Approach to Robust Visual Tracking