Notes of Dense Trajectory

最新推荐文章于 2021-05-13 22:38:48 发布

bonny95

最新推荐文章于 2021-05-13 22:38:48 发布

阅读量2.6k

点赞数

文章标签： Computer Vision descriptor

本文链接：https://blog.csdn.net/bonny95/article/details/18123849

版权

densely sample feature points in each frame
track points in the video based on optical flow.
compute multiple descriptors along the trajectories of feature points to capture shape, appearance and motion information.

Dense Sampling
- Sampling step size W=5 pixels
- # spatial scales ≤ 8
- Spatial scale increase: 1/2
- Removing points in homogeneous areas:

T=0.001iI(i1,i2)

where (i1,i2) are eigenvalues of point i in image I (the auto-correlation matrix).

where L is the length of trajectory, and the displacement vectors

- HOG – static appearance information
- HOF – local motion information
- MBH – motion descriptor for trajectories
Format of DTF features

The features are computed one by one, and each one in a single line, with the following format:

frameNum mean_x mean_y var_x var_y length scale x_pos y_pos t_pos Trajectory HOG HOF MBHx MBHy

The first 10 elements are information about the trajectory:

frameNum: The trajectory ends on which frame
mean_x: The mean value of the x coordinates of the trajectory
mean_y: The mean value of the y coordinates of the trajectory
var_x: The variance of the x coordinates of the trajectory
var_y: The variance of the y coordinates of the trajectory
length: The length of the trajectory
scale: The trajectory is computed on which scale
x_pos: The normalized x position w.r.t. the video (0~0.999), for spatio-temporal pyramid
y_pos: The normalized y position w.r.t. the video (0~0.999), for spatio-temporal pyramid
t_pos: The normalized t position w.r.t. the video (0~0.999), for spatio-temporal pyramid

The following element are five descriptors concatenated one by one:

Explicit camera motion estimation
Assumption: two consecutive frames are related by a homography.
Match feature points between frames using SURF descriptors and dense optical flow
Removing inconsistent matches due to humans: use a human detector to remove matches from human regions (computation expensive)
Estimate a homography with RANSAC with these matches

H Wang, C Schmid, Action recognition with improved trajectories, ICCV 2013
H Wang, A Kläser, C Schmid, CL Liu, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision, May 2013, Volume 103, Issue 1, pp 60-79