目标跟踪数据集VOT环境详细配置过程（附部分tracker融合代码）

最新推荐文章于 2025-04-07 13:51:20 发布

zllxot

最新推荐文章于 2025-04-07 13:51:20 发布

阅读量4.5k

点赞数 8

分类专栏： matlab 目标跟踪

本文链接：https://blog.csdn.net/cakaf/article/details/115705666

版权

目标跟踪同时被 2 个专栏收录

5 篇文章

订阅专栏

matlab

2 篇文章

订阅专栏

背景说明：最近打算在vot数据集上测试一下自己的算法，花了几天时间学习配置方法，期间也遇到很多很多问题，最终大多也都能一一解决。以vot2016为例，在此详细记录一下我的配置过程，以防以后遗忘，也希望能给看到这篇博客的人提供一些帮助。

环境介绍：Windows 10 + Matlab 2018a

VOT官网：https://votchallenge.net/

配置前需要下载好以下内容：

1.下载并vot2016数据集：链接：https://pan.baidu.com/s/15gXQWUUa8FY7EYjHfgiUkw 提取码：caa1，解压到本地，这里我将其保存在E:\VOT\VOT2016\目录下

2.下载vot工具箱(vot-toolkit)：https://github.com/votchallenge/vot-toolkit，解压之后里面大致结构如下图：

3.下载trax：https://github.com/votchallenge/trax，解压之后里面大致结构如下图：

4.下载vot官方给的融合好的样例：https://github.com/votchallenge/integration，解压之后里面大致结构如下图：

完成以上4步之后基本就满足了配置的前提条件，接下来开始具体的配置过程。我把后续的过程分为以下3步：

一、vot-toolkit中文件结构的组织：

1.在vot-toolkit中新建两个文件夹，名称分别为native和vot_workspace，如下图所示：

2.把前面第3步下载好的trax拷贝到native目录下，具体的目录格式为.\vot-toolkit-master\native\trax，如下图所示：

3.把前面第4步下载好的integration中的matlab、native和python三个文件夹拷贝到.\vot-toolkit-master\tracker\examples目录下，如下图所示：

目前为止，第一大步vot-toolkit中文件结构的组织已经完成。

二、运行官方样例ncc：

官方给出了一个已经和vot-toolkit融合后的跟踪器ncc，把这个跑通就代表vot环境配置的完成，后续如果我们要融合自己想要测的跟踪器也是仿照该过程。

1.运行toolkit_path.m：

2.在workspace中找到workspace_create.m，然后进入到vot_workspace中运行workspace_create.m：

3.运行之后需要在命令行选择数据集(输入5)、为你要跑的跟踪器起个名字(输入ncc)、编程语言(输入1)，然后当看到下面右图的画面时说明当前一切正常：

4.打开tracker_ncc.m，进行如下修改：

（1）将第2行的 error('Tracker not configured! Please edit the tracker_ncc.m file.'); 注释掉

（2）将第7行的 tracker_label = []; 改为 tracker_label = ['ncc'];

（3）将第17行的 tracker_command = generate_matlab_command('<TODO: set script name>', {'<TODO: set script path>'}); 改为

tracker_command = generate_matlab_command('ncc', {'D:\matlab\workspace\vot-toolkit-master\tracker\examples\matlab'}); 其中'ncc'对应于.\vot-toolkit- master\tracker\examples\matlab\ncc.m, 后面的路径为ncc.m所在的路径，需要根据你的具体情况进行修改。

5.找到workspace_load.m，将第142行sequences_directory的值改为你存放vot2016数据集的路径：

6.修改完成后，运行run_test.m，如果顺利的话命令行会显示vot2016的60个视频序列，输入视频序列对应的编号就会出现跟踪窗口，这时候光标在跟踪窗口中会变成一个十字，点一下才能进入下一帧：

7.测试成功后，运行run_experiments.m会让跟踪器在整个vot2016数据集上跑，得到跟踪器预测的结果保存为txt文件用于后续的算法对比评估。此外，在run_experiments.m的第六行后面，可以加上一句 experiments{1,1}.parameters.repetitions = 1; 让算法在每个视频只跑一次，节省时间：

不过这里我有一个问题还没有解决，那就是刚运行run_experiments.m后不久就会弹出下图所示的对话框，提示matlab崩溃，接着matlab就自动关闭了。。。再次启动matlab然后运行run_experiments之后就能接着跑而不会发生崩溃现象，原因目前还没搞清楚(看到有些博主说是因为matlab版本太高，让换成2018a及更早的版本，可我就是2018a啊?)，虽然这不会影响正常使用，但如果您知道原因的话欢迎告知。

8.当所有视频序列都跑完后，results文件夹下会有每个视频序列对应的txt文件，里面保存算法预测的目标位置。为了分析和评估该结果，在运行run_analysis.m之前需要进行如下修改：

（1）将第9行 error('Analysis not configured! Please edit run_analysis.m file.'); 注释掉

（2）第11行 trackers = tracker_list('ncc', 'TODO'); 改为 trackers = tracker_list('ncc');

9.运行run_analysis.m等待一段时间后会自动创建一个reports目录，里面存放的有算法的分析结果，打开之后就能看到我们平时在论文里看到的一些图表：

如果能够顺利完成上述步骤，说明我们的vot环境配置成功，并且我们已经成功实现运行和评估官方例程ncc。

三、融合我们想要评估的算法：

实际上，对于其他算法的运行和评估和前面运行ncc的步骤大致相同，关键在于接口函数的编写，自己试着写两个就会发现有一定的规律，这里附上我仿照其他接口函数写的KCF和BACF成功融合vot的接口函数代码：

function VOT_KCF
% VOT integration 
% 
% *************************************************************
% VOT: Always call exit command at the end to terminate Matlab!
% *************************************************************
cleanup = onCleanup(@() exit() );

% *************************************************************
% VOT: Set random seed to a different value every time.
% *************************************************************
RandStream.setGlobalStream(RandStream('mt19937ar', 'Seed', sum(clock)));

% **********************************
% VOT: Get initialization data
% **********************************
[handle, image, region] = vot('rectangle');

target_sz = [region(1,4), region(1,3)];
pos = [region(1,2), region(1,1)] + floor(target_sz/2);
frame = 1;

%default settings
kernel.type = 'gaussian';
feature_type = 'hog';
features.gray = false;
features.hog = false;
padding = 1.5;  %extra area surrounding the target
lambda = 1e-4;  %regularization
output_sigma_factor = 0.1;  %spatial bandwidth (proportional to target)

switch feature_type
    case 'gray'
        interp_factor = 0.075;  %linear interpolation factor for adaptation
        
        kernel.sigma = 0.2;  %gaussian kernel bandwidth
        
        kernel.poly_a = 1;  %polynomial kernel additive term
        kernel.poly_b = 7;  %polynomial kernel exponent
        
        features.gray = true;
        cell_size = 1;
        
    case 'hog'
        interp_factor = 0.02;
        
        kernel.sigma = 0.5;
        
        kernel.poly_a = 1;
        kernel.poly_b = 9;
        
        features.hog = true;
        features.hog_orientations = 9;
        cell_size = 4;
        
    otherwise
        error('Unknown feature.')
end

%if the target is large, lower the resolution, we don't need that much detail
resize_image = (sqrt(prod(target_sz)) >= 100);  %diagonal size >= threshold
if resize_image
    pos = floor(pos / 2);
    target_sz = floor(target_sz / 2);
end

%window size, taking padding into account
window_sz = floor(target_sz * (1 + padding));

%create regression labels, gaussian shaped, with a bandwidth
%proportional to target size
output_sigma = sqrt(prod(target_sz)) * output_sigma_factor / cell_size;
yf = fft2(gaussian_shaped_labels(output_sigma, floor(window_sz / cell_size)));

%store pre-computed cosine window
cos_window = hann(size(yf,1)) * hann(size(yf,2))';

while true
    % *********************************
    % VOT: Get next frame
    % *********************************
    if frame > 1
        [handle, image] = handle.frame(handle);
        if isempty(image)
            break;
        end
    end
    
    im = imread(image);
    
    if size(im,3) > 1
        im = rgb2gray(im);
    end
    if resize_image
        im = imresize(im, 0.5);
    end
    
    if frame > 1
        %obtain a subwindow for detection at the position from last
        %frame, and convert to Fourier domain (its size is unchanged)
        patch = get_subwindow(im, pos, window_sz);
        zf = fft2(get_features(patch, features, cell_size, cos_window));
        
        %calculate response of the classifier at all shifts
        switch kernel.type
            case 'gaussian'
                kzf = gaussian_correlation(zf, model_xf, kernel.sigma);
            case 'polynomial'
                kzf = polynomial_correlation(zf, model_xf, kernel.poly_a, kernel.poly_b);
            case 'linear'
                kzf = linear_correlation(zf, model_xf);
        end
        response = real(ifft2(model_alphaf .* kzf));  %equation for fast detection
        
        % response_inv = fftshift(response);
        
        %target location is at the maximum response. we must take into
        %account the fact that, if the target doesn't move, the peak
        %will appear at the top-left corner, not at the center (this is
        %discussed in the paper). the responses wrap around cyclically.
        [vert_delta, horiz_delta] = find(response == max(response(:)), 1);
        if vert_delta > size(zf,1) / 2  %wrap around to negative half-space of vertical axis
            vert_delta = vert_delta - size(zf,1);
        end
        if horiz_delta > size(zf,2) / 2  %same for horizontal axis
            horiz_delta = horiz_delta - size(zf,2);
        end
        pos = pos + cell_size * [vert_delta - 1, horiz_delta - 1];
    end
    
    %obtain a subwindow for training at newly estimated target position
    patch = get_subwindow(im, pos, window_sz);
    xf = fft2(get_features(patch, features, cell_size, cos_window));
    
    %Kernel Ridge Regression, calculate alphas (in Fourier domain)
    switch kernel.type
        case 'gaussian'
            kf = gaussian_correlation(xf, xf, kernel.sigma);
        case 'polynomial'
            kf = polynomial_correlation(xf, xf, kernel.poly_a, kernel.poly_b);
        case 'linear'
            kf = linear_correlation(xf, xf);
    end
    alphaf = yf ./ (kf + lambda);   %equation for fast training
    
    if frame == 1  %first frame, train with a single image
        model_alphaf = alphaf;
        model_xf = xf;
    else
        model_alphaf = (1 - interp_factor) * model_alphaf + interp_factor * alphaf;
        model_xf = (1 - interp_factor) * model_xf + interp_factor * xf;
    end
    
    region = [pos([2,1]) - target_sz([2,1])/2, target_sz([2,1])];
    
    if resize_image
		region = region * 2;
    end
    
    % **********************************
    % VOT: Report position for frame
    % **********************************
    if frame > 1
        handle = handle.report(handle, region);
    end
    frame = frame + 1;
    
end
% **********************************
% VOT: Output the results
% **********************************
handle.quit(handle);

end

function VOT_BACF
% VOT integration
%
% *************************************************************
% VOT: Always call exit command at the end to terminate Matlab!
% *************************************************************
cleanup = onCleanup(@() exit() );

% *************************************************************
% VOT: Set random seed to a different value every time.
% *************************************************************
RandStream.setGlobalStream(RandStream('mt19937ar', 'Seed', sum(clock)));

% **********************************
% VOT: Get initialization data
% **********************************
[handle, image, region] = vot('rectangle');

target_sz = [region(1,4), region(1,3)];
pos = [region(1,2), region(1,1)] + floor(target_sz/2);
frame = 1;


% parameters setting

% HOG feature parameters
hog_params.nDim = 31;
% Grayscale feature parameters
grayscale_params.colorspace = 'gray';
grayscale_params.nDim = 1;
% Global feature parameters
features = {
    ...struct('getFeature',@get_colorspace, 'fparams',grayscale_params),...  % Grayscale is not used as default
    struct('getFeature',@get_fhog,'fparams',hog_params),...
    };
params.t_global.cell_size = 4;                  % Feature cell size
params.t_global.cell_selection_thresh = 0.75^2; % Threshold for reducing the cell size in low-resolution cases
% Search region + extended background parameters
params.search_area_shape = 'square';    % the shape of the training/detection window: 'proportional', 'square' or 'fix_padding'
search_area_scale = 5;           % the size of the training/detection area proportional to the target size
filter_max_area   = 50^2;        % the size of the training/detection area in feature grid cells
% Learning parameters
learning_rate       = 0.0125;        % learning rate
output_sigma_factor = 1/16;		% standard deviation of the desired correlation output (proportional to target)
% Detection parameters
interpolate_response  = 4;        % correlation score interpolation strategy: 0 - off, 1 - feature grid, 2 - pixel grid, 4 - Newton's method
params.newton_iterations     = 50;           % number of Newton's iteration to maximize the detection scores
% the weight of the standard (uniform) regularization, only used when params.use_reg_window == 0
% Scale parameters
nScales =  5;
scale_step = 1.01;
% size, position, frames initialization
init_target_sz = target_sz;
% ADMM parameters, # of iteration, and lambda- mu and betha are set in
% the main function.
params.admm_iterations = 2;
params.admm_lambda = 0.01;

%set the feature ratio to the feature-cell size
featureRatio = params.t_global.cell_size;
search_area = prod(init_target_sz / featureRatio * search_area_scale);

% when the number of cells are small, choose a smaller cell size
if isfield(params.t_global, 'cell_selection_thresh')
    if search_area < params.t_global.cell_selection_thresh * filter_max_area
        params.t_global.cell_size = min(featureRatio, max(1, ceil(sqrt(prod(init_target_sz * search_area_scale)/(params.t_global.cell_selection_thresh * filter_max_area)))));
        
        featureRatio = params.t_global.cell_size;
        search_area = prod(init_target_sz / featureRatio * search_area_scale);
    end
end

global_feat_params = params.t_global;

if search_area > filter_max_area
    currentScaleFactor = sqrt(search_area / filter_max_area);
else
    currentScaleFactor = 1.0;
end

% target size at the initial scale
base_target_sz = target_sz / currentScaleFactor;

% window size, taking padding into account
switch params.search_area_shape
    case 'proportional'
        sz = floor( base_target_sz * search_area_scale);     % proportional area, same aspect ratio as the target
    case 'square'
        sz = repmat(sqrt(prod(base_target_sz * search_area_scale)), 1, 2); % square area, ignores the target aspect ratio
    case 'fix_padding'
        sz = base_target_sz + sqrt(prod(base_target_sz * search_area_scale) + (base_target_sz(1) - base_target_sz(2))/4) - sum(base_target_sz)/2; % const padding
    otherwise
        error('Unknown "params.search_area_shape". Must be ''proportional'', ''square'' or ''fix_padding''');
end

% set the size to exactly match the cell size
sz = round(sz / featureRatio) * featureRatio;
use_sz = floor(sz/featureRatio);

% construct the label function- correlation output, 2D gaussian function,
% with a peak located upon the target
output_sigma = sqrt(prod(floor(base_target_sz/featureRatio))) * output_sigma_factor;
rg           = circshift(-floor((use_sz(1)-1)/2):ceil((use_sz(1)-1)/2), [0 -floor((use_sz(1)-1)/2)]);
cg           = circshift(-floor((use_sz(2)-1)/2):ceil((use_sz(2)-1)/2), [0 -floor((use_sz(2)-1)/2)]);
[rs, cs]     = ndgrid(rg,cg);
y            = exp(-0.5 * (((rs.^2 + cs.^2) / output_sigma^2)));
yf           = fft2(y); %   FFT of y.

if interpolate_response == 1
    interp_sz = use_sz * featureRatio;
else
    interp_sz = use_sz;
end

% construct cosine window
cos_window = single(hann(use_sz(1))*hann(use_sz(2))');

im = imread(image);


if size(im,3) == 3
    if all(all(im(:,:,1) == im(:,:,2)))
        colorImage = false;
    else
        colorImage = true;
    end
else
    colorImage = false;
end

% compute feature dimensionality
feature_dim = 0;
for n = 1:length(features)
    
    if ~isfield(features{n}.fparams,'useForColor')
        features{n}.fparams.useForColor = true;
    end
    
    if ~isfield(features{n}.fparams,'useForGray')
        features{n}.fparams.useForGray = true;
    end
    
    if (features{n}.fparams.useForColor && colorImage) || (features{n}.fparams.useForGray && ~colorImage)
        feature_dim = feature_dim + features{n}.fparams.nDim;
    end
end

if size(im,3) > 1 && colorImage == false
    im = im(:,:,1);
end

if nScales > 0
    scale_exp = (-floor((nScales-1)/2):ceil((nScales-1)/2));
    scaleFactors = scale_step .^ scale_exp;
    min_scale_factor = scale_step ^ ceil(log(max(5 ./ sz)) / log(scale_step));
    max_scale_factor = scale_step ^ floor(log(min([size(im,1) size(im,2)] ./ base_target_sz)) / log(scale_step));
end

if interpolate_response >= 3
    % Pre-computes the grid that is used for socre optimization
    ky = circshift(-floor((use_sz(1) - 1)/2) : ceil((use_sz(1) - 1)/2), [1, -floor((use_sz(1) - 1)/2)]);
    kx = circshift(-floor((use_sz(2) - 1)/2) : ceil((use_sz(2) - 1)/2), [1, -floor((use_sz(2) - 1)/2)])';
    newton_iterations = params.newton_iterations;
end

% allocate memory for multi-scale tracking
multires_pixel_template = zeros(sz(1), sz(2), size(im,3), nScales, 'uint8');
small_filter_sz = floor(base_target_sz/featureRatio);

while true
    % *********************************
    % VOT: Get next frame
    % *********************************
    if frame > 1
        [handle, image] = handle.frame(handle);
        if isempty(image)
            break;
        end
    end
    im = imread(image);
    
    if size(im,3) > 1 && colorImage == false
        im = im(:,:,1);
    end
    %do not estimate translation and scaling on the first frame, since we
    %just want to initialize the tracker there
    if frame > 1
        for scale_ind = 1:nScales
            multires_pixel_template(:,:,:,scale_ind) = ...
                get_pixels(im, pos, round(sz*currentScaleFactor*scaleFactors(scale_ind)), sz);
        end
        
        xtf = fft2(bsxfun(@times,get_features(multires_pixel_template,features,global_feat_params),cos_window));
        responsef = permute(sum(bsxfun(@times, conj(g_f), xtf), 3), [1 2 4 3]);
        
        % if we undersampled features, we want to interpolate the
        % response so it has the same size as the image patch
        if interpolate_response == 2
            % use dynamic interp size
            interp_sz = floor(size(y) * featureRatio * currentScaleFactor);
        end
        responsef_padded = resizeDFT2(responsef, interp_sz);
        
        % response in the spatial domain
        response = ifft2(responsef_padded, 'symmetric');
        
        % find maximum peak
        if interpolate_response == 3
            error('Invalid parameter value for interpolate_response');
        elseif interpolate_response == 4
            [disp_row, disp_col, sind] = resp_newton(response, responsef_padded, newton_iterations, ky, kx, use_sz);
        else
            [row, col, sind] = ind2sub(size(response), find(response == max(response(:)), 1));
            disp_row = mod(row - 1 + floor((interp_sz(1)-1)/2), interp_sz(1)) - floor((interp_sz(1)-1)/2);
            disp_col = mod(col - 1 + floor((interp_sz(2)-1)/2), interp_sz(2)) - floor((interp_sz(2)-1)/2);
        end
        % calculate translation
        switch interpolate_response
            case 0
                translation_vec = round([disp_row, disp_col] * featureRatio * currentScaleFactor * scaleFactors(sind));
            case 1
                translation_vec = round([disp_row, disp_col] * currentScaleFactor * scaleFactors(sind));
            case 2
                translation_vec = round([disp_row, disp_col] * scaleFactors(sind));
            case 3
                translation_vec = round([disp_row, disp_col] * featureRatio * currentScaleFactor * scaleFactors(sind));
            case 4
                translation_vec = round([disp_row, disp_col] * featureRatio * currentScaleFactor * scaleFactors(sind));
        end
        
        translation_vec = double(translation_vec);
        
        % set the scale
        currentScaleFactor = currentScaleFactor * scaleFactors(sind);
        % adjust to make sure we are not to large or to small
        if currentScaleFactor < min_scale_factor
            currentScaleFactor = min_scale_factor;
        elseif currentScaleFactor > max_scale_factor
            currentScaleFactor = max_scale_factor;
        end
        %         dlmwrite('D:\matlab\workspace\vot-toolkit-master\vot_workspace\x.txt', translation_vec);
        %         dlmwrite('D:\matlab\workspace\vot-toolkit-master\vot_workspace\class.txt', class(translation_vec));
        % update position
        pos = pos + translation_vec;
    end
    
    % extract training sample image region
    pixels = get_pixels(im,pos,round(sz*currentScaleFactor),sz);
    % extract features and do windowing
    xf = fft2(bsxfun(@times,get_features(pixels,features,global_feat_params),cos_window));
    
    if (frame == 1)
        model_xf = xf;
    else
        model_xf = ((1 - learning_rate) * model_xf) + (learning_rate * xf);
    end
    
    g_f = single(zeros(size(xf)));
    h_f = g_f;
    l_f = g_f;
    mu    = 1;
    betha = 10;
    mumax = 1000;
    i = 1;
    
    T = prod(use_sz);
    S_xx = sum(conj(model_xf) .* model_xf, 3);
    params.admm_iterations = 2;
    %   ADMM
    while (i <= params.admm_iterations)
        %   solve for G- please refer to the paper for more details
        B = S_xx + (T * mu);
        S_lx = sum(conj(model_xf) .* l_f, 3);
        S_hx = sum(conj(model_xf) .* h_f, 3);
        g_f = (((1/(T*mu)) * bsxfun(@times, yf, model_xf)) - ((1/mu) * l_f) + h_f) - ...
            bsxfun(@rdivide,(((1/(T*mu)) * bsxfun(@times, model_xf, (S_xx .* yf))) - ((1/mu) * bsxfun(@times, model_xf, S_lx)) + (bsxfun(@times, model_xf, S_hx))), B);
        
        %   solve for H
        h = (T/((mu*T)+ params.admm_lambda))* ifft2((mu*g_f) + l_f);
        [sx,sy,h] = get_subwindow_no_window(h, floor(use_sz/2) , small_filter_sz);
        t = single(zeros(use_sz(1), use_sz(2), size(h,3)));
        t(sx,sy,:) = h;
        h_f = fft2(t);
        
        %   update L
        l_f = l_f + (mu * (g_f - h_f));
        
        %   update mu- betha = 10.
        mu = min(betha * mu, mumax);
        i = i+1;
    end
    
    target_sz = floor(base_target_sz * currentScaleFactor);
    
    region = [pos([2,1]) - floor(target_sz([2,1])/2), target_sz([2,1])];
    
    % **********************************
    % VOT: Report position for frame
    % **********************************
    if frame > 1
        handle = handle.report(handle, region);
    end
    frame = frame + 1;
    
end
% **********************************
% VOT: Output the results
% **********************************
handle.quit(handle);

end

个人建议：

在运行过程中可能会经常遇到报错，如果百度后发现也有很多人遇到这种错误，应该会有人提出他的解决方案，可以试一试。但如果很少有人遇到和你一样的错误或者别人的解决方案不适用于你，可以试下这个命令清空一下matlab工具箱缓存：rehash toolboxcache 或重启matlab，有时候往往有奇效。

2021.5.25更新：

由于使用vot toolkit遇到错误时无法知道错误的具体来源(无法设断点调试)，这里推荐一篇博客：https://blog.csdn.net/TS____4/article/details/88732647?spm=1001.2014.3001.5501，可以按照他的方法来确定代码中出现错误的位置。我之前就遇到了错误，当时我是采用将变量打印的方式来排查错误的位置的，最终发现错误的原因是代码中有一行使用了一个函数，而那个函数所在的路径没有被添加到当前路径中，直接addpath就解决了。