Matlab多目标跟踪示例（二）：Tracking Pedestrians from a Moving Car

最新推荐文章于 2024-08-12 14:15:00 发布

哈哈哈哈嘿嘿嘿

最新推荐文章于 2024-08-12 14:15:00 发布

阅读量1.6w

点赞数 9

分类专栏：【视觉目标跟踪】【matlab】

本文链接：https://blog.csdn.net/yuhq3/article/details/78596822

版权

【视觉目标跟踪】同时被 2 个专栏收录

24 篇文章 20 订阅

订阅专栏

【matlab】

4 篇文章 0 订阅

订阅专栏

示例地址：https://cn.mathworks.com/help/vision/examples/tracking-pedestrians-from-a-moving-car.html

本文与上一篇Motion-Based Multiple Object Tracking的：

相同之处在于结构相同，都是先检测出目标，再用卡尔曼滤波预测下一帧的位置，最后关联tracking 和detection。

不同之处在于：

a）将检测子从混合高斯模型换成了聚合通道特征（Aggregate Channel Feature，ACF），算法的具体介绍可参考
http://blog.csdn.net/xiny520/article/details/51460148，能有效解决之前遇到的如下问题：

①非感兴趣目标（车辆、树叶等）被检测到

②连续静止的目标不会被检测到

③并排走的多个目标被当做一个目标检测

此外，还采用了“非最大化抑制策略”，作用是当对同一个行人目标产生多个跟踪框时（可能因为背景噪声或行人重叠等原因），只选用框与框之间重叠率最大的那一个作为检测目标跟踪框。

b）增加了一个辅助数据文件pedScaleTable，它记录了在视频帧中行人所在的像素点位置与他对应的跟踪框大小的关系，具体的保存形式是：nx1的向量，第i行的数代表的是该行人的脚的位置（要获取跟踪框大小就在脚的坐标基础上加预测的高度），而i等于min(length(pedScaleTable), round(y + height))，其中length(pedScaleTable)表示这个文件的长度即为n，y和height分别表示由检测子得来的当前候选跟踪框的bottom值和高度。这样，通过这个公式得到了i，再代回pedScaleTabel向量中就得到了预测的高度，然后再通过式子abs(estHeight - height) > estHeight * scThresh，可以得到不满足期望的尺度大小的候选框，其中estHeight是刚才得到的预测高度，scThresh是预设的尺度阈值，通过这个式子，我们可以剔除不符合预测高度的检测目标候选跟踪框，提高跟踪的精确性。现在已经知道了这个辅助数据文件的作用了，但是它又是怎么得到的呢？事实上，这个辅助数据文件是通过在同一个角度、相似的场景下，拍摄含有不同距离的行人图像，将所有图像组成一个训练集并通过imageLabeler app对行人的bounding box进行标定，然后将得到的行人位置和bounding box的高度通过回归的方法训练生成产生的。而原文中并没有提到使用的训练数据集是哪里来的，所以我个人认为这个数据文件只适用于部分特定的场合（如在移动摄像头下拍摄中远距离的行人），当对其他类型视频进行跟踪时应重新训练数据文件，否则跟踪提升效果不大。具体参考实验截图。

想要更深了解该文件的读者可在matlab上自行调用查看，如下：

scaleDataFile   = 'pedScaleTable.mat';
ld = load(scaleDataFile, 'pedScaleTable');
ld.pedScaleTable

c）更新tracking 和 detection前使用的损失函数不同，从预测跟踪框与检测跟踪框的“欧几里得距离”换成了“重叠率”，具体指的是tracking与detection两个跟踪框之间交叉重叠的面积越大，就越有可能成为匹配。

d）在将unassigned tracking删除时，增加了一个判断条件：当置信度没有达到阈值时，也会删除该跟踪框。所谓置信度是在检测目标阶段得到的分数，置信度越高代表该检测目标与感兴趣目标（行人）相似度越高。为什么要提出置信度？打个比方，在检测阶段，使用motion-based时我们是将所有看起来像是人的目标都当做人了，这其中可能包括车子，或者风吹动的招牌、树叶等等，而现在这个acf算法，是不仅要得到像人的目标，还要给他们评分，只有达到要求了才算做是人。这样就能很大程度地筛除非感兴趣目标，提高准确率。

使用pedScaleTable辅助数据文件的部分实验截图：

图1

图2

图3

图4

不使用pedScaleTable辅助数据文件的部分实验截图：

图5

图6

图7

图8

实验截图中，红色框内表示的是需要检测的区域，这个区域越小速度也就越快，但能检测到的目标也就越少，可以视视频类型和内容自行修正，不同颜色框代表检测到的不同行人，框上的数字表示的置信度。

通过上述实验截图可以看到，有无pedScaleTable数据文件对跟踪效果的影响其实并不大，漏检和错检并没有因此发生显著的变化，原因可能是摄像头位置较高而且面向车辆行进的方向，正前方较小行人，两侧的行人较远，因此行人尺度普遍较小，所以作用不明显。但是不管有没有使用pedScaleTable，基于ACF和卡尔曼滤波的多目标跟踪还是比基于混合高斯模型和卡尔曼滤波的多目标跟踪更加鲁棒的，成功地解决了后者的一些问题。但是它的跟踪速度、精准性还有待提高。

另外，我使用的是Matlab r2016a的版本来运行的demo，对最新2017版本的demo不兼容，原因是最新版本更新了自带的ACF算法，请使用的读者们注意，根据版本自行修改使用的函数。

%% Tracking Pedestrians from a Moving Car
%
% This example shows how to track pedestrians using a camera mounted in a
% moving car.
%
% Copyright 2014 The MathWorks, Inc.

%% Overview
% This example shows how to perform automatic detection and tracking of
% people in a video from a moving camera. It demonstrates the flexibility
% of a tracking system adapted to a moving camera, which is ideal for
% automotive safety applications. Unlike the stationary camera
% example, <motion-based-multiple-object-tracking.html The Motion-Based
% Multiple Object Tracking>, this example contains several additional 
% algorithmic steps. These steps include people detection, customized 
% non-maximum suppression, and heuristics to identify and eliminate false 
% alarm tracks. For more information please see 
% <matlab:helpview(fullfile(docroot,'toolbox','vision','vision.map'),'multipleObjectTracking') Multiple Object Tracking>.
%
% This example is a function with the main body at the top and helper 
% routines in the form of 
% <matlab:helpview(fullfile(docroot,'toolbox','matlab','matlab_prog','matlab_prog.map'),'nested_functions') nested functions> 
% below.

function PedestrianTrackingFromMovingCameraExample()

% Create system objects used for reading video, loading prerequisite data file, detecting pedestrians, and displaying the results.
videoFile       = 'MOT17-13-SDP.mp4';
scaleDataFile   = 'pedScaleTable.mat'; % An auxiliary file that helps to determine the size of a pedestrian at different pixel locations.

obj = setupSystemObjects(videoFile, scaleDataFile);

% Create an empty array of tracks.
tracks = initializeTracks(); 

% ID of the next track.
nextId = 1; 

% 总帧数 = 视频总秒数 * 视频帧率（帧/秒）
numFrames = obj.reader.Duration * obj.reader.FrameRate;

% 视频宽度、高度
totalWidth = obj.reader.Width;
totalHeight = obj.reader.Height;

% Set the global parameters.
option.ROI                  = [10 10 totalWidth - 50 totalHeight - 50];  % A rectangle [x, y, w, h] that limits the processing area to ground locations.
option.scThresh             = 0.3;              % A threshold to control the tolerance of error in estimating the scale of a detected pedestrian. 
option.gatingThresh         = 0.9;              % A threshold to reject a candidate match between a detection and a track.
option.gatingCost           = 100;              % A large value for the assignment cost matrix that enforces the rejection of a candidate match.
option.costOfNonAssignment  = 10;               % A tuning parameter to control the likelihood of creation of a new track.
option.timeWindowSize       = 16;               % A tuning parameter to specify the number of frames required to stabilize the confidence score of a track.
option.confidenceThresh     = 2;                % A threshold to determine if a track is true positive or false alarm.
option.ageThresh            = 8;                % A threshold to determine the minimum length required for a track being true positive.
option.visThresh            = 0.6;              % A threshold to determine the minimum visibility value for a track being true positive.

% Detect people and track them across video frames.
for curFrame = 1 : numFrames
    if ~obj.reader.hasFrame()
        break;
    end    
    frame   = readFrame();
 
    [centroids, bboxes, scores] = detectPeople();
    
    predictNewLocationsOfTracks();    
    
    [assignments, unassignedTracks, unassignedDetections] = ...
        detectionToTrackAssignment();
    
    updateAssignedTracks();    
    updateUnassignedTracks();    
    deleteLostTracks();    
    createNewTracks();
    
    displayTrackingResults();
end


%% Auxiliary Input and Global Parameters of the Tracking System
% This tracking system requires a data file that contains information that
% relates the pixel location in the image to the size of the bounding box
% marking the pedestrian's location. This prior knowledge is stored in a
% vector |pedScaleTable|. The n-th entry in |pedScaleTable| represents the
% estimated height of an adult person in pixels. The index |n| references
% the approximate Y-coordinate of the pedestrian's feet.
%
% To obtain such a vector, a collection of training images were taken from
% the same viewpoint and in a similar scene to the testing environment. The
% training images contained images of pedestrians at varying distances
% from the camera. Using the 
% <matlab:helpview(fullfile(docroot,'toolbox','vision','vision.map'),'visionTrainingImageLabeler'); trainingImageLabeler>
% app, bounding boxes of the pedestrians in the images were manually
% annotated. The height of the bounding boxes together with the location of
% the pedestrians in the image were used to generate the scale data file
% through regression. Here is a helper function to show the algorithmic steps to 
% fit the linear regression model:
% <matlab:edit(fullfile(matlabroot,'toolbox','vision','visiondemos','helperTableOfScales.m')) |helperTableOfScales.m|>   
%
%
% There is also a set of global parameters that can be tuned to optimize
% the tracking performance. You can use the descriptions below to learn out
% how these parameters affect the tracking performance.
%
% * |ROI| :                 Region-Of-Interest in the form of
%                           [x, y, w, h]. It limits the processing area to 
%                           ground locations.
% * |scThresh| :            Tolerance threshold for scale estimation.
%                           When the difference between the detected scale
%                           and the expected scale exceeds the tolerance,
%                           the candidate detection is considered to be
%                           unrealistic and is removed from the output.
% * |gatingThresh| :        Gating parameter for the distance measure. When
%                           the cost of matching the detected bounding box
%                           and the predicted bounding box exceeds the
%                           threshold, the system removes the association
%                           of the two bounding boxes from tracking
%                           consideration.
% * |gatingCost| :          Value for the assignment cost matrix to
%                           discourage the possible tracking to detection 
%                           assignment.
% * |costOfNonAssignment| : Value for the assignment cost matrix for
%                           not assigning a detection or a track. Setting 
%                           it too low increases the likelihood of
%                           creating a new track, and may result in track 
%                           fragmentation. Setting it too high may result 
%                           in a single track corresponding to a series of
%                           separate moving objects.
% * |timeWindowSize| :      Number of frames required to estimate the
%                           confidence of the track.
% * |confidenceThresh| :    Confidence threshold to determine if the
%                           track is a true positive.
% * |ageThresh| :           Minimum length of a track being a true positive.
% * |visThresh| :           Minimum visibility threshold to determine if
%                           the track is a true positive.


%% Create System Objects for the Tracking System Initialization 
% The |setupSystemObjects| function creates system objects used for reading
% and displaying the video frames and loads the scale data file.
%
% The |pedScaleTable| vector, which is stored in the scale data file,
% encodes our prior knowledge of the target and the scene. Once you have the
% regressor trained from your samples, you can compute the expected height
% at every possible Y-position in the image. These values are stored in the
% vector. The n-th entry in |pedScaleTable| represents our estimated height 
% of an adult person in pixels. The index |n| references the approximate
% Y-coordinate of the pedestrian's feet.

    function obj = setupSystemObjects(videoFile,scaleDataFile)
        % Initialize Video I/O
        % Create objects for reading a video from a file, drawing the 
        % detected and tracked people in each frame, and playing the video.
        
        % Create a video file reader.
        obj.reader = VideoReader(videoFile);
        
        % Create a video player.
        obj.videoPlayer = vision.VideoPlayer('Position', [29, 597, 643, 386]);                
        
        % Load the scale data file                                        
        ld = load(scaleDataFile, 'pedScaleTable');
        obj.pedScaleTable = ld.pedScaleTable;
    end


%% Initialize Tracks
% The |initializeTracks| function creates an array of tracks, where each
% track is a structure representing a moving object in the video. The
% purpose of the structure is to maintain the state of a tracked object.
% The state consists of information used for detection-to-track assignment,
% track termination, and display. 
%
% The structure contains the following fields:
%
% * |id| :                  An integer ID of the track.
% * |color| :               The color of the track for display purpose.
% * |bboxes| :              A N-by-4 matrix to represent the bounding boxes 
%                           of the object with the current box at the last
%                           row. Each row has a form of [x, y, width,
%                           height].
% * |scores| :              An N-by-1 vector to record the classification
%                           score from the person detector with the current
%                           detection score at the last row.
% * |kalmanFilter| :        A Kalman filter object used for motion-based
%                           tracking. We track the center point of the
%                           object in image;
% * |age| :                 The number of frames since the track was
%                           initialized.
% * |totalVisibleCount| :   The total number of frames in which the object
%                           was detected (visible).
% * |confidence| :          A pair of two numbers to represent how
%                           confident we trust the track. It stores the 
%                           maximum and the average detection scores in the
%                           past within a predefined time window.
% * |predPosition| :        The predicted bounding box in the next frame.

    function tracks = initializeTracks()
        % Create an empty array of tracks
        tracks = struct(...
            'id', {}, ...
            'color', {}, ...
            'bboxes', {}, ...
            'scores', {}, ...
            'kalmanFilter', {}, ...
            'age', {}, ...
            'totalVisibleCount', {}, ...
            'confidence', {}, ...            
            'predPosition', {});
    end

%% Read a Video Frame
% Read the next video frame from the video file.
    function frame = readFrame()
        frame = obj.reader.readFrame();
    end

%% Detect People
% The |detectPeople| function returns the centroids, the bounding boxes,
% and the classification scores of the detected people. It performs
% filtering and non-maximum suppression on the raw output of |detectPeopleACF|.
% * |centroids| :         A N-by-2 matrix with each row in the form of [x,y].
% * |bboxes| :            A N-by-4 matrix with each row in the form of
%                         [x, y, width, height].
% * |scores| :            A N-by-1 vector with each element is the
%                         classification score at the corresponding frame.

    function [centroids, bboxes, scores] = detectPeople()
        % Resize the image to increase the resolution of the pedestrian.
        % This helps detect people further away from the camera.
        % resizeRatio = 1.5;
        % frame = imresize(frame, resizeRatio, 'Antialiasing',false);
        
        % Run ACF people detector within a region of interest to produce
        % detection candidates.
        [bboxes, scores] = detectPeopleACF(frame, option.ROI, ...
            'Model','caltech',...
            'WindowStride', 2,...
            'NumScaleLevels', 4, ...
            'SelectStrongest', false);
        
        % Look up the estimated height of a pedestrian based on location of their feet.
        % height = bboxes(:, 4) / resizeRatio;
        % y = (bboxes(:,2)-1) / resizeRatio + 1; 
        height = bboxes(:, 4);
        y = (bboxes(:,2)-1); 
         yfoot = min(length(obj.pedScaleTable), round(y + height));
         estHeight = obj.pedScaleTable(yfoot); 
        
        % Remove detections whose size deviates from the expected size, 
        % provided by the calibrated scale estimation. 
         invalid = abs(estHeight-height)>estHeight*option.scThresh;        
         bboxes(invalid, :) = [];
         scores(invalid, :) = [];

        % Apply non-maximum suppression to select the strongest bounding boxes.
        % 非最大化抑制：当产生重叠的检测目标时，去除重叠率超过阈值的框
        [bboxes, scores] = selectStrongestBbox(bboxes, scores, ...
                            'RatioType', 'Min', 'OverlapThreshold', 0.6);                               
        
        % Compute the centroids
        if isempty(bboxes)
            centroids = [];
        else
            centroids = [(bboxes(:, 1) + bboxes(:, 3) / 2), ...
                (bboxes(:, 2) + bboxes(:, 4) / 2)];
        end
    end

%% Predict New Locations of Existing Tracks
% Use the Kalman filter to predict the centroid of each track in the
% current frame, and update its bounding box accordingly. We take the width
% and height of the bounding box in previous frame as our current
% prediction of the size.

    function predictNewLocationsOfTracks()
        for i = 1:length(tracks)
            % Get the last bounding box on this track.
            bbox = tracks(i).bboxes(end, :);
            
            % Predict the current location of the track.
            predictedCentroid = predict(tracks(i).kalmanFilter);
            
            % Shift the bounding box so that its center is at the predicted location.
            tracks(i).predPosition = [predictedCentroid - bbox(3:4)/2, bbox(3:4)];
        end
    end

%% Assign Detections to Tracks
% Assigning object detections in the current frame to existing tracks is
% done by minimizing cost. The cost is computed using the |bboxOverlapRatio| 
% function, and is the overlap ratio between the predicted bounding box and 
% the detected bounding box. In this example, we assume the person will move 
% gradually in consecutive frames due to the high frame rate of the video 
% and the low motion speed of a person.
%
% The algorithm involves two steps: 
%
% Step 1: Compute the cost of assigning every detection to each track using
% the |bboxOverlapRatio| measure. As people move towards or away from the
% camera, their motion will not be accurately described by the centroid
% point alone. The cost takes into account the distance on the image plane as
% well as the scale of the bounding boxes. This prevents assigning
% detections far away from the camera to tracks closer to the
% camera, even if their centroids coincide. The choice of this cost function
% will ease the computation without resorting to a more sophisticated
% dynamic model. The results
% are stored in an MxN matrix, where M is the number of tracks, and N is
% the number of detections.
%
% Step 2: Solve the assignment problem represented by the cost matrix using
% the |assignDetectionsToTracks| function. The function takes the cost
% matrix and the cost of not assigning any detections to a track.
%
% The value for the cost of not assigning a detection to a track depends on
% the range of values returned by the cost function. This value must be
% tuned experimentally. Setting it too low increases the likelihood of
% creating a new track, and may result in track fragmentation. Setting it
% too high may result in a single track corresponding to a series of
% separate moving objects.
%
% The |assignDetectionsToTracks| function uses the Munkres' version of the
% Hungarian algorithm to compute an assignment which minimizes the total
% cost. It returns an M x 2 matrix containing the corresponding indices of
% assigned tracks and detections in its two columns. It also returns the
% indices of tracks and detections that remained unassigned.

    function [assignments, unassignedTracks, unassignedDetections] = ...
            detectionToTrackAssignment()
        
        % Compute the overlap ratio between the predicted boxes and the
        % detected boxes, and compute the cost of assigning each detection
        % to each track. The cost is minimum when the predicted bbox is
        % perfectly aligned with the detected bbox (overlap ratio is one)
        predBboxes = reshape([tracks(:).predPosition], 4, [])';
        cost = 1 - bboxOverlapRatio(predBboxes, bboxes);

        % Force the optimization step to ignore some matches by
        % setting the associated cost to be a large number. Note that this
        % number is different from the 'costOfNonAssignment' below.
        % This is useful when gating (removing unrealistic matches)
        % technique is applied.
        cost(cost > option.gatingThresh) = 1 + option.gatingCost;

        % Solve the assignment problem.
        [assignments, unassignedTracks, unassignedDetections] = ...
            assignDetectionsToTracks(cost, option.costOfNonAssignment);
    end

%% Update Assigned Tracks
% The |updateAssignedTracks| function updates each assigned track with the
% corresponding detection. It calls the |correct| method of
% |vision.KalmanFilter| to correct the location estimate. Next, it stores
% the new bounding box by taking the average of the size of recent (up to) 
% 4 boxes, and increases the age of the track and the total visible count 
% by 1. Finally, the function adjusts our confidence score for the track 
% based on the previous detection scores. 

    function updateAssignedTracks()
        numAssignedTracks = size(assignments, 1);
        for i = 1:numAssignedTracks
            trackIdx = assignments(i, 1);
            detectionIdx = assignments(i, 2);

            centroid = centroids(detectionIdx, :);
            bbox = bboxes(detectionIdx, :);
            
            % Correct the estimate of the object's location
            % using the new detection.
            correct(tracks(trackIdx).kalmanFilter, centroid);
            
            % Stabilize the bounding box by taking the average of the size 
            % of recent (up to) 4 boxes on the track. 
            T = min(size(tracks(trackIdx).bboxes,1), 4);
            w = mean([tracks(trackIdx).bboxes(end-T+1:end, 3); bbox(3)]);
            h = mean([tracks(trackIdx).bboxes(end-T+1:end, 4); bbox(4)]);
            tracks(trackIdx).bboxes(end+1, :) = [centroid - [w, h]/2, w, h];
            
            % Update track's age.
            tracks(trackIdx).age = tracks(trackIdx).age + 1;
            
            % Update track's score history
            tracks(trackIdx).scores = [tracks(trackIdx).scores; scores(detectionIdx)];
            
            % Update visibility.
            tracks(trackIdx).totalVisibleCount = ...
                tracks(trackIdx).totalVisibleCount + 1;
            
            % Adjust track confidence score based on the maximum detection
            % score in the past 'timeWindowSize' frames.
            T = min(option.timeWindowSize, length(tracks(trackIdx).scores));
            score = tracks(trackIdx).scores(end-T+1:end);
            tracks(trackIdx).confidence = [max(score), mean(score)];
        end
    end

%% Update Unassigned Tracks
% The |updateUnassignedTracks| function marks each unassigned track as 
% invisible, increases its age by 1, and appends the predicted bounding box 
% to the track. The confidence is set to zero since we are not sure why it
% was not assigned to a track.

    function updateUnassignedTracks()
        for i = 1:length(unassignedTracks)
            idx = unassignedTracks(i);
            tracks(idx).age = tracks(idx).age + 1;
            tracks(idx).bboxes = [tracks(idx).bboxes; tracks(idx).predPosition];
            tracks(idx).scores = [tracks(idx).scores; 0];
            
            % Adjust track confidence score based on the maximum detection
            % score in the past 'timeWindowSize' frames
            T = min(option.timeWindowSize, length(tracks(idx).scores));
            score = tracks(idx).scores(end-T+1:end);
            tracks(idx).confidence = [max(score), mean(score)];
        end
    end

%% Delete Lost Tracks
% The |deleteLostTracks| function deletes tracks that have been invisible
% for too many consecutive frames. It also deletes recently created tracks
% that have been invisible for many frames overall.
% 
% Noisy detections tend to result in creation of false tracks. For this
% example, we remove a track under following conditions:
%
% * The object was tracked for a short time. This typically happens when a 
%   false detection shows up for a few frames and a track was initiated for it. 
% * The track was marked invisible for most of the frames. 
% * It failed to receive a strong detection within the past few frames, 
%   which is expressed as the maximum detection confidence score.

    function deleteLostTracks()
        if isempty(tracks)
            return;
        end        
        
        % Compute the fraction of the track's age for which it was visible.
        ages = [tracks(:).age]';
        totalVisibleCounts = [tracks(:).totalVisibleCount]';
        visibility = totalVisibleCounts ./ ages;
        
        % Check the maximum detection confidence score.
        confidence = reshape([tracks(:).confidence], 2, [])';
        maxConfidence = confidence(:, 1);

        % Find the indices of 'lost' tracks.
        lostInds = (ages <= option.ageThresh & visibility <= option.visThresh) | ...
             (maxConfidence <= option.confidenceThresh);

        % Delete lost tracks.
        tracks = tracks(~lostInds);
    end

%% Create New Tracks
% Create new tracks from unassigned detections. Assume that any unassigned
% detection is a start of a new track. In practice, you can use other cues
% to eliminate noisy detections, such as size, location, or appearance.

    function createNewTracks()
        unassignedCentroids = centroids(unassignedDetections, :);
        unassignedBboxes = bboxes(unassignedDetections, :);
        unassignedScores = scores(unassignedDetections);
        
        for i = 1:size(unassignedBboxes, 1)            
            centroid = unassignedCentroids(i,:);
            bbox = unassignedBboxes(i, :);
            score = unassignedScores(i);
            
            % Create a Kalman filter object.
            kalmanFilter = configureKalmanFilter('ConstantVelocity', ...
                centroid, [2, 1], [5, 5], 100);
            
            % Create a new track.
            newTrack = struct(...
                'id', nextId, ...
                'color', 255*rand(1,3), ...
                'bboxes', bbox, ...
                'scores', score, ...
                'kalmanFilter', kalmanFilter, ...
                'age', 1, ...
                'totalVisibleCount', 1, ...
                'confidence', [score, score], ...
                'predPosition', bbox);
            
            % Add it to the array of tracks.
            tracks(end + 1) = newTrack; %#ok<AGROW>
            
            % Increment the next id.
            nextId = nextId + 1;
        end
    end

%% Display Tracking Results
% The |displayTrackingResults| function draws a colored bounding box for
% each track on the video frame. The level of transparency of the box
% together with the displayed score indicate the confidence of the
% detections and tracks.
    
    function displayTrackingResults()

        displayRatio = 4/3;
        frame = imresize(frame, displayRatio);
        
        if ~isempty(tracks),
            ages = [tracks(:).age]';        
            confidence = reshape([tracks(:).confidence], 2, [])';
            maxConfidence = confidence(:, 1);
            avgConfidence = confidence(:, 2);
            opacity = min(0.5,max(0.1,avgConfidence/3));
            noDispInds = (ages < option.ageThresh & maxConfidence < option.confidenceThresh) | ...
                       (ages < option.ageThresh / 2);
                   
            for i = 1:length(tracks)
                if ~noDispInds(i)
                    
                    % scale bounding boxes for display
                    bb = tracks(i).bboxes(end, :);
                    bb(:,1:2) = (bb(:,1:2)-1)*displayRatio + 1;
                    bb(:,3:4) = bb(:,3:4) * displayRatio;
                    
                    
                    frame = insertShape(frame, ...
                                            'FilledRectangle', bb, ...
                                            'Color', tracks(i).color, ...
                                            'Opacity', opacity(i));
                    frame = insertObjectAnnotation(frame, ...
                                            'rectangle', bb, ...
                                            num2str(avgConfidence(i)), ...
                                            'Color', tracks(i).color);
                end
            end
        end
        
        frame = insertShape(frame, 'Rectangle', option.ROI * displayRatio, ...
                                'Color', [255, 0, 0], 'LineWidth', 3);
                            
        step(obj.videoPlayer, frame);
        
    end

%%
displayEndOfDemoMessage(mfilename)
end