上次已经用卡尔曼滤波实现了人脸跟踪。也取得了一定的效果。。但仍存在一些问题,如人侧脸的时候跟踪效果并不理想。而KLT算法则能很好的解决这个问题。本文一共两部分。。理论部分借鉴博了客园上的博主(https://www.cnblogs.com/moondark/archive/2012/05/12/2497391.html)
一、理论
近来在研究跟踪,跟踪的方法其实有很多,如粒子滤波(pf)、meanshift跟踪,以及KLT跟踪或叫Lucas光流法,这些方法各自有各自的有点,对于粒子滤波而言,它能够比较好的在全局搜索到最优解,但其求解速度相对较慢,由于其是基于颜色直方图的计算,所以对相同颜色东西不太能够区别,meanshift方法很容易陷入局部最优,但速度还是挺快,所以现在很有一些人是将meanshift跟pf结合做跟踪,恰好在很多方面能够互补。
Kanade-Lucas-Tomasi方法,在跟踪方面表现的也不错,尤其在实时计算速度上,用它来得到的,是很多点的轨迹“trajectory”,并且还有一些发生了漂移的点,所以,得到跟踪点之后要进行一些后期的处理。这里讲的是一种图像点定位的方法,即图像的局部匹配,将图像匹配问题,从传统的滑动窗口搜索方法变为一个求解偏移量d的过程。在求解d的过程中,哪些情况下可以保证一定能够得到d的解,这些情况的点有什么特点(后来会发现,很多时候都是寻找的角点)。
先说KLT算法的几个前提假设:
1)亮度恒定
2)时间连续或者是运动是“小运动”
3)空间一致,临近点有相似运动,保持相邻
这几个为什么要这么假设,很直观的讲,如果判断一个视频的相邻两帧I、J在某局部窗口w上是一样的,则在窗口w内有:I(x, y, t) = J(x’, y’, t+τ),亮度恒定的假设(假设1)即为了保证其等号成立不受亮度的影响,假设2是为了保证KLT能够找到点,假设3则为以下原因假设(即对于同一个窗口中,所有点的偏移量都相等):
在窗口w上,所有(x, y)都往一个方向移动了(dx, dy),从而得到(x’, y’),即t时刻的(x, y)点在t+τ时刻为(x+dx, y+dy),所以寻求匹配的问题可化为对以下的式子寻求最小值,或叫做最小化以下式子:
用积分来表示上述式子,以上式子可等效为:
这个式子的含义,即找到两副图像中,在W窗口中,I、J的差异,其中I以x-d/2为中心,J以x+d/2为中心,w/2为半径的一个矩形窗口间的差异,好吧,结合我们微积分的知识,函数ε(d)要取得最小值,这个极值点的导数一定为0,即
的值为0,由泰勒展开的性质:
可以得到:
于是,问题转化为:
其中:
从而,问题即为:
即其等式可看作为:
其中,Z为一个22的矩阵,e为一个21的向量,
为了要使d能够得到解,则Z需要满足条件,即Z*Z’矩阵可逆,其中Z’为Z矩阵的转置(ZT),在一般情况下,角点具有这样的特点。
二、代码
程序部分是根据做了两个视频的案例来做的。话不多说,直接上代码
function [faceDet,faceTracks] = trackFaces( videoFilename,fittingModel,outFilename,visualise )
%TRACKFACES tracks all faces in a video
% INPUT
% videofilename - path to input video
% fittingModel - model options for face detection
% outfilename - path where we want to store our face detections
% visualise - boolean to turn on visualisation
% OUTPUT
% faceDet - bounding box detections indexed by frame id
% faceTracks - bounding box detections indexed by track id
% define track structure (track id based)
% faceTracks(trackID).bbox;
% faceTracks(trackID).frameid;
% define detection structure (frame id based)
% faceDet(frameID).trackIDs;
% faceDet(frameID).bboxes;
%
% Written by James Charles at University of Leeds 27/04/16
%
% If you use it then please cite the paper (for which is was developed):
% @Article{Charles16b,
% author = "Charles, J. and Magee, D. and Hogg.",
% title = "Virtual Immortality: Reanimating characters from TV shows",
% booktitle = "ECCV Workshop on Virtual/Augmented Reality for Visual Artificial Intelligence (VARVAI)",
% year = "2016",
% }
%
% This work is licensed under the Creative Commons Attribution 4.0 International License.
% To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or
% send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
faceTracks = [];
faceDet = [];
h_img = []; %for visualisation only (image handle)
%load video object
vidobj = VideoReader(videoFilename);
%setup face detector
faceDetector = vision.CascadeObjectDetector();
faceDetector.MaxSize = round(fittingModel.maxsize*fittingModel.facedetect.imscale);
faceDetector.MinSize = round(fittingModel.minsize*fittingModel.facedetect.imscale);
faceDetector.MergeThreshold = fittingModel.facedetect.sensitivity;
%setup trackers
trackers(1).maxID = 0;
trackers(1).numActive = 0;
%get starting frame index based on current saved output and also
%tracker info
[startIndex,trackers] = getSavedIndex(outFilename);
%load the saved tracks
if exist(outFilename,'file');
load(outFilename);
end
%loop over frames and track
for i = startIndex:vidobj.NumberOfFrames
if mod(i,1000)==0
fprintf('frame %d of %d, number of active trackers: %d\n',i,vidobj.NumberOfFrames,trackers(1).numActive);
end
img = read(vidobj,i);
%track all previous faces to this frame
[faceDet,faceTracks,trackers] = track(faceDet,faceTracks,trackers,img,i);
%detect all faces
facebboxes = detectFaces(faceDetector,img);
%start new tracks if we detect new faces
[faceDet,faceTracks,trackers] = addNewFaces(faceDet, faceTracks,trackers,facebboxes,img,i);
%save output every 1000 frames
saveOutput(outFilename,faceDet,faceTracks,trackers,vidobj.NumberOfFrames,i,1000);
%visualise
if visualise
h_img = vis(img,faceDet,trackers,i,h_img);
end
end
%produce tracking structure from faceDet
if isempty(faceTracks)
faceTracks = dets2tracks(faceDet);
frameID = vidobj.NumberOfFrames;
save(outFilename,'faceDet','frameID','faceTracks');
end
end
%function to convert faceDet structure to a faceTracks structure
function faceTracks = dets2tracks(faceDet)
faceTracks(1).bboxes = [];
faceTracks(1).frameIDs = [];
for frameID = 1:numel(faceDet)
if ~isempty(faceDet(frameID).trackIDs)
for t = 1:numel(faceDet(frameID).trackIDs)
trackID = faceDet(frameID).trackIDs(t);
if trackID > numel(faceTracks)
faceTracks(trackID).bboxes = faceDet(frameID).bboxes(t,:);
faceTracks(trackID).frameIDs = frameID;
else
faceTracks(trackID).bboxes = cat(1,faceTracks(trackID).bboxes,faceDet(frameID).bboxes(t,:));
faceTracks(trackID).frameIDs = cat(2,faceTracks(trackID).frameIDs,frameID);
end
end
end
end
end
%function to return start index for tracking
function [index,trackers] = getSavedIndex(outFilename)
if ~exist(outFilename,'file')
index = 1;
trackers(1).maxID = 0;
trackers(1).numActive = 0;
else
data = load(outFilename);
index = data.frameID + 1;
trackers = data.trackers;
end
end
%function to save the tracks
function saveOutput(outFilename,faceDet,faceTracks,trackers,maxFrames,frameID,stepSize)
if frameID == maxFrames || mod(frameID,stepSize)==0
save(outFilename,'faceDet','faceTracks','frameID','trackers');
end
end
%track all faces to the next frame (destroys trackers when neccessary)
function [faceDet,faceTracks,trackers] = track(faceDet,faceTracks,trackers,img,frameID)
if trackers(1).maxID == 0 || trackers(1).numActive == 0
return
else
%for each tracker track bbox to new image
for i = 1:numel(faceDet(frameID-1).trackIDs)
trackerID = find(faceDet(frameID-1).trackIDs(i)==cat(1,trackers(:).trackID)); %find tracker associated with this face
if isempty(trackerID); continue; end %if tracker destroyed then skip
[isGood,facebbox,tracker] = trackBBox(trackers(trackerID),img,faceDet(frameID-1).bboxes(i,:)); %track the face forward in time
%update faceDet and faceTracks with new data
if isGood
[faceDet,faceTracks] = updateFace(faceDet,faceTracks,facebbox,trackers(trackerID).trackID,frameID);
%update the tracker
trackers(trackerID).points = tracker.oldPoints;
trackers(trackerID).tracker = tracker.tracker;
else
if numel(faceDet)<frameID
faceDet(frameID).bboxes = [];
end
%we need to remove this tracker and let the system
%initialise a new one when it gets a good face detection
trackers(1).numActive = max(0,trackers(1).numActive - 1);
release(trackers(trackerID).tracker);
if trackerID == 1
if numel(trackers)==1
trackers(trackerID).points = [];
trackers(trackerID).tracker = [];
else
maxID = trackers(1).maxID;
numActive = trackers(1).numActive;
trackers(trackerID) = [];
trackers(trackerID).maxID = maxID;
trackers(trackerID).numActive = numActive;
end
else
trackers(trackerID) = [];
end
end
end
end
end
function [isGood, facebbox, tracker] = trackBBox(tracker,img,facebbox)
isGood = false;
tracker.oldpoints = tracker.points;
[points, isFound] = step(tracker.tracker, img);
tracker.points = points(isFound, :);
oldInliers = tracker.oldpoints(isFound, :);
if size(tracker.points, 1) >= 3 % need at least 4 points
% Estimate the geometric transformation between the old points
% and the new points and eliminate outliers
[xform, oldInliers, visiblePoints] = estimateGeometricTransform(...
oldInliers, tracker.points, 'similarity', 'MaxDistance', 5);
% Position bounding box at centre of points
centre = mean(tracker.points);
tempFacebbox = [centre(1)-facebbox(3)/2,centre(2)-facebbox(4)/2,facebbox(3),facebbox(4)];
tempFacebbox = round(tempFacebbox);
%check if tempFacebbox very different, if so track is not good
A1 = prod(facebbox(3:4)); %area of new bounding box
A2 = prod(tempFacebbox(3:4)); %area of old bounding box
%compute intersection area
U = rectint(facebbox,tempFacebbox);
%calculate a similarity score and threshold at 0.7
similarity = (U/A1 + U/A2)/2;
if similarity > 0.7
isGood = true;
else
isGood = false;
end
facebbox = tempFacebbox;
tracker.oldPoints = visiblePoints;
setPoints(tracker.tracker, tracker.oldPoints);
end
end
function [faceDet,faceTracks,trackers] = addNewFaces(faceDet, faceTracks,trackers,facebboxes,img,frameID)
%for each box see if it is a new face
if ~isempty(facebboxes)
numDets = size(facebboxes,1);
for i = 1:numDets
if ~isempty(faceDet)
[isNew,matchingFaceID] = isNewFace(faceDet,facebboxes(i,:),frameID);
if isNew
[faceDet,faceTracks,trackers] = addFace(faceDet,faceTracks,trackers,facebboxes(i,:),img,frameID);
else
%update tracker with points in detected face
if ~isempty(matchingFaceID)
trackerID = find(cat(1,trackers(:).trackID)==faceDet(frameID).trackIDs(matchingFaceID));
if ~isempty(trackerID)
%re-initialise the tracker
trackers(trackerID).points = cat(1,trackers(trackerID).points,getFeaturePoints(img,facebboxes(i,:)));
%randomly sample at most 30 points
Ridx = randperm(size(trackers(trackerID).points,1));
trackers(trackerID).points = trackers(trackerID).points(Ridx(1:min(numel(Ridx),30)),:);
setPoints(trackers(trackerID).tracker, trackers(trackerID).points);
end
end
end
else
[faceDet,faceTracks,trackers] = addFace(faceDet,faceTracks,trackers,facebboxes(i,:),img,frameID);
end
end
end
end
%this function adds a new face and initialises a new tracker
function [faceDet,faceTracks,trackers] = addFace(faceDet,faceTracks,trackers,facebbox,img,frameID)
numActiveTrackers = trackers(1).numActive;
%get features
points = getFeaturePoints(img,facebbox);
if ~isempty(points)
%setup new tracker
trackerID = numActiveTrackers+1;
trackers(1).maxID = trackers(1).maxID + 1; %maxID stores the total number of trackers so far
trackers(1).numActive = trackers(1).numActive + 1;
trackers(trackerID).trackID = trackers(1).maxID;
trackers(trackerID).tracker = vision.PointTracker('MaxBidirectionalError', 2);
%get features to track
trackers(trackerID).points = points;
%initialise the tracker
initialize(trackers(trackerID).tracker,trackers(trackerID).points,img);
%update the faceDet and faceTracks structure
[faceDet,faceTracks] = updateFace(faceDet,faceTracks,facebbox,trackers(trackerID).trackID,frameID);
end
end
%this function updates the faceDet and faceTracks structure with a new
%bounding box
function [faceDet,faceTracks] = updateFace(faceDet,faceTracks,facebbox,trackID,frameID)
if ~isempty(faceDet)
if numel(faceDet)<frameID % add new face and frame
faceDet(frameID).trackIDs = trackID;
faceDet(frameID).bboxes = facebbox;
faceDet(frameID).landmarks = [];
else %append face to current frame
faceDet(frameID).trackIDs = cat(2,faceDet(end).trackIDs,trackID);
faceDet(frameID).bboxes = cat(1,faceDet(end).bboxes,facebbox);
faceDet(frameID).landmarks = [];
end
else
faceDet(frameID).trackIDs = trackID;
faceDet(frameID).bboxes = facebbox;
faceDet(frameID).landmarks = [];
end
end
function bboxes = detectFaces(faceDetector,img)
bboxes = step(faceDetector, img);
end
%check if the current bounding box is a new detection
function [newface,matchingFaceID] = isNewFace(faceDet,facebbox,frameID)
newface = false;
matchingFaceID = [];
if isempty(faceDet)
newface = true;
elseif numel(faceDet)<frameID
newface = true;
else
%check overlap of facebbox with bounding boxes present in faceDet
numBBoxes = size(faceDet(frameID).bboxes,1);
A1 = prod(facebbox(3:4)); %area of new bounding box
for i = 1:numBBoxes
A2 = prod(faceDet(frameID).bboxes(i,3:4)); %area of old bounding box
%compute intersection area
U = rectint(facebbox,faceDet(frameID).bboxes(i,:));
%calculate a similarity score and threshold at 0.9
similarity = (U/A1 + U/A2)/2;
if similarity > 0.6
newface = false;
matchingFaceID = i;
break;
else
newface = true;
end
end
end
end
%function to convert a bounding box to points (unused)
function points = bbox2point(bbox)
points = zeros(4,2);
points(1,:) = bbox(1:2);
points(2,:) = points(1,:) + [0 bbox(3)];
points(3,:) = points(2,:) + [bbox(4) 0];
points(4,:) = points(1,:) + [bbox(4) 0];
end
%function to convert points to a bounding box (unused)
function bbox = point2bbox(points)
bbox = zeros(1,4);
bbox(1) = min(points(:,1));
bbox(2) = min(points(:,2));
bbox(3) = max(points(:,1))-bbox(1);
bbox(4) = max(points(:,2))-bbox(2);
end
%function to return feature points given bounding box and image
function locations = getFeaturePoints(img,bbox)
% points = detectMinEigenFeatures(rgb2gray(img), 'ROI', bbox);
points = detectBRISKFeatures(rgb2gray(img), 'ROI', bbox,'MinContrast',0.001);%ROI感兴趣的部分
locations = points.Location;
if ~isempty(locations)
%get cluster close to centre of points
centre = mean(locations);
dist = sqrt(sum(bsxfun(@minus,locations,centre).^2,2));
i = 1;
idrem = ones(1,size(locations,1));
while sum(idrem)>=size(locations,1)
idrem = dist>15*i;
i = i + 0.5;
end
locations(idrem,:) = [];
end
end
%function to visualise the output
function h_img = vis(img,faceDet,trackers,frameID,h_img)
if isempty(h_img) %create figure and image pane
figure
h_img = imagesc(img); axis image;
end
if numel(faceDet)>=frameID
for b = 1:size(faceDet(frameID).bboxes,1)
img = insertObjectAnnotation(img,'rectangle',faceDet(frameID).bboxes(b,:),sprintf('track: %d',faceDet(frameID).trackIDs(b)));
end
for b = 1:size(faceDet(frameID).bboxes,1)
trackerID = find(cat(1,trackers(:).trackID)==faceDet(frameID).trackIDs(b));
if ~isempty(trackerID)
plotPoints = cat(2,trackers(trackerID).points(:,1:2),2*ones(size(trackers(trackerID).points,1),1));
img = insertObjectAnnotation(img,'circle',plotPoints,' ','TextBoxOpacity',0,'Color','white');
end
end
end
set(h_img,'cdata',img);
drawnow
end
主函数部分就不在此展示了,可以根据自己的需要来写。。对于这部分的代码是需要花点时间去理解和消化的。。跟踪视频没法放两张截图吧
值得注意的是,KLT算法已经得到了更好的优化,在跟踪检测方面进一步完善。另外KLT在DSP板C6000中也有很广泛的应用,在军事领域,安防监控等方面也具有重要意义。DSP的处理器能很好地运行KLT。对以后的硬件学习中也很有帮助。
PS:谈一点新人对matlab机器学习或者是深度学习的感受吧。。现在明显感觉到matlab在深度学习的劣势了。。运行程序很费力,一两分钟的视频就可能把电脑跑卡。这个方向也确实很吃硬件。另一方面,matlab工程领域上很难在硬件开发板上运行。在转成C语言上也存在很多问题。。Opencv在这方面上稍微有优势,毕竟还有树莓派,openmv等硬件可以跑代码。。兼学python吧。。