基于2D特征的运动估计
基于2D特征的运动估计是从两个或多个匹配的2D点的集合中估计运动的问题。它涉及以下技术点:
- 特征的提取和匹配
- 运动估计
特征提取和匹配
这个过程主要包括三个步骤,即:
- 特征提取
- 光流法追踪匹配的点
- 筛选有效的特征点
根据这个路线,一种经典的代码构成如下:
#include <iostream>
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;
void main()
{
VideoCapture cap("xxx.mp4");
if (!cap.isOpened())
return -1;
// Get frame count
int n_frames = int(cap.get(CAP_PROP_FRAME_COUNT));
// Get width and height of video stream
int w = int(cap.get(CAP_PROP_FRAME_WIDTH));
int h = int(cap.get(CAP_PROP_FRAME_HEIGHT));
// Get frames per second (fps)
double fps = cap.get(cv::CAP_PROP_FPS);
Mat curr, curr_gray;
Mat prev, prev_gray;
for (int i = 1; i < n_frames - 1; i++)
{
// Vector from previous and current feature points
vector <Point2f> prev_pts, curr_pts;
// Detect features in previous frame
goodFeaturesToTrack(prev_gray, prev_pts, 200, 0.01, 30);
// Read next frame
bool success = cap.read(curr);
if (!success) break;
// Convert to grayscale
cvtColor(curr, curr_gray, COLOR_BGR2GRAY);
// Calculate optical flow (i.e. track feature points)
vector <uchar> status;
vector <float> err;
calcOpticalFlowPyrLK(prev_gray, curr_gray, prev_pts, curr_pts, status, err);
// Filter only valid points
auto prev_it = prev_pts.begin();
auto curr_it = curr_pts.begin();
for (size_t k = 0; k < status.size(); k++)
{
if (status[k])
{
prev_it++;
curr_it++;
}
else
{
prev_it = prev_pts.erase(prev_it);
curr_it = curr_pts.erase(curr_it);
}
}
}
}
利用goodFeaturesToTrack()
和calcOpticalFlowPyrLK()
,这一套组合拳下来,很容实现特征的提取和匹配。然而,这背后涉及了很多算法及大佬们的思想,后续通过阅读opencv的源码,学习大佬们的精髓。
运动估计
2D运动估计的原理就是计算2D变换。2D变换根据自由度的不同可以分为旋转、平移、相似变换、仿射变换以及投影变换(透视变换)等变换。
2D坐标变换中,旋转、相似都属于线性变换,平移则不属于线性变换。但在齐次坐标系下,平移也变成了线性,上述所有变换将可以由一个变换矩阵来表示,即单应性矩阵。原因是,在透视空间中,平移被看作是一个剪切变换。因此,在2D或3D欧几里空间中,平移是一个非线性变换,但在3D或4D齐次坐标下,就变成了一个线性变换。
根据场景的不同,合理选择不同的变换方式。如,在Video Stabilization中,一般选择仿射变换,当然也可以选择投影变换。但是由于在相机运动估计过程中,假设投影中心并未改变,因此选择仿射变换足够。
RANSAC
我们对匹配的特征点,利用最小二乘进行运动估计时,如果存在外点的话(几乎总是有的),得到的结果将会受外点的干扰而不够鲁棒。因此,在进行运动估计时,首先要对外点进行筛选。常用的方法有RANSAC和最小中位方差(LMS)。
RANSAC算法伪代码:
Given:
data – A set of observations.
model – A model to explain observed data points.
n – Minimum number of data points required to estimate model parameters.
k – Maximum number of iterations allowed in the algorithm.
t – Threshold value to determine data points that are fit well by model.
d – Number of close data points required to assert that a model fits well to data.
Return:
bestFit – model parameters which best fit the data (or null if no good model is found)
iterations = 0
bestFit = null
bestErr = something really large
while iterations < k do
maybeInliers := n randomly selected values from data
maybeModel := model parameters fitted to maybeInliers
alsoInliers := empty set
for every point in data not in maybeInliers do
if point fits maybeModel with an error smaller than t
add point to alsoInliers
end for
if the number of elements in alsoInliers is > d then
// This implies that we may have found a good model
// now test how good it is.
betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
thisErr := a measure of how well betterModel fits these points
if thisErr < bestErr then
bestFit := betterModel
bestErr := thisErr
end if
end if
increment iterations
end while
return bestFit
对于我们运动估计而言,RANSAC算法中对应变量的含义:
Given Var | in code(as follow) | comment |
---|---|---|
data | points0,points1 | 在特征提取和匹配阶段得到的一系列坐标点(x0, y0)和(x1,y1) |
model | M | 仿射变换(下文将会讲解如何求解). |
n | params.size | 即求解仿射变换至少要提供三个坐标点,才能够求解,否则无法提供足够约束求解6个变量 |
k | niters | 最大迭代次数,通过公式可以估计出. |
t | params.thresh | 确定是否为内点的阈值 |
d | ninliersMax | 记录拟合最多内点个数 |
代码
enum MotionModel
{
MM_TRANSLATION = 0,
MM_TRANSLATION_AND_SCALE = 1,
MM_ROTATION = 2,
MM_RIGID = 3,
MM_SIMILARITY = 4,
MM_AFFINE = 5,
MM_HOMOGRAPHY = 6,
MM_UNKNOWN = 7
};
struct RansacParams
{
int size; //!< subset size
float thresh; //!< max error to classify as inlier
float eps; //!< max outliers ratio
float prob; //!< probability of success
RansacParams() : size(0), thresh(0), eps(0), prob(0) {}
/** @brief Constructor
@param size Subset size.
@param thresh Maximum re-projection error value to classify as inlier.
@param eps Maximum ratio of incorrect correspondences.
@param prob Required success probability.
*/
RansacParams(int size, float thresh, float eps, float prob);
/**
@return Number of iterations that'll be performed by RANSAC method.
*/
int niters() const
{
return static_cast<int>(
std::ceil(std::log(1 - prob) / std::log(1 - std::pow(1 - eps, size))));
}
/**
@param model Motion model.
@return Default RANSAC method parameters for the given motion model.
*/
static RansacParams default2dMotion(MotionModel model)
{
CV_Assert(model < MM_UNKNOWN);
if (model == MM_TRANSLATION)
return RansacParams(1, 0.5f, 0.5f, 0.99f);
if (model == MM_TRANSLATION_AND_SCALE)
return RansacParams(2, 0.5f, 0.5f, 0.99f);
if (model == MM_ROTATION)
return RansacParams(1, 0.5f, 0.5f, 0.99f);
if (model == MM_RIGID)
return RansacParams(2, 0.5f, 0.5f, 0.99f);
if (model == MM_SIMILARITY)
return RansacParams(2, 0.5f, 0.5f, 0.99f);
if (model == MM_AFFINE)
return RansacParams(3, 0.5f, 0.5f, 0.99f);
return RansacParams(4, 0.5f, 0.5f, 0.99f);
}
};
Mat estimateGlobalMotionRansac(
InputArray points0, InputArray points1, int model, const RansacParams ¶ms,
float *rmse, int *ninliers)
{
CV_Assert(model <= MM_AFFINE);
CV_Assert(points0.type() == points1.type());
const int npoints = points0.getMat().checkVector(2);
CV_Assert(points1.getMat().checkVector(2) == npoints);
if (npoints < params.size)
return Mat::eye(3, 3, CV_32F);
const Point2f *points0_ = points0.getMat().ptr<Point2f>();
const Point2f *points1_ = points1.getMat().ptr<Point2f>();
const int niters = params.niters(); // 估算推导出的迭代数
// current hypothesis
std::vector<int> indices(params.size);
std::vector<Point2f> subset0(params.size);
std::vector<Point2f> subset1(params.size);
// best hypothesis
std::vector<int> bestIndices(params.size);
Mat_<float> bestM;
int ninliersMax = -1;
RNG rng(0);
Point2f p0, p1;
float x, y;
// 1 use the minimum number of data points required to estimate model parameters to cal model, then record the model corresponding to the max nums of maybeInliers
for (int iter = 0; iter < niters; ++iter)
{
// 1.1 randomly select values
for (int i = 0; i < params.size; ++i)
{
bool ok = false;
while (!ok)
{
ok = true;
indices[i] = static_cast<unsigned>(rng) % npoints;
// The randomly selected data cannot be the same as the previous data. Otherwise, random selection fails
for (int j = 0; j < i; ++j)
if (indices[i] == indices[j])
{ ok = false; break; } // if fails, continue to randomly select
}
}
// based the above index, extract the data used to fit the models
for (int i = 0; i < params.size; ++i)
{
subset0[i] = points0_[indices[i]];
subset1[i] = points1_[indices[i]];
}
// 1.2 cal models
Mat_<float> M = estimateGlobalMotionLeastSquares(subset0, subset1, model, 0);
// 1.3 evaluate every points based the above model
// put all the data into the model above, then cal the nums of maybeInliers
int numinliers = 0;
for (int i = 0; i < npoints; ++i)
{
p0 = points0_[i];
p1 = points1_[i];
x = M(0,0)*p0.x + M(0,1)*p0.y + M(0,2);
y = M(1,0)*p0.x + M(1,1)*p0.y + M(1,2);
if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
numinliers++;
}
// 1.4 record the model corresponding to the max nums of maybeInliers
if (numinliers >= ninliersMax)
{
bestM = M;
ninliersMax = numinliers;
bestIndices.swap(indices);
}
}
// 2 after iters, use the bestM to filter maybeInliers from all points, then cal model
if (ninliersMax < params.size)
{
// compute RMSE
for (int i = 0; i < params.size; ++i)
{
subset0[i] = points0_[bestIndices[i]];
subset1[i] = points1_[bestIndices[i]];
}
bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
}
else
{
subset0.resize(ninliersMax);
subset1.resize(ninliersMax);
// select maybeInliers from all points
for (int i = 0, j = 0; i < npoints && j < ninliersMax ; ++i)
{
p0 = points0_[i];
p1 = points1_[i];
x = bestM(0,0)*p0.x + bestM(0,1)*p0.y + bestM(0,2);
y = bestM(1,0)*p0.x + bestM(1,1)*p0.y + bestM(1,2);
if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
{
subset0[j] = p0;
subset1[j] = p1;
j++;
}
}
// based maybeInliers, cal model
bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
}
if (ninliers)
*ninliers = ninliersMax;
return std::move(bestM);
}
最小二乘估计
通过RANSAC过滤掉可能的外点,然后基于剩余的”内点“,进行运动估计。
基于最小二乘的2D运动估计主要步骤如下:
- 对2D坐标进行各向同性归一化,即
normalizePoints()
。 - 构建求解方程(正规方程)
- 求解正规方程
- 反归一化
各向同性归一化(Isotropic scaling)
用于2D运动估计的稀疏点集来自于不同的图像。因此,在接下来变换矩阵求解前,对坐标点进行性归一化非常重要。
对于DLT,基本矩阵求解时,归一化非常重要。
- The points are translated so that their centroid is at the origin.
- The points are then scaled so that the average distance from the origin is equal to √2.( for points(x, y), This means that the “average” point is equal to (1, 1) )
- This transformation is applied to each of the two images independently.
对应代码如下:
inline float sqr(float x) { return x * x; }
static Mat normalizePoints(int npoints, Point2f* points)
{
float cx = 0.f, cy = 0.f; // centroid of points
for (int i = 0; i < npoints; ++i)
{
cx += points[i].x;
cy += points[i].y;
}
cx /= npoints;
cy /= npoints;
float d = 0.f; // average distance
for (int i = 0; i < npoints; ++i)
{
points[i].x -= cx;
points[i].y -= cy;
d += std::sqrt(sqr(points[i].x) + sqr(points[i].y));
}
d /= npoints;
float s = std::sqrt(2.f) / d;
for (int i = 0; i < npoints; ++i)
{
points[i].x *= s;
points[i].y *= s;
}
// T matrix for normalization
Mat_<float> T = Mat::eye(3, 3, CV_32F);
T(0, 0) = T(1, 1) = s;
T(0, 2) = -cx * s;
T(1, 2) = -cy * s;
return std::move(T);
}
构建求解方程
对于仿射变换有:
展开上述式子,将系数a提取出来,即,
除了6个未知数,其他都是来自坐标点。因此,只要提供3组及以上的坐标点,就可以求解上述方程
// A*x = b
Mat_<float> A(2 * npoints, 3), b(2 * npoints, 1);
float* a0, * a1;
Point2f p0, p1;
for (int i = 0; i < npoints; ++i)
{
a0 = A[2 * i];
a1 = A[2 * i + 1];
p0 = points0[i];
p1 = points1[i];
a0[0] = p0.x; a0[1] = 1; a0[2] = 0;
a1[0] = p0.y; a1[1] = 0; a1[2] = 1;
b(2 * i, 0) = p1.x;
b(2 * i + 1, 0) = p1.y;
}
求解正规方程
Mat_<float> sol;
solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);
if (rmse)
*rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));
Mat_<float> M = Mat::eye(3, 3, CV_32F);
for (int i = 0, k = 0; i < 2; ++i)
for (int j = 0; j < 3; ++j, ++k)
M(i, j) = sol(k, 0);
反归一化
由于求解上述变换矩阵,是在归一化的坐标上进行的,不能直接对像素坐标进行变换。因此,需要反归一化,得到对像素坐标进行变换的变换矩阵。
return T1.inv() * M * T0;
代码
整个仿射变换求解的代码如下:
static Mat estimateGlobMotionLeastSquaresAffine(
int npoints, Point2f* points0, Point2f* points1, float* rmse)
{
Mat_<float> T0 = normalizePoints(npoints, points0);
Mat_<float> T1 = normalizePoints(npoints, points1);
Mat_<float> A(2 * npoints, 6), b(2 * npoints, 1);
float* a0, * a1;
Point2f p0, p1;
for (int i = 0; i < npoints; ++i)
{
a0 = A[2 * i];
a1 = A[2 * i + 1];
p0 = points0[i];
p1 = points1[i];
a0[0] = p0.x; a0[1] = p0.y; a0[2] = 1; a0[3] = a0[4] = a0[5] = 0;
a1[0] = a1[1] = a1[2] = 0; a1[3] = p0.x; a1[4] = p0.y; a1[5] = 1;
b(2 * i, 0) = p1.x;
b(2 * i + 1, 0) = p1.y;
}
Mat_<float> sol;
solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);
if (rmse)
*rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));
Mat_<float> M = Mat::eye(3, 3, CV_32F);
for (int i = 0, k = 0; i < 2; ++i)
for (int j = 0; j < 3; ++j, ++k)
M(i, j) = sol(k, 0);
return T1.inv() * M * T0;
}
同理,其他变换的求解方式类似。
最后
运动估计内容来自opencv中的Video Stabilization模块,该模块主要用于视频的稳像、防抖处理。该模块位于opencv_contrib。
基于2D特征进行运动估计,整体思路简单,值得注意的是,在进行变换矩阵求解前,需要对坐标点进行归一化。
将整个过程的源码进行模块化分解、学习并记录,每一块都有可能在以后的工作中用到,这里就相当于小小的积累吧。