基于2D特征的运动估计

基于2D特征的运动估计

基于2D特征的运动估计是从两个或多个匹配的2D点的集合中估计运动的问题。它涉及以下技术点:

  • 特征的提取和匹配
  • 运动估计

特征提取和匹配

这个过程主要包括三个步骤,即:

  1. 特征提取
  2. 光流法追踪匹配的点
  3. 筛选有效的特征点

根据这个路线,一种经典的代码构成如下:

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

void main()
{
    VideoCapture cap("xxx.mp4");

    if (!cap.isOpened())
        return -1;

    // Get frame count
    int n_frames = int(cap.get(CAP_PROP_FRAME_COUNT));

    // Get width and height of video stream
    int w = int(cap.get(CAP_PROP_FRAME_WIDTH));
    int h = int(cap.get(CAP_PROP_FRAME_HEIGHT));

    // Get frames per second (fps)
    double fps = cap.get(cv::CAP_PROP_FPS);

    Mat curr, curr_gray;
    Mat prev, prev_gray;
    for (int i = 1; i < n_frames - 1; i++)
    {
        // Vector from previous and current feature points
        vector <Point2f> prev_pts, curr_pts;

        // Detect features in previous frame
        goodFeaturesToTrack(prev_gray, prev_pts, 200, 0.01, 30);

        // Read next frame 
        bool success = cap.read(curr);
        if (!success) break;

        // Convert to grayscale
        cvtColor(curr, curr_gray, COLOR_BGR2GRAY);

        // Calculate optical flow (i.e. track feature points)
        vector <uchar> status;
        vector <float> err;
        calcOpticalFlowPyrLK(prev_gray, curr_gray, prev_pts, curr_pts, status, err);

        // Filter only valid points
        auto prev_it = prev_pts.begin();
        auto curr_it = curr_pts.begin();
        for (size_t k = 0; k < status.size(); k++)
        {
            if (status[k])
            {
                prev_it++;
                curr_it++;
            }
            else
            {
                prev_it = prev_pts.erase(prev_it);
                curr_it = curr_pts.erase(curr_it);
            }
        }
    }
}

利用goodFeaturesToTrack()calcOpticalFlowPyrLK(),这一套组合拳下来,很容实现特征的提取和匹配。然而,这背后涉及了很多算法及大佬们的思想,后续通过阅读opencv的源码,学习大佬们的精髓。

运动估计

2D运动估计的原理就是计算2D变换。2D变换根据自由度的不同可以分为旋转、平移、相似变换、仿射变换以及投影变换(透视变换)等变换。

2D坐标变换中,旋转、相似都属于线性变换,平移则不属于线性变换。但在齐次坐标系下,平移也变成了线性,上述所有变换将可以由一个变换矩阵来表示,即单应性矩阵。原因是,在透视空间中,平移被看作是一个剪切变换。因此,在2D或3D欧几里空间中,平移是一个非线性变换,但在3D或4D齐次坐标下,就变成了一个线性变换。

根据场景的不同,合理选择不同的变换方式。如,在Video Stabilization中,一般选择仿射变换,当然也可以选择投影变换。但是由于在相机运动估计过程中,假设投影中心并未改变,因此选择仿射变换足够。

RANSAC

我们对匹配的特征点,利用最小二乘进行运动估计时,如果存在外点的话(几乎总是有的),得到的结果将会受外点的干扰而不够鲁棒。因此,在进行运动估计时,首先要对外点进行筛选。常用的方法有RANSAC和最小中位方差(LMS)。

RANSAC算法伪代码
Given:
    data – A set of observations.
    model – A model to explain observed data points.
    n – Minimum number of data points required to estimate model parameters.
    k – Maximum number of iterations allowed in the algorithm.
    t – Threshold value to determine data points that are fit well by model.
    d – Number of close data points required to assert that a model fits well to data.

Return:
    bestFit – model parameters which best fit the data (or null if no good model is found)

iterations = 0
bestFit = null
bestErr = something really large

while iterations < k do
    maybeInliers := n randomly selected values from data
    maybeModel := model parameters fitted to maybeInliers
    alsoInliers := empty set
    for every point in data not in maybeInliers do
        if point fits maybeModel with an error smaller than t
             add point to alsoInliers
    end for
    if the number of elements in alsoInliers is > d then
        // This implies that we may have found a good model
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    increment iterations
end while

return bestFit

对于我们运动估计而言,RANSAC算法中对应变量的含义:

Given Varin code(as follow)comment
datapoints0,points1在特征提取和匹配阶段得到的一系列坐标点(x0, y0)和(x1,y1)
modelM仿射变换(下文将会讲解如何求解).
nparams.size即求解仿射变换至少要提供三个坐标点,才能够求解,否则无法提供足够约束求解6个变量
kniters最大迭代次数,通过公式可以估计出.
tparams.thresh确定是否为内点的阈值
dninliersMax记录拟合最多内点个数
代码
enum MotionModel
{
    MM_TRANSLATION = 0,
    MM_TRANSLATION_AND_SCALE = 1,
    MM_ROTATION = 2,
    MM_RIGID = 3,
    MM_SIMILARITY = 4,
    MM_AFFINE = 5,
    MM_HOMOGRAPHY = 6,
    MM_UNKNOWN = 7
};
struct RansacParams
{
    int size; //!< subset size
    float thresh; //!< max error to classify as inlier
    float eps; //!< max outliers ratio
    float prob; //!< probability of success

    RansacParams() : size(0), thresh(0), eps(0), prob(0) {}
    /** @brief Constructor
    @param size Subset size.
    @param thresh Maximum re-projection error value to classify as inlier.
    @param eps Maximum ratio of incorrect correspondences.
    @param prob Required success probability.
     */
    RansacParams(int size, float thresh, float eps, float prob);

    /**
    @return Number of iterations that'll be performed by RANSAC method.
    */
    int niters() const
    {
        return static_cast<int>(
                std::ceil(std::log(1 - prob) / std::log(1 - std::pow(1 - eps, size))));
    }

    /**
    @param model Motion model. 
    @return Default RANSAC method parameters for the given motion model.
    */
    static RansacParams default2dMotion(MotionModel model)
    {
        CV_Assert(model < MM_UNKNOWN);
        if (model == MM_TRANSLATION)
            return RansacParams(1, 0.5f, 0.5f, 0.99f);
        if (model == MM_TRANSLATION_AND_SCALE)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_ROTATION)
            return RansacParams(1, 0.5f, 0.5f, 0.99f);
        if (model == MM_RIGID)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_SIMILARITY)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_AFFINE)
            return RansacParams(3, 0.5f, 0.5f, 0.99f);
        return RansacParams(4, 0.5f, 0.5f, 0.99f);
    }
};

Mat estimateGlobalMotionRansac(
        InputArray points0, InputArray points1, int model, const RansacParams &params,
        float *rmse, int *ninliers)
{
    CV_Assert(model <= MM_AFFINE);
    CV_Assert(points0.type() == points1.type());
    const int npoints = points0.getMat().checkVector(2);
    CV_Assert(points1.getMat().checkVector(2) == npoints);

    if (npoints < params.size)
        return Mat::eye(3, 3, CV_32F);

    const Point2f *points0_ = points0.getMat().ptr<Point2f>();
    const Point2f *points1_ = points1.getMat().ptr<Point2f>();
    const int niters = params.niters(); // 估算推导出的迭代数

    // current hypothesis
    std::vector<int> indices(params.size);
    std::vector<Point2f> subset0(params.size);
    std::vector<Point2f> subset1(params.size);

    // best hypothesis
    std::vector<int> bestIndices(params.size);

    Mat_<float> bestM;
    int ninliersMax = -1;

    RNG rng(0);
    Point2f p0, p1;
    float x, y;

    // 1 use the minimum number of data points required to estimate model parameters to cal model, then record the model corresponding to the max nums of maybeInliers
    for (int iter = 0; iter < niters; ++iter)
    {
        // 1.1 randomly select values
        for (int i = 0; i < params.size; ++i)
        {
            bool ok = false;
            while (!ok)
            {
                ok = true;
                indices[i] = static_cast<unsigned>(rng) % npoints; 
                // The randomly selected data cannot be the same as the previous data. Otherwise, random selection fails
                for (int j = 0; j < i; ++j) 
                    if (indices[i] == indices[j])
                        { ok = false; break; } // if fails, continue to randomly select
            }
        }
        // based the above index, extract the data used to fit the models
        for (int i = 0; i < params.size; ++i)
        {
            subset0[i] = points0_[indices[i]];
            subset1[i] = points1_[indices[i]];
        }

        // 1.2 cal models
        Mat_<float> M = estimateGlobalMotionLeastSquares(subset0, subset1, model, 0);

        // 1.3 evaluate every points based the above model
        // put all the data into the model above, then cal the nums of maybeInliers
        int numinliers = 0;
        for (int i = 0; i < npoints; ++i)
        {
            p0 = points0_[i];
            p1 = points1_[i];
            x = M(0,0)*p0.x + M(0,1)*p0.y + M(0,2);
            y = M(1,0)*p0.x + M(1,1)*p0.y + M(1,2);
            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
                numinliers++;
        }
        // 1.4 record the model corresponding to the max nums of maybeInliers
        if (numinliers >= ninliersMax)
        {
            bestM = M;
            ninliersMax = numinliers;
            bestIndices.swap(indices);
        }
    }

    // 2 after iters, use the bestM to filter maybeInliers from all points, then cal model
    if (ninliersMax < params.size)
    {
        // compute RMSE
        for (int i = 0; i < params.size; ++i)
        {
            subset0[i] = points0_[bestIndices[i]];
            subset1[i] = points1_[bestIndices[i]];
        }
        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
    }
    else
    {
        subset0.resize(ninliersMax);
        subset1.resize(ninliersMax);
        // select maybeInliers from all points
        for (int i = 0, j = 0; i < npoints && j < ninliersMax ; ++i)
        {
            p0 = points0_[i];
            p1 = points1_[i];
            x = bestM(0,0)*p0.x + bestM(0,1)*p0.y + bestM(0,2);
            y = bestM(1,0)*p0.x + bestM(1,1)*p0.y + bestM(1,2);
            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
            {
                subset0[j] = p0;
                subset1[j] = p1;
                j++;
            }
        }
        // based maybeInliers, cal model
        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
    }

    if (ninliers)
        *ninliers = ninliersMax;

    return std::move(bestM);
}

最小二乘估计

通过RANSAC过滤掉可能的外点,然后基于剩余的”内点“,进行运动估计。

基于最小二乘的2D运动估计主要步骤如下:

  1. 对2D坐标进行各向同性归一化,即normalizePoints()
  2. 构建求解方程(正规方程)
  3. 求解正规方程
  4. 反归一化
各向同性归一化(Isotropic scaling

用于2D运动估计的稀疏点集来自于不同的图像。因此,在接下来变换矩阵求解前,对坐标点进行性归一化非常重要。

对于DLT,基本矩阵求解时,归一化非常重要。

主要步骤如下

  1. The points are translated so that their centroid is at the origin.
  2. The points are then scaled so that the average distance from the origin is equal to √2.( for points(x, y), This means that the “average” point is equal to (1, 1) )
  3. This transformation is applied to each of the two images independently.

对应代码如下:

inline float sqr(float x) { return x * x; }

static Mat normalizePoints(int npoints, Point2f* points)
{
    float cx = 0.f, cy = 0.f; // centroid of points
    for (int i = 0; i < npoints; ++i)
    {
        cx += points[i].x;
        cy += points[i].y;
    }
    cx /= npoints;
    cy /= npoints;

    float d = 0.f; // average distance
    for (int i = 0; i < npoints; ++i)
    {
        points[i].x -= cx;
        points[i].y -= cy;
        d += std::sqrt(sqr(points[i].x) + sqr(points[i].y));
    }
    d /= npoints;

    float s = std::sqrt(2.f) / d;
    for (int i = 0; i < npoints; ++i)
    {
        points[i].x *= s;
        points[i].y *= s;
    }
	// T matrix for normalization
    Mat_<float> T = Mat::eye(3, 3, CV_32F);
    T(0, 0) = T(1, 1) = s;
    T(0, 2) = -cx * s;
    T(1, 2) = -cy * s;
    return std::move(T);
}
构建求解方程

对于仿射变换有:
在这里插入图片描述

展开上述式子,将系数a提取出来,即,
在这里插入图片描述

除了6个未知数,其他都是来自坐标点。因此,只要提供3组及以上的坐标点,就可以求解上述方程

// A*x = b
Mat_<float> A(2 * npoints, 3), b(2 * npoints, 1);
float* a0, * a1;
Point2f p0, p1;

for (int i = 0; i < npoints; ++i)
{
    a0 = A[2 * i];
    a1 = A[2 * i + 1];
    p0 = points0[i];
    p1 = points1[i];
    a0[0] = p0.x; a0[1] = 1; a0[2] = 0;
    a1[0] = p0.y; a1[1] = 0; a1[2] = 1;
    b(2 * i, 0) = p1.x;
    b(2 * i + 1, 0) = p1.y;
}
求解正规方程
 Mat_<float> sol;
solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);

if (rmse)
    *rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));

Mat_<float> M = Mat::eye(3, 3, CV_32F);
for (int i = 0, k = 0; i < 2; ++i)
    for (int j = 0; j < 3; ++j, ++k)
        M(i, j) = sol(k, 0);
反归一化

由于求解上述变换矩阵,是在归一化的坐标上进行的,不能直接对像素坐标进行变换。因此,需要反归一化,得到对像素坐标进行变换的变换矩阵。

return T1.inv() * M * T0;
代码

整个仿射变换求解的代码如下:

static Mat estimateGlobMotionLeastSquaresAffine(
    int npoints, Point2f* points0, Point2f* points1, float* rmse)
{
    Mat_<float> T0 = normalizePoints(npoints, points0);
    Mat_<float> T1 = normalizePoints(npoints, points1);

    Mat_<float> A(2 * npoints, 6), b(2 * npoints, 1);
    float* a0, * a1;
    Point2f p0, p1;

    for (int i = 0; i < npoints; ++i)
    {
        a0 = A[2 * i];
        a1 = A[2 * i + 1];
        p0 = points0[i];
        p1 = points1[i];

        a0[0] = p0.x; a0[1] = p0.y; a0[2] = 1; a0[3] = a0[4] = a0[5] = 0;
        a1[0] = a1[1] = a1[2] = 0; a1[3] = p0.x; a1[4] = p0.y; a1[5] = 1;
        b(2 * i, 0) = p1.x;
        b(2 * i + 1, 0) = p1.y;
    }

    Mat_<float> sol;
    solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);

    if (rmse)
        *rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));

    Mat_<float> M = Mat::eye(3, 3, CV_32F);
    for (int i = 0, k = 0; i < 2; ++i)
        for (int j = 0; j < 3; ++j, ++k)
            M(i, j) = sol(k, 0);

    return T1.inv() * M * T0;
}

同理,其他变换的求解方式类似。

最后

运动估计内容来自opencv中的Video Stabilization模块,该模块主要用于视频的稳像、防抖处理。该模块位于opencv_contrib

基于2D特征进行运动估计,整体思路简单,值得注意的是,在进行变换矩阵求解前,需要对坐标点进行归一化。

将整个过程的源码进行模块化分解、学习并记录,每一块都有可能在以后的工作中用到,这里就相当于小小的积累吧。

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值