基于2D特征的运动估计

最新推荐文章于 2024-05-15 09:05:21 发布

mppcasc

最新推荐文章于 2024-05-15 09:05:21 发布

阅读量569

点赞数

分类专栏：算法文章标签：算法

本文链接：https://blog.csdn.net/mppcasc/article/details/119811049

版权

算法专栏收录该内容

5 篇文章 0 订阅

订阅专栏

基于2D特征的运动估计

基于2D特征的运动估计是从两个或多个匹配的2D点的集合中估计运动的问题。它涉及以下技术点：

特征的提取和匹配
运动估计

特征提取和匹配

这个过程主要包括三个步骤，即：

特征提取
光流法追踪匹配的点
筛选有效的特征点

根据这个路线，一种经典的代码构成如下：

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

void main()
{
    VideoCapture cap("xxx.mp4");

    if (!cap.isOpened())
        return -1;

    // Get frame count
    int n_frames = int(cap.get(CAP_PROP_FRAME_COUNT));

    // Get width and height of video stream
    int w = int(cap.get(CAP_PROP_FRAME_WIDTH));
    int h = int(cap.get(CAP_PROP_FRAME_HEIGHT));

    // Get frames per second (fps)
    double fps = cap.get(cv::CAP_PROP_FPS);

    Mat curr, curr_gray;
    Mat prev, prev_gray;
    for (int i = 1; i < n_frames - 1; i++)
    {
        // Vector from previous and current feature points
        vector <Point2f> prev_pts, curr_pts;

        // Detect features in previous frame
        goodFeaturesToTrack(prev_gray, prev_pts, 200, 0.01, 30);

        // Read next frame 
        bool success = cap.read(curr);
        if (!success) break;

        // Convert to grayscale
        cvtColor(curr, curr_gray, COLOR_BGR2GRAY);

        // Calculate optical flow (i.e. track feature points)
        vector <uchar> status;
        vector <float> err;
        calcOpticalFlowPyrLK(prev_gray, curr_gray, prev_pts, curr_pts, status, err);

        // Filter only valid points
        auto prev_it = prev_pts.begin();
        auto curr_it = curr_pts.begin();
        for (size_t k = 0; k < status.size(); k++)
        {
            if (status[k])
            {
                prev_it++;
                curr_it++;
            }
            else
            {
                prev_it = prev_pts.erase(prev_it);
                curr_it = curr_pts.erase(curr_it);
            }
        }
    }
}

利用goodFeaturesToTrack()和calcOpticalFlowPyrLK()，这一套组合拳下来，很容实现特征的提取和匹配。然而，这背后涉及了很多算法及大佬们的思想，后续通过阅读opencv的源码，学习大佬们的精髓。

运动估计

2D运动估计的原理就是计算2D变换。2D变换根据自由度的不同可以分为旋转、平移、相似变换、仿射变换以及投影变换（透视变换）等变换。

2D坐标变换中，旋转、相似都属于线性变换，平移则不属于线性变换。但在齐次坐标系下，平移也变成了线性，上述所有变换将可以由一个变换矩阵来表示，即单应性矩阵。原因是，在透视空间中，平移被看作是一个剪切变换。因此，在2D或3D欧几里空间中，平移是一个非线性变换，但在3D或4D齐次坐标下，就变成了一个线性变换。

根据场景的不同，合理选择不同的变换方式。如，在Video Stabilization中，一般选择仿射变换，当然也可以选择投影变换。但是由于在相机运动估计过程中，假设投影中心并未改变，因此选择仿射变换足够。

RANSAC

我们对匹配的特征点，利用最小二乘进行运动估计时，如果存在外点的话（几乎总是有的），得到的结果将会受外点的干扰而不够鲁棒。因此，在进行运动估计时，首先要对外点进行筛选。常用的方法有RANSAC和最小中位方差（LMS）。

RANSAC算法伪代码：

Given:
    data – A set of observations.
    model – A model to explain observed data points.
    n – Minimum number of data points required to estimate model parameters.
    k – Maximum number of iterations allowed in the algorithm.
    t – Threshold value to determine data points that are fit well by model.
    d – Number of close data points required to assert that a model fits well to data.

Return:
    bestFit – model parameters which best fit the data (or null if no good model is found)

iterations = 0
bestFit = null
bestErr = something really large

while iterations < k do
    maybeInliers := n randomly selected values from data
    maybeModel := model parameters fitted to maybeInliers
    alsoInliers := empty set
    for every point in data not in maybeInliers do
        if point fits maybeModel with an error smaller than t
             add point to alsoInliers
    end for
    if the number of elements in alsoInliers is > d then
        // This implies that we may have found a good model
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    increment iterations
end while

return bestFit

对于我们运动估计而言，RANSAC算法中对应变量的含义：

Given Var	in code(as follow)	comment
data	points0，points1	在特征提取和匹配阶段得到的一系列坐标点(x0, y0)和(x1,y1)
model	M	仿射变换（下文将会讲解如何求解）.
n	params.size	即求解仿射变换至少要提供三个坐标点，才能够求解，否则无法提供足够约束求解6个变量
k	niters	最大迭代次数，通过公式可以估计出.
t	params.thresh	确定是否为内点的阈值
d	ninliersMax	记录拟合最多内点个数

代码

enum MotionModel
{
    MM_TRANSLATION = 0,
    MM_TRANSLATION_AND_SCALE = 1,
    MM_ROTATION = 2,
    MM_RIGID = 3,
    MM_SIMILARITY = 4,
    MM_AFFINE = 5,
    MM_HOMOGRAPHY = 6,
    MM_UNKNOWN = 7
};

struct RansacParams
{
    int size; //!< subset size
    float thresh; //!< max error to classify as inlier
    float eps; //!< max outliers ratio
    float prob; //!< probability of success

    RansacParams() : size(0), thresh(0), eps(0), prob(0) {}
    /** @brief Constructor
    @param size Subset size.
    @param thresh Maximum re-projection error value to classify as inlier.
    @param eps Maximum ratio of incorrect correspondences.
    @param prob Required success probability.
     */
    RansacParams(int size, float thresh, float eps, float prob);

    /**
    @return Number of iterations that'll be performed by RANSAC method.
    */
    int niters() const
    {
        return static_cast<int>(
                std::ceil(std::log(1 - prob) / std::log(1 - std::pow(1 - eps, size))));
    }

    /**
    @param model Motion model. 
    @return Default RANSAC method parameters for the given motion model.
    */
    static RansacParams default2dMotion(MotionModel model)
    {
        CV_Assert(model < MM_UNKNOWN);
        if (model == MM_TRANSLATION)
            return RansacParams(1, 0.5f, 0.5f, 0.99f);
        if (model == MM_TRANSLATION_AND_SCALE)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_ROTATION)
            return RansacParams(1, 0.5f, 0.5f, 0.99f);
        if (model == MM_RIGID)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_SIMILARITY)
            return RansacParams(2, 0.5f, 0.5f, 0.99f);
        if (model == MM_AFFINE)
            return RansacParams(3, 0.5f, 0.5f, 0.99f);
        return RansacParams(4, 0.5f, 0.5f, 0.99f);
    }
};

Mat estimateGlobalMotionRansac(
        InputArray points0, InputArray points1, int model, const RansacParams &params,
        float *rmse, int *ninliers)
{
    CV_Assert(model <= MM_AFFINE);
    CV_Assert(points0.type() == points1.type());
    const int npoints = points0.getMat().checkVector(2);
    CV_Assert(points1.getMat().checkVector(2) == npoints);

    if (npoints < params.size)
        return Mat::eye(3, 3, CV_32F);

    const Point2f *points0_ = points0.getMat().ptr<Point2f>();
    const Point2f *points1_ = points1.getMat().ptr<Point2f>();
    const int niters = params.niters(); // 估算推导出的迭代数

    // current hypothesis
    std::vector<int> indices(params.size);
    std::vector<Point2f> subset0(params.size);
    std::vector<Point2f> subset1(params.size);

    // best hypothesis
    std::vector<int> bestIndices(params.size);

    Mat_<float> bestM;
    int ninliersMax = -1;

    RNG rng(0);
    Point2f p0, p1;
    float x, y;

    // 1 use the minimum number of data points required to estimate model parameters to cal model, then record the model corresponding to the max nums of maybeInliers
    for (int iter = 0; iter < niters; ++iter)
    {
        // 1.1 randomly select values
        for (int i = 0; i < params.size; ++i)
        {
            bool ok = false;
            while (!ok)
            {
                ok = true;
                indices[i] = static_cast<unsigned>(rng) % npoints; 
                // The randomly selected data cannot be the same as the previous data. Otherwise, random selection fails
                for (int j = 0; j < i; ++j) 
                    if (indices[i] == indices[j])
                        { ok = false; break; } // if fails, continue to randomly select
            }
        }
        // based the above index, extract the data used to fit the models
        for (int i = 0; i < params.size; ++i)
        {
            subset0[i] = points0_[indices[i]];
            subset1[i] = points1_[indices[i]];
        }

        // 1.2 cal models
        Mat_<float> M = estimateGlobalMotionLeastSquares(subset0, subset1, model, 0);

        // 1.3 evaluate every points based the above model
        // put all the data into the model above, then cal the nums of maybeInliers
        int numinliers = 0;
        for (int i = 0; i < npoints; ++i)
        {
            p0 = points0_[i];
            p1 = points1_[i];
            x = M(0,0)*p0.x + M(0,1)*p0.y + M(0,2);
            y = M(1,0)*p0.x + M(1,1)*p0.y + M(1,2);
            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
                numinliers++;
        }
        // 1.4 record the model corresponding to the max nums of maybeInliers
        if (numinliers >= ninliersMax)
        {
            bestM = M;
            ninliersMax = numinliers;
            bestIndices.swap(indices);
        }
    }

    // 2 after iters, use the bestM to filter maybeInliers from all points, then cal model
    if (ninliersMax < params.size)
    {
        // compute RMSE
        for (int i = 0; i < params.size; ++i)
        {
            subset0[i] = points0_[bestIndices[i]];
            subset1[i] = points1_[bestIndices[i]];
        }
        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
    }
    else
    {
        subset0.resize(ninliersMax);
        subset1.resize(ninliersMax);
        // select maybeInliers from all points
        for (int i = 0, j = 0; i < npoints && j < ninliersMax ; ++i)
        {
            p0 = points0_[i];
            p1 = points1_[i];
            x = bestM(0,0)*p0.x + bestM(0,1)*p0.y + bestM(0,2);
            y = bestM(1,0)*p0.x + bestM(1,1)*p0.y + bestM(1,2);
            if (sqr(x - p1.x) + sqr(y - p1.y) < params.thresh * params.thresh)
            {
                subset0[j] = p0;
                subset1[j] = p1;
                j++;
            }
        }
        // based maybeInliers, cal model
        bestM = estimateGlobalMotionLeastSquares(subset0, subset1, model, rmse);
    }

    if (ninliers)
        *ninliers = ninliersMax;

    return std::move(bestM);
}

最小二乘估计

通过RANSAC过滤掉可能的外点，然后基于剩余的”内点“，进行运动估计。

基于最小二乘的2D运动估计主要步骤如下：

对2D坐标进行各向同性归一化，即normalizePoints()。
构建求解方程（正规方程）
求解正规方程
反归一化

各向同性归一化（Isotropic scaling）

用于2D运动估计的稀疏点集来自于不同的图像。因此，在接下来变换矩阵求解前，对坐标点进行性归一化非常重要。

对于DLT，基本矩阵求解时，归一化非常重要。

主要步骤如下：

The points are translated so that their centroid is at the origin.
The points are then scaled so that the average distance from the origin is equal to √2.( for points(x, y), This means that the “average” point is equal to (1, 1) )
This transformation is applied to each of the two images independently.

对应代码如下：

inline float sqr(float x) { return x * x; }

static Mat normalizePoints(int npoints, Point2f* points)
{
    float cx = 0.f, cy = 0.f; // centroid of points
    for (int i = 0; i < npoints; ++i)
    {
        cx += points[i].x;
        cy += points[i].y;
    }
    cx /= npoints;
    cy /= npoints;

    float d = 0.f; // average distance
    for (int i = 0; i < npoints; ++i)
    {
        points[i].x -= cx;
        points[i].y -= cy;
        d += std::sqrt(sqr(points[i].x) + sqr(points[i].y));
    }
    d /= npoints;

    float s = std::sqrt(2.f) / d;
    for (int i = 0; i < npoints; ++i)
    {
        points[i].x *= s;
        points[i].y *= s;
    }
	// T matrix for normalization
    Mat_<float> T = Mat::eye(3, 3, CV_32F);
    T(0, 0) = T(1, 1) = s;
    T(0, 2) = -cx * s;
    T(1, 2) = -cy * s;
    return std::move(T);
}

构建求解方程

对于仿射变换有：
在这里插入图片描述

展开上述式子，将系数a提取出来，即，
在这里插入图片描述

除了6个未知数，其他都是来自坐标点。因此，只要提供3组及以上的坐标点，就可以求解上述方程

// A*x = b
Mat_<float> A(2 * npoints, 3), b(2 * npoints, 1);
float* a0, * a1;
Point2f p0, p1;

for (int i = 0; i < npoints; ++i)
{
    a0 = A[2 * i];
    a1 = A[2 * i + 1];
    p0 = points0[i];
    p1 = points1[i];
    a0[0] = p0.x; a0[1] = 1; a0[2] = 0;
    a1[0] = p0.y; a1[1] = 0; a1[2] = 1;
    b(2 * i, 0) = p1.x;
    b(2 * i + 1, 0) = p1.y;
}

求解正规方程

 Mat_<float> sol;
solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);

if (rmse)
    *rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));

Mat_<float> M = Mat::eye(3, 3, CV_32F);
for (int i = 0, k = 0; i < 2; ++i)
    for (int j = 0; j < 3; ++j, ++k)
        M(i, j) = sol(k, 0);

反归一化

由于求解上述变换矩阵，是在归一化的坐标上进行的，不能直接对像素坐标进行变换。因此，需要反归一化，得到对像素坐标进行变换的变换矩阵。

return T1.inv() * M * T0;

代码

整个仿射变换求解的代码如下：

static Mat estimateGlobMotionLeastSquaresAffine(
    int npoints, Point2f* points0, Point2f* points1, float* rmse)
{
    Mat_<float> T0 = normalizePoints(npoints, points0);
    Mat_<float> T1 = normalizePoints(npoints, points1);

    Mat_<float> A(2 * npoints, 6), b(2 * npoints, 1);
    float* a0, * a1;
    Point2f p0, p1;

    for (int i = 0; i < npoints; ++i)
    {
        a0 = A[2 * i];
        a1 = A[2 * i + 1];
        p0 = points0[i];
        p1 = points1[i];

        a0[0] = p0.x; a0[1] = p0.y; a0[2] = 1; a0[3] = a0[4] = a0[5] = 0;
        a1[0] = a1[1] = a1[2] = 0; a1[3] = p0.x; a1[4] = p0.y; a1[5] = 1;
        b(2 * i, 0) = p1.x;
        b(2 * i + 1, 0) = p1.y;
    }

    Mat_<float> sol;
    solve(A, b, sol, DECOMP_NORMAL | DECOMP_LU);

    if (rmse)
        *rmse = static_cast<float>(norm(A * sol, b, NORM_L2) / std::sqrt(static_cast<double>(npoints)));

    Mat_<float> M = Mat::eye(3, 3, CV_32F);
    for (int i = 0, k = 0; i < 2; ++i)
        for (int j = 0; j < 3; ++j, ++k)
            M(i, j) = sol(k, 0);

    return T1.inv() * M * T0;
}