版权声明:Davidwang原创文章,严禁用于任何商业途径,授权后方可转载。
在SLAM优化算法中,最速下降法由于下降路线呈垂直Z之形方式,其在离目标值较远时下降速度快,越目标值越近收敛速度越慢;牛顿法由于要求解二阶海森矩阵(Hessian),导致在数据量较大维度较高时计算量大幅度增加。因此,在实际使用中,我们更倾向于使用高斯牛顿法和列文伯格-马夸尔特法(还有DogLeg法等)。
本节不尝试推导高斯牛顿法(推导也很简单),只使用其结论,即增量方程:
J
(
x
)
J
T
(
x
)
Δ
x
=
−
J
(
x
)
f
(
x
)
J(x)J^T(x)\Delta x=-J(x)f(x)
J(x)JT(x)Δx=−J(x)f(x)
该式可简写为:
H
Δ
x
=
b
H \Delta x=b
HΔx=b
本节我们对高斯牛顿法实际使用进行示例,假设有若干观测数据N,这些观测数据带有高斯分布的噪声值,现在要求选定一个函数模型拟合这些观测数据,设选择的优化模型函数为:
f
(
x
)
=
a
x
+
e
x
p
(
b
x
+
c
)
f(x) = ax + exp(bx + c)
f(x)=ax+exp(bx+c)
所以误差值函数为:
e
i
=
y
i
−
(
a
x
i
+
e
x
p
(
b
x
i
+
c
)
)
e_i = y_i -( ax_i + exp(bx_i + c))
ei=yi−(axi+exp(bxi+c))
则每个误差分量与状态变量的导数为:
∂ e i ∂ a = x \frac{\partial e_i}{\partial a} = x ∂a∂ei=x
∂ e i ∂ b = x ⋅ e x p ( b x + c ) \frac{\partial e_i}{\partial b} = x \cdot exp(bx+c) ∂b∂ei=x⋅exp(bx+c)
∂ e i ∂ c = e x p ( b x + c ) \frac{\partial e_i}{\partial c} = exp(bx+c) ∂c∂ei=exp(bx+c)
于是每一项的雅克比(Jacobian)为:
J
i
=
[
∂
e
i
∂
a
,
∂
e
i
∂
b
,
∂
e
i
∂
c
]
J_i = \begin{bmatrix} \frac{\partial e_i}{\partial a} , & \frac{\partial e_i}{\partial b} , & \frac{\partial e_i}{\partial c} \end{bmatrix}
Ji=[∂a∂ei,∂b∂ei,∂c∂ei]
因此,对若干观察数据N,总的高斯牛顿增量方程为:
( ∑ i = 1 N J i ⋅ J i T ) Δ x k = ∑ i = 1 N ( − J i ⋅ e i ) \left ( \sum_{i=1}^{N}{J_i}^{} \cdot {J_i}^{T} \right )\Delta x_k = \sum_{i=1}^{N}(-{J_i}^{} \cdot e_i) (i=1∑NJi⋅JiT)Δxk=i=1∑N(−Ji⋅ei)
在得到增量公式之后,我们可以解出 Δ x k \Delta x_k Δxk,然后就可以迭代优化了。在本示例具体操作过程中,我们利用真实参数生成N个真值数据,当然,这N个真值数据都加了高斯噪声,然后利用高斯牛顿法进行优化,详细的代码如下:
/*
* Gauss-Newton iteration method
* author:Davidwang
* date :2020.08.24
*/
#include <iostream>
#include <chrono>
#include <opencv2/opencv.hpp>
#include <Eigen/Core>
#include <Eigen/Dense>
using namespace std;
using namespace Eigen;
void GN(const int, vector<double> &, vector<double> &, double &, double &, double &, double const);
int main(int argc, char **argv)
{
double ar = 18.0, br = 2.0, cr = 1.0; // 真实参数值
double ae = 2.0, be = 4.0, ce = 3.0; // 估计参数值
int N = 50; // 数据点
double w_sigma = 1.0; // 噪声Sigma值
double inv_sigma = 1.0 / w_sigma; // 信息值
cv::RNG rng; // OpenCV随机数产生器
vector<double> x_data, y_data; // 生成真值数据
for (int i = 0; i < N; i++)
{
double x = i / 100.0;
x_data.push_back(x);
y_data.push_back(ar * x + exp(br * x + cr) + rng.gaussian(w_sigma * w_sigma));
}
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
GN(N, x_data, y_data, ae, be, ce, inv_sigma);
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_used = chrono::duration_cast<chrono::duration<double>>(t2 - t1);
cout << "solve time cost = " << time_used.count() << " seconds. " << endl;
cout << "estimated abc = " << ae << ", " << be << ", " << ce << endl;
return 0;
}
///高斯牛顿法,N数据个数,x:X值,y:Y值,ae:a估计值,be:b估计值,ce:c估计值,inv_sigma:信息值(1/σ)
void GN(const int N, vector<double> &x, vector<double> &y, double &ae, double &be, double &ce, double const inv_sigma)
{
int iterations = 50; // 迭代次数
double cost = 0, lastCost = 0; // 本次迭代的cost和上一次迭代的cost
double xi, yi, error, e;
for (int iter = 0; iter < iterations; iter++)
{
Matrix3d H = Matrix3d::Zero();
Vector3d b = Vector3d::Zero();
cost = 0;
for (int i = 0; i < N; i++)
{
xi = x[i], yi = y[i]; // 第i个数据点
e = ae * xi + exp(be * xi + ce); // 计算估计值
error = yi - e; // 误差
Vector3d J; // 雅可比矩阵
J[0] = -xi; // de/da
J[1] = -xi * exp(be * xi + ce); // de/db
J[2] = -exp(be * xi + ce); // de/dd
H += inv_sigma * inv_sigma * J * J.transpose(); //H = J^T * W^{-1} * J,inv_sigma 为信息值,本示例可以去掉
b += -inv_sigma * inv_sigma * error * J; //b= -W^{-1} * f(x)*J ,inv_sigma 为信息值,本示例可以去掉
cost += error * error;
cout << "The " << iter + 1 << " iteration, The " << i << " factor " << endl;
cout << "THe Value, x: " << xi << ",y:" << yi << ",e:" << e << ",error:" << error << endl
<< endl;
}
Vector3d dx = H.ldlt().solve(b); // 求解线性方程 HΔx=b
if (isnan(dx[0]))
{
cout << "result is nan!" << endl;
break;
}
if (iter > 0 && cost >= lastCost)
{
cout << "cost: " << cost << ">= last cost: " << lastCost << ", break." << endl;
break;
}
ae += dx[0];
be += dx[1];
ce += dx[2];
lastCost = cost;
cout << "total cost: " << cost << ", \t\tupdate: " << dx.transpose() << "\t\testimated params: " << ae << "," << be << "," << ce << endl;
}
}
在代码中,我们使用了ldlt()矩阵分解方法,该方法分解矩阵要求系数矩阵是对称矩阵,根据公式
H
=
J
(
x
)
J
T
(
x
)
H=J(x) J^T(x)
H=J(x)JT(x),可以简单证明如下:
(
J
J
T
)
T
=
(
J
T
)
T
J
T
=
J
J
T
\left ( JJ^T \right )^T = \left ( J^T \right )^TJ^T = JJ^T
(JJT)T=(JT)TJT=JJT
CMakeLists如下:
cmake_minimum_required(VERSION 2.8)
project(gaussNewton)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_CXX_FLAGS "-std=c++14 -O3")
list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake)
# OpenCV
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# Eigen
include_directories("/usr/include/eigen3")
add_executable(GN gaussNewton_GN.cpp)
target_link_libraries(GN ${OpenCV_LIBS})
程序在ubuntu 18.04,OpenCV3.4.11环境下编译通过,高斯牛顿法迭代11次,ae=18.5264、be=2.09878、ce=0.927444,耗时:0.0130971s
另外,经过实验,我们发现,初始值(ae,ab,ac)的选择很重要,初始值不同时,可能会出现函数不收敛的问题,甚至是发散,这就是因为高斯牛顿法所求的H矩阵可能为奇异矩阵的或者是病态的,增量稳定性差,导致算法不收敛,即使假设所求的H矩阵为非奇异矩阵也非病态,如果步长过大,也会导致局部近似不准确,可能出现不收敛甚至发散的问题。
参考文献
1、高翔《视觉SLAM十四讲》