高斯牛顿法,LM法

Gauss–Newton algorithm

应用场景:The Gauss–Newton algorithm is used to solve non-linear least squares problems


高斯牛顿法的推导来自于牛顿法,Wiki的数学推导确实很清楚:

Derivation from Newton's method



Levenberg–Marquardt algorithm



(xi,yi)是一组观测数据,xi, yi是实数;β是我们要优化的变量,假设为n*1维:增量δ(向量)通过下式确定:

The sum of squares {\displaystyle S(\beta )}S(\beta ) at its minimum has a zero gradient with respect to β. The above first-order approximation of {\displaystyle f(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }})}{\displaystyle f(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }})} gives

{\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})\approx \sum _{i=1}^{m}\left(y_{i}-f(x_{i},{\boldsymbol {\beta }})-J_{i}{\boldsymbol {\delta }}\right)^{2},}{\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})\approx \sum _{i=1}^{m}\left(y_{i}-f(x_{i},{\boldsymbol {\beta }})-J_{i}{\boldsymbol {\delta }}\right)^{2},}

or in vector notation,

{\displaystyle {\begin{aligned}S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})&\approx \|\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}\|^{2}\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}]\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]-[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}\mathbf {J} {\boldsymbol {\delta }}-(\mathbf {J} {\boldsymbol {\delta }})^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]+{\boldsymbol {\delta }}^{T}\mathbf {J} ^{T}\mathbf {J} {\boldsymbol {\delta }}\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]-2[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}\mathbf {J} {\boldsymbol {\delta }}+{\boldsymbol {\delta }}^{T}\mathbf {J} ^{T}\mathbf {J} {\boldsymbol {\delta }}.\end{aligned}}}{\displaystyle {\begin{aligned}S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})&\approx \|\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}\|^{2}\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})-\mathbf {J} {\boldsymbol {\delta }}]\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]-[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}\mathbf {J} {\boldsymbol {\delta }}-(\mathbf {J} {\boldsymbol {\delta }})^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]+{\boldsymbol {\delta }}^{T}\mathbf {J} ^{T}\mathbf {J} {\boldsymbol {\delta }}\\&=[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]-2[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})]^{T}\mathbf {J} {\boldsymbol {\delta }}+{\boldsymbol {\delta }}^{T}\mathbf {J} ^{T}\mathbf {J} {\boldsymbol {\delta }}.\end{aligned}}}

Taking the derivative of {\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})}{\displaystyle S({\boldsymbol {\beta }}+{\boldsymbol {\delta }})} with respect to δ and setting the result to zero gives

{\displaystyle (\mathbf {J} ^{T}\mathbf {J} ){\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -f({\boldsymbol {\beta }})],}{\displaystyle (\mathbf {J} ^{T}\mathbf {J} ){\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -f({\boldsymbol {\beta }})],}   实数对向量求导

where {\displaystyle \mathbf {J} }\mathbf {J}  is the Jacobian matrix,(m*n维) whose i-th row equals J_{i}, and where \mathbf f (m*1维)and \mathbf {y}  (m*1维)are vectors with i-th component {\displaystyle f(x_{i},{\boldsymbol {\beta }})} and y_{i} respectively. This is a set of linear equations, which can be solved for δ.



从这看到增量的求解是这样的过程:

目标函数是S(\beta ), 对自变量  β 产生一个增量δ,新的目标函数S(β+δ)的近似表达式 增量 δ求导,令导数为0得到的。

这么求增量 β的原因是不是这么理解:

假设 β ^是使得S(\beta )最小的值,如何求得最优解  β ^呢?  换句话说,在初值β0的基础上,如何确定一个增量δ,使得S(\beta )最小?

逆向考虑:假设我们已经得到了这个增量δ,那必然 S’(β0+δ) =0。

现在的问题是如何求δ,即 δ是未知量。 而 δ又满足  S’(β0+δ) =0,要求 δ,那只有S(β0+δ)对 δ求导,令其导数为0得到




Levenberg's contribution is to replace this equation by a "damped version",

{\displaystyle (\mathbf {J} ^{T}\mathbf {J} +\lambda \mathbf {I} ){\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})],}{\displaystyle (\mathbf {J} ^{T}\mathbf {J} +\lambda \mathbf {I} ){\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})],}

where I is the identity matrix, giving as the increment δ to the estimated parameter vector β.



猜测可能需要计算   || S(β+δ)-S(β) ||  是否超过某一个阈值. If 超过阈值,减小λ ;else,增大λ

The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid, a smaller value can be used, bringing the algorithm closer to the Gauss–Newton algorithm, whereas if an iteration gives insufficient reduction in the residual, λ can be increased, giving a step closer to the gradient-descent direction. 

 If either the length of the calculated step δ or the reduction of sum of squares from the latest parameter vector β + δ fall below predefined limits, iteration stops, and the last parameter vector β is considered to be the solution.



Levenberg's algorithm has the disadvantage thatif the value of damping factor λ is large, inverting JTJ + λI is not used at all. Marquardt provided the insight that we can scale each component of the gradient according to the curvature(曲率), so that there is larger movement along the directions where the gradient is smaller. This avoids slow convergence in the direction of small gradient. Therefore, Marquardt replaced the identity matrix I with the diagonal matrix consisting of the diagonal elements of JTJ, resulting in the Levenberg–Marquardt algorithm:

{\displaystyle [\mathbf {J} ^{T}\mathbf {J} +\lambda \operatorname {diag} (\mathbf {J} ^{T}\mathbf {J} )]{\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})].}{\displaystyle [\mathbf {J} ^{T}\mathbf {J} +\lambda \operatorname {diag} (\mathbf {J} ^{T}\mathbf {J} )]{\boldsymbol {\delta }}=\mathbf {J} ^{T}[\mathbf {y} -\mathbf {f} ({\boldsymbol {\beta }})].}

L-M方法的一些实现库,全部的见wiki


The GNU Scientific Library has a C interface to MINPACK.

C/C++ Minpack includes the Levenberg–Marquardt algorithm

levmar is an implementation in C/C++ with support for constraints

sparseLM is a C implementation aimed at minimizing functions with large, arbitrarily sparse Jacobians. Includes a MATLAB MEX interface

InMin library contains a C++ implementation of the algorithm based on the eigen C++ linear algebra library. It has a pure C-language API as well as a Python binding

ceres is a non-linear minimisation library with an implementation of the Levenberg–Marquardt algorithm. It is written in C++ and uses eigen

ALGLIB has implementations of improved LMA in C# / C++ / Delphi / Visual Basic. Improved algorithm takes less time to converge and can use either Jacobian or exact Hessian.




  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值