高斯-牛顿法

最新推荐文章于 2024-11-26 14:00:00 发布

Nightmare004

最新推荐文章于 2024-11-26 14:00:00 发布

阅读量927

点赞数

分类专栏：数学文章标签：线性代数矩阵

本文链接：https://blog.csdn.net/qq_39942341/article/details/122781891

版权

高斯-牛顿法最小二乘雅可比矩阵迭代优化回溯线搜索

关键词由CSDN通过智能技术生成

数学专栏收录该内容

144 篇文章

订阅专栏

Gauss-Newton Method
考虑
$\min _{\mathbf{x} \in \mathbb{R}^{n}}\left\{g(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(f_{i}(\mathbf{x})-c_{i}\right)^{2}\right\}$
假设 $f_1,\cdots,f_m$ 在 $\mathbb{R}^n$ 上连续可微
$c_1,\cdots,c_m\in\mathbb{R}$
有时候会向量化
$F\left(\mathbf{x}\right)=\begin{pmatrix} f_1\left(\mathbf{x}\right)-c_1\\ f_2\left(\mathbf{x}\right)-c_2\\ \vdots\\ f_m\left(\mathbf{x}\right)-c_m\\ \end{pmatrix}$
于是问题就是
$\min \|F\left(\mathbf{x}\right)\|^2$
每一次迭代
$\mathbf{x}_{k+1}=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left\{\sum_{i=1}^{m}\left[f_i\left(\mathbf{x}_{k}\right)+\nabla f_i\left(\mathbf{x}_k\right)^T\left(\mathbf{x}-\mathbf{x}_{k}\right)-c_i\right]^2\right\}$
这个问题可以转化为
$\min_{\mathbf{x}\in\mathbb{R}^n}\|\mathbf{A}_{k}\mathbf{x}-\mathbf{b}_{k}\|^2$
其中
$\mathbf{A}_{k}=\left(\begin{array}{c} \nabla f_{1}\left(\mathbf{x}_{k}\right)^{T} \\ \nabla f_{2}\left(\mathbf{x}_{k}\right)^{T} \\ \vdots \\ \nabla f_{m}\left(\mathbf{x}_{k}\right)^{T} \end{array}\right)=J\left(\mathbf{x}_{k}\right)$
也叫做雅可比矩阵
$\mathbf{b}_{k}=\left(\begin{array}{c} \nabla f_{1}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{1}\left(\mathbf{x}_{k}\right)+c_{1} \\ \nabla f_{2}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{2}\left(\mathbf{x}_{k}\right)+c_{2} \\ \vdots \\ \nabla f_{m}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{m}\left(\mathbf{x}_{k}\right)+c_{m} \end{array}\right)=J\left(\mathbf{x}_{k}\right) \mathbf{x}_{k}-F\left(\mathbf{x}_{k}\right)$
然后最小二乘法一下
$\begin{aligned} \mathbf{x}_{k+1}&=\left(J\left(\mathbf{x}_{k}\right)^TJ\left(\mathbf{x}_{k}\right)\right)^{-1}J\left(\mathbf{x}_{k}\right)\mathbf{b}_k\\ &=\left(J\left(\mathbf{x}_{k}\right)^TJ\left(\mathbf{x}_{k}\right)\right)^{-1}J\left(\mathbf{x}_{k}\right)\left(J\left(\mathbf{x}_{k}\right) \mathbf{x}_{k}-F\left(\mathbf{x}_{k}\right)\right)\\ &=\mathbf{x}_{k}-\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right) \end{aligned}$
于是下降方向 $\mathbf{d}_{k}=\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right)$
注意到 $\nabla g\left(\mathbf{x}\right)=2J\left(\mathbf{x}\right)^T F\left(\mathbf{x}\right)$
$\mathbf{d}_{k}=\frac{1}{2}\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla g\left(\mathbf{x}_{k}\right)$
这个迭代时没有步长的，如果用了步长就是阻尼高斯-牛顿法（Damped Gauss-Newton Method）

具体步骤：
输入： $\epsilon>0$
初始化：选择任意 $\mathbf{x}_0\in\mathbb{R}^n$
步骤：
(a) $\mathbf{d}_{k}=\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right)$
(b)选择步长 $t_k$
$h\left(t\right)=g\left(\mathbf{x}_k-t\mathbf{d}_k\right)$
$(c)$ $\mathbf{x}_{k+1}=\mathbf{x}_k-t_k\mathbf{d}_k$
(d)如果 $\|\nabla g\left(\mathbf{x}_{k+1}\right)\|\le \epsilon$ ，就停止，并输出 $\mathbf{x}_{k+1}$

代码：这里步长用回溯法选的

function [x,fun_val]=damped_Gauss_Newtow(g,grad,J,F,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% g ......... objective function
% grad ...... gradient of the objective function
% J ......... Jacobian matrix
% F ......... vector-valued function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
%             at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
iter=0;
while (norm(gval)>epsilon&&(iter<10000))
    iter=iter+1;
    t=s;
    while (fun_val-g(x-t*d)<alpha*t*norm(d)^2)
        t=beta*t;
    end
    x=x-t*d;
    J_val=J(x);
    F_val=F(x);
    d=(J_val'*J_val)\(J_val'*F_val);
    fun_val=g(x);
    gval=grad(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(gval),fun_val);
end
if (iter==10000)
    fprintf('did not converge\n')
end