Gauss-Newton Method
考虑
min
x
∈
R
n
{
g
(
x
)
≡
∑
i
=
1
m
(
f
i
(
x
)
−
c
i
)
2
}
\min _{\mathbf{x} \in \mathbb{R}^{n}}\left\{g(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(f_{i}(\mathbf{x})-c_{i}\right)^{2}\right\}
x∈Rnmin{g(x)≡i=1∑m(fi(x)−ci)2}
假设
f
1
,
⋯
,
f
m
f_1,\cdots,f_m
f1,⋯,fm在
R
n
\mathbb{R}^n
Rn上连续可微
c
1
,
⋯
,
c
m
∈
R
c_1,\cdots,c_m\in\mathbb{R}
c1,⋯,cm∈R
有时候会向量化
F
(
x
)
=
(
f
1
(
x
)
−
c
1
f
2
(
x
)
−
c
2
⋮
f
m
(
x
)
−
c
m
)
F\left(\mathbf{x}\right)=\begin{pmatrix} f_1\left(\mathbf{x}\right)-c_1\\ f_2\left(\mathbf{x}\right)-c_2\\ \vdots\\ f_m\left(\mathbf{x}\right)-c_m\\ \end{pmatrix}
F(x)=⎝⎜⎜⎜⎛f1(x)−c1f2(x)−c2⋮fm(x)−cm⎠⎟⎟⎟⎞
于是问题就是
min
∥
F
(
x
)
∥
2
\min \|F\left(\mathbf{x}\right)\|^2
min∥F(x)∥2
每一次迭代
x
k
+
1
=
arg
min
x
∈
R
n
{
∑
i
=
1
m
[
f
i
(
x
k
)
+
∇
f
i
(
x
k
)
T
(
x
−
x
k
)
−
c
i
]
2
}
\mathbf{x}_{k+1}=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left\{\sum_{i=1}^{m}\left[f_i\left(\mathbf{x}_{k}\right)+\nabla f_i\left(\mathbf{x}_k\right)^T\left(\mathbf{x}-\mathbf{x}_{k}\right)-c_i\right]^2\right\}
xk+1=argx∈Rnmin{i=1∑m[fi(xk)+∇fi(xk)T(x−xk)−ci]2}
这个问题可以转化为
min
x
∈
R
n
∥
A
k
x
−
b
k
∥
2
\min_{\mathbf{x}\in\mathbb{R}^n}\|\mathbf{A}_{k}\mathbf{x}-\mathbf{b}_{k}\|^2
x∈Rnmin∥Akx−bk∥2
其中
A
k
=
(
∇
f
1
(
x
k
)
T
∇
f
2
(
x
k
)
T
⋮
∇
f
m
(
x
k
)
T
)
=
J
(
x
k
)
\mathbf{A}_{k}=\left(\begin{array}{c} \nabla f_{1}\left(\mathbf{x}_{k}\right)^{T} \\ \nabla f_{2}\left(\mathbf{x}_{k}\right)^{T} \\ \vdots \\ \nabla f_{m}\left(\mathbf{x}_{k}\right)^{T} \end{array}\right)=J\left(\mathbf{x}_{k}\right)
Ak=⎝⎜⎜⎜⎛∇f1(xk)T∇f2(xk)T⋮∇fm(xk)T⎠⎟⎟⎟⎞=J(xk)
也叫做雅可比矩阵
b
k
=
(
∇
f
1
(
x
k
)
T
x
k
−
f
1
(
x
k
)
+
c
1
∇
f
2
(
x
k
)
T
x
k
−
f
2
(
x
k
)
+
c
2
⋮
∇
f
m
(
x
k
)
T
x
k
−
f
m
(
x
k
)
+
c
m
)
=
J
(
x
k
)
x
k
−
F
(
x
k
)
\mathbf{b}_{k}=\left(\begin{array}{c} \nabla f_{1}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{1}\left(\mathbf{x}_{k}\right)+c_{1} \\ \nabla f_{2}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{2}\left(\mathbf{x}_{k}\right)+c_{2} \\ \vdots \\ \nabla f_{m}\left(\mathbf{x}_{k}\right)^{T} \mathbf{x}_{k}-f_{m}\left(\mathbf{x}_{k}\right)+c_{m} \end{array}\right)=J\left(\mathbf{x}_{k}\right) \mathbf{x}_{k}-F\left(\mathbf{x}_{k}\right)
bk=⎝⎜⎜⎜⎛∇f1(xk)Txk−f1(xk)+c1∇f2(xk)Txk−f2(xk)+c2⋮∇fm(xk)Txk−fm(xk)+cm⎠⎟⎟⎟⎞=J(xk)xk−F(xk)
然后最小二乘法一下
x
k
+
1
=
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
J
(
x
k
)
b
k
=
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
J
(
x
k
)
(
J
(
x
k
)
x
k
−
F
(
x
k
)
)
=
x
k
−
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
J
(
x
k
)
T
F
(
x
k
)
\begin{aligned} \mathbf{x}_{k+1}&=\left(J\left(\mathbf{x}_{k}\right)^TJ\left(\mathbf{x}_{k}\right)\right)^{-1}J\left(\mathbf{x}_{k}\right)\mathbf{b}_k\\ &=\left(J\left(\mathbf{x}_{k}\right)^TJ\left(\mathbf{x}_{k}\right)\right)^{-1}J\left(\mathbf{x}_{k}\right)\left(J\left(\mathbf{x}_{k}\right) \mathbf{x}_{k}-F\left(\mathbf{x}_{k}\right)\right)\\ &=\mathbf{x}_{k}-\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right) \end{aligned}
xk+1=(J(xk)TJ(xk))−1J(xk)bk=(J(xk)TJ(xk))−1J(xk)(J(xk)xk−F(xk))=xk−(J(xk)TJ(xk))−1J(xk)TF(xk)
于是下降方向
d
k
=
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
J
(
x
k
)
T
F
(
x
k
)
\mathbf{d}_{k}=\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right)
dk=(J(xk)TJ(xk))−1J(xk)TF(xk)
注意到
∇
g
(
x
)
=
2
J
(
x
)
T
F
(
x
)
\nabla g\left(\mathbf{x}\right)=2J\left(\mathbf{x}\right)^T F\left(\mathbf{x}\right)
∇g(x)=2J(x)TF(x)
d
k
=
1
2
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
∇
g
(
x
k
)
\mathbf{d}_{k}=\frac{1}{2}\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla g\left(\mathbf{x}_{k}\right)
dk=21(J(xk)TJ(xk))−1∇g(xk)
这个迭代时没有步长的,如果用了步长就是阻尼高斯-牛顿法(Damped Gauss-Newton Method)
具体步骤:
输入:
ϵ
>
0
\epsilon>0
ϵ>0
初始化:选择任意
x
0
∈
R
n
\mathbf{x}_0\in\mathbb{R}^n
x0∈Rn
步骤:
(a)
d
k
=
(
J
(
x
k
)
T
J
(
x
k
)
)
−
1
J
(
x
k
)
T
F
(
x
k
)
\mathbf{d}_{k}=\left(J\left(\mathbf{x}_{k}\right)^{T} J\left(\mathbf{x}_{k}\right)\right)^{-1} J\left(\mathbf{x}_{k}\right)^{T} F\left(\mathbf{x}_{k}\right)
dk=(J(xk)TJ(xk))−1J(xk)TF(xk)
(b)选择步长
t
k
t_k
tk
h
(
t
)
=
g
(
x
k
−
t
d
k
)
h\left(t\right)=g\left(\mathbf{x}_k-t\mathbf{d}_k\right)
h(t)=g(xk−tdk)
(
c
)
(c)
(c)
x
k
+
1
=
x
k
−
t
k
d
k
\mathbf{x}_{k+1}=\mathbf{x}_k-t_k\mathbf{d}_k
xk+1=xk−tkdk
(d)如果
∥
∇
g
(
x
k
+
1
)
∥
≤
ϵ
\|\nabla g\left(\mathbf{x}_{k+1}\right)\|\le \epsilon
∥∇g(xk+1)∥≤ϵ,就停止,并输出
x
k
+
1
\mathbf{x}_{k+1}
xk+1
代码:这里步长用回溯法选的
function [x,fun_val]=damped_Gauss_Newtow(g,grad,J,F,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% g ......... objective function
% grad ...... gradient of the objective function
% J ......... Jacobian matrix
% F ......... vector-valued function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
iter=0;
while (norm(gval)>epsilon&&(iter<10000))
iter=iter+1;
t=s;
while (fun_val-g(x-t*d)<alpha*t*norm(d)^2)
t=beta*t;
end
x=x-t*d;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(gval),fun_val);
end
if (iter==10000)
fprintf('did not converge\n')
end