最小二乘问题
min
f
(
x
)
=
1
2
∑
i
=
1
m
r
i
2
(
x
)
=
1
2
r
(
x
)
T
r
(
x
)
x
∈
R
n
,
m
⩾
n
(1)
\min f\left( x\right) =\dfrac{1}{2}\sum ^{m}_{i=1}r^{2}_i\left( x\right) =\dfrac{1}{2}r\left( x\right) ^{T}r\left( x\right)\quad x\in \mathbb{R} ^{n},m\geqslant n\tag{1}
minf(x)=21i=1∑mri2(x)=21r(x)Tr(x)x∈Rn,m⩾n(1)
这里
r
(
x
)
=
(
r
1
(
x
)
,
r
2
(
x
)
,
⋯
,
r
m
(
x
)
)
T
r\left( x\right) =\left( r_{1}\left( x\right) ,r_{2}\left( x\right) ,\cdots ,r_{m}\left( x\right) \right) ^{T}
r(x)=(r1(x),r2(x),⋯,rm(x))T称为剩余函数,点
x
x
x 处剩余函数的值称为剩余量。若
r
i
(
x
)
r_i(x)
ri(x)均为线性函数,则问题(1)为线性最小二乘问题,若至少有一个
r
i
(
x
)
r_i(x)
ri(x)为非线性函数,则问题(1) 为非线性最小二乘问题。
f(x)的导数
设
J
(
x
)
J(x)
J(x)为
r
(
x
)
r(x)
r(x)的Jacobian矩阵
J
(
x
)
=
∂
r
∂
x
=
[
∇
r
1
(
x
)
,
…
,
∇
r
m
(
x
)
]
T
∈
R
m
×
n
(2)
J\left( x\right) =\dfrac{\partial r}{\partial x}=\left[ \nabla r_{1}\left( x\right) ,\ldots ,\nabla r_{m}\left( x\right) \right] ^{T}\in \mathbb{R} ^{m\times n} \tag{2}
J(x)=∂x∂r=[∇r1(x),…,∇rm(x)]T∈Rm×n(2)
则
f
(
x
)
f(x)
f(x)的梯度为
g
(
x
)
=
∇
f
(
x
)
=
∑
i
=
1
m
r
i
(
x
)
∇
r
i
(
x
)
=
J
T
(
x
)
r
(
x
)
(3)
g\left( x\right) =\nabla f\left( x\right) =\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla r_{i}\left( x\right) =J^{T}\left( x\right) r\left( x\right) \tag{3}
g(x)=∇f(x)=i=1∑mri(x)∇ri(x)=JT(x)r(x)(3)
f
(
x
)
f(x)
f(x)的Hesse矩阵为
G
(
x
)
=
∇
2
f
(
x
)
=
∑
i
=
1
m
∇
r
i
(
x
)
∇
r
i
(
x
)
T
+
∑
i
=
1
m
r
i
(
x
)
∇
2
r
i
(
x
)
=
J
T
(
x
)
J
(
x
)
+
S
(
x
)
(4)
\begin{aligned} G\left( x\right) &=\nabla ^{2}f\left( x\right) =\sum ^{m}_{i=1}\nabla r_{i}\left( x\right) \nabla r_{i}\left( x\right) ^{T}+\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla ^{2}r_{i}\left( x\right) \\ &=J^{T}\left( x\right) J\left( x\right) +S\left( x\right) \end{aligned}\tag{4}
G(x)=∇2f(x)=i=1∑m∇ri(x)∇ri(x)T+i=1∑mri(x)∇2ri(x)=JT(x)J(x)+S(x)(4)
其中
S
(
x
)
=
∑
i
=
1
m
r
i
(
x
)
∇
2
r
i
(
x
)
(5)
S(x)=\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla ^{2}r_{i}\left( x\right) \tag{5}
S(x)=i=1∑mri(x)∇2ri(x)(5)
为便于讨论,我们采用以下记号:
J
∗
=
J
(
x
∗
)
,
J
k
=
J
(
x
k
)
S
∗
=
S
(
x
∗
)
,
S
k
=
S
(
x
k
)
J^{\ast}=J(x^{\ast}),\quad J_k=J(x_k) \\ S^{\ast}=S(x^{\ast}),\quad S_k=S(x_k)
J∗=J(x∗),Jk=J(xk)S∗=S(x∗),Sk=S(xk)
最小二乘问题的分类
在点 x ∗ x^{\ast} x∗处, ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥的大小取决于剩余量与问题的非线性程度,对零剩余或线性最小二乘问题, ∥ S ∗ ∥ = 0 \Vert S^{\ast}\Vert=0 ∥S∗∥=0,随着剩余量的增大或 e i ( x ) ( i = 1 , ⋯ , m ) e_i(x)(i=1,\cdots,m) ei(x)(i=1,⋯,m)的非线性程度的增强, ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥的值变大。根据问题的这种特点,将算法分为小剩余算法和大剩余算法。小剩余算法处理 ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥为零或不太大的问题,大剩余算法处理 ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥较大的问题。
Newton法解最小二乘问题
f
(
x
)
=
f
(
x
k
)
+
∇
f
(
x
k
)
T
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
∇
2
f
(
x
k
)
(
x
−
x
k
)
+
O
(
∥
x
−
x
k
∥
2
)
f\left( x\right) =f\left( x_{k}\right) +\nabla f\left( x_{k}\right) ^{T}\left( x-x_{k}\right) +\dfrac{1}{2}\left( x-x_{k}\right) ^{T}\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right) +O\left( \left\| x-x_{k}\right\| ^{2}\right)
f(x)=f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)+O(∥x−xk∥2)
使用二阶泰勒展开进行局部近似,这是一个二次型
q
(
x
)
=
f
(
x
k
)
+
∇
f
(
x
k
)
T
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
∇
2
f
(
x
k
)
(
x
−
x
k
)
q\left( x\right) =f\left( x_{k}\right) +\nabla f\left( x_{k}\right) ^{T}\left( x-x_{k}\right) +\dfrac{1}{2}\left( x-x_{k}\right) ^{T}\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right)
q(x)=f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)
二次型的极值可以通过令导数为0求得
q
′
(
x
)
=
∇
f
(
x
k
)
+
∇
2
f
(
x
k
)
(
x
−
x
k
)
=
0
q'\left( x\right) =\nabla f\left( x_{k}\right) +\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right)=0
q′(x)=∇f(xk)+∇2f(xk)(x−xk)=0
令
d
=
x
−
x
k
d=x-x_k
d=x−xk为增量,代入
∇
f
(
x
)
,
∇
2
f
(
x
)
\nabla f(x),\nabla^2 f(x)
∇f(x),∇2f(x)得
(
J
k
T
J
k
+
S
k
)
d
=
−
J
k
T
r
k
(6)
\left( J_{k}^{T}J_{k}+S_{k}\right) d=-J_{k}^{T}r_{k}\tag{6}
(JkTJk+Sk)d=−JkTrk(6)
对最小二乘问题, Newton 方法的缺点是每次迭代都要求
S
k
S_k
Sk ,即计算m个
n
×
n
n\times n
n×n对称矩阵.显然,对一个算法而言,
S
k
S_k
Sk 的计算是一个沉重的负担.解决这个问题的方法是或者在 Newton 方程中忽略
S
k
S_k
Sk ,或者用一阶导数信息近似
S
k
S_k
Sk 。而要忽略
S
k
S_k
Sk ,则应在
r
i
(
x
)
r_i(x)
ri(x)接近于0或接近于线性时进行。这就是下面我们要讲的小剩余算法。
Gauss-Newton法
在Newton方程(6)中忽略
S
k
S_k
Sk就得到Gauss-Newton(GN)方法。该方法也可以这样理解,在点
x
k
x_k
xk处线性化剩余函数
r
i
(
x
k
+
d
)
r_i(x_k+d)
ri(xk+d),我们得到关于
d
d
d的线性最小二乘问题
min
d
∈
R
n
q
k
(
d
)
=
1
2
∥
r
k
+
J
k
d
∥
2
2
(7)
\min_{d\in \mathbb{R}^n}q_k(d)=\dfrac{1}{2}\Vert r_k+J_kd\Vert^2_2\tag{7}
d∈Rnminqk(d)=21∥rk+Jkd∥22(7)
其中
q
k
(
d
)
=
1
2
(
J
k
d
+
r
k
)
T
(
J
k
d
+
r
k
)
=
1
2
d
T
J
k
T
J
k
d
+
d
T
(
J
k
T
r
k
)
+
1
2
r
k
T
r
k
(8)
\begin{aligned} q_k(d)&=\dfrac{1}{2}(J_k d+r_k)^T(J_k d+r_k)\\ &= \dfrac{1}{2}d^{T}J_{k}^{T}J_{k}d+d^{T}\left( J_{k}^{T}r_{k}\right) +\dfrac{1}{2}r_{k}^{T}r_{k} \end{aligned}\tag{8}
qk(d)=21(Jkd+rk)T(Jkd+rk)=21dTJkTJkd+dT(JkTrk)+21rkTrk(8)
这里
q
k
(
d
)
q_k(d)
qk(d)是对
f
(
x
k
+
d
)
f(x_k+d)
f(xk+d)的一种二次近似,它与
f
(
x
k
+
d
)
f(x_k+d)
f(xk+d)的二次Taylor近似的差别在于二次项中少了
S
k
S_k
Sk。
问题(7)的极小点
d
k
d_k
dk满足
J
k
T
J
k
d
k
=
−
J
k
T
r
k
(9)
J_{k}^{T}J_{k}d_k=-J_{k}^{T}r_{k}\tag{9}
JkTJkdk=−JkTrk(9)
式(9)称为Gauss-Newton方程,由(9)式得到的方向
d
k
d_k
dk称为Gauss-Newton方向。
用 Gauss-Newton 方法求解最小二乘问题的算法如下
算法1 (Gauss-Newton 方法求解最小二乘问题)
- 给定 x 0 , ε > 0 , k : = 0 x_0,\varepsilon>0, k :=0 x0,ε>0,k:=0;
- 若终止条件满足,则停止迭代;
- 解 J k T J k d = − J k T r k J_{k}^T J_{k} d = - J_k^T r_k JkTJkd=−JkTrk得 d k d_k dk ;
- x k + 1 : = x k + α k d k x_{k+1}:= x_k + \alpha_k d_k xk+1:=xk+αkdk ,其中 α k \alpha_k αk是一维搜索结果, k : = k + 1 k := k +1 k:=k+1,转2.
基本Gauss-Newton方法是指 α k = 1 \alpha_k =1 αk=1的Gauss-Newton方法.带线搜索的Gauss-Newton方法称为阻尼Gauss-Newton 方法.
Gauss-Newton方法的优点在于它无须计算
r
(
x
)
r(x)
r(x)的二阶导数.另外,由(3)式和(9)式知
d
k
T
g
k
=
d
k
T
J
k
T
r
k
=
−
d
k
T
J
k
T
J
k
d
k
=
−
∥
J
k
d
k
∥
2
d_{k}^{T}g_{k}=d_{k}^{T}J_{k}^{T}r_{k}=-d_{k}^{T}J_{k}^{T}J_{k}d_{k}=-\left\| J_{k}d_{k}\right\| ^{2}
dkTgk=dkTJkTrk=−dkTJkTJkdk=−∥Jkdk∥2
这说明.当 J k J_k Jk满秩, g k g_k gk非零时, d k d_k dk是下降方向。
定理2(基本Gauss-Newton 方法的局部收敛性)
设
r
i
(
x
)
∈
C
2
(
i
=
1
,
⋯
,
m
)
,
x
∗
r_i(x)\in C^2(i=1,\cdots,m),x^{\ast}
ri(x)∈C2(i=1,⋯,m),x∗是最小二乘问题(1)的最优解,且
J
∗
T
J
∗
J^{\ast T}J^{\ast}
J∗TJ∗正定。假设由基本Gauss-Newton法迭代产生的点列
{
x
k
}
\{x_k\}
{xk}收敛于
x
∗
x^{\ast}
x∗,则当
G
(
x
)
G(x)
G(x)与
J
(
x
)
T
J
(
x
)
J(x)^TJ(x)
J(x)TJ(x)在
x
∗
x^{\ast}
x∗的邻域内Lipschitz连续时,有
∥
h
k
+
1
∥
⩽
∥
(
J
∗
T
J
∗
)
−
1
∥
∥
S
∗
∥
∥
h
k
∥
+
O
(
∥
h
k
∥
2
)
\left\| h_{k+1}\right\| \leqslant \left\| \left( J^{\ast T}J^{\ast}\right) ^{-1}\right\|\left\|S^{\ast}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right)
∥hk+1∥⩽
(J∗TJ∗)−1
∥S∗∥∥hk∥+O(∥hk∥2)
其中
h
k
=
x
k
−
x
∗
h_k=x_k-x^{\ast}
hk=xk−x∗。
证明
因为
f
∈
C
2
f\in C^2
f∈C2,且
G
(
x
)
G(x)
G(x) 在
x
∗
x^{\ast}
x∗的邻域内Lipschitz连续,当
x
k
x_k
xk充分接近
x
∗
x^\ast
x∗时,由Newton法收敛性的定理证明知
g
(
x
k
+
d
)
=
g
k
+
G
k
d
+
O
(
∥
d
∥
2
)
g\left( x_{k}+d\right) =g_{k}+G_{k}d+O\left( \left\| d\right\| ^{2}\right)
g(xk+d)=gk+Gkd+O(∥d∥2)
令
d
=
−
h
k
d=-h_k
d=−hk,得
0
=
g
∗
=
g
k
−
G
k
h
k
+
O
(
∥
h
k
∥
2
)
0=g^{\ast }=g_{k}-G_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right)
0=g∗=gk−Gkhk+O(∥hk∥2)
将(3)(4)式代入上式得
J
k
T
r
k
−
(
J
k
T
J
k
+
S
k
)
h
k
+
O
(
∥
h
k
∥
2
)
=
0
(10)
J_{k}^{T}r_{k}-\left( J_{k}^{T}J_{k}+S_{k}\right) h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) =0\tag{10}
JkTrk−(JkTJk+Sk)hk+O(∥hk∥2)=0(10)
因为
J
∗
T
J
∗
J^{\ast T}J^{\ast}
J∗TJ∗正定,当
x
k
x_k
xk充分接近
x
∗
x^*
x∗时,
J
k
T
J
k
J_k^TJ_k
JkTJk亦正定,我们用
(
J
k
T
J
k
)
−
1
(J_k^TJ_k)^{-1}
(JkTJk)−1左乘(10)式,由(8)式得
−
d
k
−
h
k
−
(
J
k
T
J
k
)
−
1
S
k
h
k
+
O
(
∥
h
k
∥
2
)
=
0
-d_{k}-h_{k}-\left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) =0
−dk−hk−(JkTJk)−1Skhk+O(∥hk∥2)=0
因为
d
k
+
h
k
=
x
k
+
1
−
x
k
+
x
k
−
x
∗
=
h
k
+
1
d_{k}+h_{k}=x_{k+1}-x_{k}+x_{k}-x^{\ast }=h_{k+1}
dk+hk=xk+1−xk+xk−x∗=hk+1
所以
h
k
+
1
=
−
(
J
k
T
J
k
)
−
1
S
k
h
k
+
O
(
∥
h
k
∥
2
)
∥
h
k
+
1
∥
⩽
∥
(
J
k
T
J
k
)
−
1
S
k
∥
∥
h
k
∥
+
O
(
∥
h
k
∥
2
)
⩽
∥
(
J
k
T
J
k
)
−
1
S
k
−
(
J
∗
T
J
∗
)
−
1
S
∗
∥
∥
h
k
∥
+
∥
(
J
∗
T
J
∗
)
−
1
∥
∥
S
∗
∥
∥
h
k
∥
+
O
(
∥
h
k
∥
2
)
(11)
\begin{aligned} h_{k+1}&=-\left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) \\ \left\| h_{k+1}\right\| &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) \\ &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \left\| h_{k}\right\| +\left\| \left( J^{\ast T}J^{\ast }\right) ^{-1}\right\| \left\| S^{\ast }\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) \end{aligned}\tag{11}
hk+1∥hk+1∥=−(JkTJk)−1Skhk+O(∥hk∥2)⩽
(JkTJk)−1Sk
∥hk∥+O(∥hk∥2)⩽
(JkTJk)−1Sk−(J∗TJ∗)−1S∗
∥hk∥+
(J∗TJ∗)−1
∥S∗∥∥hk∥+O(∥hk∥2)(11)
在下面关于
S
(
x
)
S (x)
S(x)和
(
J
(
z
)
T
J
(
z
)
)
−
1
(J(z)^TJ(z))^{-1}
(J(z)TJ(z))−1在
x
∗
x^{\ast}
x∗的邻域内Lipschitz连续的证明中,对于任意矩阵
A
(
x
)
A(x)
A(x),我们采用记号
A
x
=
A
(
x
)
A_x = A ( x )
Ax=A(x).因为
G
x
G_x
Gx和
J
x
T
J
x
J_x^TJ_x
JxTJx 在
x
∗
x^{\ast}
x∗的邻域中Lipschitz连续,所以存在
β
,
γ
>
0
\beta,\gamma>0
β,γ>0,使得对
x
∗
x^{\ast}
x∗邻域内的任意两点
x
,
y
x , y
x,y ,有
∥
G
(
x
)
−
G
(
y
)
∥
⩽
β
∥
x
−
y
∥
∥
J
(
x
)
T
J
(
x
)
−
J
(
y
)
T
J
(
y
)
∥
⩽
γ
∥
x
−
y
∥
\begin{aligned}\left\| G\left( x\right) -G\left( y\right) \right\| &\leqslant \beta \left\| x-y\right\| \\ \left\| J\left( x\right) ^{T}J\left( x\right) -J\left( y\right) ^{T}J\left( y\right) \right\| &\leqslant \gamma \left\| x-y\right\| \end{aligned}
∥G(x)−G(y)∥
J(x)TJ(x)−J(y)TJ(y)
⩽β∥x−y∥⩽γ∥x−y∥
从而
∥
S
(
x
)
−
S
(
y
)
∥
=
∥
G
(
x
)
−
a
(
y
)
−
J
(
x
)
T
J
(
x
)
+
J
(
y
)
T
J
(
Y
)
∥
⩽
∥
G
(
x
)
−
G
(
y
)
∥
+
∥
J
(
x
)
T
J
(
x
)
−
J
(
y
)
T
J
(
y
)
∥
⩽
(
β
+
γ
)
∥
x
−
y
∥
\begin{aligned}\left\| S\left( x\right) -S\left( y\right) \right\| &=\left\| G\left( x\right) -a\left( y\right) -J\left( x\right) ^{T}J\left( x\right) +J\left( y\right) ^{T}J\left( Y\right) \right\| \\ &\leqslant \left\| G\left( x\right) -G\left( y\right) \right\| +\left\| J\left( x\right) ^{T}J\left( x\right) -J\left( y\right) ^{T}J\left( y\right) \right\| \\ &\leqslant \left( \beta +\gamma \right) \left\| x-y\right\| \end{aligned}
∥S(x)−S(y)∥=
G(x)−a(y)−J(x)TJ(x)+J(y)TJ(Y)
⩽∥G(x)−G(y)∥+
J(x)TJ(x)−J(y)TJ(y)
⩽(β+γ)∥x−y∥
对
x
∗
x^{\ast}
x∗邻域内的任意点
x
x
x,由
J
∗
T
J
∗
J^{\ast T}J^{\ast}
J∗TJ∗的正定性知,存在
ξ
>
0
\xi >0
ξ>0,使得
∥
(
J
x
T
J
x
)
−
1
∥
⩽
ξ
\lVert(J^T_xJ_x)^{-1}\rVert\leqslant \xi
∥(JxTJx)−1∥⩽ξ,从而
∥
(
J
x
T
J
x
)
−
1
−
(
J
y
T
J
y
)
−
1
∥
=
∥
(
J
x
T
J
x
)
−
1
(
J
y
T
J
y
−
J
x
T
J
x
)
(
J
y
T
J
y
)
−
1
∥
⩽
∥
(
J
x
T
J
x
)
−
1
∥
∥
(
J
y
T
J
y
)
−
1
∥
∥
J
y
T
J
y
−
J
x
T
J
x
∥
⩽
γ
ξ
2
∥
x
−
y
∥
\begin{aligned} \left\| \left( J_{x}^{T}J_{x}\right) ^{-1}-\left( J_{y}^{T}J_{y}\right) ^{-1}\right\| &=\left\| \left( J_{x}^{T}J_{x}\right) ^{-1}\left( J_{y}^{T}J_{y}-Jx^{T}J_{x}\right) \left( J_{y}^{T}Jy\right) ^{-1}\right\| \\ &\leqslant \left\| \left( J_{x}^{T}J_{x}\right) ^{-1}\right\| \left\| \left( J_{y}^{T}J_{y}\right) ^{-1}\right\| \left\| J_{y}^{T}J_{y}-J_{x}^{T}J_x\right\| \\ &\leqslant \gamma \xi ^{2}\left\| x-y\right\| \end{aligned}
(JxTJx)−1−(JyTJy)−1
=
(JxTJx)−1(JyTJy−JxTJx)(JyTJy)−1
⩽
(JxTJx)−1
(JyTJy)−1
JyTJy−JxTJx
⩽γξ2∥x−y∥
所以
S
x
S_x
Sx 与
(
J
x
T
J
x
)
−
1
(J_x^TJ_x)^{-1}
(JxTJx)−1也在
x
∗
x^{\ast}
x∗的邻域内Lipschitz连续。
当
x
k
x_k
xk充分接近
x
∗
x^{\ast}
x∗时,有
∥
(
J
k
T
J
k
)
−
1
S
k
−
(
J
∗
T
J
∗
)
−
1
S
∗
∥
⩽
∥
(
J
k
T
J
k
)
−
1
S
k
−
(
J
k
T
J
k
)
−
1
S
∗
∥
+
∥
(
J
k
T
J
k
)
−
1
S
∗
−
(
J
∗
T
J
∗
)
−
1
S
∗
∥
⩽
(
β
+
γ
)
∥
(
J
k
T
J
k
)
−
1
∥
∥
h
k
∥
+
γ
ξ
2
∥
S
∗
∥
∥
h
k
∥
⩽
(
(
β
+
γ
)
ξ
+
γ
ξ
2
∥
S
∗
∥
)
∥
h
k
∥
\begin{aligned} &\left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \\ &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J_{k}^{T}J_{k}\right) ^{-1}S^{\ast }\right\| +\left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S^{\ast }-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \\ &\leqslant \left( \beta +\gamma \right) \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}\right\| \left\| h_{k}\right\| +\gamma \xi ^{2}\left\| S^{\ast }\right\| \left\| h_{k}\right\| \\ &\leqslant \left( \left( \beta +\gamma \right) \xi +\gamma \xi ^{2}\left\| S^{\ast }\right\| \right) \left\| h_{k}\right\| \end{aligned}
(JkTJk)−1Sk−(J∗TJ∗)−1S∗
⩽
(JkTJk)−1Sk−(JkTJk)−1S∗
+
(JkTJk)−1S∗−(J∗TJ∗)−1S∗
⩽(β+γ)
(JkTJk)−1
∥hk∥+γξ2∥S∗∥∥hk∥⩽((β+γ)ξ+γξ2∥S∗∥)∥hk∥
所以
∥
(
J
k
T
J
k
)
−
1
S
k
−
(
J
∗
T
J
∗
)
−
1
S
∗
∥
∥
h
k
∥
⩽
(
(
β
+
γ
)
ξ
+
γ
ξ
2
∥
S
∗
∥
)
∥
h
k
∥
2
\left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \left\|h_k\right\|\leqslant \left( \left( \beta +\gamma \right) \xi +\gamma \xi ^{2}\left\| S^{\ast }\right\| \right) \left\| h_{k}\right\| ^2
(JkTJk)−1Sk−(J∗TJ∗)−1S∗
∥hk∥⩽((β+γ)ξ+γξ2∥S∗∥)∥hk∥2
将上式代入(11)式可得
∥
h
k
+
1
∥
⩽
∥
(
J
∗
T
J
∗
)
−
1
∥
∥
S
∗
∥
∥
h
k
∥
+
O
(
∥
h
k
∥
2
)
\left\| h_{k+1}\right\| \leqslant \left\| \left( J^{\ast T}J^{\ast}\right) ^{-1}\right\|\left\|S^{\ast}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right)
∥hk+1∥⩽
(J∗TJ∗)−1
∥S∗∥∥hk∥+O(∥hk∥2)
故定理结论成立。
该定理说明,若 x k → x ∗ x_k\to x^{\ast} xk→x∗,基本Gauss-Newton方法有如下两种情形的收敛速度:
- 二阶收敛速度.若 ∥ S ( x ∗ ) ∥ = 0 \left\|S (x^*)\right\|=0 ∥S(x∗)∥=0,即在零剩余问题或是线性最小二乘问题的情形,则方法在 x ∗ x^{\ast} x∗附近具有Newton方法的收敛速度.
- 线性收敛速度.若 ∥ S ( x ∗ ) ∥ ≠ 0 \left\|S ( x^*)\right\|\neq 0 ∥S(x∗)∥=0,则方法的收敛速度是线性的,收敛速度随 S ( x ∗ ) S (x^*) S(x∗)的增大而变慢.
由此可见,基本Gauss-Newton方法的收敛速度是与 x ∗ x^{\ast} x∗处剩余量的大小及剩余函数的线性程度有关的,即剩余量越小或剩余函数越接近线性,它的收敛速度就越快;反之就越慢,甚至对剩余量很大或剩余函数的非线性程度很强的问题不收敛.
LM方法
Gauss-Newton方法在迭代中会出现
J
k
T
J
k
J_k^TJ_k
JkTJk为奇异的情况,为了克服这个困难,提出LM (Levenberg-Marquardt)方法。修改Gauss-Newton方程为LM方程:
(
J
k
T
J
k
+
γ
k
I
)
d
=
−
J
k
T
r
k
(12)
(J_k^TJ_k+\gamma_k I)d=-J^T_kr_k \tag{12}
(JkTJk+γkI)d=−JkTrk(12)
其中
γ
k
⩾
0
\gamma_k\geqslant 0
γk⩾0,使得
J
k
T
J
k
+
γ
k
I
J_k^TJ_k+\gamma_k I
JkTJk+γkI正定,从计算角度出发,为保证该矩阵充分正定,
γ
k
\gamma_k
γk可能需要取得适当的大,
J
k
T
J
k
+
γ
k
J_k^TJ_k+\gamma_k
JkTJk+γk的正定性保证了得到的方向是下降方向。
(LM方程与信赖域问题的关系)
d
k
d_k
dk为信赖域子问题
min
d
1
2
∥
J
k
d
+
r
k
∥
2
s
.
t
.
∥
d
∥
2
⩽
Δ
k
2
,
Δ
k
>
0
(13)
\begin{aligned} &\min _{d}\dfrac{1}{2}\left\| J_{k}d+r_{k}\right\| ^{2} \\ &{\rm s.t.} \left\| d\right\| ^{2}\leqslant \Delta _{k}^{2},\Delta_k >0\tag{13} \end{aligned}
dmin21∥Jkd+rk∥2s.t.∥d∥2⩽Δk2,Δk>0(13)
的全局极小解的充分必要条件是,对满足(13)式的
d
k
d_k
dk,存在
γ
k
⩾
0
\gamma_k\geqslant 0
γk⩾0,使得
(
J
k
T
J
k
+
γ
k
I
)
d
k
=
−
J
k
T
r
k
γ
k
(
Δ
k
2
−
∥
d
k
∥
2
)
=
0
(14)
\begin{aligned}&\left( J_{k}^{T}J_{k}+\gamma _{k} I\right) d_{k}=-J_{k}^{T}r_{k}\tag{14}\\ &\gamma _{k}\left( \Delta _{k}^{2}-\left\| d_{k}\right\| ^{2}\right) =0\end{aligned}
(JkTJk+γkI)dk=−JkTrkγk(Δk2−∥dk∥2)=0(14)
证明:
必要性:
对于优化问题(13),由有约束优化问题的最优性条件知,存在
γ
k
⩾
0
\gamma_k\geqslant 0
γk⩾0,使得
d
k
,
γ
k
d_k,\gamma_k
dk,γk满足KKT条件。
Lagrange函数:
L
(
d
,
γ
)
=
1
2
∥
J
k
d
+
r
k
∥
2
−
1
2
γ
(
Δ
k
2
−
∥
d
∥
2
)
L(d,\gamma)=\dfrac{1}{2}\left\| J_{k}d+r_{k}\right\| ^{2}-\dfrac{1}{2}\gamma \left( \Delta _{k}^{2}-\left\| d\right\| ^{2}\right)
L(d,γ)=21∥Jkd+rk∥2−21γ(Δk2−∥d∥2)
KKT条件:
∇
d
L
(
d
k
,
γ
k
)
=
0
⇒
J
k
T
r
k
+
(
J
k
T
J
k
+
γ
k
I
)
d
k
=
0
γ
k
(
Δ
k
2
−
∥
d
k
∥
2
)
=
0
(互补性条件)
\nabla _{d}L\left( d_{k},\gamma_{k}\right) =0\Rightarrow J_{k}^{T}r_{k}+\left( J_{k}^{T}J_{k}+\gamma _{k}I\right) d_{k}=0\\ \gamma _{k}\left( \Delta _{k}^{2}-\left\| d_{k}\right\| ^{2}\right) =0\tag{互补性条件}
∇dL(dk,γk)=0⇒JkTrk+(JkTJk+γkI)dk=0γk(Δk2−∥dk∥2)=0(互补性条件)
充分性:
因为
J
k
T
J
k
+
γ
k
I
J_{k}^{T}J_{k}+\gamma _{k}I
JkTJk+γkI半正定,所以方程(14)上式的解
d
k
d_k
dk是
q
~
k
(
d
)
=
1
2
d
T
(
J
k
T
J
k
+
γ
k
I
)
d
+
d
T
(
J
k
T
r
k
)
+
1
2
r
k
T
r
k
\tilde{q}_{k}\left( d\right) = \dfrac{1}{2}d^{T}( J_{k}^{T}J_{k}+\gamma_{k}I) d+d^{T}\left( J_{k}^{T}r_{k}\right) +\dfrac{1}{2}r _{k}^{T }r_{k}
q~k(d)=21dT(JkTJk+γkI)d+dT(JkTrk)+21rkTrk
的全局极小点。由(8)式有
q
~
k
(
d
)
=
q
k
(
d
)
+
1
2
γ
k
∥
d
∥
2
\tilde{q}_{k}\left( d\right) = q_{k}\left( d\right) +\dfrac{1}{2}\gamma_{k}\left\| d\right\| ^{2}
q~k(d)=qk(d)+21γk∥d∥2
因为任给
d
∈
R
n
d\in \mathbb{R}^n
d∈Rn,有
q
~
k
(
d
)
⩾
q
~
k
(
d
k
)
\tilde{q}_k(d)\geqslant \tilde{q}_k(d_k)
q~k(d)⩾q~k(dk)
q
k
(
d
)
⩾
q
k
(
d
k
)
+
1
2
γ
k
(
∥
d
k
∥
2
−
∥
d
∥
2
)
q_{k}\left( d\right) \geqslant q_{k}\left( d_k\right) +\dfrac{1}{2}\gamma _{k}\left( \left\| d_k\right\| ^{2}-\left\| d\right\| ^{2}\right)
qk(d)⩾qk(dk)+21γk(∥dk∥2−∥d∥2)
由(14)下式知,若
γ
k
=
0
\gamma_k = 0
γk=0,有
q
k
(
d
)
⩾
q
k
(
d
k
)
q_k(d)\geqslant q_k(d_k)
qk(d)⩾qk(dk);若
γ
k
≠
0
\gamma_k\neq 0
γk=0,有
∥
d
k
∥
2
=
Δ
k
2
\lVert d_k\rVert^2=\Delta_k^2
∥dk∥2=Δk2,所以
q
k
(
d
)
⩾
q
k
(
d
k
)
+
1
2
γ
k
(
Δ
k
2
−
∥
d
∥
2
)
q_k(d)\geqslant q_k(d_k )+ \dfrac{1}{2}\gamma_k(\Delta_k^2-\lVert d \rVert^2)
qk(d)⩾qk(dk)+21γk(Δk2−∥d∥2)
这说明,对任意
γ
k
⩾
0
\gamma_k\geqslant 0
γk⩾0和任意满足
∥
d
∥
2
⩽
Δ
k
2
\lVert d \rVert^2\leqslant \Delta_k^2
∥d∥2⩽Δk2的
d
d
d,
d
k
d_k
dk是问题(13)的全局最优解。
下面来考虑
γ
k
\gamma_k
γk的修正方法,它与信赖域半径
Δ
k
\Delta_k
Δk的修正是相关的。在信赖域方法中,从
x
k
x_k
xk到
x
k
+
d
k
x_k+d_k
xk+dk,
f
(
x
)
f(x)
f(x)的实际减少量为
Δ
f
k
=
f
(
x
k
)
−
f
(
x
k
+
d
k
)
\Delta f_k = f(x_k)-f(x_k+d_k)
Δfk=f(xk)−f(xk+dk)
由(8)式给出的
f
(
x
k
+
d
)
f(x_k+d)
f(xk+d)的二次近似函数
q
k
(
d
)
q_k(d)
qk(d)的减少量为
Δ
q
k
=
q
k
(
0
)
−
q
k
(
d
k
)
\Delta q_k = q_k(0)-q_k(d_k)
Δqk=qk(0)−qk(dk)
这里
q
k
(
0
)
=
f
k
q_k(0)=f_k
qk(0)=fk,另外,由LM方程与
d
k
T
g
k
<
0
d^T_k g_k<0
dkTgk<0知
Δ
q
k
=
q
k
(
0
)
−
q
k
(
d
k
)
=
−
1
2
d
k
T
J
k
T
J
k
d
k
−
d
k
T
(
J
k
T
r
k
)
=
1
2
d
k
T
(
−
J
k
T
J
k
d
k
−
γ
k
d
k
+
γ
k
d
k
−
2
J
k
T
r
k
)
=
1
2
d
k
T
(
−
(
J
k
T
J
k
+
γ
k
I
)
d
k
+
γ
k
d
k
−
2
J
k
T
r
k
)
=
1
2
d
k
T
(
γ
k
d
k
−
g
k
)
>
0
\begin{aligned} \Delta q_{k}&=q_{k}\left( 0\right) -q_{k}\left( d_{k}\right) \\ &=-\dfrac{1}{2}d_{k}^{T}J_{k}^{T}J_{k}d_{k}-d_{k}^{T}\left( J_{k}^{T}r_{k}\right) \\ &=\dfrac{1}{2}d_{k}^{T}\left( -J_{k}^{T}J_{k}d_{k}-\gamma _{k}d_{k}+\gamma _{k}d_{k}-2J_{k}^{T}r_{k}\right) \\ &=\dfrac{1}{2}d_{k}^{T}\left( -\left( J_{k}^{T}J_{k}+\gamma _{k}I\right) d_{k}+\gamma _{k}d_{k}-2J_{k}^{T}r_{k}\right) \\ &=\dfrac{1}{2}d_{k}^{T}\left( \gamma _{k}d_{k}-g_{k}\right) >0\end{aligned}
Δqk=qk(0)−qk(dk)=−21dkTJkTJkdk−dkT(JkTrk)=21dkT(−JkTJkdk−γkdk+γkdk−2JkTrk)=21dkT(−(JkTJk+γkI)dk+γkdk−2JkTrk)=21dkT(γkdk−gk)>0
其中
g
k
=
J
k
T
r
k
g_k = J_k^T r_k
gk=JkTrk
定义
ρ
k
=
Δ
f
k
Δ
q
k
\rho_k = \dfrac{\Delta f_k}{\Delta q_k}
ρk=ΔqkΔfk
在第k步迭代,
ρ
k
\rho_k
ρk的值可以反映出
q
k
(
d
k
)
q_k(d_k)
qk(dk)近似
f
(
x
k
+
d
k
)
f(x_k+d_k)
f(xk+dk)的好坏。由LM方程知,
γ
k
\gamma_k
γk可以控制
∥
d
k
∥
\lVert d_k\rVert
∥dk∥的大小,从而可以控制信赖域的大小,
γ
k
\gamma_k
γk的修正应与信赖域方法中对
Δ
k
\Delta_k
Δk大小的修正相反。
ρ
k
\rho_k
ρk小于阈值,则说明近似效果差,应收缩信赖域,扩大
γ
\gamma
γ,否则则相反。