拟牛顿法
拟牛顿法的基本思想就是将海森矩阵 G k = ∇ 2 f ( x k ) \mathbf{G}_k=\nabla^2 f\left(\mathbf{x}_k\right) Gk=∇2f(xk)用一个近似矩阵 B k \mathbf{B}_k Bk来代替
通常 B k \mathbf{B}_k Bk应该有3个特点
- B k ≈ G k \mathbf{B}_k\approx \mathbf{G}_k Bk≈Gk使相应的算法产生的方向近似于牛顿方向,以确保算法具有较快的收敛速度
- 对于所有的 k k k, B k \mathbf{B}_k Bk对称正定,从而使得算法产生的方向是函数 f f f在 x k \mathbf{x}_k xk处的下降方向
- B k \mathbf{B}_k Bk更新规则相对简单,即通常采用一个秩1矩阵或者秩2矩阵来矫正
设
f
:
R
n
→
R
f:\mathbb{R}^n\to \mathbb{R}
f:Rn→R在开集
D
⊂
R
n
D\subset \mathbb{R}^n
D⊂Rn上二次连续可微,则
f
(
x
)
≈
(
x
k
+
1
)
+
g
k
+
1
T
(
x
−
x
k
+
1
)
+
1
2
(
x
−
x
k
+
1
)
T
G
k
+
1
(
x
−
x
k
+
1
)
f\left(\mathbf{x}\right)\approx \left(\mathbf{x}_{k+1}\right)+\mathbf{g}_{k+1}^{T}\left(\mathbf{x}-\mathbf{x}_{k+1}\right)+\frac{1}{2}\left(\mathbf{x}-\mathbf{x}_{k+1}\right)^T\mathbf{G}_{k+1}\left(\mathbf{x}-\mathbf{x}_{k+1}\right)
f(x)≈(xk+1)+gk+1T(x−xk+1)+21(x−xk+1)TGk+1(x−xk+1)
求导
g
(
x
)
=
g
k
+
1
+
G
k
+
1
(
x
−
x
k
+
1
)
\mathbf{g}\left(\mathbf{x}\right)=\mathbf{g}_{k+1}+\mathbf{G}_{k+1}\left(\mathbf{x}-\mathbf{x}_{k+1}\right)
g(x)=gk+1+Gk+1(x−xk+1)
令
x
=
x
k
,
s
k
=
x
k
+
1
−
x
k
,
y
k
=
g
k
+
1
−
g
k
\mathbf{x}=\mathbf{x}_k,\mathbf{s}_{k}=\mathbf{x}_{k+1}-\mathbf{x}_k,\mathbf{y}_k=\mathbf{g}_{k+1}-\mathbf{g}_k
x=xk,sk=xk+1−xk,yk=gk+1−gk,则有
G
k
+
1
s
k
≈
y
k
\mathbf{G}_{k+1}\mathbf{s}_k\approx \mathbf{y}_k
Gk+1sk≈yk
现在要构造近似的矩阵
B
k
\mathbf{B}_k
Bk,即
B
k
+
1
s
k
=
y
k
\mathbf{B}_{k+1}\mathbf{s}_k=\mathbf{y}_k
Bk+1sk=yk
令
H
k
+
1
=
B
k
+
1
−
1
\mathbf{H}_{k+1}=\mathbf{B}_{k+1}^{-1}
Hk+1=Bk+1−1,则有
H
k
+
1
y
k
=
s
k
\mathbf{H}_{k+1}\mathbf{y}_{k}=\mathbf{s}_k
Hk+1yk=sk
根据第三个特点,可以令
B
k
+
1
=
B
k
+
E
k
,
H
k
+
1
=
H
k
+
D
k
\mathbf{B}_{k+1}=\mathbf{B}_{k}+\mathbf{E}_k,\quad\mathbf{H}_{k+1}=\mathbf{H}_k+\mathbf{D}_k
Bk+1=Bk+Ek,Hk+1=Hk+Dk
其中
E
k
,
D
k
\mathbf{E}_k,\mathbf{D}_k
Ek,Dk是秩1矩阵或者秩2矩阵
秩1矫正
设
E
k
=
α
u
k
u
k
T
\mathbf{E}_k=\alpha \mathbf{u}_k\mathbf{u}_k^T
Ek=αukukT,其中
α
∈
R
,
u
k
∈
R
n
\alpha \in \mathbb{R},\mathbf{u}_k\in\mathbb{R}^n
α∈R,uk∈Rn
于是
(
B
k
+
α
u
k
u
k
T
)
s
k
=
y
k
\left(\mathbf{B}_k+\alpha\mathbf{u}_k\mathbf{u}_k^T\right)\mathbf{s}_k=\mathbf{y}_k
(Bk+αukukT)sk=yk
即
α
(
u
k
T
s
k
)
u
k
=
y
k
−
B
k
s
k
\alpha\left(\mathbf{u}_k^T\mathbf{s}_k\right)\mathbf{u}_k=\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k
α(ukTsk)uk=yk−Bksk
也就是说
u
k
\mathbf{u}_k
uk平行于
y
k
−
B
k
s
k
\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k
yk−Bksk,即存在
β
\beta
β,使得
u
k
=
β
(
y
k
−
B
k
s
k
)
\mathbf{u}_k=\beta\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)
uk=β(yk−Bksk)
因此
E
k
=
α
β
2
(
y
k
−
B
k
s
k
)
(
y
k
−
B
k
s
k
)
T
\mathbf{E}_k=\alpha\beta^2\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T
Ek=αβ2(yk−Bksk)(yk−Bksk)T
以及
α
β
2
[
(
y
k
−
B
k
s
k
)
T
s
k
]
(
y
k
−
B
k
s
k
)
=
(
y
k
−
B
k
s
k
)
\alpha\beta^2\left[\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_k\right]\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)=\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)
αβ2[(yk−Bksk)Tsk](yk−Bksk)=(yk−Bksk)
如果
(
y
k
−
B
k
s
k
)
T
s
k
≠
0
\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_{k}\neq 0
(yk−Bksk)Tsk=0
则
α
β
2
[
(
y
k
−
B
k
s
k
)
T
s
k
]
=
1
\alpha\beta^2\left[\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_k\right]=1
αβ2[(yk−Bksk)Tsk]=1
于是
α
β
2
=
1
(
y
k
−
B
k
s
k
)
T
s
k
\alpha\beta^2=\frac{1}{\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_k}
αβ2=(yk−Bksk)Tsk1
然后
E
k
=
(
y
k
−
B
k
s
k
)
(
y
k
−
B
k
s
k
)
T
(
y
k
−
B
k
s
k
)
T
s
k
\mathbf{E}_k=\frac{\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T}{\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_k}
Ek=(yk−Bksk)Tsk(yk−Bksk)(yk−Bksk)T
所以
B
k
+
1
=
B
k
+
(
y
k
−
B
k
s
k
)
(
y
k
−
B
k
s
k
)
T
(
y
k
−
B
k
s
k
)
T
s
k
\mathbf{B}_{k+1}=\mathbf{B}_k+\frac{\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T}{\left(\mathbf{y}_{k}-\mathbf{B}_k\mathbf{s}_k\right)^T\mathbf{s}_k}
Bk+1=Bk+(yk−Bksk)Tsk(yk−Bksk)(yk−Bksk)T
类似地
H
k
+
1
=
H
k
+
(
s
k
−
H
k
y
k
)
(
s
k
−
H
k
y
k
)
T
(
s
k
−
H
k
y
k
)
T
y
k
\mathbf{H}_{k+1}=\mathbf{H}_{k}+\frac{\left(\mathbf{s}_{k}-\mathbf{H}_k\mathbf{y}_k\right)\left(\mathbf{s}_{k}-\mathbf{H}_k\mathbf{y}_k\right)^T}{\left(\mathbf{s}_{k}-\mathbf{H}_k\mathbf{y}_k\right)^T\mathbf{y}_k}
Hk+1=Hk+(sk−Hkyk)Tyk(sk−Hkyk)(sk−Hkyk)T
所以秩1矫正步骤
初始化:选定
x
0
\mathbf{x}_0
x0,
H
0
\mathbf{H}_0
H0(通常选择
I
n
\mathbf{I}_n
In)
步骤:
1.若
∥
g
k
∥
≤
ϵ
\|\mathbf{g}_k\|\le \epsilon
∥gk∥≤ϵ,停止,输出
x
k
\mathbf{x}_k
xk
2.计算
d
k
=
−
H
k
g
k
\mathbf{d}_k=-\mathbf{H}_k\mathbf{g}_k
dk=−Hkgk
3.求步长
α
k
\alpha_k
αk
4.
x
k
+
1
=
x
k
+
α
k
d
k
\mathbf{x}_{k+1}=\mathbf{x}_k+\alpha_k\mathbf{d}_k
xk+1=xk+αkdk,矫正
H
k
\mathbf{H}_{k}
Hk得到
H
k
+
1
\mathbf{H}_{k+1}
Hk+1
5.k=k+1,转1
代码:
function [x,fun_val]=sr1(f,g,x0,s,alpha,...
beta,epsilon)
% sr1 with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
grad=g(x0);
fun_val=f(x0);
iter=0;
H_k=eye(length(x0));
while (norm(grad)>epsilon)
iter=iter+1;
dk=H_k*grad;
t=s;
while (fun_val-f(x0-t*dk)<alpha*t*grad'*dk)
t=beta*t;
end
x=x0-t*dk;
s_k=x-x0;
y_k=g(x)-grad;
H_k=H_k+(s_k-H_k*y_k)*(s_k-H_k*y_k)'/((s_k-H_k*y_k)'*y_k);
fun_val=f(x);
grad=g(x);
x0=x;
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end