牛顿法的特点就是收敛快。但是运用牛顿法需要计算二阶偏导数,而且目标函数的Hesse矩阵可能非正定。为了克服牛顿法的缺点,人们提出了拟牛顿法,它的基本思想是用不包含二阶导数的矩阵近似牛顿法中的Hesse矩阵的逆矩阵。
牛顿法的迭代公式
x ( k + 1 ) = x ( k ) + λ d ( k ) x^{(k+1)}=x^{(k)}+\lambda d^{(k)} x(k+1)=x(k)+λd(k) d ( k ) = − ▽ 2 f ( x ( k ) ) − 1 ▽ f ( x ( k ) ) d^{(k)}=-\bigtriangledown ^{2}f(x^{(k)})^{-1}\bigtriangledown f(x^{(k)}) d(k)=−▽2f(x(k))−1▽f(x(k))
为了构造 ▽ 2 f ( x ( k ) ) − 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} ▽2f(x(k))−1的近似矩阵 H k H_{k} Hk,我们先来分析 ▽ 2 f ( x ( k ) ) − 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} ▽2f(x(k))−1与一阶导数的关系。将 f ( x ) f(x) f(x)在点 x ( k + 1 ) x^{(k+1)} x(k+1)展开成泰勒级数 f ( x ) = f ( x ( k + 1 ) ) + ▽ f ( x ( k + 1 ) ) T ( x − x ( k + 1 ) ) f(x)=f(x^{(k+1)})+\bigtriangledown f(x^{(k+1)})^{T}(x-x^{(k+1)}) f(x)=f(x(k+1))+▽f(x(k+1))T(x−x(k+1)) + 1 2 ( x − x ( k + 1 ) ) T ▽ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) +\frac{1}{2}(x-x^{(k+1)})^{T} \bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) +21(x−x(k+1))T▽2f(x(k+1))(x−x(k+1))由此可知,在 x ( k + 1 ) x^{(k+1)} x(k+1)附近有 ▽ f ( x ) ≈ ▽ f ( x ( k + 1 ) ) + ▽ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) \bigtriangledown f(x) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) ▽f(x)≈▽f(x(k+1))+▽2f(x(k+1))(x−x(k+1))令 x = x ( k ) x=x^{(k)} x=x(k)则 ▽ f ( x ( k ) ) ≈ ▽ f ( x ( k + 1 ) ) + ▽ 2 f ( x ( k + 1 ) ) ( x ( k ) − x ( k + 1 ) ) \bigtriangledown f(x^{(k)}) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x^{(k)}-x^{(k+1)}) ▽f(x(k))≈▽f(x(k+1))+▽2f(x(k+1))(x(k)−x(k+1))记 p ( k ) = x ( k + 1 ) − x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} p(k)=x(k+1)−x(k) q ( k ) = ▽ f ( x ( k + 1 ) ) − ▽ f ( x ( k ) ) q^{(k)}=\bigtriangledown f(x^{(k+1)})-\bigtriangledown f(x^{(k)}) q(k)=▽f(x(k+1))−▽f(x(k)) q ( k ) ≈ ▽ 2 f ( x ( k + 1 ) ) p ( k ) q^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})p^{(k)} q(k)≈▽2f(x(k+1))p(k)如果Hesse矩阵 ▽ 2 f ( x ( k + 1 ) ) \bigtriangledown ^{2}f(x^{(k+1)}) ▽2f(x(k+1))可逆则 p ( k ) ≈ ▽ 2 f ( x ( k + 1 ) ) − 1 q ( k ) p^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})^{-1}q^{(k)} p(k)≈▽2f(x(k+1))−1q(k)这样计算出p和q后根据上式就能估计Hesse矩阵的逆。因此我们可以用不包含二阶导数的矩阵 H k + 1 H_{k+1} Hk+1取代Hesse矩阵的逆 p ( k ) = H k + 1 q ( k ) p^{(k)}=H_{k+1}q^{(k)} p(k)=Hk+1q(k)这就是拟牛顿法,接下来所要做的就是确定这个矩阵 H k + 1 H_{k+1} Hk+1。
DFB算法又被称为变尺度法
H
k
+
1
=
H
k
+
p
(
k
)
p
(
k
)
T
p
(
k
)
T
q
(
k
)
−
H
k
q
(
k
)
q
(
k
)
T
H
k
q
(
k
)
T
H
k
q
(
k
)
H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}}
Hk+1=Hk+p(k)Tq(k)p(k)p(k)T−q(k)THkq(k)Hkq(k)q(k)THk满足
p
(
k
)
=
H
k
+
1
q
(
k
)
p^{(k)}=H_{k+1}q^{(k)}
p(k)=Hk+1q(k)
DFB方法计算如下:
- 初始化 x ( 1 ) x^{(1)} x(1),允许误差 ϵ > 0 \epsilon >0 ϵ>0
- 置 H 1 = I n H_{1}=I_{n} H1=In(单位矩阵), k = 1 k=1 k=1,计算出在 x ( 1 ) x^{(1)} x(1)处的梯度 g 1 = ▽ f ( x ( 1 ) ) g_{1}=\bigtriangledown f(x^{(1)}) g1=▽f(x(1))
- 令 d ( k ) = − H k g k d^{(k)}=-H_{k}g_{k} d(k)=−Hkgk
- 从 x ( k ) x^{(k)} x(k)出发,沿着 d ( k ) d^{(k)} d(k)搜索,求步长 λ k \lambda _{k} λk,使其满足 f ( x ( k ) + λ k d ( k ) ) = m i n λ ≥ 0 f ( x ( k ) + λ d ( k ) ) f(x^{(k)}+\lambda _{k}d^{(k)})=min_{\lambda \geq0}f(x^{(k)}+\lambda d^{(k)}) f(x(k)+λkd(k))=minλ≥0f(x(k)+λd(k))更新 x ( k + 1 ) = x ( k ) + λ k d ( k ) x^{(k+1)}=x^{(k)}+\lambda _{k}d^{(k)} x(k+1)=x(k)+λkd(k)
- 检验是否满足收敛准则,若 ∣ ∣ ▽ f ( x ( k + 1 ) ) ∣ ∣ ≤ ϵ ||\bigtriangledown f(x^{(k+1)})|| \leq \epsilon ∣∣▽f(x(k+1))∣∣≤ϵ则停止迭代,得到点 x ^ = x ( k + 1 ) \hat{x}=x^{(k+1)} x^=x(k+1);否则进行步骤6
- 若 k = n k=n k=n,则令 x ( 1 ) = x ( k + 1 ) x^{(1)}=x^{(k+1)} x(1)=x(k+1),返回步骤2;否则进行步骤7
- g k + 1 = ▽ f ( x ( k + 1 ) ) g_{k+1}=\bigtriangledown f(x^{(k+1)}) gk+1=▽f(x(k+1)) p ( k ) = x ( k + 1 ) − x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} p(k)=x(k+1)−x(k) q ( k ) = g k + 1 − g k q^{(k)}=g_{k+1}-g_{k} q(k)=gk+1−gk计算 H k + 1 = H k + p ( k ) p ( k ) T p ( k ) T q ( k ) − H k q ( k ) q ( k ) T H k q ( k ) T H k q ( k ) H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}} Hk+1=Hk+p(k)Tq(k)p(k)p(k)T−q(k)THkq(k)Hkq(k)q(k)THkk=k+1,返回步骤3
BFGS
H k + 1 B F G S = H k + ( 1 + q ( k ) T H k q ( k ) p ( k ) T q ( k ) ) p ( k ) p ( k ) T p ( k ) T q ( k ) − p ( k ) q ( k ) T H k + H k q ( k ) p ( k ) T p ( k ) T q ( k ) H_{k+1}^{BFGS}=H_{k}+(1+\frac{q^{(k)T}H_{k}q^{(k)}}{p^{(k)T}q^{(k)}})\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{p^{(k)}q^{(k)T}H_{k}+H_{k}q^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}} Hk+1BFGS=Hk+(1+p(k)Tq(k)q(k)THkq(k))p(k)Tq(k)p(k)p(k)T−p(k)Tq(k)p(k)q(k)THk+Hkq(k)p(k)T