高级优化理论与方法(五)

Conjugate Direction Algorithm

IN: x 0 , d 0 , ⋯   , d n − 1 x_0,d_0,\cdots, d_{n-1} x0,d0,,dn1, Q-conjugate
Property: x k + 1 = x k + α k d k , α k = − g k T d k d k T Q d k x^{k+1}=x^k+\alpha^kd_k, \alpha^k=-\frac{{g^k}^Td_k}{{d_k}^TQd_k} xk+1=xk+αkdk,αk=dkTQdkgkTdk

Lemma 1

In the conjugate direction algorithm, g k + 1 T d i = 0 , ∀ 0 ≤ k ≤ n − 1 , ∀ 0 ≤ i ≤ k {g^{k+1}}^Td_i=0, \forall 0\leq k\leq n-1, \forall 0\leq i\leq k gk+1Tdi=0,∀0kn1,∀0ik

Lemma 2

Lemma: f ( x k + 1 ) = m i n f ( x 0 + ∑ i = 0 k α i d i ) f(x^{k+1})=min f(x^0+\sum_{i=0}^k \alpha^i d_i) f(xk+1)=minf(x0+i=0kαidi)

Pf: Let D = [ d 0 , ⋯   , d k − 1 ] D=[d_0,\cdots, d_{k-1}] D=[d0,,dk1] be an nxk-matrix,

x ( α ) = x 0 + D k α , α ∈ R k x(\alpha)=x^0+D^k\alpha,\alpha \in \mathbb{R}^k x(α)=x0+Dkα,αRk

ϕ k ( α ) = f ( x ( α ) ) = f ( x 0 + D k α ) \phi^k(\alpha)=f(x(\alpha))=f(x^0+D^k\alpha) ϕk(α)=f(x(α))=f(x0+Dkα)

D ϕ k ( α ) = ∇ f ( x 0 + D k α ) T D k D\phi^k(\alpha)=\nabla f(x^0+D^k\alpha)^TD^k Dϕk(α)=f(x0+Dkα)TDk

Let α ‾ = [ α 0 , α 1 , ⋯   , α k − 1 ] , α i \overline{\alpha}=[\alpha^0,\alpha^1,\cdots,\alpha^{k-1}], \alpha^i α=[α0,α1,,αk1],αi generated by algorithm.

D ϕ k ( α ‾ ) = ∇ f ( x 0 + D k α ‾ ) T D k = ∇ f ( x k ) D k = g k T D k D\phi^k(\overline{\alpha})=\nabla f(x^0+D^k\overline{\alpha})^TD^k=\nabla f(x^k)D^k={g^k}^TD^k Dϕk(α)=f(x0+Dkα)TDk=f(xk)Dk=gkTDk

∵ \because Property① ⇒ g k T d i = 0 , ∀ i ≤ k − 1 \Rightarrow {g^k}^Td_i=0, \forall i\leq k-1 gkTdi=0,ik1

∴ D ϕ k ( α ‾ ) = 0 \therefore D\phi^k(\overline{\alpha})=0 Dϕk(α)=0, satisfies FONC

Conjugate Gradient Algorithm

IN: f ( x ) = 1 2 x T Q x − b T x , x 0 f(x)=\frac{1}{2}x^TQx-b^Tx, x^0 f(x)=21xTQxbTx,x0

  1. k = 0 k=0 k=0
  2. compute g 0 = ∇ f ( x 0 ) g^0=\nabla f(x^0) g0=f(x0), If g 0 = 0 g^0=0 g0=0, then stop; else d 0 = − g 0 d_0=-g^0 d0=g0
  3. α k = − g k d k d k T Q d k \alpha^k=-\frac{g^kd_k}{{d_k}^TQd_k} αk=dkTQdkgkdk
  4. x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd_k xk+1=xk+αkdk
  5. g k + 1 = ∇ f ( x k + 1 ) g^{k+1}=\nabla f(x^{k+1}) gk+1=f(xk+1), If g k + 1 = 0 g^{k+1}=0 gk+1=0 then stop;
  6. β k = g k + 1 T Q d k d k T Q d k \beta^k=\frac{{g^{k+1}}^TQd_k}{{d_k}^TQd_k} βk=dkTQdkgk+1TQdk
  7. d k + 1 = − g k + 1 + β k d k d^{k+1}=-g^{k+1}+\beta^kd_k dk+1=gk+1+βkdk
  8. k k k++, goto 3

Theorem

Thm: d 0 , ⋯   , d n − 1 d_0,\cdots, d_{n-1} d0,,dn1 computed in conjugate Gradient Algorithm are Q-conjugate.

Pf:
n = 2 : n=2: n=2:

d 0 T Q d 1 = d 0 T Q ( − g 1 + β 0 d 0 ) = d 0 T Q ( − g 1 + g 1 T Q d 0 d 0 T Q d 0 d 0 ) = 0 {d_0}^TQd_1=d_0^TQ(-g^1+\beta^0d_0)={d_0}^TQ(-g^1+\frac{{g^1}^TQd_0}{{d_0}^TQd_0}d_0)=0 d0TQd1=d0TQ(g1+β0d0)=d0TQ(g1+d0TQd0g1TQd0d0)=0

n = k + 1 : n=k+1: n=k+1:

To prove d k + 1 T Q d j = 0 , ∀ j ≤ k {d_{k+1}}^TQd_j=0, \forall j\leq k dk+1TQdj=0,jk

(Known: d k T Q d j = 0 , ∀ j ≤ k − 1 {d_k}^TQd_j=0, \forall j\leq k-1 dkTQdj=0,jk1)

d k + 1 T Q d j = ( − g k + 1 + β k d k ) Q d j = − g k + 1 Q d j + β k d k Q d j = − g k + 1 Q d j {d_{k+1}}^TQd_j=(-g^{k+1}+\beta^kd_k)Qd_j=-g^{k+1}Qd_j+\beta^kd_kQd_j=-g^{k+1}Qd_j dk+1TQdj=(gk+1+βkdk)Qdj=gk+1Qdj+βkdkQdj=gk+1Qdj

∵ g j + 1 = Q x j + 1 − b = Q x j + α j Q d j − b = g j + α j Q d j \because g^{j+1}=Qx^{j+1}-b=Qx^j+\alpha^jQd_j-b=g^j+\alpha^jQd_j gj+1=Qxj+1b=Qxj+αjQdjb=gj+αjQdj

∴ d k + 1 T Q d j = − g k + 1 g j + 1 − g j α j , j ≤ k − 1 \therefore {d_{k+1}}^TQd_j=-g^{k+1}\frac{g^{j+1}-g^j}{\alpha^j},j\leq k-1 dk+1TQdj=gk+1αjgj+1gj,jk1

∴ ∀ j ≤ k : d j = − g j + β j − 1 d j − 1 \therefore \forall j \leq k: d_j=-g^j+\beta^{j-1}d_{j-1} jk:dj=gj+βj1dj1

∴ 0 = g k + 1 T d j = − g k + 1 T g j + β j − 1 g k + 1 T d j − 1 = 0 \therefore 0={g^{k+1}}^Td_j=-{g^{k+1}}^Tg^j+\beta^{j-1}{g^{k+1}}^Td_{j-1}=0 0=gk+1Tdj=gk+1Tgj+βj1gk+1Tdj1=0

∴ g k + 1 T g j = 0 , ∀ j ≤ k \therefore {g^{k+1}}^Tg^j=0, \forall j\leq k gk+1Tgj=0,jk

∴ d k + 1 T Q d j = 0 , ∀ j ≤ k − 1 \therefore {d_{k+1}}^TQd_j=0, \forall j\leq k-1 dk+1TQdj=0,jk1

d k + 1 T Q d k = 0 {d_{k+1}}^TQd_k=0 dk+1TQdk=0 (The same as n=2)

Example

f ( x ) = 3 2 x 1 2 + 2 x 2 2 + 3 2 x 3 2 + x 1 x 2 + 2 x 2 x 3 − 3 x 1 − x 3 f(x)=\frac{3}{2} x_1^2+2x_2^2+\frac{3}{2}x_3^2+x_1x_2+2x_2x_3-3x_1-x_3 f(x)=23x12+2x22+23x32+x1x2+2x2x33x1x3

Q = [ 3 0 1 0 4 2 1 2 3 ] Q=\begin{bmatrix} 3&0&1 \\ 0&4&2\\ 1&2&3 \end{bmatrix} Q= 301042123
b = [ 3 0 1 ] b=\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix} b= 301
x 0 = [ 0 0 0 ] x^0=\begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix} x0= 000
g 0 = Q x 0 − b = [ − 3 0 1 ] g^0=Qx^0-b=\begin{bmatrix} -3 \\ 0\\ 1 \end{bmatrix} g0=Qx0b= 301
d 0 = [ 3 0 1 ] d_0=\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix} d0= 301
α 0 = − [ − 3 , 0 , − 1 ] [ 3 0 1 ] [ 3 , 0 , 1 ] Q [ 3 0 1 ] = 10 36 = 0.2778 \alpha^0=-\frac{[-3,0,-1]\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}}{[3,0,1]Q\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}}=\frac{10}{36}=0.2778 α0=[3,0,1]Q 301 [3,0,1] 301 =3610=0.2778
x 1 = x 0 + α 0 d 0 = [ 0.8373 0 0.2778 ] x^1=x^0+\alpha^0d_0=\begin{bmatrix} 0.8373\\ 0\\ 0.2778 \end{bmatrix} x1=x0+α0d0= 0.837300.2778
⋯ \cdots
x 3 = [ 1 0 0 ] x^3=\begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} x3= 100

Non-Quadratic function f

Problems: Computation of α k , β k \alpha^k, \beta^k αk,βk
solutions: α k = a r g m i n x > 0 f ( x k + α d k ) → \alpha^k=argmin_{x>0} f(x^k+\alpha d_k)\rightarrow αk=argminx>0f(xk+αdk) one-dimentional search

β k \beta^k βk

Hestens-Stiefel-formula:

x k + 1 = x k + α k d k ⇒ Q x k + 1 − b = Q x k − b + α k Q d k ⇒ g k + 1 − g k = α k Q d k ⇒ Q d k = g k + 1 − g k α k x^{k+1}=x^k+\alpha^kd_k\Rightarrow Qx^{k+1}-b=Qx^k-b+\alpha^kQd_k\Rightarrow g^{k+1}-g^k=\alpha^kQd_k\Rightarrow Qd_k=\frac{g^{k+1}-g^k}{\alpha^k} xk+1=xk+αkdkQxk+1b=Qxkb+αkQdkgk+1gk=αkQdkQdk=αkgk+1gk

β k = g k T Q d k d k T Q d k = g k T ( g k + 1 − g k ) d k T ( g k + 1 − g k ) \beta^k=\frac{{g^k}^TQd_k}{{d_k}^TQd_k}=\frac{{g^k}^T(g^{k+1}-g^k)}{{d_k}^T(g^{k+1}-g^k)} βk=dkTQdkgkTQdk=dkT(gk+1gk)gkT(gk+1gk)

Polak-Ribi e ˋ \grave{e} eˋre-formula:

∵ g k T d k = − g k T g k + g k T β k − 1 d k − 1 = − g k T g k \because {g^k}^Td_k=-{g^k}^Tg^k+{g_k}^T\beta^{k-1}d_{k-1}=-{g^k}^Tg^k gkTdk=gkTgk+gkTβk1dk1=gkTgk

∴ β k = g k + 1 T ( g k + 1 − g k ) g k T g k \therefore \beta^k=\frac{{g^{k+1}}^T(g^{k+1}-g^k)}{{g^k}^Tg^k} βk=gkTgkgk+1T(gk+1gk)

Fletcher-Reeves-Formula

∵ 0 = g k + 1 T d k = − g k + 1 T g k + β k g k + 1 T d k − 1 ⇒ g k + 1 T g k = 0 \because 0={g^{k+1}}^Td_k=-{g^{k+1}}^Tg^k+\beta^k{g^{k+1}}^Td_{k-1}\Rightarrow {g^{k+1}}^Tg^k=0 0=gk+1Tdk=gk+1Tgk+βkgk+1Tdk1gk+1Tgk=0

∴ β k = g k + 1 T g k + 1 g k T g k \therefore \beta^k=\frac{{g^{k+1}}^T{g^{k+1}}}{{g^{k}}^Tg^k} βk=gkTgkgk+1Tgk+1

Quasi-Newton Methods

牛顿法回顾

x k + 1 = x k − F ( x k ) − 1 ∇ f ( x k ) , f ∈ C 3 x^{k+1}=x^k-{F(x^k)}^{-1}\nabla f(x^k), f\in C^3 xk+1=xkF(xk)1f(xk),fC3

牛顿法优点:简单,适用性广,收敛速度快

Problems

not descent, even if F ( x k ) − 1 > 0 ⇒ x k + 1 = x k − α k F ( x k ) − 1 ∇ f ( x k ) {F(x^k)}^{-1}>0\Rightarrow x^{k+1}=x^k-\alpha^k{F(x^k)}^{-1}\nabla f(x^k) F(xk)1>0xk+1=xkαkF(xk)1f(xk), where α k = a r g m i n α > 0 f ( x k − α k F ( x k ) − 1 ∇ f ( x k ) ) \alpha^k=argmin_{\alpha>0}f(x^k-\alpha^k{F(x^k)}^{-1}\nabla f(x^k)) αk=argminα>0f(xkαkF(xk)1f(xk))

not positive definite: ⇒ G = ( F ( x k ) − 1 + μ I n ) \Rightarrow G=({F(x^k)}^{-1}+\mu I_n) G=(F(xk)1+μIn)

computation of F ( x k ) − 1 ∇ f ( x k ) {F(x^k)}^{-1}\nabla f(x^k) F(xk)1f(xk)

Solution:

Construct H k H^k Hk(real-valued, positive definite, summetric)

g k = ∇ f ( x k ) g^k=\nabla f(x^k) gk=f(xk)

d k = − H k g k d^k=-H^kg^k dk=Hkgk

α k = a r g m i n f ( x k + α d k ) \alpha^k=argmin f(x^k+\alpha d^k) αk=argminf(xk+αdk)

x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k xk+1=xk+αkdk

Theorem

If g k ≠ 0 g^k\neq0 gk=0 and H k H^k Hk: nxn-metric(symmetric, positive definite), then f ( x k + 1 ) < f ( x k ) f(x^{k+1})<f(x^k) f(xk+1)<f(xk)

Theorem

Quadratic functions: f ( x ) = 1 2 x T Q x − b T x f(x)=\frac{1}{2}x^TQx-b^Tx f(x)=21xTQxbTx

Thm: Applying Quasi-Newton Method to a quadratic function with Q = Q T Q=Q^T Q=QT s.t. H k + 1 Δ g i = Δ x i , ∀ 0 ≤ i ≤ k H^{k+1}\Delta g_i=\Delta x^i, \forall0\leq i\leq k Hk+1Δgi=Δxi,∀0ik, where Δ g i = g i + 1 − g i , Δ x i = x i + 1 − x i \Delta g^i=g^{i+1}-g^i, \Delta x^i=x^{i+1}-x^i Δgi=gi+1gi,Δxi=xi+1xi, if α i ≠ 0 \alpha^i\neq 0 αi=0 for all 0 ≤ i ≤ k 0\leq i\leq k 0ik, then d 0 , ⋯   , d k + 1 d^0,\cdots,d^{k+1} d0,,dk+1 are Q-conjugate.

Corollary

Applying Quasi-Newton to quadratic functions, n n n step converges.

computation of H

Rank-One-Correction

H k + 1 = H k + a k z k z k T , a k ∈ R , z k ∈ R H^{k+1}=H^k+a^kz^k{z^k}^T, a^k\in \mathbb{R},z^k\in \mathbb{R} Hk+1=Hk+akzkzkT,akR,zkR

z k z k T = [ z 1 k ⋯ z n k ] × [ z 1 k , ⋯   , z n k ] = [ z 1 k z 1 k ⋯ z 1 k z n k ⋯ ⋯ z n k z 1 k ⋯ z n k z n k ] z^k{z^k}^T=\begin{bmatrix} z_1^k\\ \cdots\\ z_n^k \end{bmatrix}\times[z_1^k,\cdots,z_n^k]=\begin{bmatrix} z_1^kz_1^k&\cdots & z_1^kz_n^k\\ \cdots& & \cdots\\ z_n^kz_1^k & \cdots & z_n^kz_n^k \end{bmatrix} zkzkT= z1kznk ×[z1k,,znk]= z1kz1kznkz1kz1kznkznkznk
r a n k ( z k z k T ) = 1 rank(z^k{z^k}^T)=1 rank(zkzkT)=1

H k + 1 H^{k+1} Hk+1 as a function of H k , Δ g k , Δ x k H^k, \Delta g^k, \Delta x^k Hk,Δgk,Δxk

H k + 1 = H k + ( Δ x k − H k Δ g k ) ( Δ x k − H k Δ g k ) T Δ g k T ( Δ x k − H k Δ g k ) H^{k+1}=H^k+\frac{(\Delta x^k-H^k\Delta g^k)(\Delta x^k-H^k\Delta g^k)^T}{\Delta {g^k}^T(\Delta x^k-H^k\Delta g^k)} Hk+1=Hk+ΔgkT(ΔxkHkΔgk)(ΔxkHkΔgk)(ΔxkHkΔgk)T

Rank-One-Correction Algorithm

IN: x 0 , H 0 x^0, H^0 x0,H0

  1. k : = 0 k:=0 k:=0
  2. If g k = 0 g^k=0 gk=0, then stop; else d k = − H k g k d^k=-H^kg^k dk=Hkgk
  3. compute α k = a r g m i n f ( x k + α d k ) , x k + 1 = x k + α k d k \alpha^k=argmin f(x^k+\alpha d^k), x^{k+1}=x^k+\alpha^kd^k αk=argminf(xk+αdk),xk+1=xk+αkdk
  4. compute Δ x k = α k d k , Δ g k = g k + 1 − g k , H k + 1 = H k + ( Δ x k − H k Δ g k ) ( Δ x k − H k Δ g k ) T Δ g k T ( Δ x k − H k Δ g k ) \Delta x^k=\alpha^kd^k, \Delta g^k=g^{k+1}-g^k,H^{k+1}=H^k+\frac{(\Delta x^k-H^k\Delta g^k)(\Delta x^k-H^k\Delta g^k)^T}{\Delta {g^k}^T(\Delta x^k-H^k\Delta g^k)} Δxk=αkdk,Δgk=gk+1gk,Hk+1=Hk+ΔgkT(ΔxkHkΔgk)(ΔxkHkΔgk)(ΔxkHkΔgk)T
  5. k k k++,goto 2
Theorem

Apply Rank-One to a quadratic function, with Q = Q T Q=Q^T Q=QT, we have H k + 1 Δ g i = Δ x i , ∀ i ≤ k H^{k+1}\Delta g^i=\Delta x^i, \forall i\leq k Hk+1Δgi=Δxi,ik

Problems

H k H^k Hk positive definite?
Δ g k T ( Δ x k T − H k Δ g k ) \Delta {g^k}^T(\Delta {x^k}^T-H^k\Delta {g^k}) ΔgkT(ΔxkTHkΔgk) too small?

DFP algorithm

H k + 1 = H k + Δ x k Δ x k T Δ x k T Δ g k − ( H k Δ g k ) ( H k Δ g k ) T Δ g k T H k Δ g k H^{k+1}=H^k+\frac{\Delta {x^k}\Delta {x^k}^T}{\Delta {x^k}^T\Delta {g^k}}-\frac{(H^k\Delta {g^k})(H^k\Delta {g^k})^T}{\Delta {g^k}^TH^k\Delta {g^k}} Hk+1=Hk+ΔxkTΔgkΔxkΔxkTΔgkTHkΔgk(HkΔgk)(HkΔgk)T

Theorem

Apply to quadratic functions: H k + 1 Δ g i = Δ x i , ∀ i ≤ k H^{k+1}\Delta g^i=\Delta x^i, \forall i\leq k Hk+1Δgi=Δxi,ik

总结

为了使整体篇幅显得简洁,从这节课开始,逐渐删去一些繁琐的证明。这节课从上节课提到的共轭方向法讲起,又介绍了共轭梯度法。牛顿法存在一些缺陷,为了改进,提出了拟牛顿法。关于拟牛顿法中的 H H H如何计算,又有一些不同的方法。这节课介绍了秩为1的修正方法,以及DFP算法。

  • 21
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值