Conjugate Direction Algorithm
IN:
x
0
,
d
0
,
⋯
,
d
n
−
1
x_0,d_0,\cdots, d_{n-1}
x0,d0,⋯,dn−1, Q-conjugate
Property:
x
k
+
1
=
x
k
+
α
k
d
k
,
α
k
=
−
g
k
T
d
k
d
k
T
Q
d
k
x^{k+1}=x^k+\alpha^kd_k, \alpha^k=-\frac{{g^k}^Td_k}{{d_k}^TQd_k}
xk+1=xk+αkdk,αk=−dkTQdkgkTdk
Lemma 1
In the conjugate direction algorithm, g k + 1 T d i = 0 , ∀ 0 ≤ k ≤ n − 1 , ∀ 0 ≤ i ≤ k {g^{k+1}}^Td_i=0, \forall 0\leq k\leq n-1, \forall 0\leq i\leq k gk+1Tdi=0,∀0≤k≤n−1,∀0≤i≤k
Lemma 2
Lemma: f ( x k + 1 ) = m i n f ( x 0 + ∑ i = 0 k α i d i ) f(x^{k+1})=min f(x^0+\sum_{i=0}^k \alpha^i d_i) f(xk+1)=minf(x0+∑i=0kαidi)
Pf: Let D = [ d 0 , ⋯ , d k − 1 ] D=[d_0,\cdots, d_{k-1}] D=[d0,⋯,dk−1] be an nxk-matrix,
x ( α ) = x 0 + D k α , α ∈ R k x(\alpha)=x^0+D^k\alpha,\alpha \in \mathbb{R}^k x(α)=x0+Dkα,α∈Rk
ϕ k ( α ) = f ( x ( α ) ) = f ( x 0 + D k α ) \phi^k(\alpha)=f(x(\alpha))=f(x^0+D^k\alpha) ϕk(α)=f(x(α))=f(x0+Dkα)
D ϕ k ( α ) = ∇ f ( x 0 + D k α ) T D k D\phi^k(\alpha)=\nabla f(x^0+D^k\alpha)^TD^k Dϕk(α)=∇f(x0+Dkα)TDk
Let α ‾ = [ α 0 , α 1 , ⋯ , α k − 1 ] , α i \overline{\alpha}=[\alpha^0,\alpha^1,\cdots,\alpha^{k-1}], \alpha^i α=[α0,α1,⋯,αk−1],αi generated by algorithm.
D ϕ k ( α ‾ ) = ∇ f ( x 0 + D k α ‾ ) T D k = ∇ f ( x k ) D k = g k T D k D\phi^k(\overline{\alpha})=\nabla f(x^0+D^k\overline{\alpha})^TD^k=\nabla f(x^k)D^k={g^k}^TD^k Dϕk(α)=∇f(x0+Dkα)TDk=∇f(xk)Dk=gkTDk
∵ \because ∵ Property① ⇒ g k T d i = 0 , ∀ i ≤ k − 1 \Rightarrow {g^k}^Td_i=0, \forall i\leq k-1 ⇒gkTdi=0,∀i≤k−1
∴ D ϕ k ( α ‾ ) = 0 \therefore D\phi^k(\overline{\alpha})=0 ∴Dϕk(α)=0, satisfies FONC
Conjugate Gradient Algorithm
IN: f ( x ) = 1 2 x T Q x − b T x , x 0 f(x)=\frac{1}{2}x^TQx-b^Tx, x^0 f(x)=21xTQx−bTx,x0
- k = 0 k=0 k=0
- compute g 0 = ∇ f ( x 0 ) g^0=\nabla f(x^0) g0=∇f(x0), If g 0 = 0 g^0=0 g0=0, then stop; else d 0 = − g 0 d_0=-g^0 d0=−g0
- α k = − g k d k d k T Q d k \alpha^k=-\frac{g^kd_k}{{d_k}^TQd_k} αk=−dkTQdkgkdk
- x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd_k xk+1=xk+αkdk
- g k + 1 = ∇ f ( x k + 1 ) g^{k+1}=\nabla f(x^{k+1}) gk+1=∇f(xk+1), If g k + 1 = 0 g^{k+1}=0 gk+1=0 then stop;
- β k = g k + 1 T Q d k d k T Q d k \beta^k=\frac{{g^{k+1}}^TQd_k}{{d_k}^TQd_k} βk=dkTQdkgk+1TQdk
- d k + 1 = − g k + 1 + β k d k d^{k+1}=-g^{k+1}+\beta^kd_k dk+1=−gk+1+βkdk
- k k k++, goto 3
Theorem
Thm: d 0 , ⋯ , d n − 1 d_0,\cdots, d_{n-1} d0,⋯,dn−1 computed in conjugate Gradient Algorithm are Q-conjugate.
Pf:
n
=
2
:
n=2:
n=2:
d 0 T Q d 1 = d 0 T Q ( − g 1 + β 0 d 0 ) = d 0 T Q ( − g 1 + g 1 T Q d 0 d 0 T Q d 0 d 0 ) = 0 {d_0}^TQd_1=d_0^TQ(-g^1+\beta^0d_0)={d_0}^TQ(-g^1+\frac{{g^1}^TQd_0}{{d_0}^TQd_0}d_0)=0 d0TQd1=d0TQ(−g1+β0d0)=d0TQ(−g1+d0TQd0g1TQd0d0)=0
n = k + 1 : n=k+1: n=k+1:
To prove d k + 1 T Q d j = 0 , ∀ j ≤ k {d_{k+1}}^TQd_j=0, \forall j\leq k dk+1TQdj=0,∀j≤k
(Known: d k T Q d j = 0 , ∀ j ≤ k − 1 {d_k}^TQd_j=0, \forall j\leq k-1 dkTQdj=0,∀j≤k−1)
d k + 1 T Q d j = ( − g k + 1 + β k d k ) Q d j = − g k + 1 Q d j + β k d k Q d j = − g k + 1 Q d j {d_{k+1}}^TQd_j=(-g^{k+1}+\beta^kd_k)Qd_j=-g^{k+1}Qd_j+\beta^kd_kQd_j=-g^{k+1}Qd_j dk+1TQdj=(−gk+1+βkdk)Qdj=−gk+1Qdj+βkdkQdj=−gk+1Qdj
∵ g j + 1 = Q x j + 1 − b = Q x j + α j Q d j − b = g j + α j Q d j \because g^{j+1}=Qx^{j+1}-b=Qx^j+\alpha^jQd_j-b=g^j+\alpha^jQd_j ∵gj+1=Qxj+1−b=Qxj+αjQdj−b=gj+αjQdj
∴ d k + 1 T Q d j = − g k + 1 g j + 1 − g j α j , j ≤ k − 1 \therefore {d_{k+1}}^TQd_j=-g^{k+1}\frac{g^{j+1}-g^j}{\alpha^j},j\leq k-1 ∴dk+1TQdj=−gk+1αjgj+1−gj,j≤k−1
∴ ∀ j ≤ k : d j = − g j + β j − 1 d j − 1 \therefore \forall j \leq k: d_j=-g^j+\beta^{j-1}d_{j-1} ∴∀j≤k:dj=−gj+βj−1dj−1
∴ 0 = g k + 1 T d j = − g k + 1 T g j + β j − 1 g k + 1 T d j − 1 = 0 \therefore 0={g^{k+1}}^Td_j=-{g^{k+1}}^Tg^j+\beta^{j-1}{g^{k+1}}^Td_{j-1}=0 ∴0=gk+1Tdj=−gk+1Tgj+βj−1gk+1Tdj−1=0
∴ g k + 1 T g j = 0 , ∀ j ≤ k \therefore {g^{k+1}}^Tg^j=0, \forall j\leq k ∴gk+1Tgj=0,∀j≤k
∴ d k + 1 T Q d j = 0 , ∀ j ≤ k − 1 \therefore {d_{k+1}}^TQd_j=0, \forall j\leq k-1 ∴dk+1TQdj=0,∀j≤k−1
d k + 1 T Q d k = 0 {d_{k+1}}^TQd_k=0 dk+1TQdk=0 (The same as n=2)
Example
f ( x ) = 3 2 x 1 2 + 2 x 2 2 + 3 2 x 3 2 + x 1 x 2 + 2 x 2 x 3 − 3 x 1 − x 3 f(x)=\frac{3}{2} x_1^2+2x_2^2+\frac{3}{2}x_3^2+x_1x_2+2x_2x_3-3x_1-x_3 f(x)=23x12+2x22+23x32+x1x2+2x2x3−3x1−x3
Q
=
[
3
0
1
0
4
2
1
2
3
]
Q=\begin{bmatrix} 3&0&1 \\ 0&4&2\\ 1&2&3 \end{bmatrix}
Q=
301042123
b
=
[
3
0
1
]
b=\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}
b=
301
x
0
=
[
0
0
0
]
x^0=\begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}
x0=
000
g
0
=
Q
x
0
−
b
=
[
−
3
0
1
]
g^0=Qx^0-b=\begin{bmatrix} -3 \\ 0\\ 1 \end{bmatrix}
g0=Qx0−b=
−301
d
0
=
[
3
0
1
]
d_0=\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}
d0=
301
α
0
=
−
[
−
3
,
0
,
−
1
]
[
3
0
1
]
[
3
,
0
,
1
]
Q
[
3
0
1
]
=
10
36
=
0.2778
\alpha^0=-\frac{[-3,0,-1]\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}}{[3,0,1]Q\begin{bmatrix} 3\\ 0\\ 1 \end{bmatrix}}=\frac{10}{36}=0.2778
α0=−[3,0,1]Q
301
[−3,0,−1]
301
=3610=0.2778
x
1
=
x
0
+
α
0
d
0
=
[
0.8373
0
0.2778
]
x^1=x^0+\alpha^0d_0=\begin{bmatrix} 0.8373\\ 0\\ 0.2778 \end{bmatrix}
x1=x0+α0d0=
0.837300.2778
⋯
\cdots
⋯
x
3
=
[
1
0
0
]
x^3=\begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix}
x3=
100
Non-Quadratic function f
Problems: Computation of
α
k
,
β
k
\alpha^k, \beta^k
αk,βk
solutions:
α
k
=
a
r
g
m
i
n
x
>
0
f
(
x
k
+
α
d
k
)
→
\alpha^k=argmin_{x>0} f(x^k+\alpha d_k)\rightarrow
αk=argminx>0f(xk+αdk)→ one-dimentional search
β k \beta^k βk
Hestens-Stiefel-formula:
x k + 1 = x k + α k d k ⇒ Q x k + 1 − b = Q x k − b + α k Q d k ⇒ g k + 1 − g k = α k Q d k ⇒ Q d k = g k + 1 − g k α k x^{k+1}=x^k+\alpha^kd_k\Rightarrow Qx^{k+1}-b=Qx^k-b+\alpha^kQd_k\Rightarrow g^{k+1}-g^k=\alpha^kQd_k\Rightarrow Qd_k=\frac{g^{k+1}-g^k}{\alpha^k} xk+1=xk+αkdk⇒Qxk+1−b=Qxk−b+αkQdk⇒gk+1−gk=αkQdk⇒Qdk=αkgk+1−gk
β k = g k T Q d k d k T Q d k = g k T ( g k + 1 − g k ) d k T ( g k + 1 − g k ) \beta^k=\frac{{g^k}^TQd_k}{{d_k}^TQd_k}=\frac{{g^k}^T(g^{k+1}-g^k)}{{d_k}^T(g^{k+1}-g^k)} βk=dkTQdkgkTQdk=dkT(gk+1−gk)gkT(gk+1−gk)
Polak-Ribi e ˋ \grave{e} eˋre-formula:
∵ g k T d k = − g k T g k + g k T β k − 1 d k − 1 = − g k T g k \because {g^k}^Td_k=-{g^k}^Tg^k+{g_k}^T\beta^{k-1}d_{k-1}=-{g^k}^Tg^k ∵gkTdk=−gkTgk+gkTβk−1dk−1=−gkTgk
∴ β k = g k + 1 T ( g k + 1 − g k ) g k T g k \therefore \beta^k=\frac{{g^{k+1}}^T(g^{k+1}-g^k)}{{g^k}^Tg^k} ∴βk=gkTgkgk+1T(gk+1−gk)
Fletcher-Reeves-Formula
∵ 0 = g k + 1 T d k = − g k + 1 T g k + β k g k + 1 T d k − 1 ⇒ g k + 1 T g k = 0 \because 0={g^{k+1}}^Td_k=-{g^{k+1}}^Tg^k+\beta^k{g^{k+1}}^Td_{k-1}\Rightarrow {g^{k+1}}^Tg^k=0 ∵0=gk+1Tdk=−gk+1Tgk+βkgk+1Tdk−1⇒gk+1Tgk=0
∴ β k = g k + 1 T g k + 1 g k T g k \therefore \beta^k=\frac{{g^{k+1}}^T{g^{k+1}}}{{g^{k}}^Tg^k} ∴βk=gkTgkgk+1Tgk+1
Quasi-Newton Methods
牛顿法回顾
x k + 1 = x k − F ( x k ) − 1 ∇ f ( x k ) , f ∈ C 3 x^{k+1}=x^k-{F(x^k)}^{-1}\nabla f(x^k), f\in C^3 xk+1=xk−F(xk)−1∇f(xk),f∈C3
牛顿法优点:简单,适用性广,收敛速度快
Problems
not descent, even if F ( x k ) − 1 > 0 ⇒ x k + 1 = x k − α k F ( x k ) − 1 ∇ f ( x k ) {F(x^k)}^{-1}>0\Rightarrow x^{k+1}=x^k-\alpha^k{F(x^k)}^{-1}\nabla f(x^k) F(xk)−1>0⇒xk+1=xk−αkF(xk)−1∇f(xk), where α k = a r g m i n α > 0 f ( x k − α k F ( x k ) − 1 ∇ f ( x k ) ) \alpha^k=argmin_{\alpha>0}f(x^k-\alpha^k{F(x^k)}^{-1}\nabla f(x^k)) αk=argminα>0f(xk−αkF(xk)−1∇f(xk))
not positive definite: ⇒ G = ( F ( x k ) − 1 + μ I n ) \Rightarrow G=({F(x^k)}^{-1}+\mu I_n) ⇒G=(F(xk)−1+μIn)
computation of F ( x k ) − 1 ∇ f ( x k ) {F(x^k)}^{-1}\nabla f(x^k) F(xk)−1∇f(xk)
Solution:
Construct H k H^k Hk(real-valued, positive definite, summetric)
g k = ∇ f ( x k ) g^k=\nabla f(x^k) gk=∇f(xk)
d k = − H k g k d^k=-H^kg^k dk=−Hkgk
α k = a r g m i n f ( x k + α d k ) \alpha^k=argmin f(x^k+\alpha d^k) αk=argminf(xk+αdk)
x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k xk+1=xk+αkdk
Theorem
If g k ≠ 0 g^k\neq0 gk=0 and H k H^k Hk: nxn-metric(symmetric, positive definite), then f ( x k + 1 ) < f ( x k ) f(x^{k+1})<f(x^k) f(xk+1)<f(xk)
Theorem
Quadratic functions: f ( x ) = 1 2 x T Q x − b T x f(x)=\frac{1}{2}x^TQx-b^Tx f(x)=21xTQx−bTx
Thm: Applying Quasi-Newton Method to a quadratic function with Q = Q T Q=Q^T Q=QT s.t. H k + 1 Δ g i = Δ x i , ∀ 0 ≤ i ≤ k H^{k+1}\Delta g_i=\Delta x^i, \forall0\leq i\leq k Hk+1Δgi=Δxi,∀0≤i≤k, where Δ g i = g i + 1 − g i , Δ x i = x i + 1 − x i \Delta g^i=g^{i+1}-g^i, \Delta x^i=x^{i+1}-x^i Δgi=gi+1−gi,Δxi=xi+1−xi, if α i ≠ 0 \alpha^i\neq 0 αi=0 for all 0 ≤ i ≤ k 0\leq i\leq k 0≤i≤k, then d 0 , ⋯ , d k + 1 d^0,\cdots,d^{k+1} d0,⋯,dk+1 are Q-conjugate.
Corollary
Applying Quasi-Newton to quadratic functions, n n n step converges.
computation of H
Rank-One-Correction
H k + 1 = H k + a k z k z k T , a k ∈ R , z k ∈ R H^{k+1}=H^k+a^kz^k{z^k}^T, a^k\in \mathbb{R},z^k\in \mathbb{R} Hk+1=Hk+akzkzkT,ak∈R,zk∈R
z
k
z
k
T
=
[
z
1
k
⋯
z
n
k
]
×
[
z
1
k
,
⋯
,
z
n
k
]
=
[
z
1
k
z
1
k
⋯
z
1
k
z
n
k
⋯
⋯
z
n
k
z
1
k
⋯
z
n
k
z
n
k
]
z^k{z^k}^T=\begin{bmatrix} z_1^k\\ \cdots\\ z_n^k \end{bmatrix}\times[z_1^k,\cdots,z_n^k]=\begin{bmatrix} z_1^kz_1^k&\cdots & z_1^kz_n^k\\ \cdots& & \cdots\\ z_n^kz_1^k & \cdots & z_n^kz_n^k \end{bmatrix}
zkzkT=
z1k⋯znk
×[z1k,⋯,znk]=
z1kz1k⋯znkz1k⋯⋯z1kznk⋯znkznk
r
a
n
k
(
z
k
z
k
T
)
=
1
rank(z^k{z^k}^T)=1
rank(zkzkT)=1
H k + 1 H^{k+1} Hk+1 as a function of H k , Δ g k , Δ x k H^k, \Delta g^k, \Delta x^k Hk,Δgk,Δxk
H k + 1 = H k + ( Δ x k − H k Δ g k ) ( Δ x k − H k Δ g k ) T Δ g k T ( Δ x k − H k Δ g k ) H^{k+1}=H^k+\frac{(\Delta x^k-H^k\Delta g^k)(\Delta x^k-H^k\Delta g^k)^T}{\Delta {g^k}^T(\Delta x^k-H^k\Delta g^k)} Hk+1=Hk+ΔgkT(Δxk−HkΔgk)(Δxk−HkΔgk)(Δxk−HkΔgk)T
Rank-One-Correction Algorithm
IN: x 0 , H 0 x^0, H^0 x0,H0
- k : = 0 k:=0 k:=0
- If g k = 0 g^k=0 gk=0, then stop; else d k = − H k g k d^k=-H^kg^k dk=−Hkgk
- compute α k = a r g m i n f ( x k + α d k ) , x k + 1 = x k + α k d k \alpha^k=argmin f(x^k+\alpha d^k), x^{k+1}=x^k+\alpha^kd^k αk=argminf(xk+αdk),xk+1=xk+αkdk
- compute Δ x k = α k d k , Δ g k = g k + 1 − g k , H k + 1 = H k + ( Δ x k − H k Δ g k ) ( Δ x k − H k Δ g k ) T Δ g k T ( Δ x k − H k Δ g k ) \Delta x^k=\alpha^kd^k, \Delta g^k=g^{k+1}-g^k,H^{k+1}=H^k+\frac{(\Delta x^k-H^k\Delta g^k)(\Delta x^k-H^k\Delta g^k)^T}{\Delta {g^k}^T(\Delta x^k-H^k\Delta g^k)} Δxk=αkdk,Δgk=gk+1−gk,Hk+1=Hk+ΔgkT(Δxk−HkΔgk)(Δxk−HkΔgk)(Δxk−HkΔgk)T
- k k k++,goto 2
Theorem
Apply Rank-One to a quadratic function, with Q = Q T Q=Q^T Q=QT, we have H k + 1 Δ g i = Δ x i , ∀ i ≤ k H^{k+1}\Delta g^i=\Delta x^i, \forall i\leq k Hk+1Δgi=Δxi,∀i≤k
Problems
H
k
H^k
Hk positive definite?
Δ
g
k
T
(
Δ
x
k
T
−
H
k
Δ
g
k
)
\Delta {g^k}^T(\Delta {x^k}^T-H^k\Delta {g^k})
ΔgkT(ΔxkT−HkΔgk) too small?
DFP algorithm
H k + 1 = H k + Δ x k Δ x k T Δ x k T Δ g k − ( H k Δ g k ) ( H k Δ g k ) T Δ g k T H k Δ g k H^{k+1}=H^k+\frac{\Delta {x^k}\Delta {x^k}^T}{\Delta {x^k}^T\Delta {g^k}}-\frac{(H^k\Delta {g^k})(H^k\Delta {g^k})^T}{\Delta {g^k}^TH^k\Delta {g^k}} Hk+1=Hk+ΔxkTΔgkΔxkΔxkT−ΔgkTHkΔgk(HkΔgk)(HkΔgk)T
Theorem
Apply to quadratic functions: H k + 1 Δ g i = Δ x i , ∀ i ≤ k H^{k+1}\Delta g^i=\Delta x^i, \forall i\leq k Hk+1Δgi=Δxi,∀i≤k
总结
为了使整体篇幅显得简洁,从这节课开始,逐渐删去一些繁琐的证明。这节课从上节课提到的共轭方向法讲起,又介绍了共轭梯度法。牛顿法存在一些缺陷,为了改进,提出了拟牛顿法。关于拟牛顿法中的 H H H如何计算,又有一些不同的方法。这节课介绍了秩为1的修正方法,以及DFP算法。