高级优化理论与方法(四)

上节回顾

Fixed Stepsize

x k + 1 = x k − α ∇ f ( x k ) x^{k+1}=x^k-\alpha\nabla f(x^k) xk+1=xkαf(xk)

Steepest Decent

x k + 1 = x k − α k ∇ f ( x k ) x^{k+1}=x^k-\alpha^k \nabla f(x^k) xk+1=xkαkf(xk),where α k = a r g m i n f ( x k − α k ∇ f ( x k ) ) \alpha^k=argmin f(x^k-\alpha^k \nabla f(x^k)) αk=argminf(xkαkf(xk))

Gradient Method

Analysis

Theorem 3

Thm: Fixed Stepsize: x k → x ∗ x^k\rightarrow x^* xkx for any x 0 ⇔ 0 < α < 2 λ m a x ( Q ) x^0 \Leftrightarrow 0<\alpha <\frac{2}{\lambda_{max}(Q)} x00<α<λmax(Q)2

Pf: “ ⇐ \Leftarrow ”: Rayleigh’s Inequality: λ m i n ( Q ) g k T g k ≤ g k T Q g k ≤ λ m a x ( Q ) g k T g k \lambda_{min}(Q){g^k}^Tg^k\leq {g^k}^TQg^k\leq \lambda_{max}(Q){g^k}^Tg^k λmin(Q)gkTgkgkTQgkλmax(Q)gkTgk

g k T Q − 1 g k ≤ 1 λ m i n ( Q ) g k T g k {g^k}^TQ^{-1}g^k\leq \frac{1}{\lambda_{min}(Q)}{g^k}^Tg^k gkTQ1gkλmin(Q)1gkTgk

⇒ r k ≥ α λ m i n ( Q ) g k T g k λ m a x ( Q − 1 ) g k T g k ( 2 g k T g k λ m a x ( Q ) g k T g k − α ) = α λ m i n 2 ( Q ) ( 2 λ m a x ( Q ) − α ) ≥ C > 0 \Rightarrow r^k\geq \alpha\frac{\lambda_{min}(Q){g^k}^Tg^k}{\lambda_{max}(Q^{-1}){g^k}^Tg^k}(2\frac{{g^k}^Tg^k}{\lambda_{max}(Q){g^k}^Tg^k}-\alpha)=\alpha \lambda_{min}^2(Q)(\frac{2}{\lambda_{max}(Q)}-\alpha)\geq C>0 rkαλmax(Q1)gkTgkλmin(Q)gkTgk(2λmax(Q)gkTgkgkTgkα)=αλmin2(Q)(λmax(Q)2α)C>0

∑ k = 0 ∞ r k = ∞ \sum_{k=0}^{\infty} r^k=\infty k=0rk=

x k → x ∗ x^k \rightarrow x^* xkx

Pf: “ ⇐ \Leftarrow ”: Assume α < 0 \alpha<0 α<0 or α > 2 λ m a x ( Q ) \alpha>\frac{2}{\lambda_{max}(Q)} α>λmax(Q)2

Define x 0 x^0 x0 with x 0 − x ∗ x^0-x^* x0x is an eigenvector of Q corresponding to λ m a x ( Q ) \lambda_{max}(Q) λmax(Q)

x k + 1 = x k − α ( Q x k − b ) = x k − α ( Q x k − Q x ∗ ) x^{k+1}=x^k-\alpha(Qx^k-b)=x^k-\alpha(Qx^k-Qx^*) xk+1=xkα(Qxkb)=xkα(QxkQx)

→ x k + 1 − x ∗ = x k − x ∗ − α ( Q x k − Q x ∗ ) = ( I n − α Q ) ( x k − x ∗ ) = ( I n − α Q ) k + 1 ( x 0 − x ∗ ) = ( I n − α Q ) k ( I n − α Q ) ( x 0 − x ∗ ) = ( I n − α Q ) k ( x 0 − x ∗ − α Q ( x 0 − x ∗ ) ) = ( I n − α Q ) ( x 0 − x ∗ − α λ m a x ( Q ) ( x 0 − x ∗ ) ) = ( 1 − α λ m a x ( Q ) ) ( I n − α Q ) k ( x 0 − x ∗ ) = ( 1 − α λ m a x ( Q ) ) k + 1 ( x 0 − x ∗ ) \rightarrow x^{k+1}-x^*=x^k-x^*-\alpha(Qx^k-Qx^*)=(I_n-\alpha Q)(x^k-x^*)=(I_n-\alpha Q)^{k+1}(x^0-x^*)=(I_n-\alpha Q)^k(I_n-\alpha Q)(x^0-x^*)=(I_n-\alpha Q)^k(x^0-x^*-\alpha Q(x^0-x^*))=(I_n-\alpha Q)(x^0-x^*-\alpha \lambda_{max}(Q)(x^0-x^*))=(1-\alpha \lambda_{max}(Q))(I_n-\alpha Q)^k(x^0-x^*)=(1-\alpha \lambda_{max}(Q))^{k+1}(x^0-x^*) xk+1x=xkxα(QxkQx)=(InαQ)(xkx)=(InαQ)k+1(x0x)=(InαQ)k(InαQ)(x0x)=(InαQ)k(x0xαQ(x0x))=(InαQ)(x0xαλmax(Q)(x0x))=(1αλmax(Q))(InαQ)k(x0x)=(1αλmax(Q))k+1(x0x)

∣ ∣ x k + 1 − x ∗ ∣ ∣ = ∣ 1 − α λ m a x ( Q ) ∣ k + 1 ∣ ∣ x 0 − x ∗ ∣ ∣ ||x^{k+1}-x^*||=|1-\alpha \lambda_{max}(Q)|^{k+1}||x^0-x^*|| ∣∣xk+1x∣∣=∣1αλmax(Q)k+1∣∣x0x∣∣

∵ α > 0 \because \alpha>0 α>0 or α > 2 λ m a x ( Q ) \alpha>\frac{2}{\lambda_{max}(Q)} α>λmax(Q)2

∴ ∣ 1 − α λ m a x ( Q ) ∣ > 1 \therefore |1-\alpha \lambda_{max}(Q)|>1 ∣1αλmax(Q)>1

∴ x k \therefore x^k xk not converge to x ∗ x^* x

Order of convergence

Def: Given x k → x ∗ x^k\rightarrow x^* xkx, lim ⁡ k → ∞ ∣ ∣ x k − x ∗ ∣ ∣ = 0 \lim_{k\to\infty} ||x^k-x^*||=0 limk∣∣xkx∣∣=0. Order of convergence is P ∈ R P \in \mathbb{R} PR,if 0 < lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p = c < ∞ 0<\lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^p}=c<\infty 0<limk∣∣xkxp∣∣xk+1x∣∣=c<

order is ∞ \infty , if for all p > 0 p>0 p>0: lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p = 0 \lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^p}=0 limk∣∣xkxp∣∣xk+1x∣∣=0

sublinear: p = 1 p=1 p=1, lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p = 1 \lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^p}=1 limk∣∣xkxp∣∣xk+1x∣∣=1

linear: p = 1 p=1 p=1, lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p < 1 \lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^p}<1 limk∣∣xkxp∣∣xk+1x∣∣<1

superlinear: p = 1 p=1 p=1, lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p > 1 \lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^p}>1 limk∣∣xkxp∣∣xk+1x∣∣>1

注:对于二次函数, p = 2 p=2 p=2

Example 1

x k = 1 k x^k=\frac{1}{k} xk=k1

x k → 0 = x ∗ x^k\to 0=x^* xk0=x

1 k + 1 ( 1 k ) p = k p k + 1 \frac{\frac{1}{k+1}}{(\frac{1}{k})^p}=\frac{k^p}{k+1} (k1)pk+11=k+1kp

p < 1 p<1 p<1: lim ⁡ k → ∞ k p k + 1 = 0 \lim_{k\to\infty} \frac{k^p}{k+1}=0 limkk+1kp=0

p = 1 p=1 p=1: lim ⁡ k → ∞ k k + 1 = 1 \lim_{k\to\infty}\frac{k}{k+1}=1 limkk+1k=1

Example 2

x k = r k x^k=r^k xk=rk( 0 < r < 1 0<r<1 0<r<1)

x ∗ = 0 x^*=0 x=0

r k + 1 ( r k ) p = r k ( 1 − p ) + 1 \frac{r^{k+1}}{(r^k)^p}=r^{k(1-p)+1} (rk)prk+1=rk(1p)+1

p < 1 p<1 p<1: lim ⁡ k → ∞ r k ( 1 − p ) + 1 = 0 \lim_{k\to\infty} r^{k(1-p)+1}=0 limkrk(1p)+1=0

p = 1 p=1 p=1: lim ⁡ k → ∞ r k ( 1 − p ) + 1 = r < 1 \lim_{k\to\infty} r^{k(1-p)+1}=r<1 limkrk(1p)+1=r<1

Example 3

x k = r q k x^k=r^{q^k} xk=rqk, q > 1 q>1 q>1, 0 < r < 1 0<r<1 0<r<1

x ∗ = 0 x^*=0 x=0

r q k + 1 ( r q k ) p = r q k + 1 − p q k = r ( q − p ) q k \frac{r^{q^{k+1}}}{(r^{q^k})^p}=r^{q^{k+1}-pq^k}=r^{(q-p)q^k} (rqk)prqk+1=rqk+1pqk=r(qp)qk

p < q p<q p<q: lim ⁡ k → ∞ r ( q − p ) q k = 0 \lim_{k\to\infty}r^{(q-p)q^k}=0 limkr(qp)qk=0

p = q p=q p=q: lim ⁡ k → ∞ r ( q − p ) q k = 1 \lim_{k\to\infty}r^{(q-p)q^k}=1 limkr(qp)qk=1

Example 4

x k = 1 x^k=1 xk=1

x k → x 0 = 1 x^k\to x^0=1 xkx0=1

x k + 1 − 1 ( x k − 1 ) p = 0 \frac{x^{k+1}-1}{(x^k-1)^p}=0 (xk1)pxk+11=0, p = ∞ p=\infty p=

Theorem

∣ ∣ x k + 1 − x ∗ ∣ ∣ = O ( ∣ ∣ x k − x ∗ ∣ ∣ p ) ||x^{k+1}-x^*||=O(||x^k-x^*||^p) ∣∣xk+1x∣∣=O(∣∣xkxp)

For large k k k: ∃ c ∈ R \exist c \in \mathbb{R} cR: ∣ ∣ x k + 1 − x ∗ ∣ ∣ ≤ c ∣ ∣ x k − x ∗ ∣ ∣ p ||x^{k+1}-x^*||\leq c||x^{k}-x^*||^p ∣∣xk+1x∣∣c∣∣xkxp

Thm: x k − x ∗ x^k-x^* xkx If ∣ ∣ x k + 1 − x ∗ ∣ ∣ = O ( ∣ ∣ x k − x ∗ ∣ ∣ p ) ||x^{k+1}-x^*||=O(||x^k-x^*||^p) ∣∣xk+1x∣∣=O(∣∣xkxp), then the order of convergence is at least p p p.

Pf: For large k k k, ∃ c \exist c c: ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p ≤ c \frac{||x^{k+1}-x^*||}{||x^{k}-x^*||^p}\leq c ∣∣xkxp∣∣xk+1x∣∣c

∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ s = ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ p ∣ ∣ x k − x ∗ ∣ ∣ p − s ≤ c ∣ ∣ x k − x ∗ ∣ ∣ p − s \frac{||x^{k+1}-x^*||}{||x^{k}-x^*||^s}=\frac{||x^{k+1}-x^*||}{||x^{k}-x^*||^p}||x^{k}-x^*||^{p-s}\leq c||x^{k}-x^*||^{p-s} ∣∣xkxs∣∣xk+1x∣∣=∣∣xkxp∣∣xk+1x∣∣∣∣xkxpsc∣∣xkxps

If s s s is the order of convergence, lim ⁡ k → ∞ ∣ ∣ x k + 1 − x ∗ ∣ ∣ ∣ ∣ x k − x ∗ ∣ ∣ s > 0 \lim_{k\to\infty} \frac{||x^{k+1}-x^*||}{||x^k-x^*||^s}>0 limk∣∣xkxs∣∣xk+1x∣∣>0

⇒ c lim ⁡ k → ∞ ∣ ∣ x k − x ∗ ∣ ∣ p − s > 0 \Rightarrow c\lim_{k\to\infty} ||x^k-x^*||^{p-s}>0 climk∣∣xkxps>0

∵ lim ⁡ k → ∞ ∣ ∣ x k − x ∗ ∣ ∣ = 0 \because \lim_{k\to\infty} ||x^k-x^*||=0 limk∣∣xkx∣∣=0

∴ \therefore if p > s p>s p>s, c lim ⁡ k → ∞ ∣ ∣ x k − x ∗ ∣ ∣ p − s = 0 c\lim_{k\to\infty} ||x^k-x^*||^{p-s}=0 climk∣∣xkxps=0

⇒ s ≥ p \Rightarrow s \geq p sp

Theorem

Thm: Stepest Decent: the order of convergence ≥ 1 \geq 1 1

Pf: Q Q Q: λ m a x ( Q ) > λ m i n ( Q ) > 0 \lambda_{max}(Q)>\lambda_{min}(Q)>0 λmax(Q)>λmin(Q)>0

Suffices to prove: ∃ c , ∣ ∣ x k + 1 − x ∗ ∣ ∣ ≥ c ∣ ∣ x k − x ∗ ∣ ∣ \exist c, ||x^{k+1}-x^*||\geq c||x^k-x^*|| c,∣∣xk+1x∣∣c∣∣xkx∣∣

∵ V ( x k + 1 ) = 1 2 ( x k + 1 − x ∗ ) T Q ( x k + 1 − x ∗ ) ≤ λ m a x ( Q ) 2 ∣ ∣ x k + 1 − x ∗ ∣ ∣ 2 \because V(x^{k+1})=\frac{1}{2}(x^{k+1}-x^*)^TQ(x^{k+1}-x^*)\leq\frac{\lambda_{max}(Q)}{2}||x^{k+1}-x^*||^2 V(xk+1)=21(xk+1x)TQ(xk+1x)2λmax(Q)∣∣xk+1x2

V ( x ∗ ) ≥ λ m i n ( Q ) 2 ∣ ∣ x k − x ∗ ∣ ∣ 2 V(x^*)\geq\frac{\lambda_{min}(Q)}{2}||x^k-x^*||^2 V(x)2λmin(Q)∣∣xkx2

∣ ∣ x k + 1 − x ∗ ∣ ∣ ≥ ( 1 − r k ) λ m i n ( Q ) λ m a x ( Q ) ∣ ∣ x k − x ∗ ∣ ∣ ||x^{k+1}-x^*||\geq \sqrt{(1-r^k)\frac{\lambda_{min}(Q)}{\lambda_{max}(Q)}}||x^k-x^*|| ∣∣xk+1x∣∣(1rk)λmax(Q)λmin(Q) ∣∣xkx∣∣

To prove : r k < 1 ⇒ g k r^k<1\Rightarrow g^k rk<1gk is not eigenvectir of Q ⇔ r k < 1 Q\Leftrightarrow r^k<1 Qrk<1

Newton Method

f ∈ C 2 f\in C^2 fC2

x ∗ x^* x FONC ⇒ ∇ f ( x ∗ ) = 0 \Rightarrow \nabla f(x^*)=0 f(x)=0

x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) ⇒ x k + 1 = x k − [ F ( x k ) ] ′ ∇ f ( x ∗ ) x^{k+1}=x^k-\frac{f'(x^k)}{f''(x^k)}\Rightarrow x^{k+1}=x^k-[F(x^k)]'\nabla f(x^*) xk+1=xkf′′(xk)f(xk)xk+1=xk[F(xk)]f(x)

优缺点

优点

Pro: simple, convergen order

缺点

Con: F ( x k ) < 0 F(x^k)<0 F(xk)<0,

even if F ( x k ) > 0 F(x^k)>0 F(xk)>0, not decent.

Compute F − 1 ( x k ) F^{-1}(x^k) F1(xk)

Convergence Order

Example

f ( x ) = 1 2 x T Q x − b T x f(x)=\frac{1}{2}x^TQx-b^Tx f(x)=21xTQxbTx

∇ f ( x ) = Q x − b \nabla f(x)=Qx-b f(x)=Qxb

F ( x ) = Q F(x)=Q F(x)=Q

x 1 = x 0 − Q − 1 ( Q x 0 − b ) = x 0 − x 0 + Q − 1 b = Q − 1 b = x ∗ x^1=x^0-Q^{-1}(Qx^0-b)=x^0-x^0+Q^{-1}b=Q^{-1}b=x^* x1=x0Q1(Qx0b)=x0x0+Q1b=Q1b=x

Theorem

Thm: f ∈ C 3 f\in C^3 fC3, x ∗ x^* x: ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 f(x)=0 and F ( x ∗ ) F(x^*) F(x) inventible. Then, for all x 0 x^0 x0 sufficiently close to x ∗ x^* x, x ∗ x^* x converges to x ∗ x^* x with an order at least 2.

Pf: To prove: ∣ ∣ x k + 1 − x ∗ ∣ ∣ = O ( ∣ ∣ x k − x ∗ ∣ ∣ 2 ) ||x^{k+1}-x^*||=O(||x^k-x^*||^2) ∣∣xk+1x∣∣=O(∣∣xkx2)

∣ ∣ x 1 − x ∗ ∣ ∣ = ∣ ∣ x 0 − x ∗ − F − 1 ( x 0 ) ∇ f ( x ∗ ) ∣ ∣ = ∣ ∣ F − 1 ( x 0 ) ( F ( x 0 ) ( x 0 − x ∗ ) − ∇ f ( x 0 ) ) ∣ ∣ ≤ ∣ ∣ F − 1 ( x 0 ) ∣ ∣ ⋅ ∣ ∣ ( F ( x 0 ) ( x 0 − x ∗ ) − ∇ f ( x 0 ) ) ∣ ∣ ||x^{1}-x^*||=||x^0-x^*-F^{-1}(x^0)\nabla f(x^*)||=||F^{-1}(x^0)(F(x^0)(x^0-x^*)-\nabla f(x^0))||\leq ||F^{-1}(x^0)||\cdot||(F(x^0)(x^0-x^*)-\nabla f(x^0))|| ∣∣x1x∣∣=∣∣x0xF1(x0)f(x)∣∣=∣∣F1(x0)(F(x0)(x0x)f(x0))∣∣∣∣F1(x0)∣∣∣∣(F(x0)(x0x)f(x0))∣∣

∵ F ( x ∗ ) \because F(x^*) F(x) inventible, f ∈ C 3 f\in C^3 fC3, x 0 x^0 x0 sufficiently close to x ∗ x^* x, ∣ ∣ F − 1 ( x ∗ ) ∣ ∣ ||F^{-1}(x^*)|| ∣∣F1(x)∣∣ constant ⇒ ∣ ∣ F − 1 ( x 0 ) ∣ ∣ < c 2 \Rightarrow ||F^{-1}(x^0)||<c_2 ∣∣F1(x0)∣∣<c2 for some c 2 ∈ R c_2\in \mathbb{R} c2R

Taylor expansion of ∇ f ( x ) \nabla f(x) f(x): ∇ f ( x ) − ∇ f ( x ∗ ) = F ( x 0 ) ( x − x 0 ) + O ( ∣ ∣ x − x 0 ∣ ∣ 2 ) \nabla f(x)-\nabla f(x^*)=F(x^0)(x-x^0)+O(||x-x^0||^2) f(x)f(x)=F(x0)(xx0)+O(∣∣xx02)

∀ x , ∣ ∣ x − x ∗ ∣ ∣ < ϵ : ∣ ∣ ∇ f ( x ) − ∇ f ( x 0 ) − F ( x 0 ) ( x − x 0 ) ∣ ∣ ≤ c 1 ∣ ∣ x − x 0 ∣ ∣ 2 \forall x, ||x-x^*||<\epsilon: ||\nabla f(x)-\nabla f(x^0)-F(x^0)(x-x^0)||\leq c_1 ||x-x^0||^2 x,∣∣xx∣∣<ϵ:∣∣∇f(x)f(x0)F(x0)(xx0)∣∣c1∣∣xx02 for some c 1 ∈ R c_1\in \mathbb{R} c1R

If x ∗ ∈ { x : ∣ ∣ x − x ∗ ∣ ∣ < ϵ } x^*\in \{x:||x-x^*||<\epsilon\} x{x:∣∣xx∣∣<ϵ}: ∣ ∣ ∇ f ( x ∗ ) − ∇ f ( x 0 ) − F ( x 0 ) ( x ∗ − x 0 ) ∣ ∣ ≤ c 1 ∣ ∣ x ∗ − x 0 ∣ ∣ 2 ||\nabla f(x^*)-\nabla f(x^0)-F(x^0)(x^*-x^0)||\leq c_1 ||x^*-x^0||^2 ∣∣∇f(x)f(x0)F(x0)(xx0)∣∣c1∣∣xx02

∣ ∣ F ( x 0 ) ( x 0 − x ∗ ) − ∇ f ( x 0 ) ∣ ∣ ≤ c 1 ∣ ∣ x ∗ − x 0 ∣ ∣ 2 ||F(x^0)(x^0-x^*)-\nabla f(x^0)||\leq c_1||x^*-x^0||^2 ∣∣F(x0)(x0x)f(x0)∣∣c1∣∣xx02

∣ ∣ x 1 − x ∗ ∣ ∣ ≤ c 1 c 2 ∣ ∣ x 0 − x ∗ ∣ ∣ 2 ||x^1-x^*||\leq c_1c_2||x^0-x^*||^2 ∣∣x1x∣∣c1c2∣∣x0x2

∣ ∣ x k + 1 − x ∗ ∣ ∣ ≤ c 1 c 2 ∣ ∣ x k − x ∗ ∣ ∣ 2 ||x^{k+1}-x^*||\leq c_1c_2||x^k-x^*||^2 ∣∣xk+1x∣∣c1c2∣∣xkx2

Let 0 < x < 1 0<x<1 0<x<1, choose x 0 x^0 x0 satisfy ∣ ∣ x 0 − x ∗ ∣ ∣ ≤ α c 1 c 2 ||x^0-x^*||\leq\frac{\alpha}{c_1c_2} ∣∣x0x∣∣c1c2α

lim ⁡ k → ∞ ∣ ∣ x k − x ∗ ∣ ∣ ≤ α k ∣ ∣ x 0 − x ∗ ∣ ∣ = 0 \lim_{k\to\infty} ||x^k-x^*||\leq \alpha^k ||x^0-x^*||=0 limk∣∣xkx∣∣αk∣∣x0x∣∣=0

Theorem

Thm: x k x^k xk: Sequence generated by Newton’s Method. If F ( x k ) > 0 F(x^k)>0 F(xk)>0 and ∇ f ( x k ) ≠ 0 \nabla f(x^k)\neq0 f(xk)=0, then for d k = − F − 1 ( x k ) ∇ f ( x k ) d^k=-F^{-1}(x^k)\nabla f(x^k) dk=F1(xk)f(xk), there exsits an α ‾ > 0 \overline{\alpha}>0 α>0 s.t. ∀ α ∈ ( 0 , α ‾ ) : f ( x k + α d k ) < f ( x k ) \forall \alpha\in (0,\overline{\alpha}): f(x^k+\alpha d^k)<f(x^k) α(0,α):f(xk+αdk)<f(xk).

Pf: Let ϕ ( α ) = f ( x k + α d k ) \phi(\alpha)=f(x^k+\alpha d^k) ϕ(α)=f(xk+αdk)

ϕ ′ ( α ) = ∇ f ( x k + α d k ) T d k \phi'(\alpha)=\nabla f(x^k+\alpha d^k)^Td^k ϕ(α)=f(xk+αdk)Tdk

ϕ ′ ( 0 ) = ∇ f ( x k ) T d k = − ∇ f ( x k ) T F ( x k ) ∇ f ( x k ) \phi'(0)=\nabla f(x^k)^Td^k=-\nabla f(x^k)^TF(x^k)\nabla f(x^k) ϕ(0)=f(xk)Tdk=f(xk)TF(xk)f(xk)

∵ F ( x k ) > 0 , ∇ f ( x k ) ≠ 0 \because F(x^k)>0, \nabla f(x^k)\neq 0 F(xk)>0,f(xk)=0

∴ ϕ ′ ( 0 ) < 0 \therefore \phi'(0)<0 ϕ(0)<0

∴ ∃ α ‾ > 0 \therefore \exist \overline{\alpha}>0 α>0 s.t. ∀ α ∈ ( 0 , α ‾ ) : ϕ ( α ) < ϕ ( 0 ) \forall \alpha\in (0,\overline{\alpha}): \phi(\alpha)<\phi(0) α(0,α):ϕ(α)<ϕ(0)

∀ α ∈ ( 0 , α ‾ ) : f ( x k + α d k ) < f ( x k ) \forall \alpha\in (0,\overline{\alpha}): f(x^k+\alpha d^k)<f(x^k) α(0,α):f(xk+αdk)<f(xk)

Modification

x k + 1 = x k − α k F − 1 ( x k ) ∇ f ( x k ) x^{k+1}=x^k-\alpha^k F^{-1}(x^k)\nabla f(x^k) xk+1=xkαkF1(xk)f(xk), where α k = a r g m i n f ( x k − α F − 1 ( x k ) ∇ f ( x k ) \alpha^k=argmin f(x^k-\alpha F^{-1}(x^k)\nabla f(x^k) αk=argminf(xkαF1(xk)f(xk)

F ( x k ) F(x^k) F(xk) not positive definite.

Let λ 1 , λ 2 , ⋯   , λ n \lambda_1,\lambda_2,\cdots,\lambda_n λ1,λ2,,λn be the eigenvalues of F ( x k ) F(x^k) F(xk) corresponding to eigenvalues v 1 , v 2 , ⋯   , v n v_1,v_2,\cdots,v_n v1,v2,,vn

Consider G = F ( x k ) + μ I n , μ > 0 G=F(x^k)+\mu I_n,\mu>0 G=F(xk)+μIn,μ>0

G v i = ( F ( x k ) + μ I n ) v i = F ( x k ) v i + μ v i = λ i v i + μ v i = ( λ i + μ ) v i ⇒ v i Gv_i=(F(x^k)+\mu I_n)v_i=F(x^k)v_i+\mu v_i=\lambda_i v_i+\mu v_i=(\lambda_i+\mu)v_i\Rightarrow v_i Gvi=(F(xk)+μIn)vi=F(xk)vi+μvi=λivi+μvi=(λi+μ)vivi eigrnvector of G G G corresponding to λ i + μ \lambda_i+\mu λi+μ

Choose μ \mu μ large enough, s.t. λ i + μ > 0 \lambda_i+\mu>0 λi+μ>0 ∀ i ⇒ G \forall i\Rightarrow G iG positive definite

Modification: x k + 1 = x k − α k ( F ( x k ) + μ I n ) − 1 ∇ f ( x k ) x^{k+1}=x^k-\alpha^k(F(x^k)+\mu I_n)^{-1}\nabla f(x^k) xk+1=xkαk(F(xk)+μIn)1f(xk)

α k = a r g m i n f ( x k − α ( F ( x k ) + μ I n ) − 1 ∇ f ( x k ) ) \alpha^k=argmin f(x^k-\alpha(F(x^k)+\mu I_n)^{-1}\nabla f(x^k)) αk=argminf(xkα(F(xk)+μIn)1f(xk))

Conjugate Method

Def: Q Q Q: symmetric matrix from R n × n \mathbb{R}^{n\times n} Rn×n. d 0 , ⋯   , d m d_0,\cdots,d_m d0,,dm are Q-conjugate, if ∀ i ≠ j \forall i\neq j i=j d i T Q d j = 0 d_i^TQd_j=0 diTQdj=0

Orthogonal: x T y = 0 = Δ x T I n y x^Ty=0\stackrel{\Delta}{=}x^T I_n y xTy=0=ΔxTIny

Lemma

Lem: Q Q Q symmetric, positive definite. d 0 , ⋯   , d k d_0,\cdots,d_k d0,,dk: non-zero, Q-conjugate. Then d 0 , ⋯   , d k d_0,\cdots,d_k d0,,dk linearly independent.

d j T Q ( a 0 d 0 + ⋯ + a k d k ) = 0 d_j^TQ(a_0d_0+\cdots+a_kd_k)=0 djTQ(a0d0++akdk)=0

a j d j T Q d j = 0 a_jd_j^TQd_j=0 ajdjTQdj=0

∵ a > 0 , d j ≠ 0 \because a>0, d_j\neq0 a>0,dj=0

∴ a j = 0 \therefore a_j=0 aj=0

Conjugate Direction Algorithm

Input: f ( x ) = 1 2 x T Q x − b T x , x 0 , d 0 , ⋯   , d n − 1 f(x)=\frac{1}{2}x^TQx-b^Tx, x_0, d_0,\cdots,d_{n-1} f(x)=21xTQxbTx,x0,d0,,dn1: Q-conjugate

g k = ∇ f ( x k ) = Q x k − b g^k=\nabla f(x^k)=Qx^k-b gk=f(xk)=Qxkb

α k = − g k T d k d k T Q d k \alpha^k=-\frac{{g^k}^Td_k}{d_k^TQd_k} αk=dkTQdkgkTdk

x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd_k xk+1=xk+αkdk

Thm: For any x 0 x_0 x0, CDA converges to x ∗ x^* x in n n n steps.

Pf: ∵ d 0 , ⋯   , d n − 1 \because d_0,\cdots,d_{n-1} d0,,dn1 Q-conjugate

∴ d 0 , ⋯   , d n − 1 \therefore d_0,\cdots,d_{n-1} d0,,dn1 linearly independent

⇒ ∃ β 0 , ⋯   , β n − 1 : x ∗ − x 0 = β 0 d 0 + ⋯ + β n − 1 d n − 1 \Rightarrow \exist \beta_0,\cdots,\beta_{n-1}: x^*-x^0=\beta_0d_0+\cdots+\beta_{n-1}d_{n-1} β0,,βn1:xx0=β0d0++βn1dn1

⇒ d k T Q ( x ∗ − x 0 ) = β k d k T Q d k \Rightarrow d_k^TQ(x^*-x^0)=\beta_kd_k^TQd_k dkTQ(xx0)=βkdkTQdk

⇒ β k = d k T Q ( x ∗ − x 0 ) d k T Q d k = − d k T g k d k T Q d k = α k \Rightarrow \beta_k=\frac{d_k^TQ(x^*-x^0)}{d_k^TQd_k}=-\frac{d_k^Tg^k}{d_k^TQd_k}=\alpha^k βk=dkTQdkdkTQ(xx0)=dkTQdkdkTgk=αk

x k = x 0 + α 0 d 0 + ⋯ + α k − 1 d k − 1 x^k=x^0+\alpha^0d_0+\cdots+\alpha^{k-1}d_{k-1} xk=x0+α0d0++αk1dk1

d k T Q ( x ∗ − x 0 ) = d k T Q ( x ∗ − x k + x k − x 0 ) = d k T Q ( x ∗ − x k ) = d k T ( Q x ∗ − Q x k ) = d k T ( b − Q x k ) = − d k T g k d_k^TQ(x^*-x^0)=d_k^TQ(x^*-x^k+x^k-x^0)=d_k^TQ(x^*-x^k)=d_k^T(Qx^*-Qx^k)=d_k^T(b-Qx^k)=-d_k^Tg^k dkTQ(xx0)=dkTQ(xxk+xkx0)=dkTQ(xxk)=dkT(QxQxk)=dkT(bQxk)=dkTgk

Example

Q = [ 3 0 1 0 4 2 1 2 3 ] Q=\begin{bmatrix} 3&0&1 \\ 0&4&2\\ 1&2&3 \end{bmatrix} Q= 301042123

Compute d 0 , d 1 , d 2 d_0,d_1,d_2 d0,d1,d2
d 0 = [ 1 0 0 ] d_0=\begin{bmatrix} 1 \\ 0\\ 0 \end{bmatrix} d0= 100

d 0 T Q d 1 = [ 1 , 0 , 0 ] [ 3 0 1 0 4 2 1 2 3 ] [ d 1 1 d 1 2 d 1 3 ] = 3 d 1 1 + d 1 3 = 0 d_0^TQd_1=[1,0,0]\begin{bmatrix} 3&0&1 \\ 0&4&2\\ 1&2&3 \end{bmatrix}\begin{bmatrix} d_1^1 \\ d_1^2\\ d_1^3 \end{bmatrix}=3d_1^1+d_1^3=0 d0TQd1=[1,0,0] 301042123 d11d12d13 =3d11+d13=0
d 1 = [ 1 0 − 3 ] d_1=\begin{bmatrix} 1 \\ 0\\ -3 \end{bmatrix} d1= 103
注:此处展示了如何获取Q-conjugate的向量 d 0 d_0 d0 d 1 d_1 d1。先选取一个比较简单的 d 0 d_0 d0,然后代入 d 0 T Q d 1 = 0 d_0^TQd_1=0 d0TQd1=0,算出关于 d 1 d_1 d1的关系式 3 d 1 1 + d 1 3 = 0 3d_1^1+d_1^3=0 3d11+d13=0,然后选定 d 1 = [ 1 , 0 , − 3 ] T d_1=[1,0,-3]^T d1=[1,0,3]T
f ( x ) = 1 2 x T [ 4 2 2 2 ] x − [ − 1 , 1 ] x f(x)=\frac{1}{2}x^T\begin{bmatrix} 4&2 \\ 2&2 \end{bmatrix}x-[-1,1]x f(x)=21xT[4222]x[1,1]x
g ( x ) = [ 4 2 2 2 ] x − [ − 1 1 ] g(x)=\begin{bmatrix} 4&2 \\ 2&2 \end{bmatrix}x-\begin{bmatrix} -1 \\ 1 \end{bmatrix} g(x)=[4222]x[11]
x 0 = [ 0 0 ] x^0=\begin{bmatrix} 0 \\ 0 \end{bmatrix} x0=[00]
d 0 = [ 1 0 ] d_0=\begin{bmatrix} 1 \\ 0 \end{bmatrix} d0=[10]
d 1 = [ − 3 8 3 4 ] d_1=\begin{bmatrix} -\frac{3}{8} \\ \frac{3}{4} \end{bmatrix} d1=[8343]
g 0 = [ 1 − 1 ] g^0=\begin{bmatrix} 1 \\ -1 \end{bmatrix} g0=[11]
α 0 = − g 0 T d 0 d 0 T Q d 0 = − [ 1 , − 1 ] [ 1 0 ] [ 1 , 0 ] [ 4 2 2 2 ] [ 1 0 ] = − 1 4 \alpha^0=\frac{-{g^0}^Td_0}{d_0^TQd_0}=\frac{-[1,-1]\begin{bmatrix} 1 \\ 0 \end{bmatrix}}{[1,0]\begin{bmatrix} 4&2 \\ 2&2 \end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix}}=-\frac{1}{4} α0=d0TQd0g0Td0=[1,0][4222][10][1,1][10]=41
x 1 = x 0 + α 0 d 0 = [ − 1 4 0 ] x^1=x^0+\alpha^0d_0=\begin{bmatrix} -\frac{1}{4} \\ 0 \end{bmatrix} x1=x0+α0d0=[410]
g 1 = [ 0 − 2 3 ] g^1=\begin{bmatrix} 0 \\ -\frac{2}{3} \end{bmatrix} g1=[032]
α 1 = 2 \alpha^1=2 α1=2
x 2 = [ − 1 3 2 ] x^2=\begin{bmatrix} -1 \\ \frac{3}{2} \end{bmatrix} x2=[123]
f ( x 2 ) = 0 f(x^2)=0 f(x2)=0
注:二次函数在 n n n次迭代后必取到最值。

总结

本节课首先延续上节课的梯度方法,做了一些理论上的分析。然后提出了收敛速度的概念,从而可以进一步比较各个方法的收敛速度。最后介绍了牛顿法和共轭方向法。

  • 5
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值