高级优化理论与方法(三)

Gradient Method

方法推导

Fixed-Stepsize Gradient

x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^k d^k xk+1=xk+αkdk
x 0 ∈ R n x^0\in \mathbb{R}^n x0Rn
α k ∈ R \alpha^k \in \mathbb{R} αkR
d k ∈ R n d^k \in \mathbb{R}^n dkRn

m a x ∂ f ( x ) ∂ d = ∇ f ( x ) T ⋅ d max \frac{\partial f(x)}{\partial d}=\nabla f(x)^T\cdot d maxdf(x)=f(x)Td

f ( x k + 1 ) = f ( x k − α k d k ) f(x^{k+1})=f(x^k-\alpha^k d^k) f(xk+1)=f(xkαkdk)

Cauchy-Schwartz inequality, ∇ f ( x ) T ⋅ d = < ∇ f ( x ) , d > ≤ ∣ ∣ ∇ f ( x ) ∣ ∣ ⋅ ∣ ∣ d ∣ ∣ \nabla f(x)^T\cdot d=<\nabla f(x),d> \leq ||\nabla f(x)||\cdot ||d|| f(x)Td=<f(x),d>≤∣∣∇f(x)∣∣∣∣d∣∣

< ∇ f ( x ) , d > = ∣ ∣ ∇ f ( x ) ∣ ∣ ⋅ ∣ ∣ d ∣ ∣ ⇔ d = ∇ f ( x ) <\nabla f(x),d> = ||\nabla f(x)||\cdot ||d||\Leftrightarrow d=\nabla f(x) <f(x),d>=∣∣∇f(x)∣∣∣∣d∣∣d=f(x)

W.l.o.g, ∣ ∣ d ∣ ∣ = 1 ||d||=1 ∣∣d∣∣=1

d = ∇ f ( x ) ∣ ∣ ∇ f ( x ) ∣ ∣ d=\frac{\nabla f(x)}{||\nabla f(x)||} d=∣∣∇f(x)∣∣f(x)

< ∇ f ( x ) , ∇ f ( x ) ∣ ∣ ∇ f ( x ) ∣ ∣ > = ∣ ∣ ∇ f ( x ) ∣ ∣ <\nabla f(x),\frac{\nabla f(x)}{||\nabla f(x)||}>=||\nabla f(x)|| <f(x),∣∣∇f(x)∣∣f(x)>=∣∣∇f(x)∣∣

Fixed-Stepsize Gradient: x k + 1 = x k − α ∇ f ( x ) ∣ ∣ ∇ f ( x ) ∣ ∣ x^{k+1}=x^k-\alpha \frac{\nabla f(x)}{||\nabla f(x)||} xk+1=xkα∣∣∇f(x)∣∣f(x)

Steepest Descent Method

x k + 1 = x k − α k ∇ f ( x k ) x^{k+1}=x^k-\alpha^k \nabla f(x^k) xk+1=xkαkf(xk), where α k = a r g m i n f ( x k − α k ∇ f ( x k ) ) \alpha^k=argmin f(x^k-\alpha^k \nabla f(x^k)) αk=argminf(xkαkf(xk))

其实可以把这个求最值问题看成一个复合函数,从而变成一维向量空间的优化问题,用上一节课的内容解决(比如牛顿法)。转化后的复合函数如下:

x k ( α ) = x k − α ∇ f ( x k ) x^k(\alpha)=x^k-\alpha \nabla f(x^k) xk(α)=xkαf(xk)

ϕ k ( α ) = f ( x k ( α ) ) \phi^k(\alpha)=f(x^k(\alpha)) ϕk(α)=f(xk(α))

Stop Criteria

∇ f ( x k ) = 0 ⇒ \nabla f(x^k)=0\Rightarrow f(xk)=0 FONC

∣ ∣ x k + 1 − x k ∣ ∣ < ϵ ||x^{k+1}-x^k||<\epsilon ∣∣xk+1xk∣∣<ϵ

∣ ∣ x k + 1 − x k ∣ ∣ ∣ ∣ x k ∣ ∣ < ϵ \frac{||x^{k+1}-x^k||}{||x^k||}<\epsilon ∣∣xk∣∣∣∣xk+1xk∣∣<ϵ

∣ ∣ f ( x k + 1 ) − f ( x k ) ∣ ∣ < ϵ ||f(x^{k+1})-f(x^k)||<\epsilon ∣∣f(xk+1)f(xk)∣∣<ϵ

f ( x k + 1 ) − f ( x k ) f ( x k ) < ϵ \frac{f(x^{k+1})-f(x^k)}{f(x^k)}<\epsilon f(xk)f(xk+1)f(xk)<ϵ

注:上述都是一些常见的停止原则,可根据实际需要选取。

方法的正确性证明

Good direction

Claim: − ∇ f ( x k ) -\nabla f(x^k) f(xk) is a “good” direction: f ( x k + 1 ) < f ( x k ) f(x^{k+1})<f(x^k) f(xk+1)<f(xk)

Pf: f ( x k + 1 ) = f ( x k − α ∇ f ( x k ) ) = f ( x k ) − α ∣ ∣ ∇ f ( x k ) ∣ ∣ 2 + o ( α ) f(x^{k+1})=f(x^k-\alpha \nabla f(x^k))=f(x^k)-\alpha || \nabla f(x^k)||^2+o(\alpha) f(xk+1)=f(xkαf(xk))=f(xk)α∣∣∇f(xk)2+o(α)

If ∇ f ( x k ) ≠ 0 \nabla f(x^k)\neq0 f(xk)=0, then f ( x k + 1 ) − f ( x k ) < 0 f(x^{k+1})-f(x^k)<0 f(xk+1)f(xk)<0 for small α \alpha α

注:此处的证明针对的是Fixed-Stepsize Gradient

zig-zag

Claim: < x k + 2 − x k + 1 , x k + 1 − x k > = 0 <x^{k+2}-x^{k+1},x^{k+1}-x^k>=0 <xk+2xk+1,xk+1xk>=0

Pf: < x k + 2 − x k + 1 , x k + 1 − x k > = < α k + 1 ∇ f ( x k + 1 ) , α k ∇ f ( x k ) > = α k + 1 ⋅ α k ⋅ ∇ f ( x k + 1 ) T ⋅ ∇ f ( x k ) <x^{k+2}-x^{k+1},x^{k+1}-x^k>=<\alpha^{k+1} \nabla f(x^{k+1}),\alpha^k \nabla f(x^k)>=\alpha^{k+1}\cdot \alpha^k\cdot \nabla f(x^{k+1})^T\cdot \nabla f(x^k) <xk+2xk+1,xk+1xk>=<αk+1f(xk+1),αkf(xk)>=αk+1αkf(xk+1)Tf(xk)

∵ α k \because \alpha^k αk minimizes, ϕ k ( α ) = f ( x k ( α ) ) , x k ( α ) = x k − α ∇ f ( x k ) \phi^k(\alpha)=f(x^k(\alpha)),x^k(\alpha)=x^k-\alpha \nabla f(x^k) ϕk(α)=f(xk(α)),xk(α)=xkαf(xk)

∴ \therefore FONC: 0 = ϕ k ′ ( α k ) = ∇ f ( x k − α k ∇ f ( x k ) ) T ( − ∇ f ( x k ) ) = − ∇ f ( x k + 1 ) T ⋅ ∇ f ( x k ) 0=\phi^{k'}(\alpha^k)=\nabla f(x^k-\alpha^k \nabla f(x^k))^T(-\nabla f(x^k))=-\nabla f(x^{k+1})^T\cdot \nabla f(x^k) 0=ϕk(αk)=f(xkαkf(xk))T(f(xk))=f(xk+1)Tf(xk)

Descent

Claim: If ∇ f ( x k ) ≠ 0 \nabla f(x^k)\neq0 f(xk)=0, then f ( x k + 1 ) < f ( x k ) f(x^{k+1})<f(x^k) f(xk+1)<f(xk)

Pf: ∵ α k \because \alpha^k αk minimizes, f ( x k − α k ∇ f ( x k ) ) = ϕ k ( α ) f(x^k-\alpha^k \nabla f(x^k))=\phi^k(\alpha) f(xkαkf(xk))=ϕk(α)

∴ ∀ α ≥ 0 : ϕ k ( α k ) ≤ ϕ k ( α ) \therefore \forall \alpha \geq 0:\phi^k(\alpha^k) \leq \phi^k(\alpha) α0:ϕk(αk)ϕk(α)

ϕ k ′ ( 0 ) = − ∣ ∣ ∇ f ( x k ) ∣ ∣ 2 < 0 ⇒ ∃ α ‾ > 0 : ∀ α ∈ ( 0 , α ‾ ] , ϕ k ( α ) < ϕ k ( 0 ) \phi^{k'}(0)=-||\nabla f(x^k)||^2<0 \Rightarrow \exist \overline{\alpha}>0: \forall \alpha \in(0,\overline{\alpha}], \phi^k(\alpha)<\phi^k(0) ϕk(0)=∣∣∇f(xk)2<0α>0:α(0,α],ϕk(α)<ϕk(0)

f ( x k + 1 ) = ϕ k ( α k ) ≤ ϕ k ( α ) < ϕ k ( 0 ) = f ( x k ) f(x^{k+1})=\phi^k(\alpha^k)\leq \phi^k(\alpha)<\phi^k(0)=f(x^k) f(xk+1)=ϕk(αk)ϕk(α)<ϕk(0)=f(xk)

注:此处的证明针对的是Steepest Descent Method

quadratic function

f ( x ) = 1 2 x T Q x − b x f(x)=\frac{1}{2} x^TQx-bx f(x)=21xTQxbx
x ∈ R n , Q ∈ R n × n , b ∈ R n x\in \mathbb{R}^n,Q \in \mathbb{R}^{n\times n},b\in \mathbb{R}^n xRn,QRn×n,bRn
Q Q Q: symmetric, positive definite

∇ f ( x ) = 1 2 ( x T Q + Q x ) − b = 1 2 ( Q x + Q x ) − b = Q x − b \nabla f(x)=\frac{1}{2}(x^TQ+Qx)-b=\frac{1}{2}(Qx+Qx)-b=Qx-b f(x)=21(xTQ+Qx)b=21(Qx+Qx)b=Qxb

F ( x ) = Q F(x)=Q F(x)=Q

Lemma: α k = g k T ⋅ g k g k T Q g k \alpha^k =\frac{{g^k}^T\cdot g^k}{{g^k}^TQg^k} αk=gkTQgkgkTgk, where g k = ∇ f ( x k ) g^k=\nabla f(x^k) gk=f(xk)

Pf: Assume g k ≠ 0 g^k\neq0 gk=0 (If g k = 0 g^k=0 gk=0, then x k x^k xk strictly local min)

ϕ k ( α ) = f ( x k − α g k ) \phi^k(\alpha)=f(x^k-\alpha g^k) ϕk(α)=f(xkαgk)

ϕ k ′ ( α ) = ( Q ( x k − α g k ) − b ) T ⋅ ( − g k ) = ( x k − α g k ) T Q ( − g k ) + b T g k \phi^{k'}(\alpha)=(Q(x^k-\alpha g^k)-b)^T\cdot(-g^k)=(x^k-\alpha g^k)^TQ(-g^k)+b^Tg^k ϕk(α)=(Q(xkαgk)b)T(gk)=(xkαgk)TQ(gk)+bTgk

∵ α k > 0 \because \alpha^k>0 αk>0 minimizes

∴ ϕ k ′ ( α k ) = 0 \therefore \phi^{k'}(\alpha^k)=0 ϕk(αk)=0

∴ α k g k T Q g k = ( x k T Q − b T ) g k \therefore \alpha^k{g^k}^TQg^k=({x^k}^TQ-b^T)g^k αkgkTQgk=(xkTQbT)gk

∵ g k = ∇ f ( x k ) = Q x k − b \because g^k=\nabla f(x^k)=Qx^k-b gk=f(xk)=Qxkb

∴ α k = g k T ⋅ g k g k T Q g k \therefore \alpha^k=\frac{{g^k}^T\cdot g^k}{{g^k}^TQg^k} αk=gkTQgkgkTgk

注:这里给出的结论是最速下降法在二次函数时的特殊情况。

Example 1

f ( x ) = x 1 2 + x 2 2 ⇒ f ( x ) = 1 2 x T Q x − b x f(x)=x_1^2+x_2^2 \Rightarrow f(x)=\frac{1}{2} x^TQx-bx f(x)=x12+x22f(x)=21xTQxbx
Q = [ 2 0 0 2 ] Q=\begin{bmatrix} 2&0 \\ 0&2 \end{bmatrix} Q=[2002]
b = [ 0 0 ] b=\begin{bmatrix} 0 \\ 0 \end{bmatrix} b=[00]
∇ f ( x ) = [ 2 x 1 2 x 2 ] \nabla f(x)=\begin{bmatrix} 2x_1 \\ 2x_2 \end{bmatrix} f(x)=[2x12x2]
F ( x ) = [ 2 0 0 2 ] F(x)=\begin{bmatrix} 2&0 \\ 0&2 \end{bmatrix} F(x)=[2002]
x 0 = [ 1 1 ] x^0=\begin{bmatrix} 1 \\ 1 \end{bmatrix} x0=[11]
x k + 1 = x k − α k g k x^{k+1}=x^k-\alpha^kg^k xk+1=xkαkgk
x 1 = x 0 − α 0 g 0 x^1=x^0-\alpha^0g^0 x1=x0α0g0
g 0 = ∇ f ( x 0 ) = [ 2 2 ] g^0=\nabla f(x^0)=\begin{bmatrix} 2 \\ 2 \end{bmatrix} g0=f(x0)=[22]
α 0 = g 0 T g 0 g 0 T Q g 0 = 8 [ 4 , 4 ] ⋅ [ 2 , 2 ] T = 8 16 = 1 2 \alpha^0=\frac{{g^0}^Tg^0}{{g^0}^TQg^0}=\frac{8}{[4,4]\cdot [2,2]^T}=\frac{8}{16}=\frac{1}{2} α0=g0TQg0g0Tg0=[4,4][2,2]T8=168=21

x 1 = [ 1 , 1 ] T − 1 2 [ 2 , 2 ] T = [ 0 , 0 ] T x^1=[1,1]^T-\frac{1}{2}[2,2]^T=[0,0]^T x1=[1,1]T21[2,2]T=[0,0]T

g 1 = ∇ f ( x 1 ) = [ 0 0 ] g^1=\nabla f(x^1)=\begin{bmatrix} 0 \\ 0 \end{bmatrix} g1=f(x1)=[00]

Example 2

f ( x ) = 1 5 x 1 2 + x 2 2 f(x)=\frac{1}{5} x_1^2+x_2^2 f(x)=51x12+x22
∇ f ( x ) = [ 2 5 x 1 2 x 2 ] \nabla f(x)=\begin{bmatrix} \frac{2}{5}x_1 \\ 2x_2 \end{bmatrix} f(x)=[52x12x2]
b = [ 0 0 ] b=\begin{bmatrix} 0 \\ 0 \end{bmatrix} b=[00]
x 0 = [ 1 1 ] x^0=\begin{bmatrix} 1 \\ 1 \end{bmatrix} x0=[11]
g 0 = [ 2 5 2 ] g^0=\begin{bmatrix} \frac{2}{5} \\ 2 \end{bmatrix} g0=[522]
α 0 = g 0 T g 0 g 0 T Q g 0 = 4.16 8.064 = 0.51587 \alpha^0=\frac{{g^0}^Tg^0}{{g^0}^TQg^0}=\frac{4.16}{8.064}=0.51587 α0=g0TQg0g0Tg0=8.0644.16=0.51587

x 1 = [ 1 , 1 ] T − 0.51587 ⋅ [ 2 5 , 2 ] = [ 0.79365 , − 0.031746 ] T x^1=[1,1]^T-0.51587\cdot [\frac{2}{5},2]=[0.79365,-0.031746]^T x1=[1,1]T0.51587[52,2]=[0.79365,0.031746]T

注:最速下降法在各个领域都有着非常广泛的应用,也是非常重要的一个优化方法。

Analysis

对于这个方法,我们需要考虑该方法能不能停下来(收敛性),因为什么停下来(正确性),什么时候停下来(算法的效率)。

由于该方法适用性太广,所以我们从二次函数入手,来做分析。

f ( x ) = 1 2 x T Q x − b x f(x)=\frac{1}{2}x^TQx-bx f(x)=21xTQxbx

x ∗ x^* x为最小值点,于是 ∇ f ( x ∗ ) = 0 ⇒ Q x ∗ − b = 0 \nabla f(x^*)=0\Rightarrow Qx^*-b=0 f(x)=0Qxb=0

V ( x ) = f ( x ) + 1 2 x ∗ T Q x ∗ V(x)=f(x)+\frac{1}{2}{x^*}^TQx^* V(x)=f(x)+21xTQx

V ( x ) = 1 2 x T Q x − b x + 1 2 x ∗ T Q x ∗ = 1 2 x T Q x − x ∗ T Q x + 1 2 x ∗ T Q x ∗ = 1 2 x T Q ( x − x ∗ ) + 1 2 x ∗ T Q ( x − x ∗ ) = 1 2 ( x ∗ − x ) T Q ( x ∗ − x ) ⇒ V ( x ∗ ) = 0 V(x)=\frac{1}{2}x^TQx-bx+\frac{1}{2}{x^*}^TQx^*=\frac{1}{2}x^TQx-{x^*}^TQx+\frac{1}{2}{x^*}^TQ{x^*}=\frac{1}{2}x^TQ(x-x^*)+\frac{1}{2}{x^*}^TQ(x-x^*)=\frac{1}{2}(x^*-x)^TQ(x^*-x) \Rightarrow V(x^*)=0 V(x)=21xTQxbx+21xTQx=21xTQxxTQx+21xTQx=21xTQ(xx)+21xTQ(xx)=21(xx)TQ(xx)V(x)=0

注:该等式利用了 x ∗ T Q x = x T Q x ∗ {x^*}^TQx=x^TQx^* xTQx=xTQx

x k + 1 = x k − α k g k , g k = Q x k − b x^{k+1}=x^k-\alpha^k g^k, g^k=Qx^k-b xk+1=xkαkgk,gk=Qxkb

Lemma

Lemma: V ( x k + 1 ) = ( 1 − r k ) V ( x ∗ ) V(x^{k+1})=(1-r^k)V(x^*) V(xk+1)=(1rk)V(x), where r k = 1 r^k=1 rk=1, if g ( k ) = 0 g(k)=0 g(k)=0

If g ( k ) ≠ 0 g(k)\neq0 g(k)=0, then r k = α k g k T Q g k g k T Q − 1 g k ( 2 g k T ⋅ g k g k T Q g k − α k ) r^k=\alpha^k \frac{{g^k}^TQg^k}{{g^k}^TQ^{-1}g^k}(2\frac{{g^k}^T\cdot g^k}{{g^k}^TQg^k}-\alpha^k) rk=αkgkTQ1gkgkTQgk(2gkTQgkgkTgkαk)

Pf: Let y k = x k − x ∗ y^k=x^k-x^* yk=xkx

V ( x k ) = 1 2 y T Q y k V(x^k)=\frac{1}{2}y^TQy^k V(xk)=21yTQyk

V ( x k + 1 ) = 1 2 ( x k + 1 − x k ) T Q ( x k + 1 − x ∗ ) = 1 2 ( x k − x ∗ − α k g k ) T Q ( x k − x ∗ − α k g k ) = 1 2 y k T Q y k − α k g k T Q y k + 1 2 α k 2 g k T Q g k V(x^{k+1})=\frac{1}{2}(x^{k+1}-x^k)^TQ(x^{k+1}-x^*)=\frac{1}{2}(x^k-x^*-\alpha^kg^k)^TQ(x^k-x^*-\alpha^kg^k)=\frac{1}{2}{y^k}^TQy^k-\alpha^k{g^k}^TQy^k+\frac{1}{2}{\alpha^k}^2{g^k}^TQg^k V(xk+1)=21(xk+1xk)TQ(xk+1x)=21(xkxαkgk)TQ(xkxαkgk)=21ykTQykαkgkTQyk+21αk2gkTQgk

V ( x k ) − V ( x k + 1 ) V ( x k ) = 2 α k g k T Q y k − α k 2 g k T Q g k y k T Q y k \frac{V(x^k)-V(x^{k+1})}{V(x^k)}=\frac{2\alpha^k{g^k}^TQy^k-{\alpha^k}^2{g^k}^TQg^k}{{y^k}^TQy^k} V(xk)V(xk)V(xk+1)=ykTQyk2αkgkTQykαk2gkTQgk

∵ g k = Q x k − b = Q x k − Q x ∗ = Q ( x k − x ∗ ) = Q y k \because g^k=Qx^k-b=Qx^k-Qx^*=Q(x^k-x^*)=Qy^k gk=Qxkb=QxkQx=Q(xkx)=Qyk

∴ y k T Q y k = ( Q − 1 g k ) T Q ( Q − 1 g k ) = g k T Q − 1 g k , g k T Q y k = g k T Q Q − 1 g k = g k T g k \therefore {y^k}^TQy^k=(Q^{-1}g^k)^TQ(Q^{-1}g^k)={g^k}^TQ^{-1}g^k,{g^k}^TQy^k={g^k}^TQQ^{-1}g^k={g^k}^Tg^k ykTQyk=(Q1gk)TQ(Q1gk)=gkTQ1gk,gkTQyk=gkTQQ1gk=gkTgk

V ( x k ) − V ( x k + 1 ) V ( x k ) = 2 α k g k T Q y k − α k 2 g k T Q g k y k T Q y k = 2 α k g k T Q y k − α k 2 g k T Q g k g k T Q − 1 g k = α k g k T Q g k g k T Q − 1 g k ( 2 g k T g k g k T Q g k − α k ) = r k \frac{V(x^k)-V(x^{k+1})}{V(x^k)}=\frac{2\alpha^k{g^k}^TQy^k-{\alpha^k}^2{g^k}^TQg^k}{{y^k}^TQy^k}=\frac{2\alpha^k{g^k}^TQy^k-{\alpha^k}^2{g^k}^TQg^k}{{g^k}^TQ^{-1}g^k}=\alpha^k \frac{{g^k}^TQg^k}{{g^k}^TQ^{-1}g^k}(2\frac{{g^k}^Tg^k}{{g^k}^TQg^k}-\alpha^k)=r^k V(xk)V(xk)V(xk+1)=ykTQyk2αkgkTQykαk2gkTQgk=gkTQ1gk2αkgkTQykαk2gkTQgk=αkgkTQ1gkgkTQgk(2gkTQgkgkTgkαk)=rk

Theorem 1

Thm: x k x^k xk from a gradient method.

Suppose r k > 0 , ∀ k r^k>0, \forall k rk>0,k

Then, x k x^k xk converges to x ∗ x^* x for any x 0 ⇔ ∑ k = 0 ∞ r k = ∞ x^0\Leftrightarrow \sum_{k=0}^{\infty} r^k=\infty x0k=0rk=

Pf: r k = 1 − V ( x k + 1 ) V ( x k ) , ∀ k : r k ≤ 1 r^k=1-\frac{V(x^{k+1})}{V(x^k)},\forall k: r^k\leq1 rk=1V(xk)V(xk+1),k:rk1

r k ↔ V ( x k + 1 ) = 0 ↔ x k + 1 = x k r^k\leftrightarrow V(x^{k+1})=0\leftrightarrow x^{k+1}=x^k rkV(xk+1)=0xk+1=xk

∀ k \forall k k: if g k ≠ 0 g^k\neq0 gk=0, then r k < 1 r^k<1 rk<1

∀ k : r k > 0 \forall k: r^k>0 k:rk>0

g k ≠ 0 ⇒ 0 < r k < 1 g^k\neq0 \Rightarrow 0<r^k<1 gk=00<rk<1

∵ V ( x k + 1 ) = ( 1 − r k ) V ( x k ) \because V(x^{k+1})=(1-r^k)V(x^k) V(xk+1)=(1rk)V(xk)

∴ x k = ∏ i = 0 k − 1 ( 1 − r i ) V ( x 0 ) \therefore x^k=\prod_{i=0}^{k-1} (1-r^i)V(x^0) xk=i=0k1(1ri)V(x0)

x k → x ∗ ⇔ V ( x ∗ ) → 0 ⇔ ∏ i = 0 ∞ ( 1 − r i ) = 0 ⇔ ∑ i = 1 ∞ l o g ( 1 − r i ) = ∞ x^k \rightarrow x^*\Leftrightarrow V(x^*)\rightarrow 0\Leftrightarrow \prod_{i=0}^{\infty} (1-r^i)=0\Leftrightarrow \sum_{i=1}^{\infty}log(1-r^i)=\infty xkxV(x)0i=0(1ri)=0i=1log(1ri)=

Rayleigh’s inequality: λ m i n ( Q ) ∣ ∣ x ∣ ∣ 2 ≤ x T Q x ≤ λ m a x ( Q ) ∣ ∣ x ∣ ∣ 2 \lambda_{min}(Q)||x||^2\leq x^TQx\leq \lambda_{max}(Q)||x||^2 λmin(Q)∣∣x2xTQxλmax(Q)∣∣x2

Q Q Q symmetric, Q > 0 : λ m i n ( Q − 1 ) = 1 λ m a x ( Q ) , λ m a x ( Q − 1 ) = 1 λ m i n ( Q ) Q>0: \lambda_{min}(Q^{-1})=\frac{1}{\lambda_{max}(Q)},\lambda_{max}(Q^{-1})=\frac{1}{\lambda_{min}(Q)} Q>0:λmin(Q1)=λmax(Q)1,λmax(Q1)=λmin(Q)1

Lemma: λ m i n ( Q ) λ m a x ( Q ) ≤ ( x T x ) 2 ( x T Q x ) ( x T Q − 1 x ) ≤ λ m a x ( Q ) λ m i n ( Q ) \frac{\lambda_{min}(Q)}{\lambda_{max}(Q)}\leq \frac{(x^Tx)^2}{(x^TQx)(x^TQ^{-1}x)}\leq \frac{\lambda_{max}(Q)}{\lambda_{min}(Q)} λmax(Q)λmin(Q)(xTQx)(xTQ1x)(xTx)2λmin(Q)λmax(Q)

Theorem 2

Thm: Steepest Decent: x k → x ∗ x^k\rightarrow x^* xkx

Pf: ∀ k : g k ≠ 0 \forall k: g^k\neq0 k:gk=0

α k = g k T g k g k T Q g k → r k = ( g k T g k ) 2 ( g k T Q g k ) ( g k T Q − 1 g k ) \alpha^k=\frac{{g^k}^Tg^k}{{g^k}^TQg^k}\rightarrow r^k=\frac{({g^k}^Tg^k)^2}{({g^k}^TQg^k)({g^k}^TQ^{-1}g^k)} αk=gkTQgkgkTgkrk=(gkTQgk)(gkTQ1gk)(gkTgk)2

r ≥ λ m i n ( Q ) λ m a x ( Q ) > 0 ⇒ ∑ k = 0 ∞ r k = ∞ r\geq \frac{\lambda_{min}(Q)}{\lambda_{max}(Q)}>0\Rightarrow \sum_{k=0}^{\infty} r^k=\infty rλmax(Q)λmin(Q)>0k=0rk=

总结

本节课介绍了梯度方法。先是推导出了固定步长的梯度法,然后在此基础上,考虑可变步长,于是推导出了最速下降法。并且给出了对于二次函数的最速下降法的具体做法和例子。最后做了一些理论性的分析,证明了梯度方法的收敛性。

  • 9
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值