高级优化理论与方法(二)

上节回顾

Constrained

m i n f ( x ) s . t . x ∈ Ω min f(x)\\ s.t. x\in \Omega minf(x)s.t.xΩ

Unconstrained

m i n f ( x ) min f(x) minf(x)

FONC

x ∗ x^* x is optimal, ∀ d , ∇ f ( x ∗ ) T d ≥ 0 \forall d, \nabla f(x^*)^Td \geq 0 d,f(x)Td0
(interior) ∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 f(x)=0

SONC

x ∗ x^* x local optimal, ∀ d , d T ∇ F ( x ) T d ≥ 0 \forall d, d^T\nabla F(x)^Td \geq 0 d,dTF(x)Td0
(interior) ∇ f ( x ∗ ) = 0 , F ( x ∗ ) ≥ 0 \nabla f(x^*)=0,F(x^*)\geq0 f(x)=0,F(x)0

example

m i n f ( x 1 , x 2 ) = x 1 2 − x 2 2 min f(x_1,x_2)=x_1^2-x_2^2 minf(x1,x2)=x12x22

x ∗ = [ 0 , 0 ] T x^*=[0,0]^T x=[0,0]T

∇ f ( x ) = [ 2 x 1 , − 2 x 2 ] T \nabla f(x)=[2x_1,-2x_2]^T f(x)=[2x1,2x2]T

∇ f ( x ∗ ) = [ 0 , 0 ] T \nabla f(x^*)=[0,0]^T f(x)=[0,0]T

H ( x ) = [ 2 0 0 − 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}>0 H(x)=[2002]>0
d 1 = [ 1 , 0 ] T d_1=[1,0]^T d1=[1,0]T

d 1 T F ( x ∗ ) d 1 = [ 2 , 0 ] [ 1 , 0 ] T = 2 > 0 d_1^TF(x^*)d_1=[2,0][1,0]^T=2>0 d1TF(x)d1=[2,0][1,0]T=2>0

d 2 = [ 0 , 1 ] T d_2=[0,1]^T d2=[0,1]T

d 2 T F ( x ∗ ) d 2 = − 2 < 0 d_2^TF(x^*)d_2=-2<0 d2TF(x)d2=2<0

根据SONC, [ 0 , 0 ] T [0,0]^T [0,0]T not local minimizer.

这节课的内容

SOSC

定理叙述

【Second-order Sufficient Condition]
Let f ∈ C 2 f\in C^2 fC2 be defined on a region in which x ∗ x^* x is an interior point.Suppose that:
∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 f(x)=0
F ( x ∗ ) > 0 F(x^*)>0 F(x)>0
Then, x ∗ x^* x is a strict local minimizer of f. ∀ x ∈ N ϵ ( x ∗ ) : f ( x ∗ ) < f ( x ) \forall x\in N_{\epsilon}(x^*):f(x^*)<f(x) xNϵ(x):f(x)<f(x)
注:对于无约束优化问题,我们只能给出一些充分条件或者必要条件,充要条件是数学界的一个公开问题,目前还没有答案。

证明

证:
f ∈ C 2 ⇒ F ( x ∗ ) = F ( x ∗ ) T f \in C^2 \Rightarrow F(x^*)=F(x^*)^T fC2F(x)=F(x)T
(由Clairaut’s Theorem and Schwarz’s Therem, ∀ i , j ∈ [ 1 , n ] , ∂ 2 f ( x ∗ ) ∂ x i ∂ x j = ∂ 2 f ( x ∗ ) ∂ x j ∂ x i \forall i,j \in [1,n],\frac{\partial^2 f(x^*)}{\partial x_i \partial x_j}=\frac{\partial^2 f(x^*)}{\partial x_j \partial x_i} i,j[1,n],xixj2f(x)=xjxi2f(x))

Rayleigh’s Inequality:for a P ∈ R n × n P \in \mathbb{R}^{n \times n} PRn×n,symmetric, positive definite:
λ m i n ( P ) ∣ ∣ x ∣ ∣ 2 ≤ x T P x ≤ λ m a x ( P ) ∣ ∣ x ∣ ∣ 2 \lambda_{min}(P)||x||^2\leq x^TPx \leq \lambda_{max}(P)||x||^2 λmin(P)∣∣x2xTPxλmax(P)∣∣x2

where λ m i n ( P ) \lambda_{min}(P) λmin(P) and λ m a x ( P ) \lambda_{max}(P) λmax(P) are the minmal and maximal eigenvalue value of P, respectively.

a symmetric matrix is positive definite ⇔ \Leftrightarrow all its eigenvalues are positive.

∵ d T F ( x ∗ ) d ≥ λ m i n ( F ( x ∗ ) ) ∣ ∣ d ∣ ∣ 2 > 0 \because d^TF(x^*)d \geq \lambda_{min}(F(x^*))||d||^2>0 dTF(x)dλmin(F(x))∣∣d2>0

∴ f ( x ∗ + d ) − f ( x ∗ ) = 1 2 d T F ( x ∗ ) d + o ( ∣ ∣ d ∣ ∣ 2 ) > 0 \therefore f(x^*+d)-f(x^*)=\frac{1}{2}d^TF(x^*)d+o(||d||^2)>0 f(x+d)f(x)=21dTF(x)d+o(∣∣d2)>0

例子

f ( x ) = x 1 2 + x 2 2 f(x)=x_1^2+x_2^2 f(x)=x12+x22
∇ f ( x ) = [ 2 x 1 , 2 x 2 ] T \nabla f(x)=[2x_1,2x_2]^T f(x)=[2x1,2x2]T
H ( x ) = [ 2 0 0 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}>0 H(x)=[2002]>0
x ∗ = [ 0 , 0 ] T x^*=[0,0]^T x=[0,0]T

One-dimensional Search Methods

Iterative Method

Iterative Method意为迭代算法。此处算法用algorithm其实不太严谨,因为要设计到算法的复杂度证明、正确性证明、能否停止等等的算法严谨性问题,而method这个词则不用考虑这么多。迭代意为由某个初始点出发,找一些方向,往某些方向更新的过程。

Golden Section Search

Assume f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0,b0] (only one minimizer in [ a 0 , b 0 ] [a_0,b_0] [a0,b0])
Basic Idea: “Narrow Down”
Binary Search does not work out.
Pick two instead of one points.

Method

input: a 0 , b 0 , f , ϵ a_0,b_0,f,\epsilon a0,b0,f,ϵ
1. i = 0 i=0 i=0

2.while b i − a i ≥ ϵ b_i-a_i\geq \epsilon biaiϵ do

3.Pick x < y x<y x<y from [a_i,b_i]

4.If f ( x ) < f ( y ) f(x)<f(y) f(x)<f(y) then a i + 1 = a i , b i + 1 = y a_{i+1}=a_i,b_{i+1}=y ai+1=ai,bi+1=y;
else b i + 1 = b i , a i + 1 = x b_{i+1}=b_i,a_{i+1}=x bi+1=bi,ai+1=x

5.i++

6.END while

Issues

1.# while-loop
2.# computation of f ( ⋅ ) f(\cdot) f()

方法推理

W.O.L.G.(Without Loss of Generality)
Assume b 0 − a 0 = 1 b_0-a_0=1 b0a0=1
a 1 − a 0 = b 1 − b 0 = ρ < 1 2 a_1-a_0=b_1-b_0=\rho<\frac{1}{2} a1a0=b1b0=ρ<21

∀ i : b i + 1 − a i + 1 = ( 1 − ρ ) ( b i − a i ) \forall i: b_{i+1}-a_{i+1}=(1-\rho)(b_i-a_i) i:bi+1ai+1=(1ρ)(biai)

b 1 − a 1 = 1 − 2 ρ b_1-a_1=1-2\rho b1a1=12ρ

b 1 − a 1 = ρ ( b 1 − a 0 ) = ρ ( 1 − ρ ) ⇒ 1 − 2 ρ = ρ − ρ 2 ⇒ ρ 2 − 3 ρ + 1 = 0 b_1-a_1=\rho(b_1-a_0)=\rho(1-\rho) \Rightarrow 1-2\rho=\rho-\rho^2 \Rightarrow \rho^2-3\rho+1=0 b1a1=ρ(b1a0)=ρ(1ρ)12ρ=ρρ2ρ23ρ+1=0

ρ 1 = 3 + 5 2 > 1 2 \rho_1=\frac{3+\sqrt{5}}{2}>\frac{1}{2} ρ1=23+5 >21(舍去), ρ 2 = 3 − 5 2 < 1 2 \rho_2=\frac{3-\sqrt{5}}{2}<\frac{1}{2} ρ2=235 <21

算法描述

1.compile b 1 = a 0 + ( 1 − ρ ) ( b 0 − a 0 ) , a 1 = a 0 + ρ ( b 0 − a 0 ) , f ( a 1 ) , f ( b 1 ) b_1=a_0+(1-\rho)(b_0-a_0),a_1=a_0+\rho(b_0-a_0),f(a_1),f(b_1) b1=a0+(1ρ)(b0a0),a1=a0+ρ(b0a0),f(a1),f(b1)

2.i=0

3.while b i − a i ≥ ϵ b_i-a_i\geq \epsilon biaiϵ do
if f ( a i + 1 ) < f ( b i + 1 ) f(a_{i+1})<f(b_{i+1}) f(ai+1)<f(bi+1) then
b i + 2 = a i + 1 , a i + 2 = a i + ρ ( b i + 1 − a i ) , a i + 1 = a i b_{i+2}=a_{i+1},a_{i+2}=a_i+\rho(b_{i+1}-a_i),a_{i+1}=a_i bi+2=ai+1,ai+2=ai+ρ(bi+1ai),ai+1=ai
else
a i + 2 = b i + 1 , b i + 2 = b i − ρ ( b i − a i + 1 ) , b i + 1 = b i a_{i+2}=b_{i+1},b_{i+2}=b_i-\rho(b_i-a_{i+1}),b_{i+1}=b_i ai+2=bi+1,bi+2=biρ(biai+1),bi+1=bi

4.i++

5.END while

Time

1.While-Loop: time of f ( ⋅ ) f(\cdot) f()+O(1)
2.Loop: ( 1 − ρ ) N ( b 0 − a 0 ) < ϵ (1-\rho)^N(b_0-a_0)<\epsilon (1ρ)N(b0a0)<ϵ
N= a r g m i n ( l o g 1 − ρ ϵ b 0 − a 0 ) argmin(log_{1-\rho}\frac{\epsilon}{b_0-a_0}) argmin(log1ρb0a0ϵ)

Example

ϵ = 0.3 \epsilon=0.3 ϵ=0.3
f ( x ) = x 4 − 14 x 3 + 60 x 2 − 70 x f(x)=x^4-14x^3+60x^2-70x f(x)=x414x3+60x270x
[0,2]
( 1 − ρ ) N < 0.3 2 = 0.15 ⇒ N = 4 (1-\rho)^N<\frac{0.3}{2}=0.15\Rightarrow N=4 (1ρ)N<20.3=0.15N=4

1. a 1 = a 0 + ρ ( b 0 − a 0 ) = 0.7633 a_1=a_0+\rho(b_0-a_0)=0.7633 a1=a0+ρ(b0a0)=0.7633
b 1 = a 0 + ( 1 − ρ ) ( b 0 − a 0 ) = 1.236 b_1=a_0+(1-\rho)(b_0-a_0)=1.236 b1=a0+(1ρ)(b0a0)=1.236
f ( a 1 ) = − 24.36 f(a_1)=-24.36 f(a1)=24.36
f ( b 1 ) = − 18.96 f(b_1)=-18.96 f(b1)=18.96

2.[0,1.236]
b 2 = a 1 = 0.7639 b_2=a_1=0.7639 b2=a1=0.7639
a 1 = a 0 + ρ ( 1.236 − 0 ) = 0.4721 a_1=a_0+\rho(1.236-0)=0.4721 a1=a0+ρ(1.2360)=0.4721
f ( b 2 ) = − 24.36 f(b_2)=-24.36 f(b2)=24.36
f ( a 2 ) = − 21.10 f(a_2)=-21.10 f(a2)=21.10

3.[0.4721,1.236]
a 3 = b 2 = 0.7639 a_3=b_2=0.7639 a3=b2=0.7639
b 3 = a 2 + ( 1 − ρ ) ( 1.236 − 0.4721 ) = 0.9443 b_3=a_2+(1-\rho)(1.236-0.4721)=0.9443 b3=a2+(1ρ)(1.2360.4721)=0.9443
f ( a 3 ) = − 24.36 f(a_3)=-24.36 f(a3)=24.36
f ( b 3 ) = − 23.59 f(b_3)=-23.59 f(b3)=23.59

4.[0.4721,0.9443]
b 4 = a 3 = 0.7639 b_4=a_3=0.7639 b4=a3=0.7639
a 4 = 0.4721 + ρ ( 0.7443 − 0.4721 ) = 0.6525 a_4=0.4721+\rho(0.7443-0.4721)=0.6525 a4=0.4721+ρ(0.74430.4721)=0.6525
f ( b 4 ) = − 24.36 f(b_4)=-24.36 f(b4)=24.36
f ( a 4 ) = − 23.86 f(a_4)=-23.86 f(a4)=23.86

5.[0.6525,09443]
0.9443 − 0.6525 < 0.3 = ϵ 0.9443-0.6525<0.3=\epsilon 0.94430.6525<0.3=ϵ
算法终止

Fibonacci Method

事实上,每一轮的 ρ \rho ρ不一定要固定,也可以变化。假设 ρ \rho ρ会变化,我们来推导一下每一轮之间 ρ \rho ρ的关系。
ρ 1 ( 1 − ρ 0 ) = 1 − 2 ρ 0 \rho_1(1-\rho_0)=1-2\rho_0 ρ1(1ρ0)=12ρ0
ρ k + 1 ( 1 − ρ k ) = 1 − 2 ρ k \rho_{k+1}(1-\rho_k)=1-2\rho_k ρk+1(1ρk)=12ρk
ρ k + 1 = 1 − ρ k 1 − ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1=11ρkρk

问题转化为
min ( 1 − ρ 0 ) ( 1 − ρ 1 ) ⋯ ( 1 − ρ k ) (1-\rho_0)(1-\rho_1)\cdots (1-\rho_k) (1ρ0)(1ρ1)(1ρk)
s.t. ρ k + 1 = 1 − ρ k 1 − ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1=11ρkρk

结论为 ρ 0 = 1 − F N F N + 1 , ρ N − 1 = 1 − F 1 F 2 \rho_0=1-\frac{F_N}{F_{N+1}},\rho_{N-1}=1-\frac{F_1}{F_2} ρ0=1FN+1FN,ρN1=1F2F1
F k F_k Fk为Fibonacci数列的第 k k k项, F 0 = 0 , F 1 = 1 , F k + 2 = F k + F k + 1 F_0=0,F_1=1,F_{k+2}=F_k+F_{k+1} F0=0,F1=1,Fk+2=Fk+Fk+1

注:用该方法来做比黄金分割法要快。

Bisection Method

Assume:f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0,b0], f continuously differentiable.

f ′ ( c ) < 0 : [ c , b 0 ] f'(c)<0:[c,b_0] f(c)<0:[c,b0]
f ′ ( c ) > 0 : [ a 0 , c ] f'(c)>0:[a_0,c] f(c)>0:[a0,c]
f ′ ( c ) = 0 : f'(c)=0: f(c)=0:return c c c

( 1 2 ) N < ϵ (\frac{1}{2})^N<\epsilon (21)N<ϵ

Newton Method

Assume: f ∈ C 2 ⇒ x ∗ ∈ [ a , b ] : f ′ ( x ∗ ) = 0 f \in C^2\Rightarrow x^*\in [a,b]: f'(x^*)=0 fC2x[a,b]:f(x)=0

x k + 1 = x k − f ( x k ) f ′ ( x k ) x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)} xk+1=xkf(xk)f(xk) x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) x_{k+1}=x_k-\frac{f'(x_k)}{f''(x_k)} xk+1=xkf′′(xk)f(xk)

该方法只有在初始点选的比较好的时候才管用,若初始点选的不好,可能产生振荡不收敛的问题。

Example

f ( x ) = 1 2 x 2 − s i n x f(x)=\frac{1}{2}x^2-sinx f(x)=21x2sinx

x 0 = 0.5 x_0=0.5 x0=0.5

ϵ = 1 0 − 5 \epsilon=10^{-5} ϵ=105

f ′ ( x ) = x − c o s x f'(x)=x-cosx f(x)=xcosx

f ′ ′ ( x ) = 1 + s i n x f''(x)=1+sinx f′′(x)=1+sinx

x 1 = 0.5 − 0.5 − c o s 0 , 5 1 + s i n 0.5 = 0.7552 x_1=0.5-\frac{0.5-cos0,5}{1+sin0.5}=0.7552 x1=0.51+sin0.50.5cos0,5=0.7552

x 2 = 0.7391 x_2=0.7391 x2=0.7391

x 3 = 0.7390 x_3=0.7390 x3=0.7390

x 4 = 0.7390 x_4=0.7390 x4=0.7390

Secant Method

secant意为切线。

f ∈ C 1 f \in C^1 fC1

f ′ ′ ≈ f ′ ( x k + 1 ) − f ′ ( x k ) x k + 1 − x k f''\approx\frac{f'(x_{k+1})-f'(x_k)}{x_{k+1}-x_k} f′′xk+1xkf(xk+1)f(xk)

x k + 1 = x k − f ′ ( x k ) ( x k − x k − 1 ) f ′ ( x k ) − f ′ ( x k − 1 ) x_{k+1}=x_k-\frac{f'(x_k)(x_k-x_{k-1})}{f'(x_k)-f'(x_{k-1})} xk+1=xkf(xk)f(xk1)f(xk)(xkxk1)

Bracketing

Find the initial a 0 , b 0 a_0,b_0 a0,b0
Suffice: a 0 , c , b 0 ← f ( a 0 ) > f ( c ) , f ( b 0 ) > f ( c ) a_0,c,b_0\leftarrow f(a_0)>f(c),f(b_0)>f(c) a0,c,b0f(a0)>f(c),f(b0)>f(c)
该方法用于求得一个理想的区间,然后使用其它算法来做,但在实际应用中比较少见,且不太好用。

总结

本节课先回顾了FONC和SONC这两个找最值点的必要条件,然后给出了SOSC这个找最值点的充分条件。虽然看上去比较简单,但是关于无约束优化的定理目前也只发展到这种程度。目前数学界还没有找出一个充分必要条件。然后介绍了一维搜索方法中的迭代方法。重点介绍了黄金分割法,简略介绍了斐波那契法、二分法、牛顿法、割线法等方法。

  • 26
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值