高级优化理论与方法(二)
上节回顾
Constrained
m i n f ( x ) s . t . x ∈ Ω min f(x)\\ s.t. x\in \Omega minf(x)s.t.x∈Ω
Unconstrained
m i n f ( x ) min f(x) minf(x)
FONC
x
∗
x^*
x∗ is optimal,
∀
d
,
∇
f
(
x
∗
)
T
d
≥
0
\forall d, \nabla f(x^*)^Td \geq 0
∀d,∇f(x∗)Td≥0
(interior)
∇
f
(
x
∗
)
=
0
\nabla f(x^*)=0
∇f(x∗)=0
SONC
x
∗
x^*
x∗ local optimal,
∀
d
,
d
T
∇
F
(
x
)
T
d
≥
0
\forall d, d^T\nabla F(x)^Td \geq 0
∀d,dT∇F(x)Td≥0
(interior)
∇
f
(
x
∗
)
=
0
,
F
(
x
∗
)
≥
0
\nabla f(x^*)=0,F(x^*)\geq0
∇f(x∗)=0,F(x∗)≥0
example
m i n f ( x 1 , x 2 ) = x 1 2 − x 2 2 min f(x_1,x_2)=x_1^2-x_2^2 minf(x1,x2)=x12−x22
x ∗ = [ 0 , 0 ] T x^*=[0,0]^T x∗=[0,0]T
∇ f ( x ) = [ 2 x 1 , − 2 x 2 ] T \nabla f(x)=[2x_1,-2x_2]^T ∇f(x)=[2x1,−2x2]T
∇ f ( x ∗ ) = [ 0 , 0 ] T \nabla f(x^*)=[0,0]^T ∇f(x∗)=[0,0]T
H
(
x
)
=
[
2
0
0
−
2
]
>
0
H(x)=\begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}>0
H(x)=[200−2]>0
d
1
=
[
1
,
0
]
T
d_1=[1,0]^T
d1=[1,0]T
d 1 T F ( x ∗ ) d 1 = [ 2 , 0 ] [ 1 , 0 ] T = 2 > 0 d_1^TF(x^*)d_1=[2,0][1,0]^T=2>0 d1TF(x∗)d1=[2,0][1,0]T=2>0
d 2 = [ 0 , 1 ] T d_2=[0,1]^T d2=[0,1]T
d 2 T F ( x ∗ ) d 2 = − 2 < 0 d_2^TF(x^*)d_2=-2<0 d2TF(x∗)d2=−2<0
根据SONC, [ 0 , 0 ] T [0,0]^T [0,0]T not local minimizer.
这节课的内容
SOSC
定理叙述
【Second-order Sufficient Condition]
Let
f
∈
C
2
f\in C^2
f∈C2 be defined on a region in which
x
∗
x^*
x∗ is an interior point.Suppose that:
①
∇
f
(
x
∗
)
=
0
\nabla f(x^*)=0
∇f(x∗)=0
②
F
(
x
∗
)
>
0
F(x^*)>0
F(x∗)>0
Then,
x
∗
x^*
x∗ is a strict local minimizer of f.
∀
x
∈
N
ϵ
(
x
∗
)
:
f
(
x
∗
)
<
f
(
x
)
\forall x\in N_{\epsilon}(x^*):f(x^*)<f(x)
∀x∈Nϵ(x∗):f(x∗)<f(x)
注:对于无约束优化问题,我们只能给出一些充分条件或者必要条件,充要条件是数学界的一个公开问题,目前还没有答案。
证明
证:
f
∈
C
2
⇒
F
(
x
∗
)
=
F
(
x
∗
)
T
f \in C^2 \Rightarrow F(x^*)=F(x^*)^T
f∈C2⇒F(x∗)=F(x∗)T
(由Clairaut’s Theorem and Schwarz’s Therem,
∀
i
,
j
∈
[
1
,
n
]
,
∂
2
f
(
x
∗
)
∂
x
i
∂
x
j
=
∂
2
f
(
x
∗
)
∂
x
j
∂
x
i
\forall i,j \in [1,n],\frac{\partial^2 f(x^*)}{\partial x_i \partial x_j}=\frac{\partial^2 f(x^*)}{\partial x_j \partial x_i}
∀i,j∈[1,n],∂xi∂xj∂2f(x∗)=∂xj∂xi∂2f(x∗))
Rayleigh’s Inequality:for a
P
∈
R
n
×
n
P \in \mathbb{R}^{n \times n}
P∈Rn×n,symmetric, positive definite:
λ
m
i
n
(
P
)
∣
∣
x
∣
∣
2
≤
x
T
P
x
≤
λ
m
a
x
(
P
)
∣
∣
x
∣
∣
2
\lambda_{min}(P)||x||^2\leq x^TPx \leq \lambda_{max}(P)||x||^2
λmin(P)∣∣x∣∣2≤xTPx≤λmax(P)∣∣x∣∣2
where λ m i n ( P ) \lambda_{min}(P) λmin(P) and λ m a x ( P ) \lambda_{max}(P) λmax(P) are the minmal and maximal eigenvalue value of P, respectively.
a symmetric matrix is positive definite ⇔ \Leftrightarrow ⇔all its eigenvalues are positive.
∵ d T F ( x ∗ ) d ≥ λ m i n ( F ( x ∗ ) ) ∣ ∣ d ∣ ∣ 2 > 0 \because d^TF(x^*)d \geq \lambda_{min}(F(x^*))||d||^2>0 ∵dTF(x∗)d≥λmin(F(x∗))∣∣d∣∣2>0
∴ f ( x ∗ + d ) − f ( x ∗ ) = 1 2 d T F ( x ∗ ) d + o ( ∣ ∣ d ∣ ∣ 2 ) > 0 \therefore f(x^*+d)-f(x^*)=\frac{1}{2}d^TF(x^*)d+o(||d||^2)>0 ∴f(x∗+d)−f(x∗)=21dTF(x∗)d+o(∣∣d∣∣2)>0
例子
f
(
x
)
=
x
1
2
+
x
2
2
f(x)=x_1^2+x_2^2
f(x)=x12+x22
∇
f
(
x
)
=
[
2
x
1
,
2
x
2
]
T
\nabla f(x)=[2x_1,2x_2]^T
∇f(x)=[2x1,2x2]T
H
(
x
)
=
[
2
0
0
2
]
>
0
H(x)=\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}>0
H(x)=[2002]>0
x
∗
=
[
0
,
0
]
T
x^*=[0,0]^T
x∗=[0,0]T
One-dimensional Search Methods
Iterative Method
Iterative Method意为迭代算法。此处算法用algorithm其实不太严谨,因为要设计到算法的复杂度证明、正确性证明、能否停止等等的算法严谨性问题,而method这个词则不用考虑这么多。迭代意为由某个初始点出发,找一些方向,往某些方向更新的过程。
Golden Section Search
Assume f: unimodular on
[
a
0
,
b
0
]
[a_0,b_0]
[a0,b0] (only one minimizer in
[
a
0
,
b
0
]
[a_0,b_0]
[a0,b0])
Basic Idea: “Narrow Down”
Binary Search does not work out.
Pick two instead of one points.
Method
input:
a
0
,
b
0
,
f
,
ϵ
a_0,b_0,f,\epsilon
a0,b0,f,ϵ
1.
i
=
0
i=0
i=0
2.while b i − a i ≥ ϵ b_i-a_i\geq \epsilon bi−ai≥ϵ do
3.Pick x < y x<y x<y from [a_i,b_i]
4.If
f
(
x
)
<
f
(
y
)
f(x)<f(y)
f(x)<f(y) then
a
i
+
1
=
a
i
,
b
i
+
1
=
y
a_{i+1}=a_i,b_{i+1}=y
ai+1=ai,bi+1=y;
else
b
i
+
1
=
b
i
,
a
i
+
1
=
x
b_{i+1}=b_i,a_{i+1}=x
bi+1=bi,ai+1=x
5.i++
6.END while
Issues
1.# while-loop
2.# computation of
f
(
⋅
)
f(\cdot)
f(⋅)
方法推理
W.O.L.G.(Without Loss of Generality)
Assume
b
0
−
a
0
=
1
b_0-a_0=1
b0−a0=1
a
1
−
a
0
=
b
1
−
b
0
=
ρ
<
1
2
a_1-a_0=b_1-b_0=\rho<\frac{1}{2}
a1−a0=b1−b0=ρ<21
∀ i : b i + 1 − a i + 1 = ( 1 − ρ ) ( b i − a i ) \forall i: b_{i+1}-a_{i+1}=(1-\rho)(b_i-a_i) ∀i:bi+1−ai+1=(1−ρ)(bi−ai)
b 1 − a 1 = 1 − 2 ρ b_1-a_1=1-2\rho b1−a1=1−2ρ
b 1 − a 1 = ρ ( b 1 − a 0 ) = ρ ( 1 − ρ ) ⇒ 1 − 2 ρ = ρ − ρ 2 ⇒ ρ 2 − 3 ρ + 1 = 0 b_1-a_1=\rho(b_1-a_0)=\rho(1-\rho) \Rightarrow 1-2\rho=\rho-\rho^2 \Rightarrow \rho^2-3\rho+1=0 b1−a1=ρ(b1−a0)=ρ(1−ρ)⇒1−2ρ=ρ−ρ2⇒ρ2−3ρ+1=0
ρ 1 = 3 + 5 2 > 1 2 \rho_1=\frac{3+\sqrt{5}}{2}>\frac{1}{2} ρ1=23+5>21(舍去), ρ 2 = 3 − 5 2 < 1 2 \rho_2=\frac{3-\sqrt{5}}{2}<\frac{1}{2} ρ2=23−5<21
算法描述
1.compile b 1 = a 0 + ( 1 − ρ ) ( b 0 − a 0 ) , a 1 = a 0 + ρ ( b 0 − a 0 ) , f ( a 1 ) , f ( b 1 ) b_1=a_0+(1-\rho)(b_0-a_0),a_1=a_0+\rho(b_0-a_0),f(a_1),f(b_1) b1=a0+(1−ρ)(b0−a0),a1=a0+ρ(b0−a0),f(a1),f(b1)
2.i=0
3.while
b
i
−
a
i
≥
ϵ
b_i-a_i\geq \epsilon
bi−ai≥ϵ do
if
f
(
a
i
+
1
)
<
f
(
b
i
+
1
)
f(a_{i+1})<f(b_{i+1})
f(ai+1)<f(bi+1) then
b
i
+
2
=
a
i
+
1
,
a
i
+
2
=
a
i
+
ρ
(
b
i
+
1
−
a
i
)
,
a
i
+
1
=
a
i
b_{i+2}=a_{i+1},a_{i+2}=a_i+\rho(b_{i+1}-a_i),a_{i+1}=a_i
bi+2=ai+1,ai+2=ai+ρ(bi+1−ai),ai+1=ai
else
a
i
+
2
=
b
i
+
1
,
b
i
+
2
=
b
i
−
ρ
(
b
i
−
a
i
+
1
)
,
b
i
+
1
=
b
i
a_{i+2}=b_{i+1},b_{i+2}=b_i-\rho(b_i-a_{i+1}),b_{i+1}=b_i
ai+2=bi+1,bi+2=bi−ρ(bi−ai+1),bi+1=bi
4.i++
5.END while
Time
1.While-Loop: time of
f
(
⋅
)
f(\cdot)
f(⋅)+O(1)
2.Loop:
(
1
−
ρ
)
N
(
b
0
−
a
0
)
<
ϵ
(1-\rho)^N(b_0-a_0)<\epsilon
(1−ρ)N(b0−a0)<ϵ
N=
a
r
g
m
i
n
(
l
o
g
1
−
ρ
ϵ
b
0
−
a
0
)
argmin(log_{1-\rho}\frac{\epsilon}{b_0-a_0})
argmin(log1−ρb0−a0ϵ)
Example
ϵ
=
0.3
\epsilon=0.3
ϵ=0.3
f
(
x
)
=
x
4
−
14
x
3
+
60
x
2
−
70
x
f(x)=x^4-14x^3+60x^2-70x
f(x)=x4−14x3+60x2−70x
[0,2]
(
1
−
ρ
)
N
<
0.3
2
=
0.15
⇒
N
=
4
(1-\rho)^N<\frac{0.3}{2}=0.15\Rightarrow N=4
(1−ρ)N<20.3=0.15⇒N=4
1.
a
1
=
a
0
+
ρ
(
b
0
−
a
0
)
=
0.7633
a_1=a_0+\rho(b_0-a_0)=0.7633
a1=a0+ρ(b0−a0)=0.7633
b
1
=
a
0
+
(
1
−
ρ
)
(
b
0
−
a
0
)
=
1.236
b_1=a_0+(1-\rho)(b_0-a_0)=1.236
b1=a0+(1−ρ)(b0−a0)=1.236
f
(
a
1
)
=
−
24.36
f(a_1)=-24.36
f(a1)=−24.36
f
(
b
1
)
=
−
18.96
f(b_1)=-18.96
f(b1)=−18.96
2.[0,1.236]
b
2
=
a
1
=
0.7639
b_2=a_1=0.7639
b2=a1=0.7639
a
1
=
a
0
+
ρ
(
1.236
−
0
)
=
0.4721
a_1=a_0+\rho(1.236-0)=0.4721
a1=a0+ρ(1.236−0)=0.4721
f
(
b
2
)
=
−
24.36
f(b_2)=-24.36
f(b2)=−24.36
f
(
a
2
)
=
−
21.10
f(a_2)=-21.10
f(a2)=−21.10
3.[0.4721,1.236]
a
3
=
b
2
=
0.7639
a_3=b_2=0.7639
a3=b2=0.7639
b
3
=
a
2
+
(
1
−
ρ
)
(
1.236
−
0.4721
)
=
0.9443
b_3=a_2+(1-\rho)(1.236-0.4721)=0.9443
b3=a2+(1−ρ)(1.236−0.4721)=0.9443
f
(
a
3
)
=
−
24.36
f(a_3)=-24.36
f(a3)=−24.36
f
(
b
3
)
=
−
23.59
f(b_3)=-23.59
f(b3)=−23.59
4.[0.4721,0.9443]
b
4
=
a
3
=
0.7639
b_4=a_3=0.7639
b4=a3=0.7639
a
4
=
0.4721
+
ρ
(
0.7443
−
0.4721
)
=
0.6525
a_4=0.4721+\rho(0.7443-0.4721)=0.6525
a4=0.4721+ρ(0.7443−0.4721)=0.6525
f
(
b
4
)
=
−
24.36
f(b_4)=-24.36
f(b4)=−24.36
f
(
a
4
)
=
−
23.86
f(a_4)=-23.86
f(a4)=−23.86
5.[0.6525,09443]
0.9443
−
0.6525
<
0.3
=
ϵ
0.9443-0.6525<0.3=\epsilon
0.9443−0.6525<0.3=ϵ
算法终止
Fibonacci Method
事实上,每一轮的
ρ
\rho
ρ不一定要固定,也可以变化。假设
ρ
\rho
ρ会变化,我们来推导一下每一轮之间
ρ
\rho
ρ的关系。
ρ
1
(
1
−
ρ
0
)
=
1
−
2
ρ
0
\rho_1(1-\rho_0)=1-2\rho_0
ρ1(1−ρ0)=1−2ρ0
ρ
k
+
1
(
1
−
ρ
k
)
=
1
−
2
ρ
k
\rho_{k+1}(1-\rho_k)=1-2\rho_k
ρk+1(1−ρk)=1−2ρk
ρ
k
+
1
=
1
−
ρ
k
1
−
ρ
k
\rho_{k+1}=1-\frac{\rho_k}{1-\rho_k}
ρk+1=1−1−ρkρk
问题转化为
min
(
1
−
ρ
0
)
(
1
−
ρ
1
)
⋯
(
1
−
ρ
k
)
(1-\rho_0)(1-\rho_1)\cdots (1-\rho_k)
(1−ρ0)(1−ρ1)⋯(1−ρk)
s.t.
ρ
k
+
1
=
1
−
ρ
k
1
−
ρ
k
\rho_{k+1}=1-\frac{\rho_k}{1-\rho_k}
ρk+1=1−1−ρkρk
结论为
ρ
0
=
1
−
F
N
F
N
+
1
,
ρ
N
−
1
=
1
−
F
1
F
2
\rho_0=1-\frac{F_N}{F_{N+1}},\rho_{N-1}=1-\frac{F_1}{F_2}
ρ0=1−FN+1FN,ρN−1=1−F2F1
F
k
F_k
Fk为Fibonacci数列的第
k
k
k项,
F
0
=
0
,
F
1
=
1
,
F
k
+
2
=
F
k
+
F
k
+
1
F_0=0,F_1=1,F_{k+2}=F_k+F_{k+1}
F0=0,F1=1,Fk+2=Fk+Fk+1
注:用该方法来做比黄金分割法要快。
Bisection Method
Assume:f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0,b0], f continuously differentiable.
f
′
(
c
)
<
0
:
[
c
,
b
0
]
f'(c)<0:[c,b_0]
f′(c)<0:[c,b0]
f
′
(
c
)
>
0
:
[
a
0
,
c
]
f'(c)>0:[a_0,c]
f′(c)>0:[a0,c]
f
′
(
c
)
=
0
:
f'(c)=0:
f′(c)=0:return
c
c
c
( 1 2 ) N < ϵ (\frac{1}{2})^N<\epsilon (21)N<ϵ
Newton Method
Assume: f ∈ C 2 ⇒ x ∗ ∈ [ a , b ] : f ′ ( x ∗ ) = 0 f \in C^2\Rightarrow x^*\in [a,b]: f'(x^*)=0 f∈C2⇒x∗∈[a,b]:f′(x∗)=0
x k + 1 = x k − f ( x k ) f ′ ( x k ) x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)} xk+1=xk−f′(xk)f(xk)或 x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) x_{k+1}=x_k-\frac{f'(x_k)}{f''(x_k)} xk+1=xk−f′′(xk)f′(xk)
该方法只有在初始点选的比较好的时候才管用,若初始点选的不好,可能产生振荡不收敛的问题。
Example
f ( x ) = 1 2 x 2 − s i n x f(x)=\frac{1}{2}x^2-sinx f(x)=21x2−sinx
x 0 = 0.5 x_0=0.5 x0=0.5
ϵ = 1 0 − 5 \epsilon=10^{-5} ϵ=10−5
f ′ ( x ) = x − c o s x f'(x)=x-cosx f′(x)=x−cosx
f ′ ′ ( x ) = 1 + s i n x f''(x)=1+sinx f′′(x)=1+sinx
x 1 = 0.5 − 0.5 − c o s 0 , 5 1 + s i n 0.5 = 0.7552 x_1=0.5-\frac{0.5-cos0,5}{1+sin0.5}=0.7552 x1=0.5−1+sin0.50.5−cos0,5=0.7552
x 2 = 0.7391 x_2=0.7391 x2=0.7391
x 3 = 0.7390 x_3=0.7390 x3=0.7390
x 4 = 0.7390 x_4=0.7390 x4=0.7390
Secant Method
secant意为切线。
f ∈ C 1 f \in C^1 f∈C1
f ′ ′ ≈ f ′ ( x k + 1 ) − f ′ ( x k ) x k + 1 − x k f''\approx\frac{f'(x_{k+1})-f'(x_k)}{x_{k+1}-x_k} f′′≈xk+1−xkf′(xk+1)−f′(xk)
x k + 1 = x k − f ′ ( x k ) ( x k − x k − 1 ) f ′ ( x k ) − f ′ ( x k − 1 ) x_{k+1}=x_k-\frac{f'(x_k)(x_k-x_{k-1})}{f'(x_k)-f'(x_{k-1})} xk+1=xk−f′(xk)−f′(xk−1)f′(xk)(xk−xk−1)
Bracketing
Find the initial
a
0
,
b
0
a_0,b_0
a0,b0
Suffice:
a
0
,
c
,
b
0
←
f
(
a
0
)
>
f
(
c
)
,
f
(
b
0
)
>
f
(
c
)
a_0,c,b_0\leftarrow f(a_0)>f(c),f(b_0)>f(c)
a0,c,b0←f(a0)>f(c),f(b0)>f(c)
该方法用于求得一个理想的区间,然后使用其它算法来做,但在实际应用中比较少见,且不太好用。
总结
本节课先回顾了FONC和SONC这两个找最值点的必要条件,然后给出了SOSC这个找最值点的充分条件。虽然看上去比较简单,但是关于无约束优化的定理目前也只发展到这种程度。目前数学界还没有找出一个充分必要条件。然后介绍了一维搜索方法中的迭代方法。重点介绍了黄金分割法,简略介绍了斐波那契法、二分法、牛顿法、割线法等方法。