高级优化理论与方法(八)
Global Search Method
之前的方法都需要用到函数Gradient,并且要求用户自己给出初始点 x 0 x_0 x0。接下来介绍几种不一样的启发式算法。
Neler-Mead Simplex
Def: “Simplex”
R
n
\mathbb{R}^n
Rn
Object determined by an assemby of
n
+
1
n+1
n+1 points.
d e t [ P 0 P 1 ⋯ P n 1 1 ⋯ 1 ] ≠ 0 det\begin{bmatrix} P_0&P_1&\cdots &P_n \\ 1&1&\cdots &1 \end{bmatrix}\neq 0 det[P01P11⋯⋯Pn1]=0
Initialize:
P
0
,
⋯
,
P
n
∈
R
n
P_0,\cdots,P_n\in \mathbb{R}^n
P0,⋯,Pn∈Rn
(
P
i
=
P
0
+
α
i
e
i
,
α
i
∈
R
,
e
i
=
[
0
⋯
0
1
0
⋯
0
]
)
P_i=P_0+\alpha_i e_i, \alpha_i\in \mathbb{R}, e_i=\begin{bmatrix} 0\\ \cdots \\ 0\\ 1\\ 0\\ \cdots \\ 0 \end{bmatrix})
Pi=P0+αiei,αi∈R,ei=
0⋯010⋯0
)
注:上面给出了一种可行的初始化方法,
e
i
e_i
ei表示只有第
i
i
i维为1,其余都为0的
n
n
n维向量。
Update: replace P i P_i Pi with the max f ( P i ) f(P_i) f(Pi) by a new point.
Termination conditions satisfied.
2-dimensional: P s , P n l , P l : f ( P s ) ≤ f ( P n l ) ≤ f ( P l ) P_s, P_{nl}, P_l: f(P_s)\leq f(P_{nl})\leq f(P_l) Ps,Pnl,Pl:f(Ps)≤f(Pnl)≤f(Pl)
注:二维情况下,有三个初始点,将三个初始点按照大小关系排列产生 P s , P n l , P l P_s, P_{nl}, P_l Ps,Pnl,Pl。
f ( P 0 ) ≤ f ( P 1 ) ≤ ⋯ ≤ f ( P n ) f(P_0)\leq f(P_1)\leq \cdots \leq f(P_n) f(P0)≤f(P1)≤⋯≤f(Pn)
P g = 1 n ∑ i = 0 n − 1 P i P_g=\frac{1}{n} \sum_{i=0}^{n-1} P_i Pg=n1∑i=0n−1Pi
P r = P g + ρ ( P g − P l ) P_r=P_g+\rho (P_g-P_l) Pr=Pg+ρ(Pg−Pl) [typical: ρ = 1 \rho=1 ρ=1]
接下来都是以二维情况举例,进行分类讨论。
Case 1
f ( P s ) ≤ f ( P r ) ≤ f ( P n l ) f(P_s)\leq f(P_r)\leq f(P_{nl}) f(Ps)≤f(Pr)≤f(Pnl)
replace P l P_l Pl by P r → P_r\rightarrow Pr→ next iteration
Case 2
f ( P r ) < f ( P s ) f(P_r)<f(P_s) f(Pr)<f(Ps)
expansion: P e = P g + λ ( P g − P l ) P_e=P_g+\lambda (P_g-P_l) Pe=Pg+λ(Pg−Pl) [ λ = 2 \lambda=2 λ=2]
Case 2.1
f ( P e ) ≤ f ( P r ) f(P_e)\leq f(P_r) f(Pe)≤f(Pr)
replace P l P_l Pl by P e P_e Pe
Case 2.2
otherwise
replace
P
l
P_l
Pl by
P
r
P_r
Pr
Case 3
f ( P r ) > f ( P n l ) f(P_r)>f(P_{nl}) f(Pr)>f(Pnl)
Case 3.1
f ( P l ) > f ( P r ) : P c = P g + r ( P r − P g ) f(P_l)>f(P_r): P_c=P_g+r(P_r-P_g) f(Pl)>f(Pr):Pc=Pg+r(Pr−Pg) [ r = 1 2 r=\frac{1}{2} r=21]
Case 3.2
otherwise
P
c
=
P
g
+
r
(
P
l
−
P
g
)
P_c=P_g+r(P_l-P_g)
Pc=Pg+r(Pl−Pg) [
r
=
1
2
r=\frac{1}{2}
r=21]
If
f
(
P
c
)
<
f
(
P
l
)
f(P_c)<f(P_l)
f(Pc)<f(Pl) then replace
P
l
P_l
Pl by
P
c
P_c
Pc, next iteration.
Otherwise, shrinkage:
∀
i
:
V
i
=
δ
(
P
l
−
P
s
)
\forall i: V_i=\delta (P_l-P_s)
∀i:Vi=δ(Pl−Ps) [
δ
=
1
2
\delta=\frac{1}{2}
δ=21]
Simulated Annealing
模拟退火算法是一种随机搜索(Randomized Search) 算法。
Def: “Neighborhood” of x x x: N ϵ ( x ) = { x ′ : d ( x , x ′ ) ≤ ϵ } N_{\epsilon}(x)=\{x': d(x,x')\leq \epsilon\} Nϵ(x)={x′:d(x,x′)≤ϵ}
Naive Random Search
- k : = 0 k:=0 k:=0, initialize x 0 x^0 x0
- Pick a point z k z^k zk at random from N ϵ ( x k ) N_{\epsilon}(x^k) Nϵ(xk)
- If f ( z k ) < f ( x k ) f(z^k)<f(x^k) f(zk)<f(xk), then x k + 1 = z k x^{k+1}=z^k xk+1=zk; else x k + 1 = x k x^{k+1}=x^k xk+1=xk
- If some stop criterium satisfied, then stop
- k k k++; Goto 2
Problem: local optimum
way: enlarge
N
ϵ
(
x
)
N_{\epsilon}(x)
Nϵ(x)
Simulated Annealing
- Toss coin with probability of HEAD equal to p ( k , f ( x k ) , f ( z k ) ) p(k,f(x^k),f(z^k)) p(k,f(xk),f(zk)). If HEAD, then x k + 1 = z k x^{k+1}=z^k xk+1=zk; else x k + 1 = x k x^{k+1}=x^k xk+1=xk
P ( k , f ( x k ) , f ( z k ) ) = m i n { 1 , e x p ( − f ( z k ) − f ( x k ) T k ) } P(k,f(x^k),f(z^k))=min\{1,exp(-\frac{f(z^k)-f(x^k)}{T_k})\} P(k,f(xk),f(zk))=min{1,exp(−Tkf(zk)−f(xk))}
where T k T_k Tk is a positive sequence.
T
k
=
r
l
o
g
(
k
+
2
)
,
r
>
0
T_k=\frac{r}{log(k+2)}, r>0
Tk=log(k+2)r,r>0
monotonically decreased to 0.
{ f ( z k < f ( x k ) : x k + 1 = z k ( 概率为 1 ) f ( z k ) ≥ f ( x k ) : x k + 1 = z k ( 概率为 e x p ( − f ( z k ) − f ( x k ) T k ) ) \begin{cases} f(z^k<f(x^k): x^{k+1}=z^k(概率为1) \\ f(z^k)\geq f(x^k): x^{k+1}=z^k (概率为exp(-\frac{f(z^k)-f(x^k)}{T_k})) \end{cases} {f(zk<f(xk):xk+1=zk(概率为1)f(zk)≥f(xk):xk+1=zk(概率为exp(−Tkf(zk)−f(xk)))
k
→
∞
k\to \infty
k→∞: “escape” probability decreased.
注:该方法通过抛硬币的方式,解决了朴素随机搜索中可能陷入局部最小值的问题。
Particle Swarm Optimization (PSO)
粒子群优化
∣ P ∣ = m |P|=m ∣P∣=m
∀ i : p i b e s t \forall i: p_i^{best} ∀i:pibest
g b e s t g^{best} gbest: globally best
basic PSO
- k : = 0 k:=0 k:=0, generate initial random points. < p i 0 , v i 0 > p i b e s t = p i 0 , g b e s t = a r g m i n i f ( p i 0 ) <p_i^0,v_i^0>p_i^{best}=p_i^0, g^{best}=argmin_i f(p_i^0) <pi0,vi0>pibest=pi0,gbest=argminif(pi0)
- For i = 1 , ⋯ , m i=1,\cdots,m i=1,⋯,m generate random vectors r i k , s i k r_i^k, s_i^k rik,sik with components from {0,1}, and set ω < 1 , c 1 , c 2 ≈ 2 ⇒ V i k + 1 = ω V i k + c 1 r i k ( p i b e s t , k − p i k ) + c 2 s i k ( g b e s t , k − p i k ) , p i k + 1 = p i k + V i k + 1 \omega<1,c_1,c_2\approx2\Rightarrow V_i^{k+1}=\omega V_i^k+c_1r_i^k(p_i^{best,k}-p_i^k)+c_2s_i^k(g^{best,k}-p_i^k), p_i^{k+1}=p_i^k+V_i^{k+1} ω<1,c1,c2≈2⇒Vik+1=ωVik+c1rik(pibest,k−pik)+c2sik(gbest,k−pik),pik+1=pik+Vik+1
- For i = 1 , ⋯ , m i=1,\cdots,m i=1,⋯,m do: if f ( p i k + 1 ) < f ( p i b e s t , k ) f(p_i^{k+1})<f(p_i^{best,k}) f(pik+1)<f(pibest,k), then p i b e s t , k + 1 = p i k + 1 p_i^{best,k+1}=p_i^{k+1} pibest,k+1=pik+1; else p i b e s t , k + 1 = p i b e s t , k p_i^{best,k+1}=p_i^{best,k} pibest,k+1=pibest,k
- If ∃ i ∈ { 1 , ⋯ , m } \exist i\in \{1,\cdots,m\} ∃i∈{1,⋯,m} with f ( p i k + 1 ) < f ( g b e s t , k ) f(p_i^{k+1})<f(g^{best,k}) f(pik+1)<f(gbest,k) then g b e s t , k + 1 = p i k + 1 g^{best,k+1}=p_i^{k+1} gbest,k+1=pik+1; else g b e s t , k + 1 = g b e s t , k g^{best,k+1}=g^{best,k} gbest,k+1=gbest,k
- If some stop criterion satisfied then stop;
- k k k++; goto 2
Genetic Algorithms
遗传算法
representation scheme: ①selection②cross over③mutation
算法流程:
- P 0 P_0 P0
- Selection → M k \rightarrow M_k →Mk
- Cross Over
- Mutation
- If some stop criterion satisfied then stop;
- goto 2
为了表述方便,这里假定求最大值而非最小值。
Selection
population set: ∣ P ( k ) ∣ = N |P(k)|=N ∣P(k)∣=N
P ( k ) = { x 1 , ⋯ , x N } P(k)=\{x_1,\cdots,x_N\} P(k)={x1,⋯,xN}
∣ M ( k ) ∣ = N |M(k)|=N ∣M(k)∣=N
注:Selection的目的是从大小为N的population set(即 P P P)中选出N个元素组成 M M M。
Rouletle-Wheel
P r o b ( x i → M ( k ) ) = f ( x i ) F ( k ) , F ( k ) = ∑ i = 1 N f ( x i ) Prob(x_i\to M(k))=\frac{f(x_i)}{F(k)}, F(k)=\sum_{i=1}^N f(x_i) Prob(xi→M(k))=F(k)f(xi),F(k)=∑i=1Nf(xi)
Tournament Scheme
随机两个元素 x i , x j x_i,x_j xi,xj,若 f ( x i ) > f ( x j ) f(x_i)>f(x_j) f(xi)>f(xj),则选取 x i x_i xi进入 M M M
Cross Over
随机两个元素 x i , x j x_i,x_j xi,xj,将 x i x_i xi的前半部分和 x j x_j xj的后半部分结合,形成新的元素。
Mutation
以较低的概率对元素 x i x_i xi的某一位进行变异。
Constrained Optimization
min
f
(
x
)
f(x)
f(x)
s.t.
x
∈
Ω
x\in \Omega
x∈Ω
Linear Programming(LP)
min/max
f
(
x
)
=
c
T
x
=
∑
i
=
1
n
c
i
x
i
j
,
c
∈
R
n
,
x
∈
R
n
f(x)=c^Tx=\sum_{i=1}^nc_ix_{ij}, c\in \mathbb{R}^n, x \in \mathbb{R}^n
f(x)=cTx=∑i=1ncixij,c∈Rn,x∈Rn
s.t.
{
a
11
x
1
+
⋯
+
a
1
n
x
n
>
b
1
a
21
x
1
+
⋯
+
a
2
n
x
n
≤
b
2
⋯
a
m
1
x
1
+
⋯
+
a
m
n
x
n
≥
b
m
\begin{cases} a_{11}x_1+\cdots+a_{1n}x_n>b_1\\ a_{21}x_1+\cdots+a_{2n}x_n\leq b_2\\ \cdots\\ a_{m1}x_1+\cdots+a_{mn}x_n\geq b_m \end{cases}
⎩
⎨
⎧a11x1+⋯+a1nxn>b1a21x1+⋯+a2nxn≤b2⋯am1x1+⋯+amnxn≥bm
b
i
∈
R
,
∀
1
≤
i
≤
m
b_i\in\mathbb{R},\forall 1\leq i\leq m
bi∈R,∀1≤i≤m
a
i
j
∈
R
,
∀
1
≤
i
≤
n
,
1
≤
j
≤
m
a_{ij}\in\mathbb{R}, \forall 1\leq i\leq n, 1\leq j\leq m
aij∈R,∀1≤i≤n,1≤j≤m
Complex
LP Standtard Form
min
c
T
x
c^Tx
cTx
s.t.
A
x
≥
b
Ax\geq b
Ax≥b
Normal Form
min
c
T
x
c^Tx
cTx
s.t.
A
x
=
b
Ax=b
Ax=b
x
≥
0
x\geq 0
x≥0
注:为了满足
x
≥
0
x\geq0
x≥0,若
x
i
x_i
xi的要求是小于等于0,则可以用
−
x
i
-x_i
−xi来代替
x
i
x_i
xi;若
x
i
x_i
xi没有要求,则可以令
x
i
=
u
−
v
,
u
,
v
≥
0
x_i=u-v,u,v\geq0
xi=u−v,u,v≥0。
Example
max
x
2
−
x
1
x_2-x_1
x2−x1
s.t.
3
x
1
=
x
2
−
5
3x_1=x_2-5
3x1=x2−5
∣
x
2
∣
≤
2
|x_2|\leq 2
∣x2∣≤2
x
1
≤
0
x_1\leq 0
x1≤0
①min
x
1
−
x
2
x_1-x_2
x1−x2
②
x
1
←
−
x
1
x_1\leftarrow-x_1
x1←−x1
③
∣
x
2
∣
≤
2
⇒
x
2
≤
2
,
x
2
≥
−
2
|x_2|\leq2\Rightarrow x_2\leq 2, x_2\geq -2
∣x2∣≤2⇒x2≤2,x2≥−2
④
x
2
=
u
−
v
,
u
,
v
≥
0
x_2=u-v,u,v\geq 0
x2=u−v,u,v≥0
min
−
x
1
−
(
u
−
v
)
-x_1-(u-v)
−x1−(u−v)
s.t.
−
3
x
1
=
u
−
v
−
5
-3x_1=u-v-5
−3x1=u−v−5
u
−
v
≤
2
u-v\leq 2
u−v≤2
u
−
v
≥
2
u-v\geq2
u−v≥2
x
1
,
u
,
v
≥
0
x_1,u,v\geq0
x1,u,v≥0
min
−
x
1
−
u
+
v
-x_1-u+v
−x1−u+v
s.t.
3
x
1
+
u
−
v
=
5
3x_1+u-v=5
3x1+u−v=5
u
−
v
+
y
=
2
u-v+y=2
u−v+y=2
u
−
v
−
z
=
−
2
u-v-z=-2
u−v−z=−2
x
1
,
u
,
v
,
y
,
z
≥
0
x_1,u,v,y,z\geq0
x1,u,v,y,z≥0
Theorem
For each LP, there exists an equivalent LP in normal form.
总结
这节课先介绍了一些全局搜索法。介绍了奈勒-米德单纯形算法,模拟退火算法,粒子群优化算法,遗传算法(这里讲的比较粗略,可以参考我的另一篇博客)。这些算法都属于启发式算法,算法的理论基础较为薄弱,所以在介绍算法之后没有做过多展开。
到这周是第八周了,学期过半。前半学期都在介绍无限制条件的优化算法,后半学期要开始介绍带限制条件的优化算法了。这节课先从比较简单的线性优化开始,介绍单纯形法。这节课证明了任何线性优化问题都可以转化为规范形式,这方便了我们后面的求解。