Algorithms for Constrained Optimization
基本思路:沿用无约束优化问题中的迭代方法: x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^k d^k xk+1=xk+αkdk。但是问题在于如何使得迭代满足约束条件。
Projection
Idea
If
x
k
+
α
k
d
k
∈
Ω
x^k+\alpha^k d^k\in\Omega
xk+αkdk∈Ω, then
x
k
+
1
=
x
k
+
α
k
d
k
x^{k+1}=x^k+\alpha^k d^k
xk+1=xk+αkdk
else $x^{k+1}=“project” on
Ω
\Omega
Ω.
Example
Ω
=
{
x
:
l
i
≤
x
i
≤
u
i
,
∀
i
}
,
l
i
,
u
i
∈
Ω
\Omega=\{x:l_i\leq x_i\leq u_i,\forall i\},l_i,u_i\in\Omega
Ω={x:li≤xi≤ui,∀i},li,ui∈Ω
y
i
=
{
u
i
,
x
i
≥
u
i
x
i
,
l
i
<
x
i
<
u
i
l
i
,
x
i
≤
l
i
y_i=\begin{cases} u_i,x_i\geq u_i\\ x_i,l_i<x_i<u_i\\ l_i,x_i\leq l_i \end{cases}
yi=⎩
⎨
⎧ui,xi≥uixi,li<xi<uili,xi≤li
Method
“Projection of
x
x
x on
Ω
\Omega
Ω”:
π
[
x
]
:
=
\pi[x]:=
π[x]:= the closest point of
Ω
\Omega
Ω to
x
x
x.
π
[
x
k
+
α
k
d
k
]
=
a
r
g
m
i
n
z
∈
Ω
∣
∣
z
−
(
x
k
+
α
k
d
k
)
∣
∣
\pi[x^k+\alpha^k d^k]=argmin_{z\in\Omega}||z-(x^k+\alpha^k d^k)||
π[xk+αkdk]=argminz∈Ω∣∣z−(xk+αkdk)∣∣
Project gradient method: x k + 1 = π [ x k − α k ∇ f ( x k ) ] x^{k+1}=\pi[x^k-\alpha^k \nabla f(x^k)] xk+1=π[xk−αk∇f(xk)], where α k = a r g m i n α ≥ 0 f ( x k − α ∇ f ( x k ) ) \alpha^k=argmin_{\alpha\geq 0} f(x^k-\alpha \nabla f(x^k)) αk=argminα≥0f(xk−α∇f(xk))
Problem
min ∣ ∣ z − ( x k + α k d k ) ∣ ∣ ||z-(x^k+\alpha^k d^k)|| ∣∣z−(xk+αkdk)∣∣ s.t. z ∈ Ω z\in\Omega z∈Ω is difficult to solve.
Solution
Orthogonal Projector
min
f
(
x
)
f(x)
f(x)
s.t.
A
x
=
b
Ax=b
Ax=b
A ∈ R m × n , m ≤ n , r a n k A = m A\in \mathbb{R}^{m\times n},m\leq n,rank A=m A∈Rm×n,m≤n,rankA=m
Definition
Def: Orthogonal Projector: P = I n − A T ( A A T ) − 1 A P=I_n-A^T(AA^T)^{-1}A P=In−AT(AAT)−1A
Remark
P = P T , P 2 = P × P = P P=P^T,P^2=P\times P=P P=PT,P2=P×P=P
Lemma
v ∈ R n v\in \mathbb{R}^n v∈Rn. Then, P v = 0 ⇔ v ∈ { x : x = A T y } P_v=0\Leftrightarrow v\in\{x:x=A^Ty\} Pv=0⇔v∈{x:x=ATy}
Theorem
x ∗ ∈ R n x^*\in\mathbb{R}^n x∗∈Rn is a feasible solution. P ∇ f ( x ∗ ) = 0 ⇔ x ∗ P\nabla f(x^*)=0\Leftrightarrow x^* P∇f(x∗)=0⇔x∗ satisfies the Lagrange’s condition.
Projection
x
k
+
1
=
π
[
x
k
−
α
k
∇
f
(
x
k
)
]
x^{k+1}=\pi[x^k-\alpha^k\nabla f(x^k)]
xk+1=π[xk−αk∇f(xk)]
=
x
k
−
α
k
P
∇
f
(
x
)
=x^k-\alpha^kP\nabla f(x)
=xk−αkP∇f(x)
Projected steepest descent
α k = a r g m i n α > 0 f ( x k − α ∇ f ( x k ) ) \alpha^k=argmin_{\alpha>0} f(x^k-\alpha\nabla f(x^k)) αk=argminα>0f(xk−α∇f(xk))
Properties
If x 0 x^0 x0 is feasible, then ∀ k : x k \forall k: x^k ∀k:xk is feasible.
Theorem
x k x^k xk: generated by “projected steepest descent”. If P ∇ f ( x k ) ≠ 0 P\nabla f(x^k)\neq 0 P∇f(xk)=0, then f ( x k + 1 ) < f ( x k ) f(x^{k+1})<f(x^k) f(xk+1)<f(xk).
Properties
x ∗ x^* x∗ is a global minimizer of a convex function f f f over { x : A x = b } ⇔ P ∇ f ( x ∗ ) = 0 \{x:Ax=b\}\Leftrightarrow P\nabla f(x^*)=0 {x:Ax=b}⇔P∇f(x∗)=0
Lagrange’s Algorithm
min
f
(
x
)
f(x)
f(x)
s.t.
h
(
x
)
=
0
h(x)=0
h(x)=0
h : R n → R m , l ( x , λ ) = f ( x ) + λ T h ( x ) h: \mathbb{R}^n\rightarrow \mathbb{R}^m, l(x,\lambda)=f(x)+\lambda^T h(x) h:Rn→Rm,l(x,λ)=f(x)+λTh(x)
Lagrange’s Algorithm: { x k + 1 = x k − α k ( ∇ f ( x k ) + D h ( x k ) T λ k ) λ k + 1 = λ k + β k h ( x k ) \begin{cases} x^{k+1}=x^k-\alpha^k(\nabla f(x^k)+Dh(x^k)^T\lambda^k)\\ \lambda^{k+1}=\lambda^k+\beta^k h(x^k) \end{cases} {xk+1=xk−αk(∇f(xk)+Dh(xk)Tλk)λk+1=λk+βkh(xk)
Theorem
Provided α , β \alpha,\beta α,β sufficiently small. ∃ \exist ∃ a neighborhood of ( x ∗ , λ ∗ ) (x^*,\lambda^*) (x∗,λ∗) ( ( x ∗ , λ ∗ ) ((x^*,\lambda^*) ((x∗,λ∗) satisfies ∇ f ( x ∗ ) + D h ( x ∗ ) T λ ∗ = 0 , L ( x ∗ , λ ∗ ) ≥ 0 ) \nabla f(x^*)+Dh(x^*)^T\lambda^*=0,L(x^*,\lambda^*)\geq 0) ∇f(x∗)+Dh(x∗)Tλ∗=0,L(x∗,λ∗)≥0) such that if ( x ∗ , λ ∗ ) (x^*,\lambda^*) (x∗,λ∗) is in this neighborhood, the algorithm converges to ( x ∗ , λ ∗ ) (x^*,\lambda^*) (x∗,λ∗) with at least a linear order.
min
f
(
x
)
f(x)
f(x)
s.t.
g
(
x
)
≤
0
g(x)\leq 0
g(x)≤0
l
(
x
,
μ
)
=
f
(
x
)
+
μ
T
g
(
x
)
l(x,\mu)=f(x)+\mu^T g(x)
l(x,μ)=f(x)+μTg(x)
x
k
+
1
=
x
k
−
α
k
(
∇
f
(
x
k
)
+
D
g
(
x
k
)
T
μ
k
)
x^{k+1}=x^k-\alpha^k(\nabla f(x^k)+Dg(x^k)^T\mu^k)
xk+1=xk−αk(∇f(xk)+Dg(xk)Tμk)
μ
k
+
1
=
[
μ
k
+
β
k
g
(
x
k
)
]
+
=
m
a
x
{
μ
k
+
β
k
g
(
x
k
)
,
0
}
\mu^{k+1}=[\mu^k+\beta^kg(x^k)]_+=max\{\mu^k+\beta^kg(x^k),0\}
μk+1=[μk+βkg(xk)]+=max{μk+βkg(xk),0}
Theorem
( x ∗ , μ ∗ ) (x^*,\mu^*) (x∗,μ∗) satisfies the KKT-conditions. L ( x ∗ , μ ) ≥ 0 L(x^*,\mu)\geq 0 L(x∗,μ)≥0. Provided α , β \alpha,\beta α,β sufficiently small, ∃ \exist ∃ a neighborhood, then the algorithm converges to ( x ∗ , μ ∗ ) (x^*,\mu^*) (x∗,μ∗) with at least a linear order.
Penalty Function
min
f
(
x
)
f(x)
f(x)
s.t.
x
∈
Ω
x\in\Omega
x∈Ω
⇒
\Rightarrow
⇒ min
f
(
x
)
+
r
P
(
x
)
f(x)+rP(x)
f(x)+rP(x)
r
∈
R
+
:
r\in \mathbb{R}^+:
r∈R+: penalty parameter.
P
(
x
)
:
R
n
→
R
P(x):\mathbb{R}^n\rightarrow \mathbb{R}
P(x):Rn→R: penalty function
Definition
P
P
P is a penalty function, if
(1)
P
P
P is continuous
(2)
P
(
x
)
≥
0
,
∀
x
∈
R
n
P(x)\geq 0,\forall x\in\mathbb{R}^n
P(x)≥0,∀x∈Rn
(3)
P
(
x
)
=
0
⇔
x
∈
Ω
P(x)=0\Leftrightarrow x\in\Omega
P(x)=0⇔x∈Ω
min
f
(
x
)
f(x)
f(x)
s.t.
g
i
(
x
)
≤
0
g_i(x)\leq 0
gi(x)≤0
⇒
p
(
x
)
=
∑
i
g
i
+
(
x
)
\Rightarrow p(x)=\sum\limits_i g_i^+(x)
⇒p(x)=i∑gi+(x)
where
g
i
+
(
x
)
=
m
a
x
{
0
,
g
i
(
x
)
}
g_i^+(x)=max\{0,g_i(x)\}
gi+(x)=max{0,gi(x)}
Example
g
1
(
x
)
=
x
−
2
g_1(x)=x-2
g1(x)=x−2
g
2
(
x
)
=
−
(
x
+
1
)
3
g_2(x)=-(x+1)^3
g2(x)=−(x+1)3
g
1
+
(
x
)
=
{
0
,
x
≤
2
x
−
2
,
otherwise
g_1^+(x)=\begin{cases} 0,x\leq 2\\ x-2,\text{otherwise} \end{cases}
g1+(x)={0,x≤2x−2,otherwise
g
2
+
(
x
)
=
{
0
,
x
≥
−
1
−
(
x
+
1
)
3
,
otherwise
g_2^+(x)=\begin{cases} 0,x\geq -1\\ -(x+1)^3,\text{otherwise} \end{cases}
g2+(x)={0,x≥−1−(x+1)3,otherwise
P
(
x
)
=
{
x
−
2
,
x
>
2
0
,
−
1
≤
x
≤
2
−
(
x
+
1
)
3
,
x
<
−
1
P(x)=\begin{cases} x-2,x>2\\ 0,-1\leq x \leq 2\\ -(x+1)^3,x<-1 \end{cases}
P(x)=⎩
⎨
⎧x−2,x>20,−1≤x≤2−(x+1)3,x<−1
Def: Courant-Beltrami-Penalty: P ( x ) = ∑ i = 1 p ( g i + ( x ) ) 2 P(x)=\sum_{i=1}^p (g_i^+(x))^2 P(x)=∑i=1p(gi+(x))2
Multi-objective Optimization
min
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
⋯
f
l
(
x
)
]
f(x)=\begin{bmatrix} f_1(x)\\ f_2(x)\\ \cdots \\ f_l(x) \end{bmatrix}
f(x)=
f1(x)f2(x)⋯fl(x)
s.t.
x
∈
Ω
x\in\Omega
x∈Ω
Pareto-optimal
Pareto-optimal: x ∗ ∈ Ω x^*\in\Omega x∗∈Ω. If ∄ x ∈ Ω \not \exist x\in\Omega ∃x∈Ω s.t. for i = 1 , ⋯ , l : f i ( x ) ≤ f i ( x ∗ ) i=1,\cdots,l:f_i(x)\leq f_i(x^*) i=1,⋯,l:fi(x)≤fi(x∗) and ∃ i : f i ( x ) ≤ f i ( x ∗ ) \exist i:f_i(x)\leq f_i(x^*) ∃i:fi(x)≤fi(x∗)
Multi to Single
①Weighted sum:
f
(
x
)
=
∑
w
i
f
i
(
x
)
f(x)=\sum w_i f_i(x)
f(x)=∑wifi(x)
②MiniMax:
f
(
x
)
=
max
i
{
f
i
(
x
)
}
f(x)=\max\limits_i\{f_i(x)\}
f(x)=imax{fi(x)}
③p-norm:
f
(
x
)
=
∣
∣
f
i
(
x
)
∣
∣
p
=
f
1
p
(
x
)
+
⋯
+
f
l
p
(
x
)
f(x)=||f_i(x)||_p=f_1^p(x)+\cdots+f_l^p(x)
f(x)=∣∣fi(x)∣∣p=f1p(x)+⋯+flp(x)
④satisfactory: min
f
1
(
x
)
f_1(x)
f1(x)
s.t.
f
2
(
x
)
≤
b
2
,
⋯
,
f
l
(
x
)
≤
b
l
f_2(x)\leq b_2,\cdots,f_l(x)\leq b_l
f2(x)≤b2,⋯,fl(x)≤bl
总结
这节课主要介绍了约束优化问题的算法,分为投影法和惩罚函数法。在投影法中,为了解决迭代方法中难以求得满足限制条件的最小值问题,引入了正交投影算子。在惩罚函数法中,引入了惩罚函数,对落在约束区域外的点进行惩罚。最后简单介绍了多目标优化问题。多目标优化问题较难,现有的理论较少,只简单介绍了帕累托最优,以及将多目标优化问题转换成单目标优化问题的几种方法。至此,优化理论与优化方法的内容就全部结束啦。
课程考察重点
FONC,SONC,SOSC的应用。
Gradient method, Newton method, Conjugate method等优化方法的应用。
单纯形法,拉格朗日条件,KKT条件。
纯应用,没有证明。五道大题。
重点考察对方法是否熟悉,侧重过程,不侧重计算。