高级优化理论与方法(十三)
Non-linear Constrained Optimization
min
f
(
x
)
f(x)
f(x)
s.t.
h
(
x
)
=
0
h(x)=0
h(x)=0
g
(
x
)
≤
0
g(x)\leq 0
g(x)≤0
x ∈ R n , f : R n → R , h : R n → R m , g : R n → R p x\in \mathbb{R}^n, f: \mathbb{R}^n\rightarrow \mathbb{R},h:\mathbb{R}^n\rightarrow \mathbb{R}^m,g:\mathbb{R}^n\rightarrow \mathbb{R}^p x∈Rn,f:Rn→R,h:Rn→Rm,g:Rn→Rp
注:非线性优化问题和线性优化问题的最大区别在于目标函数是否是线性函数。
Case 1
min
f
(
x
)
=
0
f(x)=0
f(x)=0
s.t.
h
(
x
)
=
0
h(x)=0
h(x)=0
h : R n → R m , h ∈ C 1 h:\mathbb{R}^n\rightarrow \mathbb{R}^m,h\in C^1 h:Rn→Rm,h∈C1 (continuously differential)
Definition
Def: Let x ∗ x^* x∗ be with h 1 ( x ∗ ) = 0 , ⋯ , h m ( x ∗ ) = 0 h_1(x^*)=0,\cdots,h_m(x^*)=0 h1(x∗)=0,⋯,hm(x∗)=0. x ∗ x^* x∗ is a regular point, if ∇ h 1 ( x ∗ ) , ⋯ , ∇ h m ( x ∗ ) \nabla h_1(x^*),\cdots,\nabla h_m(x^*) ∇h1(x∗),⋯,∇hm(x∗) are linearly independent.
Jacobian: D h ( x ∗ ) = [ D h 1 ( x ∗ ) D h 2 ( x ∗ ) ⋯ D h m ( x ∗ ) ] T Dh(x^*)=\begin{bmatrix} Dh_1(x^*)\\ Dh_2(x^*)\\ \cdots\\ Dh_m(x^*) \end{bmatrix}^T Dh(x∗)= Dh1(x∗)Dh2(x∗)⋯Dhm(x∗) T
Def: Surface: S = { x ∈ R n : h 1 ( x ) = 0 , ⋯ , h m ( x ) = 0 } S=\{x\in\mathbb{R}^n:h_1(x)=0,\cdots,h_m(x)=0\} S={x∈Rn:h1(x)=0,⋯,hm(x)=0}
Example 1
n
=
3
,
m
=
1
,
h
(
x
)
=
x
2
−
x
3
2
n=3,m=1,h(x)=x_2-x_3^2
n=3,m=1,h(x)=x2−x32
D
h
(
x
)
=
[
0
,
1
,
−
2
x
3
]
Dh(x)=[0,1,-2x_3]
Dh(x)=[0,1,−2x3]
∀
x
∈
R
3
,
D
h
(
x
)
≠
0
\forall x\in\mathbb{R}^3,Dh(x)\neq 0
∀x∈R3,Dh(x)=0
S
=
{
x
:
x
2
−
x
3
2
=
0
}
S=\{x:x_2-x_3^2=0\}
S={x:x2−x32=0}
Example 2
h 1 ( x ) = x 1 , h 2 ( x ) = x 2 − x 3 2 h_1(x)=x_1,h_2(x)=x_2-x_3^2 h1(x)=x1,h2(x)=x2−x32
D
h
(
x
∗
)
=
[
1
0
0
0
1
−
2
x
3
]
Dh(x^*)=\begin{bmatrix} 1&0&0\\ 0&1&-2x_3 \end{bmatrix}
Dh(x∗)=[10010−2x3]
S
=
{
x
:
x
1
=
0
,
x
2
−
x
3
2
=
0
}
S=\{x:x_1=0,x_2-x_3^2=0\}
S={x:x1=0,x2−x32=0}
Necessary/Sufficient Conditions
FONC:
x
∗
x^*
x∗ local minimizer
⇒
∇
f
(
x
∗
)
=
0
\Rightarrow \nabla f(x^*)=0
⇒∇f(x∗)=0
SONC:
x
∗
x^*
x∗ local minimizer
⇒
∇
f
(
x
∗
)
=
0
,
∀
y
:
y
T
F
(
x
∗
)
y
≥
0
\Rightarrow \nabla f(x^*)=0,\forall y:y^T F(x^*)y\geq 0
⇒∇f(x∗)=0,∀y:yTF(x∗)y≥0
SOSC: (1)
∇
f
(
x
∗
)
=
0
\nabla f(x^*)=0
∇f(x∗)=0 (2)
∀
y
:
y
T
F
(
x
∗
)
y
≥
0
⇒
x
∗
\forall y:y^T F(x^*)y\geq 0\Rightarrow x^*
∀y:yTF(x∗)y≥0⇒x∗ strictly local minimizer
Definition
Def: A curve C C C on a surface S S S is a set of points { x ( t ) ∈ S : t ∈ ( a , b ) } , x ( t ) : R → R n \{x(t)\in S:t\in(a,b)\},x(t):\mathbb{R}\rightarrow \mathbb{R}^n {x(t)∈S:t∈(a,b)},x(t):R→Rn is a continuous function.
Curve differentiable:
x
˙
(
t
)
=
d
x
d
t
(
t
)
=
[
x
˙
1
(
t
)
x
˙
2
(
t
)
⋯
x
˙
n
(
t
)
]
\dot{x}(t)=\frac{dx}{dt}(t)=\begin{bmatrix} \dot{x}_1(t)\\ \dot{x}_2(t)\\ \cdots\\ \dot{x}_n(t) \end{bmatrix}
x˙(t)=dtdx(t)=
x˙1(t)x˙2(t)⋯x˙n(t)
exists for all
t
∈
(
a
,
b
)
t\in (a,b)
t∈(a,b)
x
¨
(
t
)
=
d
2
x
d
t
2
(
t
)
=
[
x
¨
1
(
t
)
x
¨
2
(
t
)
⋯
x
¨
n
(
t
)
]
\ddot{x}(t)=\frac{d^2x}{dt^2}(t)=\begin{bmatrix} \ddot{x}_1(t)\\ \ddot{x}_2(t)\\ \cdots\\ \ddot{x}_n(t) \end{bmatrix}
x¨(t)=dt2d2x(t)=
x¨1(t)x¨2(t)⋯x¨n(t)
exists for all
t
∈
(
a
,
b
)
t\in (a,b)
t∈(a,b)
Def: tangent space at x ∗ ∈ S = { x ∈ R n : h ( x ) = 0 } x^*\in S=\{x\in\mathbb{R}^n:h(x)=0\} x∗∈S={x∈Rn:h(x)=0} is the set T ( x ∗ ) = { y : D h ( x ∗ ) y = 0 } T(x^*)=\{y:Dh(x^*)y=0\} T(x∗)={y:Dh(x∗)y=0}
Example
S = { x ∈ R 3 : h 1 ( x ) = x 1 = 0 , h 2 ( x ) = x 1 − x 2 = 0 } S=\{x\in \mathbb{R}^3: h_1(x)=x_1=0,h_2(x)=x_1-x_2=0\} S={x∈R3:h1(x)=x1=0,h2(x)=x1−x2=0}
D
h
(
x
∗
)
=
[
1
0
0
1
−
1
0
]
Dh(x^*)=\begin{bmatrix} 1&0&0\\ 1&-1&0 \end{bmatrix}
Dh(x∗)=[110−100]
S
S
S regular points
T
(
x
)
=
{
y
:
∇
h
1
(
x
)
T
y
=
0
,
∇
h
2
(
x
)
T
y
=
0
}
=
{
[
0
,
0
,
α
]
:
α
∈
R
}
⇒
x
3
T(x)=\{y:\nabla h_1(x)^Ty=0,\nabla h_2(x)^Ty=0\}=\{[0,0,\alpha]:\alpha\in\mathbb{R}\}\Rightarrow x_3
T(x)={y:∇h1(x)Ty=0,∇h2(x)Ty=0}={[0,0,α]:α∈R}⇒x3-axis
Theorem
Thm: Let x ∗ x^* x∗ be a regular point. T ( x ∗ ) T(x^*) T(x∗): tangent space at x ∗ x^* x∗. Then: y ∈ T ( x ∗ ) ⇔ ∃ y\in T(x^*)\Leftrightarrow \exist y∈T(x∗)⇔∃ differentiable curve on S S S passing through x ∗ x^* x∗ with derivative y y y at x ∗ x^* x∗.
FONC(Lagrange’s Condition)
2-Dimensional
h
:
R
3
→
R
h: \mathbb{R}^3\rightarrow \mathbb{R}
h:R3→R
Let
x
∗
=
[
x
1
∗
,
x
2
∗
]
T
,
h
(
x
∗
)
=
0
x^*=[x_1^*,x_2^*]^T, h(x^*)=0
x∗=[x1∗,x2∗]T,h(x∗)=0
Assume
∇
h
(
x
∗
)
≠
0
\nabla h(x^*)\neq 0
∇h(x∗)=0
Let
x
(
t
)
:
R
→
R
2
,
x
(
t
)
x(t):\mathbb{R} \rightarrow \mathbb{R}^2,x(t)
x(t):R→R2,x(t) continuously differentiable.
x
(
t
)
=
[
x
1
(
t
)
x
2
(
t
)
]
,
t
∈
(
a
,
b
)
,
x
∗
=
x
(
t
∗
)
x(t)=\begin{bmatrix} x_1(t)\\ x_2(t) \end{bmatrix},t\in(a,b),x^*=x(t^*)
x(t)=[x1(t)x2(t)],t∈(a,b),x∗=x(t∗)
∵
∀
t
∈
(
a
,
b
)
:
h
(
x
(
t
)
)
=
0
\because \forall t\in (a,b): h(x(t))=0
∵∀t∈(a,b):h(x(t))=0
∴
∀
t
:
d
d
t
h
(
x
(
t
)
)
=
0
\therefore \forall t: \frac{d}{dt}h(x(t))=0
∴∀t:dtdh(x(t))=0
∴
∇
h
(
x
∗
)
\therefore \nabla h(x^*)
∴∇h(x∗) orthogonal to
x
(
t
∗
)
x(t^*)
x(t∗)
Assume x ∗ = x ( t ∗ ) x^*=x(t^*) x∗=x(t∗) minimizer of f ( x ) f(x) f(x) on S = { x : h ( x ) = 0 } S=\{x:h(x)=0\} S={x:h(x)=0}
Define
ϕ
(
t
)
=
f
(
x
(
t
)
)
⇒
F
O
N
C
d
ϕ
d
t
(
t
∗
)
=
0
\phi(t)=f(x(t))\stackrel{FONC}{\Rightarrow} \frac{d\phi}{dt}(t^*)=0
ϕ(t)=f(x(t))⇒FONCdtdϕ(t∗)=0
0
=
d
d
t
ϕ
(
t
∗
)
=
∇
f
(
x
(
t
∗
)
)
T
x
˙
(
t
∗
)
=
∇
f
(
x
∗
)
T
x
˙
(
t
∗
)
0=\frac{d}{dt}\phi(t^*)=\nabla f(x(t^*))^T\dot{x}(t^*)=\nabla f(x^*)^T\dot{x}(t^*)
0=dtdϕ(t∗)=∇f(x(t∗))Tx˙(t∗)=∇f(x∗)Tx˙(t∗)
⇒
∇
f
(
x
∗
)
\Rightarrow \nabla f(x^*)
⇒∇f(x∗) is orthogonal to
x
˙
(
t
∗
)
\dot{x}(t^*)
x˙(t∗)
∇
f
(
x
∗
)
=
λ
∇
h
(
x
∗
)
\nabla f(x^*)=\lambda \nabla h(x^*)
∇f(x∗)=λ∇h(x∗)
Summary:
x
∗
x^*
x∗ is a minimizer of
f
:
R
2
→
R
f:\mathbb{R}^2\rightarrow \mathbb{R}
f:R2→R with
h
(
x
)
=
0
,
h
:
R
2
→
R
h(x)=0,h:\mathbb{R}^2\rightarrow \mathbb{R}
h(x)=0,h:R2→R. Then,
∇
h
(
x
∗
)
\nabla h(x^*)
∇h(x∗) and
∇
f
(
x
∗
)
\nabla f(x^*)
∇f(x∗) are parallel.
⇒
\Rightarrow
⇒ If
∇
h
(
x
∗
)
≠
0
\nabla h(x^*)\neq 0
∇h(x∗)=0, then
∃
λ
∗
\exist \lambda^*
∃λ∗ s.t.
∇
f
(
x
∗
)
+
λ
∗
∇
h
(
x
∗
)
=
0
\nabla f(x^*)+\lambda^*\nabla h(x^*)=0
∇f(x∗)+λ∗∇h(x∗)=0
Lagrange’s Theorem[FONC]
x ∗ x^* x∗ is a local minimizer of f : R n → R f:\mathbb{R}^n\rightarrow\mathbb{R} f:Rn→R, subject to h ( x ) = 0 , h : R n → R m , m ≤ n h(x)=0, h:\mathbb{R}^n\rightarrow\mathbb{R}^m,m\leq n h(x)=0,h:Rn→Rm,m≤n. Assume x ∗ x^* x∗ is regular. Then ∃ x ∗ ∈ R m \exist x^*\in \mathbb{R}^m ∃x∗∈Rm s.t. D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)=0 Df(x∗)+λ∗TDh(x∗)=0
Lagrange’s Function
Lagrange’s function:
l
:
R
n
×
R
m
→
R
l:\mathbb{R}^n\times\mathbb{R}^m\rightarrow \mathbb{R}
l:Rn×Rm→R
l
(
x
,
λ
)
=
f
(
x
)
+
λ
T
h
(
x
)
l(x,\lambda)=f(x)+\lambda^Th(x)
l(x,λ)=f(x)+λTh(x)
min
l
(
x
,
λ
)
⇐
l(x,\lambda)\Leftarrow
l(x,λ)⇐ FONC
D
l
(
x
∗
,
λ
∗
)
=
0
⇒
{
D
x
l
(
x
∗
,
λ
∗
)
=
0
D
λ
l
(
x
∗
,
λ
∗
)
=
0
Dl(x^*,\lambda^*)=0\Rightarrow \begin{cases} D_xl(x^*,\lambda^*)=0\\ D_{\lambda}l(x^*,\lambda^*)=0 \end{cases}
Dl(x∗,λ∗)=0⇒{Dxl(x∗,λ∗)=0Dλl(x∗,λ∗)=0
Example 1
已知长方体的表面积为
A
A
A,求体积的最大值。
max
x
1
x
2
x
3
x_1x_2x_3
x1x2x3
s.t.
x
1
x
2
+
x
2
x
3
+
x
1
x
3
=
A
2
(
A
>
0
)
x_1x_2+x_2x_3+x_1x_3=\frac{A}{2}(A>0)
x1x2+x2x3+x1x3=2A(A>0)
f
(
x
)
=
−
x
1
x
2
x
3
,
h
(
x
)
=
x
1
x
2
+
x
2
x
3
+
x
1
x
3
−
A
2
f(x)=-x_1x_2x_3,h(x)=x_1x_2+x_2x_3+x_1x_3-\frac{A}{2}
f(x)=−x1x2x3,h(x)=x1x2+x2x3+x1x3−2A
∇
f
(
x
)
=
[
−
x
2
x
3
,
−
x
1
x
3
,
−
x
1
x
2
]
T
\nabla f(x)=[-x_2x_3,-x_1x_3,-x_1x_2]^T
∇f(x)=[−x2x3,−x1x3,−x1x2]T
∇
h
(
x
)
=
[
x
2
+
x
3
,
x
1
+
x
3
,
x
1
+
x
2
]
T
\nabla h(x)=[x_2+x_3,x_1+x_3,x_1+x_2]^T
∇h(x)=[x2+x3,x1+x3,x1+x2]T
All feasible solutions are regular.
λ
∈
R
\lambda\in\mathbb{R}
λ∈R
{
∇
f
(
x
)
+
λ
∇
h
(
x
)
=
0
h
(
x
)
=
0
⇒
{
x
2
x
3
−
λ
(
x
2
+
x
3
)
=
0
x
1
x
3
−
λ
(
x
1
+
x
3
)
=
0
x
1
x
2
−
λ
(
x
1
+
x
2
)
=
0
x
1
x
2
+
x
2
x
3
+
x
1
x
3
−
A
2
=
0
\begin{cases} \nabla f(x)+\lambda \nabla h(x)=0\\ h(x)=0 \end{cases}\Rightarrow \begin{cases} x_2x_3-\lambda(x_2+x_3)=0\\ x_1x_3-\lambda(x_1+x_3)=0\\ x_1x_2-\lambda(x_1+x_2)=0\\ x_1x_2+x_2x_3+x_1x_3-\frac{A}{2}=0 \end{cases}
{∇f(x)+λ∇h(x)=0h(x)=0⇒⎩
⎨
⎧x2x3−λ(x2+x3)=0x1x3−λ(x1+x3)=0x1x2−λ(x1+x2)=0x1x2+x2x3+x1x3−2A=0
当 x 1 = x 2 = x 3 = A 6 x_1=x_2=x_3=\sqrt{\frac{A}{6}} x1=x2=x3=6A时,取到最值
Example 2
f
(
x
)
=
x
1
2
+
x
2
2
,
h
(
x
)
=
x
1
2
+
2
x
2
2
−
1
f(x)=x_1^2+x_2^2,h(x)=x_1^2+2x_2^2-1
f(x)=x12+x22,h(x)=x12+2x22−1
∇
f
(
x
)
=
[
2
x
1
2
x
2
]
,
∇
h
(
x
)
=
[
2
x
1
4
x
2
]
\nabla f(x)=\begin{bmatrix} 2x_1\\ 2x_2 \end{bmatrix},\nabla h(x)=\begin{bmatrix} 2x_1\\ 4x_2 \end{bmatrix}
∇f(x)=[2x12x2],∇h(x)=[2x14x2]
All feasible solutions are regular.
{
∇
f
(
x
)
+
λ
∇
h
(
x
)
=
0
h
(
x
)
=
0
⇒
{
2
x
1
+
2
λ
x
1
=
0
2
x
2
+
4
λ
x
2
=
0
x
1
2
+
2
x
2
2
=
1
\begin{cases} \nabla f(x)+\lambda \nabla h(x)=0\\ h(x)=0 \end{cases}\Rightarrow \begin{cases} 2x_1+2\lambda x_1=0\\ 2x_2+4\lambda x_2=0\\ x_1^2+2x_2^2=1 \end{cases}
{∇f(x)+λ∇h(x)=0h(x)=0⇒⎩
⎨
⎧2x1+2λx1=02x2+4λx2=0x12+2x22=1
either x 1 = 0 x_1=0 x1=0 or λ = − 1 \lambda=-1 λ=−1
λ = − 1 ⇒ { x 1 = ± 1 x 2 = 0 \lambda=-1\Rightarrow\begin{cases} x_1=\pm 1\\ x_2=0 \end{cases} λ=−1⇒{x1=±1x2=0
x 1 = 0 ⇒ { λ = − 1 2 x 2 = ± 1 2 x_1=0\Rightarrow\begin{cases} \lambda=-\frac{1}{2}\\ x_2=\pm \frac{1}{\sqrt{2}} \end{cases} x1=0⇒{λ=−21x2=±21
f ( [ 1 0 ] ) = f ( [ − 1 0 ] ) = 1 f(\begin{bmatrix} 1\\ 0 \end{bmatrix})=f(\begin{bmatrix} -1\\ 0 \end{bmatrix})=1 f([10])=f([−10])=1
f ( [ 0 1 2 ] ) = f ( [ 0 − 1 2 ] ) = 1 2 f(\begin{bmatrix} 0\\ \frac{1}{\sqrt{2}} \end{bmatrix})=f(\begin{bmatrix} 0\\ -\frac{1}{\sqrt{2}} \end{bmatrix})=\frac{1}{2} f([021])=f([0−21])=21
当 x 1 = 0 , x 2 = ± 1 2 x_1=0,x_2=\pm \frac{1}{\sqrt{2}} x1=0,x2=±21时,取到最小值 1 2 \frac{1}{2} 21
Example 3
min
−
x
T
Q
x
-x^TQx
−xTQx
s.t.
x
T
P
x
=
1
x^TPx=1
xTPx=1
P
,
Q
>
0
,
P
T
=
P
,
Q
T
=
Q
P,Q>0,P^T=P,Q^T=Q
P,Q>0,PT=P,QT=Q
f
(
x
)
=
−
x
T
Q
x
,
h
(
x
)
=
x
T
P
x
−
1
f(x)=-x^TQx,h(x)=x^TPx-1
f(x)=−xTQx,h(x)=xTPx−1
l
(
x
,
λ
)
=
x
T
Q
x
+
λ
(
1
−
x
T
P
x
)
l(x,\lambda)=x^TQx+\lambda(1-x^TPx)
l(x,λ)=xTQx+λ(1−xTPx)
D
x
l
(
x
,
λ
)
=
2
x
T
Q
−
2
λ
x
T
P
=
0
⇒
(
λ
P
−
Q
)
x
=
0
⇒
P
−
1
Q
x
=
λ
x
⇒
λ
,
x
D_xl(x,\lambda)=2x^TQ-2\lambda x^TP=0\Rightarrow (\lambda P-Q)x=0\Rightarrow P^{-1}Qx=\lambda x\Rightarrow \lambda,x
Dxl(x,λ)=2xTQ−2λxTP=0⇒(λP−Q)x=0⇒P−1Qx=λx⇒λ,x are
P
−
1
Q
P^{-1}Q
P−1Q’s eigenvalue and eigenvector
D
λ
l
(
x
,
λ
)
=
1
−
x
T
P
x
=
0
D_{\lambda}l(x,\lambda)=1-x^TPx=0
Dλl(x,λ)=1−xTPx=0
Q
x
=
P
λ
x
Qx=P\lambda x
Qx=Pλx
⇒
x
T
Q
x
=
λ
x
T
P
x
\Rightarrow x^TQx=\lambda x^TPx
⇒xTQx=λxTPx
⇒
x
T
Q
x
=
λ
\Rightarrow x^TQx=\lambda
⇒xTQx=λ
⇒
λ
∗
:
\Rightarrow \lambda^*:
⇒λ∗: maximal eigenvalue of
P
−
1
Q
P^{-1}Q
P−1Q
SONC
Assume
f
:
R
n
→
R
,
h
:
R
n
→
R
m
f:\mathbb{R}^n\rightarrow \mathbb{R},h:\mathbb{R}^n\rightarrow \mathbb{R}^m
f:Rn→R,h:Rn→Rm twice continuously differentiable.
l
(
x
,
λ
)
=
f
(
x
)
+
λ
T
h
(
x
)
=
f
(
x
)
+
λ
1
h
1
(
x
)
+
⋯
+
λ
m
h
m
(
x
)
l(x,\lambda)=f(x)+\lambda^Th(x)=f(x)+\lambda_1h_1(x)+\cdots+\lambda_mh_m(x)
l(x,λ)=f(x)+λTh(x)=f(x)+λ1h1(x)+⋯+λmhm(x)
L
(
x
,
λ
)
=
F
(
x
)
+
λ
1
H
1
(
x
)
+
⋯
+
λ
m
H
m
(
x
)
L(x,\lambda)=F(x)+\lambda_1H_1(x)+\cdots+\lambda_mH_m(x)
L(x,λ)=F(x)+λ1H1(x)+⋯+λmHm(x)
Thm(SONC): x ∗ x^* x∗ a local minimizer of f : R n → R f:\mathbb{R}^n\rightarrow \mathbb{R} f:Rn→R with h ( x ) = 0 , h : R n → R m , m ≤ n , f , h ∈ C 2 h(x)=0,h:\mathbb{R}^n\rightarrow \mathbb{R}^m,m\leq n,f,h\in C^2 h(x)=0,h:Rn→Rm,m≤n,f,h∈C2. Then, ∃ λ ∗ ∈ R m \exist \lambda^*\in \mathbb{R}^m ∃λ∗∈Rm, s.t. { D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 ∀ y ∈ T ( x ∗ ) = { y : D h ( x ∗ ) y = 0 } : y T L ( x ∗ , λ ∗ ) y ≥ 0 \begin{cases} Df(x^*)+{\lambda^*}^TDh(x^*)=0\\ \forall y\in T(x^*)=\{y:Dh(x^*)y=0\}:y^TL(x^*,\lambda^*)y\geq 0 \end{cases} {Df(x∗)+λ∗TDh(x∗)=0∀y∈T(x∗)={y:Dh(x∗)y=0}:yTL(x∗,λ∗)y≥0
SOSC
f , h ∈ C 2 f,h\in C^2 f,h∈C2, If ∃ x ∗ ∈ R n , λ ∗ ∈ R m \exist x^*\in\mathbb{R}^n,\lambda^*\in \mathbb{R}^m ∃x∗∈Rn,λ∗∈Rm, s.t.
- D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)=0 Df(x∗)+λ∗TDh(x∗)=0
- ∀ y ∈ T ( x ∗ ) : y T L ( x ∗ , λ ∗ ) y > 0 \forall y\in T(x^*):y^TL(x^*,\lambda^*)y>0 ∀y∈T(x∗):yTL(x∗,λ∗)y>0
then x ∗ x^* x∗ is a strict local minimizer of f ( x ) f(x) f(x) w.r.t. h ( x ) = 0 h(x)=0 h(x)=0
Example 1
max
x
T
Q
x
x^TQx
xTQx
s.t.
x
T
P
x
=
1
x^TPx=1
xTPx=1
Q = [ 4 0 0 1 ] , P = [ 2 0 0 1 ] Q=\begin{bmatrix} 4&0\\ 0&1 \end{bmatrix},P=\begin{bmatrix} 2&0\\ 0&1 \end{bmatrix} Q=[4001],P=[2001]
P
−
1
Q
=
[
2
0
0
1
]
P^{-1}Q=\begin{bmatrix} 2&0\\ 0&1 \end{bmatrix}
P−1Q=[2001]
⇒
λ
1
=
2
,
λ
2
=
1
\Rightarrow \lambda_1=2,\lambda_2=1
⇒λ1=2,λ2=1
⇒
λ
∗
=
2
\Rightarrow \lambda^*=2
⇒λ∗=2
⇒
x
∗
=
[
1
2
,
0
]
T
\Rightarrow x^*=[\frac{1}{\sqrt{2}},0]^T
⇒x∗=[21,0]T or
x
∗
=
[
−
1
2
,
0
]
T
x^*=[-\frac{1}{\sqrt{2}},0]^T
x∗=[−21,0]T
Example 2
Consider min
1
2
x
T
Q
x
\frac{1}{2}x^TQx
21xTQx
s.t.
A
x
=
b
Ax=b
Ax=b
Q > 0 , Q = Q T , A ∈ R m × n , m ≤ n , b ∈ R m , r a n k A = m Q>0,Q=Q^T,A\in\mathbb{R}^{m\times n},m\leq n, b\in\mathbb{R}^m,rankA=m Q>0,Q=QT,A∈Rm×n,m≤n,b∈Rm,rankA=m
l
(
x
,
λ
)
=
1
2
x
T
Q
x
+
λ
T
(
b
−
A
x
)
l(x,\lambda)=\frac{1}{2}x^TQx+\lambda^T(b-Ax)
l(x,λ)=21xTQx+λT(b−Ax)
D
x
l
(
x
,
λ
)
=
x
T
Q
−
λ
T
A
=
0
D_xl(x,\lambda)=x^TQ-\lambda^TA=0
Dxl(x,λ)=xTQ−λTA=0
⇒
x
=
Q
−
1
A
T
λ
\Rightarrow x=Q^{-1}A^T\lambda
⇒x=Q−1ATλ
⇒
A
x
=
A
Q
−
1
A
T
λ
\Rightarrow Ax=AQ^{-1}A^T\lambda
⇒Ax=AQ−1ATλ
⇒
λ
=
(
A
Q
−
1
A
T
)
−
1
b
\Rightarrow \lambda=(AQ^{-1}A^T)^{-1}b
⇒λ=(AQ−1AT)−1b
⇒
x
=
Q
−
1
A
T
(
A
Q
−
1
A
T
)
−
1
b
\Rightarrow x=Q^{-1}A^T(AQ^{-1}A^T)^{-1}b
⇒x=Q−1AT(AQ−1AT)−1b
L ( x , λ ) = Q > 0 L(x,\lambda)=Q>0 L(x,λ)=Q>0
Case 2
min
f
(
x
)
f(x)
f(x)
s.t.
h
(
x
)
=
0
h(x)=0
h(x)=0
g
(
x
)
≤
0
g(x)\leq 0
g(x)≤0
f
:
R
n
→
R
f:\mathbb{R}^n\rightarrow \mathbb{R}
f:Rn→R
h
:
R
n
→
R
m
,
m
≤
n
h:\mathbb{R}^n\rightarrow \mathbb{R}^m,m\leq n
h:Rn→Rm,m≤n
g
:
R
n
→
R
p
g:\mathbb{R}^n\rightarrow \mathbb{R}^p
g:Rn→Rp
Definition
Def: An inequality constraint g j ( x ) ≤ 0 g_j(x)\leq 0 gj(x)≤0 is called active at x ∗ x^* x∗, if g j ( x ∗ ) = 0 g_j(x^*)=0 gj(x∗)=0; otherwise, inactive.
Def: Let x ∗ x^* x∗ satisfy h ( x ∗ ) = 0 h(x^*)=0 h(x∗)=0 and g ( x ∗ ) ≤ 0 g(x^*)\leq 0 g(x∗)≤0. Let J ( x ∗ ) = { j : g j ( x ∗ ) = 0 } , x ∗ J(x^*)=\{j: g_j(x^*)=0\},x^* J(x∗)={j:gj(x∗)=0},x∗ is called regular, if ∇ h i ( x ∗ ) \nabla h_i(x^*) ∇hi(x∗) for all 1 ≤ i ≤ m 1\leq i\leq m 1≤i≤m and ∇ g i ( x ∗ ) \nabla g_i(x^*) ∇gi(x∗) for all j ∈ J ( x ∗ ) j\in J(x^*) j∈J(x∗) are linear independent.
KKT-Theorem(FONC)
Let f , h , g ∈ C 1 , x ∗ f,h,g\in C^1, x^* f,h,g∈C1,x∗ be a regular point and a local minimizer of f ( x ) f(x) f(x) w.r.t. h ( x ∗ ) = 0 h(x^*)=0 h(x∗)=0 and g ( x ∗ ) ≤ 0 g(x^*)\leq 0 g(x∗)≤0. Then, there exist λ ∗ ∈ R m \lambda^*\in\mathbb{R}^m λ∗∈Rm and μ ∗ ∈ R p \mu^*\in\mathbb{R}^p μ∗∈Rp s.t.
- μ ∗ ≥ 0 \mu^*\geq 0 μ∗≥0
- D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) + μ ∗ T D g ( x ∗ ) = 0 Df(x^*)+{\lambda^*}^TDh(x^*)+{\mu^*}^TDg(x^*)=0 Df(x∗)+λ∗TDh(x∗)+μ∗TDg(x∗)=0
- μ ∗ T g ( x ∗ ) = 0 {\mu^*}^Tg(x^*)=0 μ∗Tg(x∗)=0
Example 1
min
−
400
R
(
10
+
R
)
2
-\frac{400R}{(10+R)^2}
−(10+R)2400R
s.t.
−
R
≤
0
-R\leq 0
−R≤0
∇ f ( R ) = − 400 ( 10 − R ) ( 10 + R ) 3 \nabla f(R)=-\frac{400(10-R)}{(10+R)^3} ∇f(R)=−(10+R)3400(10−R)
{ μ ≥ 0 D f ( x ∗ ) + λ ∗ T D h ( x ∗ ) + μ ∗ T D g ( x ∗ ) = 0 μ T g ( x ) = 0 g ( x ) ≤ 0 h ( x ) = 0 \begin{cases} \mu\geq 0\\ Df(x^*)+{\lambda^*}^TDh(x^*)+{\mu^*}^TDg(x^*)=0\\ \mu^T g(x)=0\\ g(x)\leq 0\\ h(x)=0 \end{cases} ⎩ ⎨ ⎧μ≥0Df(x∗)+λ∗TDh(x∗)+μ∗TDg(x∗)=0μTg(x)=0g(x)≤0h(x)=0
⇒ { μ ≥ 0 − 400 ( 10 − R ) ( 10 + R ) 3 − μ = 0 μ R = 0 R ≥ 0 \Rightarrow \begin{cases} \mu\geq 0\\ -\frac{400(10-R)}{(10+R)^3}-\mu=0\\ \mu R=0\\ R\geq 0 \end{cases} ⇒⎩ ⎨ ⎧μ≥0−(10+R)3400(10−R)−μ=0μR=0R≥0
If
μ
>
0
\mu>0
μ>0, then
R
=
0
,
μ
=
−
4
R=0,\mu=-4
R=0,μ=−4(✕)
If
μ
=
0
⇒
R
=
10
\mu=0\Rightarrow R=10
μ=0⇒R=10(✓ )
Example 2
min
−
4000
(
10
+
R
)
2
-\frac{4000}{(10+R)^2}
−(10+R)24000
s.t.
−
R
<
0
-R<0
−R<0
∇ f ( R ) = 8000 ( 10 + R ) 3 \nabla f(R)=\frac{8000}{(10+R)^3} ∇f(R)=(10+R)38000
KKT: { μ ≥ 0 8000 ( 10 + R ) 3 − μ = 0 μ R = 0 R ≥ 0 \begin{cases} \mu\geq 0\\ \frac{8000}{(10+R)^3}-\mu=0\\ \mu R=0\\ R\geq 0 \end{cases} ⎩ ⎨ ⎧μ≥0(10+R)38000−μ=0μR=0R≥0
μ
=
0
⇒
\mu=0\Rightarrow
μ=0⇒ no solution(✕)
μ
>
0
⇒
R
=
0
,
μ
=
8
\mu>0\Rightarrow R=0,\mu=8
μ>0⇒R=0,μ=8(✓ )
总结
这节课主要介绍了非线性约束优化问题。按照不同的约束条件,把问题分为了两种情形。第一种情形是只有等式约束,第二种情形既有等式约束又有不等式约束。在第一种情形下,重点介绍了拉格朗日条件,并在二维情况下推导出了拉格朗日条件。由于拉格朗日条件是一阶必要条件(FONC),又进一步介绍了用拉格朗日条件来求最值的拉格朗日乘数法。然后简要地介绍了二阶必要条件(SONC)和二阶充分条件(SOSC)。最后考虑了第二种情形,并给出了KKT条件。