目录
凸函数一个重要的性质是,极值点的个数只可能是零个或一个,如果存在局部最优解,那么该局部最优解一定是全局最优解,如果不存在局部最优解,那么它就没有全局最优解。因此,对于凸优化问题,对全局最优解的求解,可以松弛为,对局部最优解的求解:
1. 对于无约束优化问题,可采用梯度法对求取局部最优解/全局最优解;
2. 对于含有等式约束的优化问题,可采用拉格朗日乘子法求取局部最优解/全局最优解;
3. 对于含有等式和不等式约束的优化问题,可采用KKT条件求取局部最优解/全局最优解。
无约束优化问题
无约束优化理论介绍
对于下述优化问题:
min
x
f
(
x
)
\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)
xminf(x)
要想求解上述问题,显然非常简单,直接对目标函数求梯度,并设置其等于零,即可得到最优解:
min
x
f
(
x
)
→
∇
x
f
(
x
)
=
0
\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right) \to {\nabla _{\bf{x}}}f\left( {\bf{x}} \right) = {\bf{0}}
xminf(x)→∇xf(x)=0
无约束优化案例分析
求解下述二次型问题的最小值:
min
x
1
,
x
2
f
(
x
1
,
x
2
)
=
x
1
2
+
x
1
x
2
+
x
2
2
−
2
x
1
−
4
x
2
+
3
\mathop {\min }\limits_{{x_1},{x_2}} f\left( {{x_1},{x_2}} \right) = x_1^2 + {x_1}{x_2} + x_2^2 - 2{x_1} - 4{x_2}+3
x1,x2minf(x1,x2)=x12+x1x2+x22−2x1−4x2+3
分别对
x
1
x_1
x1和
x
2
x_2
x2求取偏导数,并设置等于零,从而求得目标函数的极值点。根据凸优化的性质,该局部极值点就是全局最值点。
{
∂
f
/
∂
x
1
=
2
x
1
+
x
2
−
2
=
0
∂
f
/
∂
x
2
=
x
1
+
2
x
2
−
4
=
0
→
{
x
1
=
0
x
2
=
2
\left\{ \begin{array}{l} \partial f/\partial {x_1} = 2{x_1} + {x_2} - 2 = 0\\ \partial f/\partial {x_2} = {x_1} + 2{x_2} - 4 = 0 \end{array} \right. \to \left\{ {\begin{array}{l} {{x_1} = 0}\\ {{x_2} = 2} \end{array}} \right.
{∂f/∂x1=2x1+x2−2=0∂f/∂x2=x1+2x2−4=0→{x1=0x2=2
单个等式约束优化问题
单个等式约束优化理论介绍
求解下述含有单个等式约束的优化问题:
{
min
x
f
(
x
)
s
.
t
.
h
(
x
)
=
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.\ \ h\left( {\bf{x}} \right) = 0 \end{array} \right.
{xminf(x)s.t. h(x)=0
容易理解的是,不同的
x
\bf{x}
x使得目标函数取不同的数值,同一个目标函数值对应一系列的
x
\bf{x}
x,也就是说,
f
(
x
)
f\left( {\bf{x}} \right)
f(x)像是由一个个等值线所组成的。
h
(
x
)
=
0
h\left( {\bf{x}} \right) = 0
h(x)=0则表示一条线或者一个平面,很容易理解,当
h
(
x
)
=
0
h\left( {\bf{x}} \right) = 0
h(x)=0所表示的线/面与与
f
(
x
)
f\left( {\bf{x}} \right)
f(x)所表示的等值线刚好相切的时候,
f
(
x
)
f\left( {\bf{x}} \right)
f(x)取得极值。所谓相切,意味着
f
(
x
)
f\left( {\bf{x}} \right)
f(x)的梯度与
h
(
x
)
h\left( {\bf{x}} \right)
h(x)的梯度满足线性关系,即
∇
x
f
(
x
)
+
λ
∇
x
h
(
x
)
=
0
{\nabla _{\bf{x}}}f\left( {\bf{x}} \right) + \lambda{\nabla _{\bf{x}}}h\left( {\bf{x}} \right) = \bf{0}
∇xf(x)+λ∇xh(x)=0,系数前面加一个负号,完全是为了后续表达式的方便,理论上完全可以不加。结合原始等式等式约束
h
(
x
)
=
0
h\left( {\bf{x}} \right) = 0
h(x)=0,上式的解可以描述为:
{
min
x
f
(
x
)
s
.
t
.
h
(
x
)
=
0
→
{
∇
x
f
(
x
)
+
λ
∇
x
h
(
x
)
=
0
h
(
x
)
=
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.\ \ h\left( {\bf{x}} \right) = 0 \end{array} \right. \to \left\{ \begin{array}{l} {\nabla _{\bf{x}}}f\left( {\bf{x}} \right) + \lambda{\nabla _{\bf{x}}}h\left( {\bf{x}} \right) = \bf{0}\\ h\left( {\bf{x}} \right) = 0 \end{array} \right.
{xminf(x)s.t. h(x)=0→{∇xf(x)+λ∇xh(x)=0h(x)=0
也就是说,对于上述有约束优化问题的求解,就等价为求解无约束优化问题
L
(
x
,
λ
)
L(\bf{x},\lambda)
L(x,λ)的最优解:
min
x
,
λ
L
=
f
(
x
)
+
λ
h
(
x
)
→
{
∇
x
L
(
x
,
λ
)
=
∇
x
f
(
x
)
+
λ
∇
x
h
(
x
)
=
0
∇
λ
L
(
x
,
λ
)
=
h
(
x
)
=
0
\mathop {\min }\limits_{{\bf{x}},\lambda } L = f\left( {\bf{x}} \right) + \lambda h\left( {\bf{x}} \right) \to \left\{ {\begin{array}{l} {{\nabla _{\bf{x}}}L\left( {{\bf{x}},\lambda } \right) = {\nabla _{\bf{x}}}f\left( {\bf{x}} \right) + \lambda {\nabla _{\bf{x}}}h\left( {\bf{x}} \right) = {\bf{0}}}\\ {{\nabla _\lambda }L\left( {{\bf{x}},\lambda } \right) = h\left( {\bf{x}} \right) = 0} \end{array}} \right.
x,λminL=f(x)+λh(x)→{∇xL(x,λ)=∇xf(x)+λ∇xh(x)=0∇λL(x,λ)=h(x)=0
从而有,对于含有等式约束的优化问题,可以引入一个额外的变量
λ
\lambda
λ,将有约束优化问题转化为无约束优化问题,如下所示。其中,额外引入的变量
λ
\lambda
λ也被称作拉格朗日乘子,这种将含有等式约束优化问题转为无约束优化问题的方法也被称作拉格朗日乘子法。
{
min
x
f
(
x
)
s
.
t
.
h
(
x
)
=
0
→
min
x
,
λ
L
=
f
(
x
)
+
λ
h
(
x
)
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)}\\ {s.t.\;\;h\left( {\bf{x}} \right) = 0} \end{array}} \right. \to \mathop {\min }\limits_{{\bf{x}},\lambda } L = f\left( {\bf{x}} \right) + \lambda h\left( {\bf{x}} \right)
{xminf(x)s.t.h(x)=0→x,λminL=f(x)+λh(x)
单个等式约束优化案例分析
求解下述含有单个等式约束的凸优化问题:
{
min
x
1
,
x
2
f
(
x
1
,
x
2
)
=
0.5
x
1
2
+
1.5
x
2
2
−
x
1
−
2
x
2
s
.
t
.
x
1
+
x
2
=
1
\left\{ \begin{array}{l} \mathop {\min }\limits_{{x_1},{x_2}} f\left( {{x_1},{x_2}} \right) = 0.5x_1^2 + 1.5x_2^2 - {x_1} - 2{x_2}\\ s.t.\ \ \ {x_1} + {x_2} = 1 \end{array} \right.
{x1,x2minf(x1,x2)=0.5x12+1.5x22−x1−2x2s.t. x1+x2=1
利用拉格朗日乘子法构建拉格朗日函数,从而将上述有约束优化问题转换为无约束优化问题:
min
x
1
,
x
2
,
λ
L
(
x
1
,
x
2
,
λ
)
=
0.5
x
1
2
+
1.5
x
2
2
−
x
1
−
2
x
2
+
λ
(
x
1
+
x
2
−
1
)
\mathop {\min }\limits_{{x_1},{x_2},\lambda } L\left( {{x_1},{x_2},\lambda } \right) = 0.5x_1^2 + 1.5x_2^2 - {x_1} - 2{x_2} + {\rm{ }}\lambda \left( {{x_1} + {x_2} - 1} \right)
x1,x2,λminL(x1,x2,λ)=0.5x12+1.5x22−x1−2x2+λ(x1+x2−1)
然后,通过求解下面的方程组来找到使得无约束优化问题
L
L
L最小化 的
x
1
x_1
x1,
x
2
x_2
x2 和
λ
\lambda
λ 值:
{
∂
L
/
∂
x
1
=
x
1
−
1
+
λ
=
0
∂
L
/
∂
x
2
=
3
x
2
−
2
+
λ
=
0
∂
L
/
∂
λ
=
x
1
+
x
2
−
1
=
0
→
{
x
1
=
0.5
x
2
=
0.5
λ
=
0.5
\left\{ {\begin{array}{l} {\partial L/\partial {x_1} = {x_1} - 1 + \lambda = 0}\\ {\partial L/\partial {x_2} = 3{x_2} - 2 + \lambda = 0}\\ {\partial L/\partial \lambda = {x_1} + {x_2} - 1 = 0} \end{array}} \right. \to \left\{ \begin{array}{l} {x_1} = 0.5\\ {x_2} = 0.5\\ \lambda = 0.5 \end{array} \right.
⎩
⎨
⎧∂L/∂x1=x1−1+λ=0∂L/∂x2=3x2−2+λ=0∂L/∂λ=x1+x2−1=0→⎩
⎨
⎧x1=0.5x2=0.5λ=0.5
于是,原问题的最小值为:
f
m
i
n
=
0.5
x
1
2
+
1.5
x
2
2
−
x
1
−
2
x
2
=
−
1
f_{min} = 0.5x_1^2 + 1.5x_2^2 - {x_1} - 2{x_2} = -1
fmin=0.5x12+1.5x22−x1−2x2=−1
多个等式约束优化问题
多个等式约束优化理论介绍
进一步考虑将上述单等式约束问题推广到含有多个等式约束情况,
{
min
x
f
(
x
)
s
.
t
.
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.\ \ {h_i}\left( {\bf{x}} \right) = 0,\ i = 1,2,3... \end{array} \right.
{xminf(x)s.t. hi(x)=0, i=1,2,3...
此时对梯度的要求松弛为:
f
(
x
)
f\left( {\bf{x}} \right)
f(x)的梯度可以被
h
i
(
x
)
,
i
=
1
,
2
,
3...
{h_i}\left( {\bf{x}} \right),{\rm{ }}i = 1,2,3...
hi(x),i=1,2,3...的梯度通过某种线性组合进行表示,这个其实是非常松弛的约束,如下所示:
{
min
x
f
(
x
)
s
.
t
.
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
→
{
∇
x
f
+
λ
1
∇
x
h
1
+
λ
2
∇
x
h
2
+
.
.
.
=
0
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3
,
.
.
.
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)}\\ {s.t.\ \ {h_i}\left( {\bf{x}} \right) = 0,\ i = 1,2,3...} \end{array}} \right. \to \left\{ {\begin{array}{l} {{\nabla _{\bf{x}}}f + {\lambda _1}{\nabla _{\bf{x}}}{h_1} + {\lambda _2}{\nabla _{\bf{x}}}{h_2} + ... = {\bf{0}}}\\ {{h_i}\left( {\bf{x}} \right) = 0,\ i = 1,2,3,...} \end{array}} \right.
{xminf(x)s.t. hi(x)=0, i=1,2,3...→{∇xf+λ1∇xh1+λ2∇xh2+...=0hi(x)=0, i=1,2,3,...
等价为,我们可以构建无约束优化问题
L
(
x
,
λ
1
,
λ
2
.
.
.
)
L\left( {{\bf{x}},{\lambda _1},{\lambda _2}...} \right)
L(x,λ1,λ2...),并对其求最小值:
min
x
,
λ
i
L
=
f
+
λ
1
h
1
+
λ
2
h
2
+
.
.
.
→
{
∇
x
L
=
∇
x
f
+
λ
1
∇
x
h
1
+
λ
2
∇
x
h
2
+
.
.
.
=
0
∇
λ
i
L
=
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
\mathop {\min }\limits_{{\bf{x}},\ {\lambda _i}} L = f + {\lambda _1}{h_1} + {\lambda _2}{h_2} + ... \to \left\{ \begin{array}{l} {\nabla _{\bf{x}}}L = {\nabla _{\bf{x}}}f + {\lambda _1}{\nabla _{\bf{x}}}{h_1} + {\lambda _2}{\nabla _{\bf{x}}}{h_2} + ... = {\bf{0}}\\ {\nabla _{{\lambda _i}}}L = {h_i}\left( {\bf{x}} \right) = 0,i = 1,2,3... \end{array} \right.
x, λiminL=f+λ1h1+λ2h2+...→{∇xL=∇xf+λ1∇xh1+λ2∇xh2+...=0∇λiL=hi(x)=0,i=1,2,3...
于是,可以建立含有多个等式约束优化问题的拉格朗日乘子法,将有约束优化问题转为无约束优化问题:
{
min
x
f
(
x
)
s
.
t
.
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
→
min
x
,
λ
i
L
=
f
(
x
)
+
λ
1
h
1
(
x
)
+
λ
2
h
2
(
x
)
+
.
.
.
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)}\\ {s.t.\ \ {h_i}\left( {\bf{x}} \right) = 0,\ i = 1,2,3...} \end{array}} \right. \to \mathop {\min }\limits_{{\bf{x}},{\lambda _i}} L = f\left( {\bf{x}} \right) + {\lambda _1}{h_1}\left( {\bf{x}} \right) + {\lambda _2}{h_2}\left( {\bf{x}} \right) + ...
{xminf(x)s.t. hi(x)=0, i=1,2,3...→x,λiminL=f(x)+λ1h1(x)+λ2h2(x)+...
多个等式约束优化案例分析
考虑以下形式的凸优化问题:
{
min
x
1
,
x
2
,
x
3
,
x
4
f
=
2
x
1
2
+
3
x
2
2
+
4
x
3
2
+
2
x
4
2
+
2
x
1
x
2
+
3
x
3
x
4
+
3
x
1
+
4
x
2
+
5
x
3
+
6
x
4
s
.
t
.
{
h
1
=
x
1
+
x
2
−
2
=
0
h
2
=
x
3
+
x
4
−
3
=
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = 2{x_1}^2 + 3{x_2}^2 + 4{x_3}^2 + 2{x_4}^2 + 2{x_1}{x_2} + 3{x_3}{x_4} + 3{x_1} + 4{x_2} + 5{x_3} + 6{x_4}\\ \ \ \ \ \ s.t.\ \ \ \ \ \ \left\{ \begin{array}{l} {h_1} = x_1 + x_2 - 2 = 0\\ {h_2} = x_3 + x_4 - 3 = 0 \end{array} \right. \end{array} \right.
⎩
⎨
⎧x1,x2,x3,x4minf=2x12+3x22+4x32+2x42+2x1x2+3x3x4+3x1+4x2+5x3+6x4 s.t. {h1=x1+x2−2=0h2=x3+x4−3=0
使用拉格朗日乘子法构建拉格朗日函数,将上述有约束优化问题转化为无约束优化问题,如下所示:
min
x
i
,
λ
i
L
(
x
1
,
x
2
,
x
3
,
x
4
,
λ
1
,
λ
2
)
=
f
+
λ
1
h
1
+
λ
2
h
2
\mathop {\min }\limits_{{x_i},{\lambda _i}} L\left( {{x_1},{x_2},{x_3},{x_4},{\lambda _1},{\lambda _2}} \right) = f + {\lambda _1}{h_1} + {\lambda _2}{h_2}
xi,λiminL(x1,x2,x3,x4,λ1,λ2)=f+λ1h1+λ2h2
求取拉格朗日函数的偏导数并设置为零,从而得到使
L
L
L最小化的
x
1
x_1
x1,
x
2
x_2
x2,
x
3
x_3
x3,
x
4
x_4
x4,
λ
1
\lambda_1
λ1,
λ
2
\lambda_2
λ2:
{
∂
L
/
∂
x
1
=
4
x
1
+
2
x
2
+
3
+
λ
1
=
0
∂
L
/
∂
x
2
=
6
x
2
+
2
x
1
+
4
+
λ
1
=
0
∂
L
/
∂
x
3
=
8
x
3
+
3
x
4
+
5
+
λ
2
=
0
∂
L
/
∂
x
4
=
4
x
4
+
3
x
3
+
6
+
λ
2
=
0
∂
L
/
∂
λ
1
=
x
1
+
x
2
−
2
=
0
∂
L
/
∂
λ
2
=
x
3
+
x
4
−
3
=
0
→
{
x
1
=
3
/
2
x
2
=
1
/
2
x
3
=
2
/
3
x
4
=
7
/
3
λ
1
=
−
10
λ
2
=
−
52
/
3
\left\{ \begin{array}{l} \partial L/\partial {x_1} = 4{x_1} + 2{x_2} + 3 + {\lambda _1} = 0\\ \partial L/\partial {x_2} = 6{x_2} + 2{x_1} + 4 + {\lambda _1} = 0\\ \partial L/\partial {x_3} = 8{x_3} + 3{x_4} + 5 + {\lambda _2} = 0\\ \partial L/\partial {x_4} = 4{x_4} + 3{x_3} + 6 + {\lambda _2} = 0\\ \partial L/\partial {\lambda _1} = {x_1} + {x_2} - 2 = 0\\ \partial L/\partial {\lambda _2} = {x_3} + {x_4} - 3 = 0 \end{array} \right. \to \left\{ \begin{array}{l} {x_1} = 3/2\\ {x_2} = 1/2\\ {x_3} = 2/3\\ {x_4} = 7/3\\ {\lambda _1} = - 10\\ {\lambda _2} = - 52/3 \end{array} \right.
⎩
⎨
⎧∂L/∂x1=4x1+2x2+3+λ1=0∂L/∂x2=6x2+2x1+4+λ1=0∂L/∂x3=8x3+3x4+5+λ2=0∂L/∂x4=4x4+3x3+6+λ2=0∂L/∂λ1=x1+x2−2=0∂L/∂λ2=x3+x4−3=0→⎩
⎨
⎧x1=3/2x2=1/2x3=2/3x4=7/3λ1=−10λ2=−52/3
因此,原问题
f
(
x
1
,
x
2
,
x
3
,
x
4
)
f\left(x_1,x_2,x_3,x_4\right)
f(x1,x2,x3,x4)的最小值为:
f
m
i
n
=
2
x
1
2
+
3
x
2
2
+
4
x
3
2
+
2
x
4
2
+
2
x
1
x
2
+
3
x
3
x
4
+
3
x
1
+
4
x
2
+
5
x
3
+
6
x
4
=
575
/
12
f_{min} = 2{x_1}^2 + 3{x_2}^2 + 4{x_3}^2 + 2{x_4}^2 + 2{x_1}{x_2} + 3{x_3}{x_4} + 3{x_1} + 4{x_2} + 5{x_3} + 6{x_4} = 575/12
fmin=2x12+3x22+4x32+2x42+2x1x2+3x3x4+3x1+4x2+5x3+6x4=575/12
单个不等式约束优化问题
单个不等式约束优化理论介绍
对于含有单个不等式约束的优化问题,数学表示如下所示:
{
min
x
f
(
x
)
s
.
t
.
g
(
x
)
≤
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.\ \ g\left( {\bf{x}} \right) \le 0 \end{array} \right.
{xminf(x)s.t. g(x)≤0
假设
x
∗
{{\bf{x}}^*}
x∗为满足上述问题的最优解,
x
∗
{{\bf{x}}^*}
x∗与解空间
K
=
{
x
∣
g
(
x
)
≤
0
}
K = \left\{ {{\bf{x}}|g\left( {\bf{x}} \right) \le 0} \right\}
K={x∣g(x)≤0}之间的关系,可分为两种情况进行讨论:
(1)
g
(
x
∗
)
<
0
g\left( {{{\bf{x}}^*}} \right) < 0
g(x∗)<0,此时最佳解位于K的内部,称为内部解(interior solution),这时不等式约束没有发挥作用;
(2)
g
(
x
∗
)
=
0
g\left( {{{\bf{x}}^*}} \right) = 0
g(x∗)=0,此时最佳解位于K的边界,称为边界解(boundary solut.),这时约束条件是有效的,此时不等式约束变为等式约束,因此采用上面介绍的拉格朗日乘子法进行求解。
因此,对于上述含有单个不等式约束的优化问题,可以采用如下步骤进行求解:
步骤一:忽略不等式约束条件,直接对目标函数进行无约束优化求解,得到初始最优解
x
0
∗
{\bf{x}}_0^*
x0∗。判断初始解
x
0
∗
{\bf{x}}_0^*
x0∗是否满足不等式约束,如果刚好
g
(
x
0
∗
)
≤
0
g\left( {{\bf{x}}_0^*} \right) \le 0
g(x0∗)≤0,或者说,
x
0
∗
∈
K
{\bf{x}}_0^* \in K
x0∗∈K,那么
x
0
∗
{\bf{x}}_0^*
x0∗就是最终解
x
∗
{\bf{x}}^*
x∗,求解结束。
min
x
f
(
x
)
→
∇
x
f
(
x
)
=
0
→
x
0
∗
\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right) \to {\nabla _{\bf{x}}}f\left( {\bf{x}} \right) = {\bf{0}} \to {\bf{x}}_0^*
xminf(x)→∇xf(x)=0→x0∗
步骤二:假如初始最优解不满足不等式约束,即
x
0
∗
∉
K
{\bf{x}}_0^* \notin K
x0∗∈/K,那么最优解
x
∗
\bf{x}^*
x∗一定发生在
K
K
K的边界上,而不会发生在内部(这是由凸优化性质所决定的)。于是,可以构建出下述含有等式约束的优化问题,并通过拉格朗日乘子法得到最优解
x
∗
\bf{x}^*
x∗,求解结束。
{
min
x
f
(
x
)
s
.
t
.
g
(
x
)
=
0
→
min
x
,
μ
L
=
f
(
x
)
+
μ
g
(
x
)
→
x
∗
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)}\\ {s.t.\;\;g\left( {\bf{x}} \right) = 0} \end{array}} \right. \to \mathop {\min }\limits_{{\bf{x}},\mu } L = f\left( {\bf{x}} \right) + \mu g\left( {\bf{x}} \right) \to {{\bf{x}}^*}
{xminf(x)s.t.g(x)=0→x,μminL=f(x)+μg(x)→x∗
按照上述两个步骤就可以解决不等式约束优化问题了。但是,两个步骤还要分情况讨论,显得不够专业,需要进一步进行归一化表示,至少是数学意义上的归一化表示。注意:以下操作纯属是为了实现数学形式上的归一化,也就所谓的KKT条件,本质和上面分类讨论是一样的。KKT表示方法,更多的意义在于实现了简洁的数学表达形式,实际应用过程中按照上述两个步骤进行分类讨论是更符合物理直观感受的。
{
内部解情况:
μ
=
0
g
(
x
)
≤
0
∇
x
L
(
x
,
μ
)
=
0
}
+
{
边界解情况:
g
(
x
)
=
0
∇
x
L
(
x
,
μ
)
=
0
∇
μ
L
(
x
,
μ
)
=
0
}
→
K
K
T
:
{
μ
g
(
x
)
=
0
g
(
x
)
≤
0
∇
x
L
(
x
,
μ
)
=
0
μ
≥
0
\left\{ \begin{array}{l} 内部解情况:\\ \mu = 0\\ g\left( {\bf{x}} \right) \le 0\\ {\nabla _{\bf{x}}}L\left( {{\bf{x}},\mu } \right) = {\bf{0}} \end{array} \right\} + \left\{ \begin{array}{l} 边界解情况:\\ g\left( {\bf{x}} \right) = 0\\ {\nabla _{\bf{x}}}L\left( {{\bf{x}},\mu } \right) = {\bf{0}}\\ {\nabla _\mu }L\left( {{\bf{x}},\mu } \right) = {\bf{0}} \end{array} \right\} \to KKT:\left\{ \begin{array}{l} \mu g\left( {\bf{x}} \right) = 0\\ g\left( {\bf{x}} \right) \le 0\\ {\nabla _{\bf{x}}}L\left( {{\bf{x}},\mu } \right) = {\bf{0}}\\ \mu \ge 0 \end{array} \right.
⎩
⎨
⎧内部解情况:μ=0g(x)≤0∇xL(x,μ)=0⎭
⎬
⎫+⎩
⎨
⎧边界解情况:g(x)=0∇xL(x,μ)=0∇μL(x,μ)=0⎭
⎬
⎫→KKT:⎩
⎨
⎧μg(x)=0g(x)≤0∇xL(x,μ)=0μ≥0
对于KKT条件的前三个是比较好理解的,但对于 μ ≥ 0 \mu \ge 0 μ≥0,该如何理解呢?对比可以发现,KKT条件中缺少了 ∇ μ L ( x , μ ) = 0 {\nabla _\mu }L\left( {{\bf{x}},\mu } \right) = {\bf{0}} ∇μL(x,μ)=0相关的描述,事实上, μ ≥ 0 \mu \ge 0 μ≥0正是对应该条件的松弛表示。在边界解情况中,我们构建了拉格朗日函数,并列出来相关求解方程组,从而可以求得 x \bf{x} x和 μ \mu μ。但事实上,我们真正想要求解的是 x \bf{x} x,而对 μ \mu μ的具体数值,其实是不关心的。因此,这里进一步进行了松弛表示,取而代之的是,仅仅给出了 μ \mu μ的取值范围, μ ≥ 0 \mu \ge 0 μ≥0,并且这个是有一定的物理意义的,具体解释如下。
对于边界解情况,我们将问题转为了等式约束问题,但这个和纯等式约束还是略有不同的。因为在这里,我们不仅希望在边界上取得极值,还希望这个值就是极小值,也就是说,继续往可行域 K K K里面走只能取到更差的。这就需要对 ∇ x f ( x ∗ ) \nabla_{\bf{x}}f\left( {{{\bf{x}}^*}} \right) ∇xf(x∗)和 ∇ x g ( x ∗ ) {\nabla _{\bf{x}}}g\left( {{{\bf{x}}^*}} \right) ∇xg(x∗)之间的线性关系进行约束。因为我们希望在边界上实现 f ( x ) f\left( {\bf{x}} \right) f(x)的极小化,因此 f ( x ) f\left( {\bf{x}} \right) f(x)增大的方向应该指向 K K K内部。另一方面, ∇ x g ( x ) \nabla _{\bf{x}}g\left( {\bf{x}} \right) ∇xg(x)指向使 g ( x ) g(\bf{x}) g(x)变大的方向,比如, g ( x ) = x 1 2 + x 2 2 g(\bf{x}) = x_1^2 + x_2^2 g(x)=x12+x22,其梯度向量 ( 2 x 1 , 2 x 2 ) (2x_1,\ 2x_2) (2x1, 2x2)指向 g ( x ) g(\bf{x}) g(x)增长最快的方向。也就是说 ∇ x f ( x ∗ ) \nabla_{\bf{x}} f\left( {{{\bf{x}}^*}} \right) ∇xf(x∗)和 ∇ x g ( x ∗ ) {\nabla _{\bf{x}}}g\left( {{{\bf{x}}^*}} \right) ∇xg(x∗)必然得是反向的,于是有 ∇ x f ( x ∗ ) + μ ∇ x g ( x ∗ ) = 0 {\nabla _{\bf{x}}}f\left( {{{\bf{x}}^*}} \right) + \mu {\nabla _{\bf{x}}}g\left( {{{\bf{x}}^*}} \right) = 0 ∇xf(x∗)+μ∇xg(x∗)=0过程中,必然满足 μ ≥ 0 \mu \ge 0 μ≥0。
综上所述,写出不等式约束优化问题的KKT条件:
{
min
x
f
(
x
)
s
.
t
.
g
(
x
)
≤
0
→
构建:
L
(
x
,
μ
)
=
f
(
x
)
+
μ
g
(
x
)
→
K
K
T
:
{
∇
x
L
(
x
,
μ
)
=
0
μ
g
(
x
)
=
0
g
(
x
)
≤
0
μ
≥
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.\ \ g\left( {\bf{x}} \right) \le 0 \end{array} \right. \to 构建:L\left( {{\bf{x}},\mu } \right) = f\left( {\bf{x}} \right) + \mu g\left( {\bf{x}} \right) \to KKT:\left\{ \begin{array}{l} {\nabla _{\bf{x}}}L\left( {{\bf{x}},\mu } \right) = {\bf{0}}\\ \mu g\left( {\bf{x}} \right) = 0\\ g\left( {\bf{x}} \right) \le 0\\ \mu \ge 0 \end{array} \right.
{xminf(x)s.t. g(x)≤0→构建:L(x,μ)=f(x)+μg(x)→KKT:⎩
⎨
⎧∇xL(x,μ)=0μg(x)=0g(x)≤0μ≥0
单个不等式约束优化案例分析1
考虑下述含有单个不等书约束优化问题的求解:
{
min
x
1
,
x
2
f
(
x
1
,
x
2
)
=
x
1
2
+
x
1
x
2
+
x
2
2
−
2
x
1
−
4
x
2
+
3
s
.
t
.
g
(
x
1
,
x
2
)
=
x
1
+
x
2
−
3
≤
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{{x_1},{x_2}} f\left( {{x_1},{x_2}} \right) = x_1^2 + {x_1}{x_2} + x_2^2 - 2{x_1} - 4{x_2} + 3\\ s.t.\ \ \ g\left( {{x_1},{x_2}} \right)={x_1} + {x_2} - 3 \le 0 \end{array} \right.
{x1,x2minf(x1,x2)=x12+x1x2+x22−2x1−4x2+3s.t. g(x1,x2)=x1+x2−3≤0
首先忽略不等式约束,直接对目标函数求取极小值:
{
∂
f
/
∂
x
1
=
2
x
1
+
x
2
−
2
=
0
∂
f
/
∂
x
2
=
x
1
+
2
x
2
−
4
=
0
→
{
x
1
=
0
x
2
=
2
\left\{ {\begin{array}{l} {\partial f/\partial {x_1} = 2{x_1} + {x_2} - 2 = 0}\\ {\partial f/\partial {x_2} = {x_1} + 2{x_2} - 4 = 0} \end{array}} \right. \to \left\{ {\begin{array}{l} {{x_1} = 0}\\ {{x_2} = 2} \end{array}} \right.
{∂f/∂x1=2x1+x2−2=0∂f/∂x2=x1+2x2−4=0→{x1=0x2=2
显然,该解自动满足不等式约束条件。由于该解是全局范围内的最优解,且满足不等式约束条件,所以就是上述问题的解。
单个不等式约束优化案例分析2
考虑下述含有单个不等书约束优化问题的求解:
{
min
x
1
,
x
2
f
(
x
1
,
x
2
)
=
x
1
2
+
x
1
x
2
+
x
2
2
−
2
x
1
−
4
x
2
+
3
s
.
t
.
g
(
x
1
,
x
2
)
=
x
1
+
x
2
−
1
≤
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{{x_1},{x_2}} f\left( {{x_1},{x_2}} \right) = x_1^2 + {x_1}{x_2} + x_2^2 - 2{x_1} - 4{x_2} + 3\\ s.t.\ \ \ g\left( {{x_1},{x_2}} \right)={x_1} + {x_2} - 1\le 0 \end{array} \right.
{x1,x2minf(x1,x2)=x12+x1x2+x22−2x1−4x2+3s.t. g(x1,x2)=x1+x2−1≤0
同样地,忽略不等式约束,直接对目标函数求取极小值:
{
∂
f
/
∂
x
1
=
2
x
1
+
x
2
−
2
=
0
∂
f
/
∂
x
2
=
x
1
+
2
x
2
−
4
=
0
→
{
x
1
=
0
x
2
=
2
\left\{ {\begin{array}{l} {\partial f/\partial {x_1} = 2{x_1} + {x_2} - 2 = 0}\\ {\partial f/\partial {x_2} = {x_1} + 2{x_2} - 4 = 0} \end{array}} \right. \to \left\{ {\begin{array}{l} {{x_1} = 0}\\ {{x_2} = 2} \end{array}} \right.
{∂f/∂x1=2x1+x2−2=0∂f/∂x2=x1+2x2−4=0→{x1=0x2=2
所求的全局最优解不能满足约束
g
(
x
1
,
x
2
)
=
x
1
+
x
2
−
1
≤
0
g\left( {{x_1},{x_2}} \right)={x_1} + {x_2} - 1\le 0
g(x1,x2)=x1+x2−1≤0,根据前述分析,最优解一定发生在
g
(
x
1
,
x
2
)
=
x
1
+
x
2
−
1
=
0
g\left( {{x_1},{x_2}} \right)={x_1} + {x_2} - 1 = 0
g(x1,x2)=x1+x2−1=0的边界上。因此,上述优化问题变更为:
{
min
x
1
,
x
2
f
(
x
1
,
x
2
)
=
x
1
2
+
x
1
x
2
+
x
2
2
−
2
x
1
−
4
x
2
+
3
s
.
t
.
g
(
x
1
,
x
2
)
=
x
1
+
x
2
−
1
=
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{{x_1},{x_2}} f\left( {{x_1},{x_2}} \right) = x_1^2 + {x_1}{x_2} + x_2^2 - 2{x_1} - 4{x_2} + 3\\ s.t.\ \ \ g\left( {{x_1},{x_2}} \right)={x_1} + {x_2} - 1= 0 \end{array} \right.
{x1,x2minf(x1,x2)=x12+x1x2+x22−2x1−4x2+3s.t. g(x1,x2)=x1+x2−1=0
基于拉格朗日乘子法求解该问题,构建拉格朗日函数并使其最小化:
min
x
1
,
x
2
,
λ
L
(
x
1
,
x
2
,
μ
)
=
x
1
2
+
x
1
x
2
+
x
2
2
−
2
x
1
−
4
x
2
+
3
+
μ
(
x
1
+
x
2
−
1
)
\mathop {\min }\limits_{{x_1},{x_2},\lambda } L\left( {{x_1},{x_2},\mu } \right) = x_1^2 + {x_1}{x_2} + x_2^2 - 2{x_1} - 4{x_2} + 3 + \mu \left( {{x_1} + {x_2} - 1} \right)
x1,x2,λminL(x1,x2,μ)=x12+x1x2+x22−2x1−4x2+3+μ(x1+x2−1)
分别求取
L
(
x
1
,
x
2
,
μ
)
L\left( {{x_1},{x_2},\mu } \right)
L(x1,x2,μ)关于
x
1
,
x
2
,
μ
{x_1},{x_2},\mu
x1,x2,μ的偏导数并设置零,从而求得等式约束优化问题的最优解,也是原始不等式优化问题的最优解。这里,我们也对
μ
\mu
μ进行了准确的求解,可以看出,
μ
\mu
μ确实是大于等于零的。
{
∂
L
/
∂
x
1
=
2
x
1
+
x
2
−
2
+
μ
=
0
∂
L
/
∂
x
2
=
x
1
+
2
x
2
−
4
+
μ
=
0
∂
L
/
∂
μ
=
x
1
+
x
2
−
1
=
0
→
{
x
1
=
−
0.5
x
2
=
1.5
μ
=
1.5
\left\{ {\begin{array}{l} {\partial L/\partial {x_1} = 2{x_1} + {x_2} - 2 + \mu = 0}\\ \partial L/\partial {x_2} = {x_1} + 2{x_2} - 4 + \mu = 0\\ \partial L/\partial \mu = {x_1} + {x_2} - 1 = 0 \end{array}} \right. \to \left\{ {\begin{array}{l} {{x_1} = - 0.5}\\ {x_2} = 1.5\\ \mu = 1.5 \end{array}} \right.
⎩
⎨
⎧∂L/∂x1=2x1+x2−2+μ=0∂L/∂x2=x1+2x2−4+μ=0∂L/∂μ=x1+x2−1=0→⎩
⎨
⎧x1=−0.5x2=1.5μ=1.5
多个不等式约束优化问题
多个不等式约束优化理论介绍
将上述单个不等式约束的情况进行推广,可以得到含有多个不等式约束的优化问题:
{
min
x
f
(
x
)
s
.
t
.
g
j
(
x
)
≤
0
,
j
=
1
,
2
,
3...
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)}\\ {s.t.\ \ {g_j}\left( {\bf{x}} \right) \le 0,j = 1,2,3...} \end{array}} \right.
{xminf(x)s.t. gj(x)≤0,j=1,2,3...
当不等式约束变得更多,解空间
K
=
{
x
∣
g
j
(
x
)
≤
0
}
K = \left\{ {{\bf{x}}|g_j\left( {\bf{x}} \right) \le 0} \right\}
K={x∣gj(x)≤0}相对更严格了一些,所对应的几何区域也可能更小的一点,但其实并没有什么本质的改变。同样地,假设
x
∗
{{\bf{x}}^*}
x∗为满足上述问题的最优解,根据
x
∗
{{\bf{x}}^*}
x∗与解空间
K
=
{
x
∣
g
(
x
)
≤
0
}
K = \left\{ {{\bf{x}}|g\left( {\bf{x}} \right) \le 0} \right\}
K={x∣g(x)≤0}之间的关系,可分为两种情况进行讨论:
(1) 所有约束
g
j
(
x
∗
)
<
0
g_j\left( {{{\bf{x}}^*}} \right) < 0
gj(x∗)<0,此时最佳解位于K的内部,称为内部解(interior solution),这时不等式约束没有发挥作用;
(2) 至少一个
g
j
(
x
∗
)
=
0
g_j\left( {{{\bf{x}}^*}} \right) = 0
gj(x∗)=0,此时最佳解位于K的边界,称为边界解(boundary solut.),这时约束条件是有效的,此时不等式约束变为等式约束,可采用上面介绍的拉格朗日乘子法进行求解。问题在于,面临多个不等式约束的时候,到底哪一个/多个被触发,或者说触发哪一个/多个是最优解,是比较困难的。对于实际工程问题,不等式约束一般不会太多,可以通过排列组合的形式把所有可能性都试试,然后对比选出最优解就可以了。甚至更进一步,配合几何方法,进行剪枝操作,因为有的组合都不用尝试就可以被淘汰了,或者说只需要尝试更少的组合数,就可以选出最优解了,详见最后的案例分析。
类似于单个不等式优化问题的求解过程,我们也可以构建拉格朗日函数,
L
(
x
,
μ
1
,
μ
2
,
.
.
.
)
L\left( {{\bf{x}},{\mu _1},{\mu _2},...} \right)
L(x,μ1,μ2,...),将上述分类讨论的情况进行数学形式上的归一化,写出对应的KKT条件,如下所示:
构建:
L
(
x
,
μ
1
,
μ
2
,
.
.
.
)
=
f
(
x
)
+
μ
1
g
1
(
x
)
+
μ
2
g
2
(
x
)
+
.
.
.
→
K
K
T
:
{
∇
x
L
(
x
,
μ
1
,
μ
2
.
.
.
)
=
0
μ
j
g
j
(
x
)
=
0
g
j
(
x
)
≤
0
,
j
=
1
,
2
,
3...
μ
j
≥
0
构建:L\left( {{\bf{x}},{\mu _1},{\mu _2},...} \right) = f\left( {\bf{x}} \right) + {\mu _1}{g_1}\left( {\bf{x}} \right) + {\mu _2}{g_2}\left( {\bf{x}} \right) + ... \to KKT:\left\{ \begin{array}{l} {\nabla _{\bf{x}}}L\left( {{\bf{x}},{\mu _1},{\mu _2}...} \right) = {\bf{0}}\\ {\mu _j}{g_j}\left( {\bf{x}} \right) = 0\\ {g_j}\left( {\bf{x}} \right) \le 0,j = 1,2,3...\\ {\mu _j} \ge {\bf{0}} \end{array} \right.
构建:L(x,μ1,μ2,...)=f(x)+μ1g1(x)+μ2g2(x)+...→KKT:⎩
⎨
⎧∇xL(x,μ1,μ2...)=0μjgj(x)=0gj(x)≤0,j=1,2,3...μj≥0
注意:对于KKT条件本质上也要尝试各种组合形式,即根据 μ j g j ( x ) = 0 {\mu _j}{g_j}\left( {\bf{x}} \right) = 0 μjgj(x)=0,将每一个不等式约束拆分为 μ j = 0 \mu_j = 0 μj=0和 g j ( x ) = 0 {g_j}\left( {\bf{x}} \right) = 0 gj(x)=0两种情况,从而尝试一共 2 j 2^j 2j种组合。KKT条件在尝试过程中,通过 g j ( x ) ≤ 0 {g_j}\left( {\bf{x}} \right) \le 0 gj(x)≤0和 μ j ≥ 0 {\mu _j} \ge {\bf{0}} μj≥0两个条件去淘汰组合,并最终只会有唯一解被保留下来。
多个不等式约束优化案例分析
考虑如下含有多个不等式约束优化问题的求解:
{
min
x
1
,
x
2
,
x
3
,
x
4
f
=
x
1
2
+
2
x
2
2
+
4
x
3
2
−
4
x
1
−
4
x
2
s
.
t
.
{
g
1
=
x
1
+
x
2
≤
0
g
2
=
x
2
≤
0
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ {\begin{array}{l} {g_1 = {x_1} + {x_2} \le 0}\\ {g_2 = {x_2} \le 0} \end{array}} \right.} \end{array}} \right.
⎩
⎨
⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{g1=x1+x2≤0g2=x2≤0
构建拉格朗日函数
L
(
x
1
,
x
2
,
x
3
,
μ
1
,
μ
2
)
L(x_1,x_2,x_3,\mu _1,\mu _2)
L(x1,x2,x3,μ1,μ2),如下所示:
L
=
x
1
2
+
2
x
2
2
+
4
x
3
2
−
4
x
1
−
4
x
2
+
μ
1
(
x
1
+
x
2
)
+
μ
2
x
2
L = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2} + {\mu _1}\left( {{x_1} + {x_2}} \right) + {\mu _2}{x_2}
L=x12+2x22+4x32−4x1−4x2+μ1(x1+x2)+μ2x2
注意,这里列写拉格朗日函数的目的并不是为了将有约束优化问题转为无约束优化问题,这与上述等式约束优化问题的求解是点区别的,这里列写拉格朗日函数的目的是进一步给出KKT条件,如下所示:
{
∂
L
/
x
1
=
2
x
1
−
4
+
μ
1
=
0
∂
L
/
x
2
=
4
x
2
−
4
+
μ
1
+
μ
2
=
0
∂
L
/
x
3
=
8
x
3
=
0
μ
1
(
x
1
+
x
2
)
=
0
μ
2
x
2
=
0
x
1
+
x
2
≤
0
x
2
≤
0
μ
1
≥
0
μ
2
≥
0
\left\{ \begin{array}{l} \partial L/{x_1} = 2{x_1} - 4 + {\mu _1} = 0\\ \partial L/{x_2} = 4{x_2} - 4 + {\mu _1} + {\mu _2} = 0\\ \partial L/{x_3} = 8{x_3} = 0\\ {\mu _1}\left( {{x_1} + {x_2}} \right) = 0\\ {\mu _2}{x_2} = 0\\ {x_1} + {x_2} \le 0\\ {x_2} \le 0\\ {\mu _1} \ge 0\\ {\mu _2} \ge 0 \end{array} \right.
⎩
⎨
⎧∂L/x1=2x1−4+μ1=0∂L/x2=4x2−4+μ1+μ2=0∂L/x3=8x3=0μ1(x1+x2)=0μ2x2=0x1+x2≤0x2≤0μ1≥0μ2≥0
根据
μ
1
(
x
1
+
x
2
)
=
0
{\mu _1}\left( {{x_1} + {x_2}} \right) = 0
μ1(x1+x2)=0和
μ
2
x
2
=
0
{\mu _2}{x_2} = 0
μ2x2=0,进一步可以拆分为四种情况,如下所示:
C
a
s
e
1
:
{
2
x
1
−
4
=
0
4
x
2
−
4
=
0
8
x
3
=
0
μ
1
=
0
μ
2
=
0
,
C
a
s
e
2
:
{
2
x
1
−
4
=
0
4
x
2
−
4
+
μ
2
=
0
8
x
3
=
0
μ
1
=
0
x
2
=
0
,
C
a
s
e
3
:
{
2
x
1
−
4
+
μ
1
=
0
4
x
2
−
4
+
μ
1
=
0
x
3
=
0
x
1
+
x
2
=
0
μ
2
=
0
,
C
a
s
e
4
:
{
2
x
1
−
4
+
μ
1
=
0
4
x
2
−
4
+
μ
1
+
μ
2
=
0
8
x
3
=
0
x
1
+
x
2
=
0
x
2
=
0
\rm{Case1:}\left\{ \begin{array}{l} 2{x_1} - 4 = 0\\ 4{x_2} - 4 = 0\\ 8{x_3} = 0\\ {\mu _1} = 0\\ {\mu _2} = 0 \end{array} \right.,\rm{Case2:}\left\{ \begin{array}{l} 2{x_1} - 4 = 0\\ 4{x_2} - 4 + {\mu _2} = 0\\ 8{x_3} = 0\\ {\mu _1} = 0\\ {x_2} = 0 \end{array} \right.,\rm{Case3:}\left\{ \begin{array}{l} 2{x_1} - 4 + {\mu _1} = 0\\ 4{x_2} - 4 + {\mu _1} = 0\\ {x_3} = 0\\ {x_1} + {x_2} = 0\\ {\mu _2} = 0 \end{array} \right.,\rm{Case4:}\left\{ \begin{array}{l} 2{x_1} - 4 + {\mu _1} = 0\\ 4{x_2} - 4 + {\mu _1} + {\mu _2} = 0\\ 8{x_3} = 0\\ {x_1} + {x_2} = 0\\ {x_2} = 0 \end{array} \right.
Case1:⎩
⎨
⎧2x1−4=04x2−4=08x3=0μ1=0μ2=0,Case2:⎩
⎨
⎧2x1−4=04x2−4+μ2=08x3=0μ1=0x2=0,Case3:⎩
⎨
⎧2x1−4+μ1=04x2−4+μ1=0x3=0x1+x2=0μ2=0,Case4:⎩
⎨
⎧2x1−4+μ1=04x2−4+μ1+μ2=08x3=0x1+x2=0x2=0
对上述四个方程组依次进行求解,可以得到:
C
a
s
e
1
:
{
x
1
=
2
x
2
=
1
x
3
=
0
μ
1
=
0
μ
1
=
0
,
C
a
s
e
2
:
{
x
1
=
2
x
2
=
0
x
3
=
0
μ
1
=
0
μ
2
=
4
,
C
a
s
e
3
:
{
x
1
=
0
x
2
=
0
x
3
=
0
μ
1
=
4
μ
2
=
0
,
C
a
s
e
4
:
{
x
1
=
0
x
2
=
0
x
3
=
0
μ
1
=
0
μ
2
=
0
\rm{Case1:}\left\{ \begin{array}{l} {x_1} = 2\\ {x_2} = 1\\ {x_3} = 0\\ {\mu _1} = 0\\ {\mu _1} = 0 \end{array} \right.,\rm{Case2:}\left\{ \begin{array}{l} {x_1} = 2\\ {x_2} = 0\\ {x_3} = 0\\ {\mu _1} = 0\\ {\mu _2} = 4 \end{array} \right.,\rm{Case3:}\left\{ \begin{array}{l} {x_1} = 0\\ {x_2} = 0\\ {x_3} = 0\\ {\mu _1} = 4\\ {\mu _2} = 0 \end{array} \right.,\rm{Case4:}\left\{ \begin{array}{l} {x_1} = 0\\ {x_2} = 0\\ {x_3} = 0\\ {\mu _1} = 0\\ {\mu _2} = 0 \end{array} \right.
Case1:⎩
⎨
⎧x1=2x2=1x3=0μ1=0μ1=0,Case2:⎩
⎨
⎧x1=2x2=0x3=0μ1=0μ2=4,Case3:⎩
⎨
⎧x1=0x2=0x3=0μ1=4μ2=0,Case4:⎩
⎨
⎧x1=0x2=0x3=0μ1=0μ2=0
可以看出,Case1和Case2违反了 x 1 + x 2 ≤ 0 {x_1} + {x_2} \le 0 x1+x2≤0,Case3和Case4指向了同一组解: x 1 = 0 , x 2 = 0 , x 3 = 0 x_1 = 0,x_2=0,x_3=0 x1=0,x2=0,x3=0,这也就是问题的最优解。
通用有约束优化问题
通用有约束优化理论介绍
综合上述等式约束和不等式约束,给出一般性有约束优化问题的KKT求解方法:
{
min
x
f
(
x
)
s
.
t
.
{
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
g
j
(
x
)
≤
0
,
j
=
1
,
2
,
3...
→
构建
:
L
=
f
(
x
)
+
λ
1
h
1
(
x
)
+
λ
2
h
2
(
x
)
+
.
.
.
+
μ
1
g
1
(
x
)
+
μ
2
g
2
(
x
)
+
.
.
.
→
K
K
T
:
{
∇
x
L
(
x
,
λ
1
,
λ
2
.
.
.
μ
1
,
μ
2
.
.
.
)
=
0
h
i
(
x
)
=
0
,
i
=
1
,
2
,
3...
μ
j
g
j
(
x
)
=
0
g
j
(
x
)
≤
0
,
j
=
1
,
2
,
3...
μ
j
≥
0
\left\{ \begin{array}{l} \mathop {\min }\limits_{\bf{x}} f\left( {\bf{x}} \right)\\ s.t.{\rm{ }}\left\{ \begin{array}{l} {h_i}\left( {\bf{x}} \right) = 0,{\rm{ }}i = 1,2,3...\\ {g_j}\left( {\bf{x}} \right) \le 0,{\rm{ }}j = 1,2,3... \end{array} \right. \end{array} \right. \to构建:L = f\left( {\bf{x}} \right) + {\lambda _1}{h_1}\left( {\bf{x}} \right) + {\lambda _2}{h_2}\left( {\bf{x}} \right) + ... + {\mu _1}{g_1}\left( {\bf{x}} \right) + {\mu _2}{g_2}\left( {\bf{x}} \right) + ... \to KKT:\left\{ \begin{array}{l} {\nabla _{\bf{x}}}L\left( {{\bf{x}},{\lambda _1},{\lambda _2}...{\mu _1},{\mu _2}...} \right) = {\bf{0}}\\ {h_i}\left( {\bf{x}} \right) = 0,{\rm{ }}i = 1,2,3...\\ {\mu _j}{g_j}\left( {\bf{x}} \right) = 0\\ {g_j}\left( {\bf{x}} \right) \le 0,{\rm{ }}j = 1,2,3...\\ {\mu _j} \ge {\bf{0}} \end{array} \right.
⎩
⎨
⎧xminf(x)s.t.{hi(x)=0,i=1,2,3...gj(x)≤0,j=1,2,3...→构建:L=f(x)+λ1h1(x)+λ2h2(x)+...+μ1g1(x)+μ2g2(x)+...→KKT:⎩
⎨
⎧∇xL(x,λ1,λ2...μ1,μ2...)=0hi(x)=0,i=1,2,3...μjgj(x)=0gj(x)≤0,j=1,2,3...μj≥0
注意:上述KKT条件本质上也要尝试各种组合形式,即根据 μ j g j ( x ) = 0 {\mu _j}{g_j}\left( {\bf{x}} \right) = 0 μjgj(x)=0,将每一个不等式约束拆分为 μ j = 0 \mu_j = 0 μj=0和 g j ( x ) = 0 {g_j}\left( {\bf{x}} \right) = 0 gj(x)=0两种情况,从而尝试一共 2 j 2^j 2j种组合。KKT条件在尝试过程中,通过 g j ( x ) ≤ 0 {g_j}\left( {\bf{x}} \right) \le 0 gj(x)≤0和 μ j ≥ 0 {\mu _j} \ge {\bf{0}} μj≥0两个条件去淘汰非最优解所对应的组合,并最终只会有唯一解被保留下来。
通用有约束优化案例分析
考虑如下同时含有等式约束和不等式约束的优化问题:
{
min
x
1
,
x
2
,
x
3
,
x
4
f
=
x
1
2
+
2
x
2
2
+
4
x
3
2
−
4
x
1
−
4
x
2
s
.
t
.
{
h
1
=
2
x
2
−
x
1
=
0
g
1
=
x
1
−
1
≤
0
g
2
=
−
1
−
x
1
≤
0
g
3
=
x
2
−
1
≤
0
g
4
=
−
1
−
x
2
≤
0
g
5
=
x
3
−
1
≤
0
g
6
=
−
1
−
x
3
≤
0
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ {\begin{array}{l} {h_1=2{x_2} - {x_1} = 0}\\ {g_1 = {x_1} - 1 \le 0}\\ {g_2 = - 1 - {x_1} \le 0}\\ {g_3 = {x_2} - 1 \le 0}\\ {g_4 = - 1 - {x_2} \le 0}\\ {g_5 = {x_3} - 1 \le 0}\\ {g_6 = - 1 - {x_3} \le 0} \end{array}} \right.} \end{array}} \right.
⎩
⎨
⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.⎩
⎨
⎧h1=2x2−x1=0g1=x1−1≤0g2=−1−x1≤0g3=x2−1≤0g4=−1−x2≤0g5=x3−1≤0g6=−1−x3≤0
构建拉格朗日函数
L
(
x
1
,
x
2
,
x
3
,
λ
,
μ
1
,
μ
2
,
μ
3
,
μ
4
,
μ
5
,
μ
6
)
L(x_1,x_2,x_3,\lambda,\mu_1,\mu_2,\mu_3,\mu_4,\mu_5,\mu_6)
L(x1,x2,x3,λ,μ1,μ2,μ3,μ4,μ5,μ6),并列写KKT条件,如下所示:
构建
:
L
=
f
+
λ
1
h
1
+
μ
1
g
1
+
μ
2
g
2
+
μ
3
g
3
+
μ
4
g
4
+
μ
5
g
5
+
μ
6
g
6
→
K
K
T
:
{
∂
L
/
x
k
=
0
,
k
=
1
,
2
,
3
h
1
=
0
μ
j
g
j
=
0
,
j
=
1
,
2
,
3
,
4
,
5
,
6
g
j
≤
0
,
j
=
1
,
2
,
3
,
4
,
5
,
6
μ
j
≥
0
,
j
=
1
,
2
,
3
,
4
,
5
,
6
构建:L = f + {\lambda _1}{h_1} + {\mu _1}{g_1} + {\mu _2}{g_2} + {\mu _3}{g_3} + {\mu _4}{g_4} + {\mu _5}{g_5} + {\mu _6}{g_6} \to KKT:\left\{ \begin{array}{l} \partial L/{x_k} = 0,k = 1,2,3\\ {h_1} = 0\\ {\mu _j}{g_j} = 0,j = 1,2,3,4,5,6\\ {g_j} \le 0,j = 1,2,3,4,5,6\\ {\mu _j} \ge 0,j = 1,2,3,4,5,6 \end{array} \right.
构建:L=f+λ1h1+μ1g1+μ2g2+μ3g3+μ4g4+μ5g5+μ6g6→KKT:⎩
⎨
⎧∂L/xk=0,k=1,2,3h1=0μjgj=0,j=1,2,3,4,5,6gj≤0,j=1,2,3,4,5,6μj≥0,j=1,2,3,4,5,6
显然,我们按照 μ j g j = 0 , j = 1 , 2 , 3 , 4 , 5 , 6 {\mu _j}{g_j} = 0,j = 1,2,3,4,5,6 μjgj=0,j=1,2,3,4,5,6进行分类讨论,就可以求出最终解。但是,这样的话,需要对 2 6 = 64 2^6=64 26=64种情况进行分类讨论,对于在线求解或手动求解,可以配合几何方法进行简化,从而提高计算效率。
首先考虑目标函数的几何意义,
f
f
f表示三维空间下的椭球面,椭球的中心可以通过下式进行求解,这也就是无约束情况下的最优解,即全空间范围内的最优解。
{
∂
f
/
x
1
=
2
x
1
−
4
=
0
∂
f
/
x
2
=
4
x
2
−
4
=
0
∂
f
/
x
3
=
8
x
3
=
0
→
{
x
1
=
2
x
2
=
1
x
3
=
0
\left\{ \begin{array}{l} \partial f/{x_1} = 2{x_1} - 4 = 0\\ \partial f/{x_2} = 4{x_2} - 4 = 0\\ \partial f/{x_3} = 8{x_3} = 0 \end{array} \right. \to \left\{ \begin{array}{l} {x_1} = 2\\ {x_2} = 1\\ {x_3} = 0 \end{array} \right.
⎩
⎨
⎧∂f/x1=2x1−4=0∂f/x2=4x2−4=0∂f/x3=8x3=0→⎩
⎨
⎧x1=2x2=1x3=0
约束的存在相当于对可行域进行几何约束。考虑六个不等式约束,构成了三维空间中的一个正方体,该正方体内即为不等式可行域。进一步考虑等式约束,对应三维空间中的一个平面。因此,等式约束和不等式约束共同确定了一个截面,即正方体与平面的交集,为最终可行域。
由于函数为凸函数,最优解一定会发生在(1)优先,椭圆球与截面的切点位置;(2)其次,椭圆球与四个边界的切点位置;(3)最后,四个顶点位置。这取决于,切点是否在该截面范围内。首先求解椭球与截面的切点,如下所示:
{
min
x
1
,
x
2
,
x
3
,
x
4
f
=
x
1
2
+
2
x
2
2
+
4
x
3
2
−
4
x
1
−
4
x
2
s
.
t
.
{
2
x
2
−
x
1
=
0
→
x
1
=
2
,
x
2
=
1
,
x
3
=
0
,
超出截面
\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ {2{x_2} - {x_1} = 0} \right.} \end{array}} \right. \to {x_1} = 2,{x_2} = 1,{x_3} = 0,超出截面
{x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{2x2−x1=0→x1=2,x2=1,x3=0,超出截面
进一步,求解椭球与四条边界的切线,如下所示。对于两种有可能的情况,可以直接对比他们所对应的
f
f
f值的大小,显然情况一更优。因此,情况一所对应的解即为最优解。
情况
1
:
{
min
x
1
,
x
2
,
x
3
,
x
4
f
=
x
1
2
+
2
x
2
2
+
4
x
3
2
−
4
x
1
−
4
x
2
s
.
t
.
{
2
x
2
−
x
1
=
0
x
1
−
1
=
0
→
x
1
=
1
,
x
2
=
0.5
,
x
3
=
0
,有可能
情况1:\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ \begin{array}{l} 2{x_2} - {x_1} = 0\\ {x_1} - 1 = 0 \end{array} \right.} \end{array}} \right. \to {x_1} = 1,{x_2} = 0.5,{x_3} = 0,有可能
情况1:⎩
⎨
⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{2x2−x1=0x1−1=0→x1=1,x2=0.5,x3=0,有可能
情况 2 : { min x 1 , x 2 , x 3 , x 4 f = x 1 2 + 2 x 2 2 + 4 x 3 2 − 4 x 1 − 4 x 2 s . t . { 2 x 2 − x 1 = 0 x 1 + 1 = 0 → x 1 = − 1 , x 2 = − 0.5 , x 3 = 0 ,有可能 情况2:\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ \begin{array}{l} 2{x_2} - {x_1} = 0\\ {x_1} + 1 = 0 \end{array} \right.} \end{array}} \right. \to {x_1} = - 1,{x_2} = - 0.5,{x_3} = 0,有可能 情况2:⎩ ⎨ ⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{2x2−x1=0x1+1=0→x1=−1,x2=−0.5,x3=0,有可能
情况 3 : { min x 1 , x 2 , x 3 , x 4 f = x 1 2 + 2 x 2 2 + 4 x 3 2 − 4 x 1 − 4 x 2 s . t . { 2 x 2 − x 1 = 0 x 3 − 1 = 0 → x 1 = 2 , x 2 = 1 , x 3 = 1 , 超出可行域 情况3:\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ \begin{array}{l} 2{x_2} - {x_1} = 0\\ {x_3} - 1 = 0 \end{array} \right.} \end{array}} \right. \to {x_1} = 2,{x_2} = 1,{x_3} = 1,超出可行域 情况3:⎩ ⎨ ⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{2x2−x1=0x3−1=0→x1=2,x2=1,x3=1,超出可行域
情况 4 : { min x 1 , x 2 , x 3 , x 4 f = x 1 2 + 2 x 2 2 + 4 x 3 2 − 4 x 1 − 4 x 2 s . t . { 2 x 2 − x 1 = 0 x 3 + 1 = 0 → x 1 = 2 , x 2 = 1 , x 3 = − 1 , 超出可行域 情况4:\left\{ {\begin{array}{l} {\mathop {\min }\limits_{{x_1},{x_2},{x_3},{x_4}} f = {x_1}^2 + 2{x_2}^2 + 4{x_3}^2 - 4{x_1} - 4{x_2}}\\ {\;\;\;\;\;s.t.\;\;\;\;\;\;\left\{ \begin{array}{l} 2{x_2} - {x_1} = 0\\ {x_3} + 1 = 0 \end{array} \right.} \end{array}} \right. \to {x_1} = 2,{x_2} = 1,{x_3} = - 1,超出可行域 情况4:⎩ ⎨ ⎧x1,x2,x3,x4minf=x12+2x22+4x32−4x1−4x2s.t.{2x2−x1=0x3+1=0→x1=2,x2=1,x3=−1,超出可行域