一、数学规划
从一个可行解的集合中,寻找出最优的元素,称为数学规划,又名优化。可以写为
m
i
n
i
m
i
z
e
f
0
(
x
)
s
u
b
j
e
c
t
t
o
f
i
(
x
)
<
=
b
i
,
i
=
1
,
2
,
.
.
.
,
n
\begin{aligned} & minimize\ f_0(x) \\ &subject\ to\ f_i(x) <= b_i, i = 1, 2, ..., n \end{aligned}
minimize f0(x)subject to fi(x)<=bi,i=1,2,...,n其中
x
=
[
x
1
,
.
.
.
,
x
n
]
T
x = [x_1, ... , x_n]^T
x=[x1,...,xn]T,称为优化变量;
f
0
:
R
n
→
R
f_0: \bm{R}^n \rightarrow \bm{R}
f0:Rn→R称为目标函数;
f
i
:
R
n
→
R
f_i: \bm{R}^n \rightarrow \bm{R}
fi:Rn→R称为不等式约束。优化问题的最优解为
x
∗
x^*
x∗,等价于
∀
z
∈
{
z
∣
f
i
(
z
)
<
=
b
i
}
,
f
0
(
z
)
>
=
f
0
(
x
∗
)
\forall z \in \{z|f_i(z) <= b_i\}, f_0(z)>=f_0(x^*)
∀z∈{z∣fi(z)<=bi},f0(z)>=f0(x∗)在图像处理中,对图像
I
0
(
x
,
y
)
I_0(x,y)
I0(x,y)存在噪声,考虑恢复图像
I
(
x
,
y
)
I(x,y)
I(x,y)。考虑先验知识图像的分片光滑,则认为TV范数,形如
∣
∣
I
∣
∣
T
V
=
∑
y
∑
x
[
(
I
(
x
,
y
)
−
I
(
x
,
y
−
1
)
)
2
+
(
I
(
x
,
y
)
−
I
(
x
−
1
,
y
)
)
2
]
1
/
2
||I||_{TV} = \sum_y\sum_x[(I(x,y) - I(x,y - 1))^2 + (I(x,y) - I(x - 1,y))^2]^{1/2}
∣∣I∣∣TV=y∑x∑[(I(x,y)−I(x,y−1))2+(I(x,y)−I(x−1,y))2]1/2表示两方向差分的平方和开方求和,对于自然图像,TV范数一般较小。故可以化为优化问题:
m
i
n
i
m
i
z
e
∣
∣
I
∣
∣
T
V
+
λ
∣
∣
I
−
I
0
∣
∣
F
2
\begin{aligned} & minimize\ ||I||_{TV} + λ||I - I_0||^2_F \end{aligned}
minimize ∣∣I∣∣TV+λ∣∣I−I0∣∣F2以确保恢复得到的图像光滑的同时,噪声图像与恢复图像相对接近。
数学规划可以以不同角度分类,包括线性规划问题与非线性规划问题,该类以约束的线性判定;凸优化与非凸优化,该类以约束的凸性判定,凸优化与非凸优化有本质上的差别,线性规划是典型的凸优化问题;光滑优化与非光滑优化,该类以目标函数的可微性判定;连续优化与离散优化,该类以可行域判定,离散优化一般情况下的非凸优化问题;单目标与多目标问题,该类以目标函数的数量判定。
二、仿射集
首先考虑空间不同的两点
x
1
,
x
2
∈
R
n
\bm{x_1}, \bm{x_2} \in \bm{R}^n
x1,x2∈Rn,为了表示过这两点的直线方程,定义变量
θ
∈
R
θ \in \bm{R}
θ∈R,则该直线为
y
=
θ
x
1
+
(
1
−
θ
)
x
2
\bm{y} = θ\bm{x_1} + (1 - θ)\bm{x_2}
y=θx1+(1−θ)x2再考虑线段,考虑空间不同的两点
x
1
,
x
2
∈
R
n
\bm{x_1}, \bm{x_2} \in \bm{R}^n
x1,x2∈Rn,
θ
∈
R
θ \in \bm{R}
θ∈R,线段可以表示为
y
=
θ
x
1
+
(
1
−
θ
)
x
2
,
θ
∈
[
0
,
1
]
\bm{y} = θ\bm{x_1} + (1 - θ)\bm{x_2},θ\in [0,1]
y=θx1+(1−θ)x2,θ∈[0,1]基于此,定义仿射集为,对于集合
C
\bm{C}
C,
∀
x
1
,
x
2
∈
C
\forall x_1, x_2 \in \bm{C}
∀x1,x2∈C,连接
x
1
x_1
x1与
x
2
x_2
x2的直线也在
C
\bm{C}
C内,则称该集合为仿射集。该定义推广到n元仍然有效。
首先考察仿射集的性质。取仿射集
C
\bm{C}
C,定义
V
=
C
−
x
0
=
{
x
−
x
0
∣
x
∈
C
,
∀
x
0
∈
C
}
\bm{V} = \bm{C} - x_0 = \{x - x_0|x \in \bm{C}, \forall x_0 \in \bm{C} \}
V=C−x0={x−x0∣x∈C,∀x0∈C}称
V
\bm{V}
V为与
C
\bm{C}
C相关的子空间。
V
\bm{V}
V亦是一个仿射集,考虑
∀
v
1
,
v
2
∈
V
\forall v_1, v_2 \in \bm{V}
∀v1,v2∈V,
∀
a
,
b
∈
R
\forall a, b \in \bm{R}
∀a,b∈R,考察
a
v
1
+
b
v
2
+
x
0
av_1 + bv_2 + x_0
av1+bv2+x0与
C
\bm{C}
C的关系,有
a
v
1
+
b
v
2
+
x
0
=
a
(
v
1
+
x
0
)
+
b
(
v
2
+
x
0
)
+
(
1
−
a
−
b
)
x
0
av_1 + bv_2 + x_0 = a(v_1 + x_0) + b(v_2 + x_0) + (1 - a - b) x_0
av1+bv2+x0=a(v1+x0)+b(v2+x0)+(1−a−b)x0而其中易知
v
1
+
x
0
∈
C
v_1 + x_0 \in \bm{C}
v1+x0∈C,
v
2
+
x
0
∈
C
v_2 + x_0 \in \bm{C}
v2+x0∈C,
x
0
∈
C
x_0 \in \bm{C}
x0∈C,故
a
v
1
+
b
v
2
+
x
0
∈
C
av_1 + bv_2 + x_0 \in \bm{C}
av1+bv2+x0∈C,即
a
v
1
+
b
v
2
∈
V
av_1 + bv_2 \in \bm{V}
av1+bv2∈V。故
V
\bm{V}
V的性质为
∀
v
1
,
v
2
∈
V
,
∀
a
,
b
∈
R
,
a
v
1
+
b
v
2
+
x
0
∈
C
\forall v_1, v_2 \in \bm{V}, \forall a, b \in \bm{R}, av_1 + bv_2 + x_0 \in \bm{C}
∀v1,v2∈V,∀a,b∈R,av1+bv2+x0∈C在几何空间中体现为
C
\bm{C}
C为任意超平面,而
V
\bm{V}
V与
C
\bm{C}
C平行且过原点。
考虑
C
=
{
X
∣
A
X
=
b
}
\bm{C} = \{\bm{X}|\bm{A}\bm{X} = \bm{b}\}
C={X∣AX=b},并
∀
X
1
,
X
2
∈
C
\forall\bm{X}_1, \bm{X}_2 \in \bm{C}
∀X1,X2∈C,则有
A
X
1
=
b
A
X
2
=
b
\bm{A}\bm{X}_1 = \bm{b} \\ \bm{A}\bm{X}_2 = \bm{b}
AX1=bAX2=b再
∀
θ
∈
R
\forallθ \in R
∀θ∈R,则有
θ
A
X
1
=
b
(
1
−
θ
)
A
X
2
=
b
θ\bm{A}\bm{X}_1 = \bm{b} \\ (1 - θ)\bm{A}\bm{X}_2 = \bm{b}
θAX1=b(1−θ)AX2=b故有
θ
A
X
1
+
(
1
−
θ
)
A
X
2
=
b
θ\bm{A}\bm{X}_1 + (1 - θ)\bm{A}\bm{X}_2 = \bm{b}
θAX1+(1−θ)AX2=b即
θ
X
1
+
(
1
−
θ
)
X
2
∈
C
θ\bm{X}_1 + (1 - θ)\bm{X}_2 \in \bm{C}
θX1+(1−θ)X2∈C,因此线性方程组的解集是一个仿射集。考虑该解集的子空间
V
=
{
X
−
X
0
∣
A
X
=
b
}
,
A
X
0
=
b
\bm{V} = \{\bm{X} - \bm{X}_0 | \bm{A}\bm{X} = \bm{b}\}, \bm{A}\bm{X}_0 = \bm{b}
V={X−X0∣AX=b},AX0=b即
V
=
{
X
−
X
0
∣
A
(
X
−
X
0
)
=
0
}
\bm{V} = \{\bm{X} - \bm{X}_0 | \bm{A}(\bm{X} - \bm{X}_0) = \bm{0}\}
V={X−X0∣A(X−X0)=0}考虑
Y
=
X
−
X
0
\bm{Y} = \bm{X} - \bm{X}_0
Y=X−X0则
V
=
{
Y
∣
A
Y
=
0
}
\bm{V} = \{\bm{Y} | \bm{A}\bm{Y} = \bm{0}\}
V={Y∣AY=0},即在高维空间中仍满足
V
\bm{V}
V与
C
\bm{C}
C平行并且过原点。
考虑任意集合
C
\bm{C}
C,为了构造该集合的最小仿射集,定义仿射包
a
f
f
C
=
{
θ
1
X
1
+
.
.
.
+
θ
k
X
k
∣
∀
X
1
,
.
.
.
,
X
k
∈
C
,
θ
1
+
.
.
.
+
θ
k
=
1
}
aff\ \bm{C} = \{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1 + ... + θ_k = 1 \}
aff C={θ1X1+...+θkXk∣∀X1,...,Xk∈C,θ1+...+θk=1}
三、凸集
对于集合
C
\bm{C}
C,
∀
x
1
,
x
2
∈
C
\forall x_1, x_2 \in \bm{C}
∀x1,x2∈C,连接
x
1
x_1
x1与
x
2
x_2
x2的线段也在
C
\bm{C}
C内,则称该集合为凸集。该定义推广到n元仍然有效。仿射集是一种特殊的凸集。
考虑任意集合
C
\bm{C}
C,为了构造该集合的最小凸集,定义凸包
C
o
n
v
C
=
{
θ
1
X
1
+
.
.
.
+
θ
k
X
k
∣
∀
X
1
,
.
.
.
,
X
k
∈
C
,
θ
1
,
.
.
.
,
θ
k
∈
[
0
,
1
]
,
θ
1
+
.
.
.
+
θ
k
=
1
}
Conv\ \bm{C} = \{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1, ... , θ_k \in [0, 1], θ_1 + ... + θ_k = 1 \}
Conv C={θ1X1+...+θkXk∣∀X1,...,Xk∈C,θ1,...,θk∈[0,1],θ1+...+θk=1} 对于集合
C
\bm{C}
C,
∀
x
∈
C
\forall \bm{x} \in \bm{C}
∀x∈C,对
θ
>
=
0
θ >= 0
θ>=0,有
θ
x
∈
C
θ\bm{x} \in \bm{C}
θx∈C,则称该集合为锥,锥一定经过原点。而对于集合
C
\bm{C}
C,
∀
x
1
,
x
2
∈
C
\forall \bm{x}_1, \bm{x}_2 \in \bm{C}
∀x1,x2∈C,对
θ
1
,
θ
2
>
=
0
θ_1, θ_2 >= 0
θ1,θ2>=0,有
θ
1
x
1
+
θ
2
x
2
∈
C
θ_1\bm{x}_1 + θ_2\bm{x}_2 \in \bm{C}
θ1x1+θ2x2∈C,则称该集合为凸锥。考虑任意集合
C
\bm{C}
C,可以定义凸锥包
{
θ
1
X
1
+
.
.
.
+
θ
k
X
k
∣
∀
X
1
,
.
.
.
,
X
k
∈
C
,
θ
1
,
.
.
.
,
θ
k
>
=
0
}
\{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1, ... , θ_k >= 0\}
{θ1X1+...+θkXk∣∀X1,...,Xk∈C,θ1,...,θk>=0} 凸集中有几种特殊的形式:一个点是仿射集、凸集,但仅有原点是凸锥;空集是仿射集、凸集、凸锥;
R
n
\bm{R}^n
Rn空间是仿射集、凸集、凸锥;
R
n
\bm{R}^n
Rn的子空间是仿射集、凸集、凸锥;任意直线是仿射集、凸集,过原点的直线式凸锥;任意线段是凸集,点是仿射集,原点是凸锥; 任意射线是凸集,点是仿射集,过原点的射线是凸锥。
接下来考虑复杂情况。考虑超平面
{
x
∣
a
T
x
=
b
,
a
,
x
∈
R
n
,
a
≠
0
,
b
∈
R
}
\{\bm{x}|\bm{a}^T\bm{x} = b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\}
{x∣aTx=b,a,x∈Rn,a=0,b∈R},在低维中表现为直线、平面。超平面是仿射集,凸集,超平面过原点,即子空间是一个凸锥。而半空间
{
x
∣
a
T
x
>
b
,
a
,
x
∈
R
n
,
a
≠
0
,
b
∈
R
}
\{\bm{x}|\bm{a}^T\bm{x} > b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\}
{x∣aTx>b,a,x∈Rn,a=0,b∈R}或
{
x
∣
a
T
x
<
=
b
,
a
,
x
∈
R
n
,
a
≠
0
,
b
∈
R
}
\{\bm{x}|\bm{a}^T\bm{x} <= b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\}
{x∣aTx<=b,a,x∈Rn,a=0,b∈R}是一个凸集,不是一个仿射集,过原点时是一个凸锥;球
{
x
∣
∣
∣
x
−
x
c
∣
∣
2
<
=
r
,
x
c
∈
R
n
}
\{\bm{x}|\ ||\bm{x} - \bm{x}_c||_2 <= r, \bm{x}_c \in \bm{R}^n\}
{x∣ ∣∣x−xc∣∣2<=r,xc∈Rn},低维中表现为圆、球体,是凸集,点是仿射集,原点是凸锥。考虑证明球是凸集,取球
B
(
x
,
x
c
)
=
{
x
∣
∣
∣
x
−
x
c
∣
∣
2
<
=
r
,
x
c
∈
R
n
}
B(\bm{x}, \bm{x}_c) = \{\bm{x}|\ ||\bm{x} - \bm{x}_c||_2 <= r, \bm{x}_c \in \bm{R}^n\}
B(x,xc)={x∣ ∣∣x−xc∣∣2<=r,xc∈Rn},
∀
x
1
,
x
2
∈
B
\forall \bm{x}_1, \bm{x}_2 \in B
∀x1,x2∈B,有
∣
∣
x
1
−
x
c
∣
∣
2
<
=
r
,
∣
∣
x
2
−
x
c
∣
∣
2
<
=
r
||\bm{x}_1 - \bm{x}_c||_2 <= r, ||\bm{x}_2 - \bm{x}_c||_2 <= r
∣∣x1−xc∣∣2<=r,∣∣x2−xc∣∣2<=r,考虑
θ
∈
[
0
,
1
]
θ \in [0, 1]
θ∈[0,1],有
∣
∣
θ
x
1
+
(
1
−
θ
)
x
2
−
x
c
∣
∣
2
=
∣
∣
θ
(
x
1
−
x
c
)
+
(
1
−
θ
)
(
x
2
−
x
c
)
∣
∣
2
≤
∣
∣
θ
(
x
1
−
x
c
)
∣
∣
2
+
∣
∣
(
1
−
θ
)
(
x
2
−
x
c
)
∣
∣
2
=
θ
∣
∣
(
x
1
−
x
c
)
∣
∣
2
+
(
1
−
θ
)
∣
∣
(
x
2
−
x
c
)
∣
∣
2
≤
r
\begin{aligned} &||θ\bm{x}_1 + (1 - θ)\bm{x}_2 - \bm{x}_c||_2 \\ =\ & ||θ(\bm{x}_1 - \bm{x}_c) + (1 - θ)(\bm{x}_2 - \bm{x}_c)||_2 \\ \le\ & ||θ(\bm{x}_1 - \bm{x}_c)||_2 + ||(1 - θ)(\bm{x}_2 - \bm{x}_c)||_2 \\ =\ & θ||(\bm{x}_1 - \bm{x}_c)||_2 + (1 - θ)||(\bm{x}_2 - \bm{x}_c)||_2 \\ \le\ & r \end{aligned}
= ≤ = ≤ ∣∣θx1+(1−θ)x2−xc∣∣2∣∣θ(x1−xc)+(1−θ)(x2−xc)∣∣2∣∣θ(x1−xc)∣∣2+∣∣(1−θ)(x2−xc)∣∣2θ∣∣(x1−xc)∣∣2+(1−θ)∣∣(x2−xc)∣∣2r即球中元素的凸组合仍在球内,球是凸集;椭球
{
x
∣
(
x
−
x
c
)
T
P
−
1
(
x
−
x
c
)
<
=
1
,
x
c
∈
R
n
,
P
∈
S
+
+
n
}
\{\bm{x}|\ (\bm{x} - \bm{x}_c)^T\bm{P}^{-1}(\bm{x} - \bm{x}_c) <= 1, \bm{x}_c \in \bm{R}^n, \bm{P} \in \bm{S}^n_{++}\}
{x∣ (x−xc)TP−1(x−xc)<=1,xc∈Rn,P∈S++n},其中
S
+
+
n
\bm{S}^n_{++}
S++n表示n维正定对称矩阵,
P
\bm{P}
P决定了椭球的半轴长。考虑椭球
{
x
∣
(
x
−
x
c
)
T
(
4
0
0
1
)
−
1
(
x
−
x
c
)
<
=
1
}
\{\bm{x}|\ (\bm{x} - \bm{x}_c)^T\left( \begin{matrix}4 & 0 \\ 0 & 1 \end{matrix} \right )^{-1}(\bm{x} - \bm{x}_c) <= 1\}
{x∣ (x−xc)T(4001)−1(x−xc)<=1},展开得
{
(
x
1
,
x
2
)
∣
1
/
4
x
1
2
+
x
2
2
<
=
1
}
\{(x_1, x_2)|1/4x_1^2 + x_2^2 <= 1\}
{(x1,x2)∣1/4x12+x22<=1}。椭球是凸集;多面体
{
x
∣
a
T
x
<
=
b
j
,
j
=
1
,
2
,
.
.
.
m
,
a
T
x
=
d
j
,
j
=
1
,
2
,
.
.
.
p
}
\{\bm{x}|\bm{a}^T\bm{x} <= b_j, j = 1, 2, ...m, \bm{a}^T\bm{x} = d_j, j = 1, 2, ...p\}
{x∣aTx<=bj,j=1,2,...m,aTx=dj,j=1,2,...p},可以无界,多面体是凸集;单纯形,在
R
n
\bm{R}^n
Rn空间中选择
v
0
,
.
.
.
,
v
k
\bm{v}_0, ..., \bm{v}_k
v0,...,vk共k+1个点,
v
1
−
v
0
,
.
.
.
,
v
k
−
v
0
\bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0
v1−v0,...,vk−v0线性无关,则与上述点相关的单纯形为
C
o
n
v
{
v
0
,
.
.
.
v
k
}
=
{
θ
0
v
0
+
.
.
.
+
θ
k
v
k
,
θ
>
=
0
,
1
T
θ
=
1
}
Conv\{\bm{v}_0, ... \bm{v}_k\} = \{θ_0\bm{v}_0 + ... + θ_k\bm{v}_k, θ >= 0, \bm{1}^Tθ = 1\}
Conv{v0,...vk}={θ0v0+...+θkvk,θ>=0,1Tθ=1}。考虑二维情况,
k
=
1
k = 1
k=1时为线段,
k
=
2
k = 2
k=2时为三角形,
k
>
=
3
k >= 3
k>=3时
{
v
k
}
\{\bm{v}_k\}
{vk}不能线性无关。考虑三维情况,单纯形是线段、三角形、正四面体。单纯形一定是一个多面体,考虑证明该命题。记单纯形
C
C
C,
x
∈
C
,
x
=
θ
0
v
0
+
.
.
.
+
θ
k
v
k
,
θ
>
=
0
,
1
T
θ
=
1
\bm{x} \in C, \bm{x} = θ_0\bm{v}_0 + ... + θ_k\bm{v}_k, θ >= 0, \bm{1}^Tθ = 1
x∈C,x=θ0v0+...+θkvk,θ>=0,1Tθ=1,
v
1
−
v
0
,
.
.
.
,
v
k
−
v
0
\bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0
v1−v0,...,vk−v0线性无关。取
(
θ
1
,
.
.
.
,
θ
k
)
T
=
y
,
(
v
1
−
v
0
,
.
.
.
,
v
k
−
v
0
)
=
B
∈
R
n
×
k
(θ_1, ..., θ_k)^T = \bm{y}, (\bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0) = \bm{B} \in \bm{R}^{n × k}
(θ1,...,θk)T=y,(v1−v0,...,vk−v0)=B∈Rn×k,则
1
T
y
<
=
1
,
y
>
=
0
\bm{1}^T\bm{y} <= 1, \bm{y} >= 0
1Ty<=1,y>=0,则
x
=
θ
0
v
0
+
.
.
.
+
θ
k
v
k
=
v
0
+
θ
1
(
v
1
−
v
0
)
+
.
.
.
+
θ
k
(
v
k
−
v
0
)
=
v
0
+
B
y
\begin{aligned} \bm{x} &= θ_0\bm{v}_0 + ... + θ_k\bm{v}_k \\ &= \bm{v}_0 + θ_1(\bm{v}_1 - \bm{v}_0) + ... + θ_k(\bm{v}_k - \bm{v}_0) \\ &= \bm{v}_0 + \bm{B}\bm{y} \end{aligned}
x=θ0v0+...+θkvk=v0+θ1(v1−v0)+...+θk(vk−v0)=v0+By其中,
R
a
n
k
(
B
n
×
k
)
=
k
,
k
<
=
n
Rank(\bm{B}_{n×k}) = k, k<=n
Rank(Bn×k)=k,k<=n,则有非奇异矩阵
A
=
(
A
1
A
2
)
∈
R
n
×
n
\bm{A} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right ) \in \bm{R}^{n×n}
A=(A1A2)∈Rn×n,使得
A
B
=
(
A
1
A
2
)
B
=
(
I
k
0
)
\bm{A}\bm{B} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{B} = \left( \begin{matrix}\bm{I}_k \\ \bm{0}\end{matrix} \right )
AB=(A1A2)B=(Ik0)。故有
A
x
=
A
v
0
+
A
B
y
\bm{A}\bm{x} = \bm{A}\bm{v}_0 + \bm{A}\bm{B}\bm{y}
Ax=Av0+ABy,即
(
A
1
A
2
)
x
=
(
A
1
A
2
)
v
0
+
(
A
1
A
2
)
B
y
\left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{x} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{v}_0 + \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{B}\bm{y}
(A1A2)x=(A1A2)v0+(A1A2)By展开得
A
1
x
=
A
1
v
0
+
y
A
2
x
=
A
2
v
0
\bm{A}_1\bm{x} = \bm{A}_1\bm{v}_0 +\bm{y} \\ \bm{A}_2\bm{x} = \bm{A}_2\bm{v}_0
A1x=A1v0+yA2x=A2v0考虑
1
T
y
<
=
1
,
y
>
=
0
\bm{1}^T\bm{y} <= 1, \bm{y} >= 0
1Ty<=1,y>=0,有
A
1
x
>
=
A
1
v
0
1
T
A
1
x
<
=
1
T
A
1
v
0
+
1
A
2
x
=
A
2
v
0
\bm{A}_1\bm{x} >= \bm{A}_1\bm{v}_0\\ \bm{1}^T\bm{A}_1\bm{x} <= \bm{1}^T\bm{A}_1\bm{v}_0 + 1 \\ \bm{A}_2\bm{x} = \bm{A}_2\bm{v}_0
A1x>=A1v01TA1x<=1TA1v0+1A2x=A2v0得证。考虑对称矩阵集合
S
n
\bm{S}^n
Sn,对称半正定矩阵集合
S
+
n
\bm{S}^n_+
S+n与对称正定矩阵集合
S
+
n
+
\bm{S}^n_++
S+n+。现在证明
S
+
n
\bm{S}^n_+
S+n是凸集、凸锥。
∀
θ
1
,
θ
2
>
=
0
,
∀
A
,
B
∈
S
+
n
\forall \theta_1, \theta_2 >= 0, \forall \bm{A}, \bm{B} \in \bm{S}^n_+
∀θ1,θ2>=0,∀A,B∈S+n,则有
∀
X
∈
R
n
,
X
T
A
X
>
=
0
,
X
T
B
X
>
=
0
\forall \bm{X} \in \bm{R}^n, \bm{X}^T\bm{A}\bm{X} >=0, \bm{X}^T\bm{B}\bm{X} >= 0
∀X∈Rn,XTAX>=0,XTBX>=0,则
X
T
(
θ
1
A
+
θ
2
B
)
X
=
θ
1
X
T
A
X
+
θ
2
X
T
B
X
>
=
0
\bm{X}^T(\theta_1\bm{A} + \theta_2\bm{B})\bm{X} = \theta_1\bm{X}^T\bm{A}\bm{X} + \theta_2\bm{X}^T\bm{B}\bm{X} >= 0
XT(θ1A+θ2B)X=θ1XTAX+θ2XTBX>=0,即对称半正定矩阵是凸锥。但对称正定矩阵不是凸锥,但是一个凸集。
四、保凸运算
若
S
1
,
S
2
S_1, S_2
S1,S2是凸集,则
S
1
∩
S
2
S_1 \cap S_2
S1∩S2是凸集,该结论可以推广到n个凸集的情况。
考虑函数
f
(
x
)
=
A
x
+
b
,
A
∈
R
m
×
n
,
b
∈
R
m
f(\bm{x}) = \bm{A}\bm{x} + \bm{b}, \bm{A} \in \bm{R}^{m×n}, \bm{b} \in \bm{R}^m
f(x)=Ax+b,A∈Rm×n,b∈Rm,则
f
:
R
n
→
R
m
f: \bm{R}^n \rightarrow \bm{R}^m
f:Rn→Rm是仿射函数。若
S
∈
R
n
S \in \bm{R}^n
S∈Rn是凸集,
f
:
R
n
→
R
m
f: \bm{R}^n \rightarrow \bm{R}^m
f:Rn→Rm是仿射函数,则
f
(
S
)
=
{
f
(
x
)
∣
x
∈
S
}
f(S) = \{f(\bm{x})|\bm{x} \in S\}
f(S)={f(x)∣x∈S}是凸集,逆仿射函数
f
−
1
f^{-1}
f−1仍然是凸集。
若
S
1
,
S
2
S_1, S_2
S1,S2是凸集,则
{
x
+
y
∣
x
∈
S
1
,
y
∈
S
2
}
\{x+y|x \in S_1, y \in S_2\}
{x+y∣x∈S1,y∈S2}是凸集,
{
(
x
,
y
)
∣
x
∈
S
1
,
y
∈
S
2
}
\{(x, y)|x \in S_1, y \in S_2\}
{(x,y)∣x∈S1,y∈S2}是凸集。
考虑线性矩阵不等式【LMI】
A
(
X
)
=
X
1
A
1
+
.
.
.
+
X
n
A
n
⪯
B
,
B
,
A
i
,
X
i
∈
S
m
A(\bm{X}) = \bm{X}_1\bm{A}_1 + ... + \bm{X}_n\bm{A}_n \preceq \bm{B}, \bm{B}, \bm{A}_i, \bm{X}_i \in \bm{S}^m
A(X)=X1A1+...+XnAn⪯B,B,Ai,Xi∈Sm,其中
A
(
X
)
⪯
B
A(\bm{X}) \preceq \bm{B}
A(X)⪯B表示
(
A
(
X
)
−
B
)
(A(\bm{X}) - \bm{B})
(A(X)−B)是半负定矩阵,则
{
X
∣
A
(
X
)
⪯
B
}
\{\bm{X}|A(\bm{X})\preceq\bm{B}\}
{X∣A(X)⪯B}是一个凸集。考虑仿射变换
f
(
X
)
=
B
−
A
(
X
)
f(\bm{X}) = \bm{B} - A(\bm{X})
f(X)=B−A(X),而对称半正定矩阵是凸锥,则有
f
−
1
(
S
+
n
)
=
{
X
∣
B
−
A
(
X
)
⪰
0
}
f^{-1}(\bm{S}^n_+) = \{\bm{X} | \bm{B} - A(\bm{X}) \succeq 0\}
f−1(S+n)={X∣B−A(X)⪰0}也是凸集,即LMI的解集也是凸集。
考虑函数
p
(
z
,
t
)
=
z
/
t
,
z
∈
R
n
,
t
∈
R
+
+
p(\bm{z}, t) = \bm{z}/t, \bm{z}\in \bm{R}^n, t\in R_{++}
p(z,t)=z/t,z∈Rn,t∈R++,则称该函数为透视函数。若
(
z
,
t
)
(\bm{z}, t)
(z,t)是凸集,则其透视函数
p
(
z
,
t
)
p(\bm{z}, t)
p(z,t)是凸集。考虑高维的两点
x
,
y
∈
R
n
+
1
\bm{x}, \bm{y} \in \bm{R}^{n+1}
x,y∈Rn+1,则经过这两点的线段为
θ
x
+
(
1
−
θ
)
y
θ\bm{x} + (1 - θ)\bm{y}
θx+(1−θ)y,其透视函数为
p
(
θ
x
+
(
1
−
θ
)
y
)
=
(
θ
x
′
+
(
1
−
θ
)
y
′
)
/
(
θ
x
n
+
1
+
(
1
−
θ
)
y
n
+
1
)
=
θ
x
n
+
1
/
(
θ
x
n
+
1
+
(
1
−
θ
)
y
n
+
1
)
(
x
′
/
x
n
+
1
)
+
(
1
−
θ
)
y
n
+
1
/
(
θ
x
n
+
1
+
(
1
−
θ
)
y
n
+
1
)
(
y
′
/
y
n
+
1
)
=
μ
p
(
x
′
,
x
n
+
1
)
+
(
1
−
μ
)
(
y
′
,
y
n
+
1
)
\begin{aligned}p(θ\bm{x} + (1 - θ)\bm{y}) &= (θ\bm{x}' + (1 - θ)\bm{y}')/(θx_{n+1} + (1 - θ)y_{n+1}) \\&= θx_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1}) (\bm{x}'/x_{n+1}) + (1 - θ)y_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1}) (\bm{y}'/y_{n+1}) \\&= μp(\bm{x}', x_{n+1}) + (1 - μ)(\bm{y}', y_{n+1}) \end{aligned}
p(θx+(1−θ)y)=(θx′+(1−θ)y′)/(θxn+1+(1−θ)yn+1)=θxn+1/(θxn+1+(1−θ)yn+1)(x′/xn+1)+(1−θ)yn+1/(θxn+1+(1−θ)yn+1)(y′/yn+1)=μp(x′,xn+1)+(1−μ)(y′,yn+1)其中
μ
=
θ
x
n
+
1
/
(
θ
x
n
+
1
+
(
1
−
θ
)
y
n
+
1
)
μ = θx_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1})
μ=θxn+1/(θxn+1+(1−θ)yn+1),该结果是一个凸组合。再考虑反透视映射
p
−
1
(
c
)
=
{
(
x
,
t
)
∈
R
n
+
1
∣
x
/
t
∈
c
,
t
>
0
}
p^{-1}(\bm{c}) = \{(\bm{x}, t)\in \bm{R}^{n+1}|\bm{x}/t \in \bm{c}, t>0\}
p−1(c)={(x,t)∈Rn+1∣x/t∈c,t>0},其亦是凸集。
考虑仿射函数
g
(
x
)
=
(
A
,
c
T
)
T
x
+
(
b
,
d
)
T
,
A
∈
R
m
×
n
,
C
∈
R
n
,
b
∈
R
m
,
d
∈
R
g(\bm{x}) = (\bm{A}, \bm{c}^T)^T\bm{x} + (\bm{b}, d)^T, \bm{A}\in\bm{R}^{m×n}, \bm{C}\in\bm{R}^{n}, \bm{b}\in\bm{R}^{m}, d\in R
g(x)=(A,cT)Tx+(b,d)T,A∈Rm×n,C∈Rn,b∈Rm,d∈R,与透视函数
p
:
R
m
+
1
→
R
m
p:\bm{R}^{m+1}\rightarrow \bm{R}^{m}
p:Rm+1→Rm,则定义线性分式函数
f
:
p
g
f:pg
f:pg,即
f
(
x
)
=
(
A
x
+
b
)
/
(
C
T
x
+
d
)
,
d
o
m
f
=
{
x
∣
C
T
x
+
d
>
0
}
f(\bm{x}) = (\bm{A}\bm{x} + \bm{b})/(\bm{C}^T\bm{x} + d), dom\ f=\{\bm{x}|\bm{C}^T\bm{x} + d>0\}
f(x)=(Ax+b)/(CTx+d),dom f={x∣CTx+d>0},任意凸集的线性分式函数仍是凸集。考虑两个随机变量联合概率的条件概率,其中
u
=
{
1
,
.
.
.
,
n
}
,
v
=
{
1
,
.
.
.
,
m
}
u = \{1, ..., n\}, v = \{1, ..., m\}
u={1,...,n},v={1,...,m},则联合概率
p
i
j
=
P
(
u
=
i
,
v
=
j
)
p_{ij} = P(u = i, v = j)
pij=P(u=i,v=j),以及条件概
f
i
∣
j
=
P
(
u
=
i
∣
v
=
j
)
f_{i|j} = P(u = i|v = j)
fi∣j=P(u=i∣v=j),则
f
i
∣
j
=
p
i
j
/
∑
p
k
∣
j
f_{i|j} = p_{ij}/\sum{p_{k|j}}
fi∣j=pij/∑pk∣j该式是一个线性分式映射。
五、凸函数
定义函数
f
:
R
n
→
R
f:\bm{R}^n\rightarrow R
f:Rn→R,若
d
o
m
f
dom f
domf是凸集,且对于任意
x
,
y
∈
d
o
m
f
\bm{x}, \bm{y} \in dom f
x,y∈domf与
0
≤
θ
≤
1
0 \le θ \le 1
0≤θ≤1,都有
f
(
θ
x
+
(
1
−
θ
)
y
)
≤
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
f(θ\bm{x} + (1-θ)\bm{y}) \le θf(\bm{x})+(1-θ)f(\bm{y})
f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)则称函数
f
f
f是凸函数。若该式在
x
≠
y
,
0
<
θ
<
1
\bm{x} \ne \bm{y}, 0 < θ < 1
x=y,0<θ<1时成立,则称
f
f
f严格凸。若
f
f
f是凸的,则
−
f
-f
−f是凹的。
对于任意的凸函数
f
f
f,考虑在
d
o
m
f
dom\ f
dom f上过点
x
\bm{x}
x的直线
x
+
t
v
\bm{x} + t\bm{v}
x+tv,则
g
(
t
)
=
f
(
x
+
t
v
)
g(t) = f(\bm{x} + t\bm{v})
g(t)=f(x+tv)是凸的。这有助于将凸函数限制在直线上判断凸性。
对于任意的凸函数
f
f
f,可以拓展为
g
(
x
)
=
f
(
x
)
,
x
∈
d
o
m
f
=
∞
,
x
∉
d
o
m
f
\begin{aligned} g(\bm{x}) &= f(\bm{x}), \bm{x} \in dom\ f \\ &= \infty, \bm{x} \notin dom\ f \end{aligned}
g(x)=f(x),x∈dom f=∞,x∈/dom f拓展后的
g
g
g仍是一个凸函数。
考虑凸函数的一阶条件。若函数
f
:
R
n
→
R
f:\bm{R}^n\rightarrow R
f:Rn→R可微,即梯度
▽
f
▽f
▽f在
d
o
m
f
dom\ f
dom f上均存在,则
f
f
f为凸函数等价于
d
o
m
f
dom\ f
dom f为凸且
f
(
y
)
≥
f
(
x
)
+
▽
f
T
(
x
)
(
y
−
x
)
,
∀
x
,
y
∈
d
o
m
f
f(\bm{y}) \ge f(\bm{x}) + ▽f^T(\bm{x})(\bm{y} - \bm{x}), \forall \bm{x}, \bm{y} \in dom\ f
f(y)≥f(x)+▽fT(x)(y−x),∀x,y∈dom f这是一条重要的性质,考虑存在
▽
f
T
(
x
)
=
0
▽f^T(\bm{x}) = \bm{0}
▽fT(x)=0的情况,则上述式为
f
(
y
)
≥
f
(
x
)
,
∀
x
,
y
∈
d
o
m
f
f(\bm{y}) \ge f(\bm{x}), \forall \bm{x}, \bm{y} \in dom\ f
f(y)≥f(x),∀x,y∈dom f,这是凸优化的重要思想。
考虑凸函数的二阶条件。若函数
f
:
R
n
→
R
f:\bm{R}^n\rightarrow R
f:Rn→R二阶可微,则
f
f
f为凸函数等价于
d
o
m
f
dom\ f
dom f为凸且
▽
f
2
(
x
)
⪰
0
,
∀
x
∈
d
o
m
f
▽f^2(\bm{x}) \succeq 0, \forall \bm{x}\in dom\ f
▽f2(x)⪰0,∀x∈dom f其中
▽
f
2
(
x
)
▽f^2(\bm{x})
▽f2(x)是海森【Hession】矩阵。
考虑二次函数
f
:
R
n
→
R
f:\bm{R}^n\rightarrow R
f:Rn→R,形如
f
(
x
)
=
x
T
P
x
/
2
+
q
T
x
+
r
,
P
∈
S
n
,
q
∈
R
n
,
r
∈
R
f(\bm{x}) = \bm{x}^T\bm{Px}/2 + \bm{q}^T\bm{x} + r, \bm{P} \in \bm{S}^n, \bm{q} \in \bm{R}^n, r \in R
f(x)=xTPx/2+qTx+r,P∈Sn,q∈Rn,r∈R考察其凸性,只需考察其海森矩阵
▽
f
2
(
x
)
=
P
▽f^2(\bm{x}) = \bm{P}
▽f2(x)=P。
考虑仿射函数
f
(
x
)
=
A
x
+
b
f(\bm{x}) = \bm{A}\bm{x} + \bm{b}
f(x)=Ax+b,其海森矩阵
▽
f
2
(
x
)
=
0
▽f^2(\bm{x}) = \bm{0}
▽f2(x)=0,即凸又凹。
考虑指数函数
f
(
x
)
=
e
a
x
f(\bm{x}) = e^{a\bm{x}}
f(x)=eax,其海森矩阵
▽
f
2
(
x
)
=
a
2
e
a
x
⪰
0
▽f^2(\bm{x}) = a^2e^{a\bm{x}} \succeq 0
▽f2(x)=a2eax⪰0,为凸。
考虑幂函数
f
(
x
)
=
x
a
,
x
∈
R
+
+
f(\bm{x}) = \bm{x}^a, x \in R_{++}
f(x)=xa,x∈R++,其海森矩阵
▽
f
2
(
x
)
=
a
(
a
−
1
)
x
a
−
2
▽f^2(\bm{x}) = a(a-1)\bm{x}^{a-2}
▽f2(x)=a(a−1)xa−2,当
0
≤
a
≤
1
0 \le a \le 1
0≤a≤1,为凸。
考虑负熵
f
(
x
)
=
x
l
o
g
x
,
x
∈
R
+
+
f(x) = xlogx, x \in R_{++}
f(x)=xlogx,x∈R++,其二阶导数为
1
/
x
1/x
1/x,是严格凸的函数。
考虑范数
p
(
x
)
,
x
∈
R
n
p(\bm{x}),\bm{x} \in \bm{R}^n
p(x),x∈Rn满足
p
(
a
x
)
=
∣
a
∣
p
(
x
)
p
(
x
+
y
)
≤
p
(
x
)
+
p
(
y
)
p
(
x
)
=
0
,
x
=
0
p(a\bm{x}) = |a|p(\bm{x}) \\ p(\bm{x} + \bm{y}) \le p(\bm{x}) + p(\bm{y}) \\ p(\bm{x}) = 0, \bm{x} = \bm{0}
p(ax)=∣a∣p(x)p(x+y)≤p(x)+p(y)p(x)=0,x=0考察范数的凸性。
∀
x
,
y
∈
R
n
,
∀
θ
∈
[
0
,
1
]
\forall \bm{x}, \bm{y} \in \bm{R}^n, \forall\theta \in[0, 1]
∀x,y∈Rn,∀θ∈[0,1],有
p
(
θ
x
+
(
1
−
θ
)
y
)
≤
p
(
θ
x
)
+
p
(
(
1
−
θ
)
y
)
=
θ
p
(
x
)
+
(
1
−
θ
)
p
(
y
)
\begin{aligned} p(\theta\bm{x} + (1 - \theta)\bm{y}) &\le p(\theta\bm{x}) + p((1 - \theta)\bm{y}) \\ &= \theta p(\bm{x}) + (1 - \theta)p(\bm{y}) \end{aligned}
p(θx+(1−θ)y)≤p(θx)+p((1−θ)y)=θp(x)+(1−θ)p(y)即范数为凸。而考虑0范数
∣
∣
x
∣
∣
0
=
n
u
m
{
x
∣
x
i
≠
0
}
||\bm{x}||_0 = num\{\bm{x}|x_{i} \ne 0\}
∣∣x∣∣0=num{x∣xi=0}0范数不是范数,也非凸。
考虑极大值函数
f
(
x
)
=
m
a
x
{
x
1
,
.
.
.
,
x
n
}
,
x
∈
R
n
f(\bm{x}) = max\{x_1, ..., x_n\}, \bm{x} \in \bm{R}^n
f(x)=max{x1,...,xn},x∈Rn,
∀
x
,
y
∈
R
n
,
∀
θ
∈
[
0
,
1
]
\forall \bm{x}, \bm{y} \in \bm{R}^n, \forall\theta \in[0, 1]
∀x,y∈Rn,∀θ∈[0,1],有
f
(
θ
x
+
(
1
−
θ
)
y
)
=
m
a
x
{
θ
x
i
+
(
1
−
θ
)
y
i
,
i
=
1
,
.
.
.
,
n
}
≤
θ
m
a
x
{
x
i
}
+
(
1
−
θ
)
m
a
x
{
y
i
}
,
i
=
1
,
.
.
.
,
n
\begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &= max\{\theta x_i + (1 - \theta)y_i, i = 1, ..., n\} \\ &\le \theta max\{x_i\} + (1 - \theta) max\{y_i\}, i = 1, ..., n\end{aligned}
f(θx+(1−θ)y)=max{θxi+(1−θ)yi,i=1,...,n}≤θmax{xi}+(1−θ)max{yi},i=1,...,n即极大值函数为凸。极大值函数不可导,为了解决该问题,使用解析逼近解决该问题,形如
f
(
x
)
=
l
o
g
(
e
x
1
+
.
.
.
+
e
x
n
)
,
x
∈
R
n
m
a
x
{
x
i
}
≤
f
(
x
)
≤
m
a
x
{
x
i
}
+
l
o
g
n
f(\bm{x}) = log(e^{x_1} + ... + e^{x_n}), \bm{x} \in \bm{R}^n \\ max\{x_i\} \le f(\bm{x}) \le max\{x_i\} + logn
f(x)=log(ex1+...+exn),x∈Rnmax{xi}≤f(x)≤max{xi}+logn其海森矩阵为
∂
f
/
∂
x
i
=
e
x
i
/
∑
e
x
i
H
i
j
=
∂
f
2
/
∂
x
i
∂
x
j
∂
f
2
/
∂
x
i
∂
x
j
=
−
e
x
i
e
x
j
/
(
∑
e
x
i
)
2
,
i
≠
j
∂
f
2
/
∂
x
i
∂
x
j
=
(
−
e
x
i
e
x
i
+
e
x
i
∑
e
x
i
)
/
(
∑
e
x
i
)
2
,
i
=
j
∂f/∂x_i = e^{x_i}/\sum{e^{x_i}} \\ \bm{H}_{ij} = ∂f^2/∂x_i∂x_j \\ ∂f^2/∂x_i∂x_j = -e^{x_i}e^{x_j}/(\sum{e^{x_i}})^2,i \ne j \\ ∂f^2/∂x_i∂x_j = (-e^{x_i}e^{x_i} + e^{x_i}\sum{e^{x_i}})/(\sum{e^{x_i}})^2, i = j
∂f/∂xi=exi/∑exiHij=∂f2/∂xi∂xj∂f2/∂xi∂xj=−exiexj/(∑exi)2,i=j∂f2/∂xi∂xj=(−exiexi+exi∑exi)/(∑exi)2,i=j则有
H
=
1
/
(
∑
e
x
i
)
2
[
d
i
a
g
(
e
x
i
∑
e
x
i
)
−
(
e
x
1
,
.
.
.
,
e
x
n
)
T
(
e
x
1
,
.
.
.
,
e
x
n
)
]
\bm{H} = 1/(\sum{e^{x_i}})^2[diag(e^{x_i}\sum{e^{x_i}}) - (e^{x_1}, ..., e^{x_n})^T(e^{x_1}, ..., e^{x_n})]
H=1/(∑exi)2[diag(exi∑exi)−(ex1,...,exn)T(ex1,...,exn)]考察
H
\bm{H}
H的半正定性,即
∀
V
∈
R
n
,
V
T
H
V
≥
0
\forall \bm{V} \in \bm{R}^n, \bm{V}^T\bm{H}\bm{V} \ge 0
∀V∈Rn,VTHV≥0,取
z
=
(
e
x
1
,
.
.
.
,
e
x
n
)
\bm{z} = (e^{x_1}, ..., e^{x_n})
z=(ex1,...,exn)不考虑正数系数,有
V
T
H
V
=
k
+
+
[
(
1
T
z
)
V
T
d
i
a
g
(
z
)
V
−
V
T
z
z
T
V
]
=
k
+
+
[
∑
z
i
∑
v
i
2
z
i
−
(
∑
v
i
z
i
)
2
]
\begin{aligned}\bm{V}^T\bm{H}\bm{V} &= k_{++}[(\bm{1}^T\bm{z})\bm{V}^Tdiag(\bm{z})\bm{V} - \bm{V}^T\bm{z}\bm{z}^T\bm{V}] \\&= k_{++}[\sum z_i\sum v_i^2z_i - (\sum v_iz_i)^2] \end{aligned}
VTHV=k++[(1Tz)VTdiag(z)V−VTzzTV]=k++[∑zi∑vi2zi−(∑vizi)2]取
a
i
=
v
i
(
z
i
)
1
/
2
,
b
i
=
z
i
1
/
2
a_i = v_i(z_i)^{1/2}, b_i = z_i^{1/2}
ai=vi(zi)1/2,bi=zi1/2,有
V
T
H
V
=
k
+
+
[
∑
z
i
∑
v
i
2
z
i
−
(
∑
v
i
z
i
)
2
]
=
k
+
+
[
b
T
b
a
T
a
−
(
a
T
b
)
2
]
\begin{aligned}\bm{V}^T\bm{H}\bm{V} &= k_{++}[\sum z_i\sum v_i^2z_i - (\sum v_iz_i)^2] \\&= k_{++}[\bm{b}^T\bm{b}\bm{a}^T\bm{a} - (\bm{a}^T\bm{b})^2] \end{aligned}
VTHV=k++[∑zi∑vi2zi−(∑vizi)2]=k++[bTbaTa−(aTb)2]由柯西施瓦茨【Cauchy-Schwarz】不等式,该式非负,即极大值解析函数为凸。
六、保凸函数
若
f
1
,
.
.
.
f
m
f_1, ...f_m
f1,...fm是凸函数,则其非负加权和,即
f
=
∑
w
i
f
i
,
w
i
≥
0
f = \sum w_if_i, w_i \ge 0
f=∑wifi,wi≥0为凸。推广到连续情况,若
f
(
x
,
y
)
f(x, y)
f(x,y)对于任何
y
∈
A
y \in A
y∈A均为凸,设
w
(
y
)
≥
0
w(y) \ge 0
w(y)≥0,则
g
(
x
)
=
∫
y
∈
A
w
(
y
)
f
(
x
,
y
)
d
y
g(x) = \int_{y\in A}w(y)f(x, y)dy
g(x)=∫y∈Aw(y)f(x,y)dy为凸。
考虑
f
:
R
n
→
R
,
A
∈
R
n
×
m
,
b
∈
R
n
f:\bm{R}^n \rightarrow R, \bm{A} \in \bm{R}^{n×m}, \bm{b} \in \bm{R}^n
f:Rn→R,A∈Rn×m,b∈Rn,定义函数
g
(
x
)
=
f
(
A
x
+
b
)
g(\bm{x}) = f(\bm{A}\bm{x} + \bm{b})
g(x)=f(Ax+b),若
f
f
f为凸,则
g
g
g为凸。考虑
∀
x
,
y
∈
d
o
m
g
,
0
≤
θ
≤
1
\forall \bm{x}, \bm{y} \in dom\ g, 0 \le \theta \le 1
∀x,y∈dom g,0≤θ≤1,有
g
(
θ
x
+
(
1
−
θ
)
y
)
=
f
(
θ
A
x
+
(
1
−
θ
)
A
y
+
b
)
=
f
(
θ
(
A
x
+
b
)
+
(
1
−
θ
)
(
A
y
+
b
)
)
≤
θ
f
(
A
x
+
b
)
+
(
1
−
θ
)
f
(
A
y
+
b
)
=
θ
g
(
x
)
+
(
1
−
θ
)
g
(
y
)
\begin{aligned} g(\theta\bm{x} + (1 - \theta)\bm{y}) &= f(\theta\bm{A}\bm{x} + (1 - \theta)\bm{A}\bm{y} + \bm{b}) \\&= f(\theta(\bm{A}\bm{x} + \bm{b}) + (1 - \theta)(\bm{A}\bm{y} + \bm{b})) \\ &\le\theta f(\bm{A}\bm{x} + \bm{b}) + (1 - \theta)f(\bm{A}\bm{y} + \bm{b}) \\&= \theta g(\bm{x})+(1 - \theta)g(\bm{y}) \end{aligned}
g(θx+(1−θ)y)=f(θAx+(1−θ)Ay+b)=f(θ(Ax+b)+(1−θ)(Ay+b))≤θf(Ax+b)+(1−θ)f(Ay+b)=θg(x)+(1−θ)g(y)该问题先仿射,再映射;再考虑映射后仿射,即
f
i
:
R
n
→
R
,
A
∈
R
n
,
b
∈
R
f_i:\bm{R}^n \rightarrow R, \bm{A} \in \bm{R}^n, b \in R
fi:Rn→R,A∈Rn,b∈R,定义函数
g
(
x
)
=
A
(
f
1
(
x
)
,
.
.
.
,
f
n
(
x
)
)
T
+
b
g(\bm{x}) = \bm{A}(f_1(\bm{x}), ..., f_n(\bm{x}))^T+b
g(x)=A(f1(x),...,fn(x))T+b,若
A
\bm{A}
A均为正,则该式是一个非负加权和。
考虑两个函数的极大值函数,
f
1
f_1
f1与
f
2
f_2
f2为凸,则
f
(
x
)
=
m
a
x
{
f
1
(
x
)
,
f
2
(
x
)
}
f(x) = max\{f_1(x), f_2(x)\}
f(x)=max{f1(x),f2(x)}为凸。考虑
∀
x
,
y
∈
d
o
m
f
,
0
≤
θ
≤
1
\forall \bm{x}, \bm{y} \in dom\ f, 0 \le \theta \le 1
∀x,y∈dom f,0≤θ≤1,有
f
(
θ
x
+
(
1
−
θ
)
y
)
=
m
a
x
{
f
1
(
θ
x
+
(
1
−
θ
)
y
)
,
f
2
(
θ
x
+
(
1
−
θ
)
y
)
}
≤
m
a
x
{
θ
f
1
(
x
)
+
(
1
−
θ
)
f
1
(
y
)
,
θ
f
2
(
x
)
+
(
1
−
θ
)
f
2
(
y
)
}
≤
m
a
x
{
θ
f
1
(
x
)
,
θ
f
2
(
x
)
}
+
m
a
x
{
(
1
−
θ
)
f
1
(
y
)
,
(
1
−
θ
)
f
2
(
y
)
}
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
\begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &= max\{f_1(\theta\bm{x} + (1 - \theta)\bm{y}), f_2(\theta\bm{x} + (1 - \theta)\bm{y})\} \\&\le max\{\theta f_1(\bm{x}) + (1 - \theta)f_1(\bm{y}), \theta f_2(\bm{x}) + (1 - \theta)f_2(\bm{y})\} \\&\le max\{\theta f_1(\bm{x}), \theta f_2(\bm{x})\} + max\{(1 - \theta)f_1(\bm{y}), (1 - \theta)f_2(\bm{y})\} \\&= \theta f(\bm{x}) + (1 - \theta)f(\bm{y}) \end{aligned}
f(θx+(1−θ)y)=max{f1(θx+(1−θ)y),f2(θx+(1−θ)y)}≤max{θf1(x)+(1−θ)f1(y),θf2(x)+(1−θ)f2(y)}≤max{θf1(x),θf2(x)}+max{(1−θ)f1(y),(1−θ)f2(y)}=θf(x)+(1−θ)f(y) 考虑函数的组合,定义
h
:
R
k
→
R
,
g
:
R
n
→
R
k
h:\bm{R}^k \rightarrow R, g:\bm{R}^n\rightarrow\bm{R}^k
h:Rk→R,g:Rn→Rk,则其函数组合为
f
=
h
g
:
R
n
→
R
f=hg:\bm{R}^n\rightarrow R
f=hg:Rn→R,其定义域
d
o
m
f
=
{
x
∈
d
o
m
g
∣
g
(
x
)
∈
d
o
m
h
}
dom\ f = \{x\in dom\ g|g(x) \in dom\ h\}
dom f={x∈dom g∣g(x)∈dom h}。考察定义在
R
R
R上一维二阶可微函数的凸性,即
f
(
x
)
=
h
(
g
(
x
)
)
f(x) = h(g(x))
f(x)=h(g(x))的二阶导数
d
f
(
x
)
2
/
d
2
x
=
d
h
2
(
g
(
x
)
)
/
d
2
g
(
x
)
⋅
(
d
g
(
x
)
/
d
x
)
2
+
d
h
(
g
(
x
)
)
/
d
g
(
x
)
⋅
d
g
(
x
)
2
/
d
2
x
df(x)^2/d^2x = dh^2(g(x))/d^2g(x)·(dg(x)/dx)^2 + dh(g(x))/dg(x)·dg(x)^2/d^2x
df(x)2/d2x=dh2(g(x))/d2g(x)⋅(dg(x)/dx)2+dh(g(x))/dg(x)⋅dg(x)2/d2x则有
h
h
h为凸且单调不减,而
g
g
g为凸函数时,
f
f
f为凸;或
h
h
h为凸且单调不增,而
g
g
g为凹函数时,
f
f
f为凸。再考虑复杂情况,即高维、非实数全空间定义或二阶不可微时,分别使用海森矩阵、扩展函数与原始定义来解决。
定义函数
f
:
R
n
→
R
,
g
:
R
n
×
R
+
+
→
R
f:\bm{R}^n \rightarrow R, g:\bm{R}^n × R_{++}\rightarrow R
f:Rn→R,g:Rn×R++→R,其中
g
(
x
,
t
)
=
t
f
(
x
/
t
)
g(\bm{x},t) = tf(\bm{x}/t)
g(x,t)=tf(x/t)其中
d
o
m
g
=
{
(
x
,
t
)
∣
t
>
0
,
x
/
t
∈
d
o
m
f
}
dom\ g = \{(\bm{x}, t)|t >0, \bm{x}/t \in dom\ f\}
dom g={(x,t)∣t>0,x/t∈dom f}。那么若
f
f
f为凸,则
g
g
g为凸。
考虑负对数
f
(
x
)
=
−
l
o
g
x
f(x) = -logx
f(x)=−logx,其是一个凸函数,而其透视
g
(
x
,
t
)
=
t
l
o
g
(
t
/
x
)
g(x, t) = tlog(t/x)
g(x,t)=tlog(t/x)也是凸的。再考虑
u
,
v
∈
R
+
+
n
\bm{u}, \bm{v} \in \bm{R}_{++}^n
u,v∈R++n,那么
g
(
u
,
v
)
=
∑
u
i
l
o
v
g
(
u
i
/
v
i
)
g(\bm{u}, \bm{v}) = \sum u_ilovg(u_i/v_i)
g(u,v)=∑uilovg(ui/vi)也是凸的,其是凸函数的和。再考虑
D
K
L
(
u
,
v
)
=
∑
(
u
i
l
o
g
(
u
i
/
v
i
)
−
u
i
+
v
i
)
D_{KL}(\bm{u}, \bm{v}) = \sum (u_ilog(u_i/v_i)-u_i + v_i)
DKL(u,v)=∑(uilog(ui/vi)−ui+vi)称为KL散度,其是一个凸函数,并且是一种Bregman散度。考虑函数
f
:
R
→
R
f:R \rightarrow R
f:R→R为凸,则其Bregman散度为
D
B
(
u
,
v
)
=
f
(
u
)
−
f
(
v
)
−
▽
f
(
v
)
(
u
−
v
)
D_B(u, v) = f(u) - f(v) - ▽f(v)(u-v)
DB(u,v)=f(u)−f(v)−▽f(v)(u−v)当取
f
(
u
)
=
∑
u
i
l
o
g
u
i
−
∑
u
i
f(\bm{u}) = \sum u_ilogu_i - \sum u_i
f(u)=∑uilogui−∑ui时,其退化为KL散度,因为Bregman不保凸。
七、拟凸函数
考虑函数
f
:
R
n
→
R
f:\bm{R}^n \rightarrow R
f:Rn→R,其α下水平集【α-sublevel set】定义为
C
α
=
{
x
∈
d
o
m
f
∣
f
(
x
)
≤
α
}
C_\alpha=\{\bm{x} \in dom\ f|f(\bm{x}) \le \alpha\}
Cα={x∈dom f∣f(x)≤α}凸函数的所有下水平集都是凸集,对于
∀
x
,
y
∈
C
α
,
f
(
x
)
≤
α
,
f
(
y
)
≤
α
\forall \bm{x}, \bm{y} \in C_\alpha, f(\bm{x}) \le \alpha, f(\bm{y}) \le \alpha
∀x,y∈Cα,f(x)≤α,f(y)≤α,有
f
(
θ
x
+
(
1
−
θ
)
y
)
≤
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
≤
α
\begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &\le \theta f(\bm{x}) + (1 - \theta)f(\bm{y}) \\&\le \alpha \end{aligned}
f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)≤α即对任意的
α
\alpha
α都满足。但该性质反之则不成立。
考虑下水平集的意义,对于凸函数
f
:
R
2
→
R
f:\bm{R}^2 \rightarrow R
f:R2→R, 将函数空间投影到几何平面时,当
α
\alpha
α增大,其下水平集投影是单调不减的凸集,推广到高维亦然。
而对于这样的函数,其不是凸函数,但其下水平集是凸集,称为拟凸函数。若一个函数是凸函数,则其一定是一个拟凸函数,但反之不成立,拟凸函数甚至可能是一个凹函数。拟凸函数也称单模态函数,一般来讲,凸优化算法亦适用于拟凸函数。拟凸函数可以用数字语言定义,形如
m
a
x
{
f
(
x
)
,
f
(
y
)
}
≥
f
(
θ
x
+
(
1
−
θ
)
y
)
max\{f(\bm{x}), f(\bm{y})\} \ge f(\theta\bm{x} + (1 - \theta)\bm{y})
max{f(x),f(y)}≥f(θx+(1−θ)y)则称
f
f
f为拟凸函数。
对于一个拟凸函数
f
f
f,若其一阶可微,则有若
f
(
y
)
≤
f
(
x
)
f(\bm{y}) \le f(\bm{x})
f(y)≤f(x),则
▽
f
T
(
x
)
(
y
−
x
)
≤
0
▽f^T(\bm{x})(\bm{y} - \bm{x}) \le 0
▽fT(x)(y−x)≤0。
对于一个拟凸函数
f
f
f,若其二阶可微,则有若
y
T
▽
f
≥
0
\bm{y}^T▽f \ge 0
yT▽f≥0,则
y
T
▽
2
f
y
≥
0
\bm{y}^T▽^2f\bm{y} \ge 0
yT▽2fy≥0。