证明:f(X)=ln(ex1+ex2+…+exn)是凸函数.
方法一:使用定义证明
设X,Y是Rn上的两个向量,0<=a<=1
f
(
a
X
+
(
1
−
a
)
Y
)
=
ln
(
e
a
x
1
+
(
1
−
a
)
y
1
+
e
a
x
2
+
(
1
−
a
)
y
2
+
⋯
+
e
a
x
n
+
(
1
−
a
)
y
n
)
=
ln
(
e
a
x
1
⋅
e
(
1
−
a
)
y
1
+
e
a
x
2
⋅
e
(
1
−
a
)
y
2
+
⋯
+
e
a
x
n
⋅
e
(
1
−
a
)
y
n
)
≤
ln
(
(
e
x
1
+
e
x
2
+
⋯
+
e
x
n
)
a
×
(
e
y
1
+
e
y
2
+
⋯
+
e
y
n
)
1
−
a
)
=
a
f
(
X
)
+
(
1
−
a
)
f
(
Y
)
\begin{aligned} f(aX+(1-a)Y) &= \ln(e^{ax1+(1-a)y1}+e^{ax2+(1-a)y2}+\cdots+e^{axn+(1-a)yn})\\ &= \ln(e^{ax1}\cdot e^{(1-a)y1}+e^{ax2}\cdot e^{(1-a)y2}+\cdots+e^{axn}\cdot e^{(1-a)yn})\\ &\le \ln( (e^{x1}+e^{x2}+\cdots+e^{xn})^a\times(e^{y1}+e^{y2}+\cdots+e^{yn})^{1-a}) \\ &= af(X)+(1-a)f(Y) \end{aligned}
f(aX+(1−a)Y)=ln(eax1+(1−a)y1+eax2+(1−a)y2+⋯+eaxn+(1−a)yn)=ln(eax1⋅e(1−a)y1+eax2⋅e(1−a)y2+⋯+eaxn⋅e(1−a)yn)≤ln((ex1+ex2+⋯+exn)a×(ey1+ey2+⋯+eyn)1−a)=af(X)+(1−a)f(Y)
该不等式是由HÖlder不等式得到的。
HÖlder不等式:
X
T
Y
<
=
∥
X
∥
q
∥
Y
∥
p
,
1
/
q
+
1
/
p
=
1
X^TY<=\|X\|_q\|Y\|_p, 1/q+1/p=1
XTY<=∥X∥q∥Y∥p,1/q+1/p=1
令:
- X=(eax1,eax2,…,eaxn)T,
- Y = (e(1-a)y1,e(1-a)y2,…,e(1-a)yn)T,
- q=1/a,
- p=1/(1-a)
即可得到上述证明中的不等号左右两端。
方法二:使用凸函数的二阶充要条件证明
设g(X)=ex1+ex2+…+exn, Z = (ex1,ex2,…,exn)T, Y = (y1, y2, … , yn)T
∇
f
(
X
)
=
(
e
x
1
,
e
x
2
,
⋯
,
e
x
n
)
T
/
g
(
X
)
=
Z
/
g
(
X
)
∇
2
f
(
X
)
=
∂
2
f
(
X
)
∂
X
∂
X
T
=
∂
Z
/
g
(
X
)
∂
X
T
=
(
(
∇
Z
)
T
g
(
X
)
−
Z
∇
T
g
(
X
)
)
/
g
2
(
X
)
=
{
d
i
a
g
[
Z
]
g
(
X
)
−
Z
Z
T
}
/
g
2
(
X
)
Y
T
∇
2
f
(
X
)
Y
=
1
g
2
(
X
)
(
Y
T
d
i
a
g
[
Z
]
Y
g
(
X
)
−
Y
T
(
Z
Z
T
)
Y
)
=
1
g
2
(
X
)
(
[
(
e
x
1
+
e
x
2
+
.
.
.
+
e
x
n
)
(
∑
i
=
1
n
z
i
y
i
2
)
]
−
(
∑
i
=
1
n
z
i
y
i
)
2
=
1
g
2
(
X
)
(
∑
i
=
1
n
z
i
∑
i
=
1
n
z
i
y
i
2
−
(
∑
i
=
1
n
z
i
z
i
y
i
)
2
)
≥
0
∇
2
f
(
X
)
⪰
0
\begin{aligned} \nabla f(X) &= (e^{x1},e^{x2},\cdots,e^{xn})^T/g(X) = Z/g(X)\\ \nabla^2f(X) &= \frac{\partial ^2f(X)}{\partial X \partial X^T}\\ &= \frac{\partial Z/g(X)}{\partial X^T} \\ &= ((\nabla Z)^Tg(X)-Z \nabla ^Tg(X))/g^2(X) \\ &= \{diag[Z]g(X)-Z Z^T\}/g^2(X) \\ Y^T\nabla^2f(X)Y &= \frac{1}{g^2(X)}(Y^Tdiag[Z]Yg(X)-Y^T(ZZ^T)Y)\\ &= \frac{1}{g^2(X)}([(e^{x1}+e^{x2}+...+e^{xn})(\sum_{i=1}^nz_iy_i^2) ]- (\sum_{i=1}^nz_iy_i)^2\\ &= \frac{1}{g^2(X)}(\sum_{i=1}^nz_i\sum_{i=1}^nz_iy_i^2- (\sum_{i=1}^n \sqrt{z_i} \sqrt{z_i}y_i)^2)\\ &\ge0\\ \nabla^2f(X) &\succeq 0 \end{aligned}
∇f(X)∇2f(X)YT∇2f(X)Y∇2f(X)=(ex1,ex2,⋯,exn)T/g(X)=Z/g(X)=∂X∂XT∂2f(X)=∂XT∂Z/g(X)=((∇Z)Tg(X)−Z∇Tg(X))/g2(X)={diag[Z]g(X)−ZZT}/g2(X)=g2(X)1(YTdiag[Z]Yg(X)−YT(ZZT)Y)=g2(X)1([(ex1+ex2+...+exn)(i=1∑nziyi2)]−(i=1∑nziyi)2=g2(X)1(i=1∑nzii=1∑nziyi2−(i=1∑nziziyi)2)≥0⪰0
所以f(X)是凸函数。
这里的不等式,是由柯西不等式得到的。
∥
X
T
Y
∥
≤
∥
X
∥
∥
Y
∥
\| X^TY\|\le \|X\|\|Y\|
∥XTY∥≤∥X∥∥Y∥
令:
- X = ( z 1 , z 2 , ⋯ , z n ) T X = (\sqrt{z_1},\sqrt{z_2},\cdots,\sqrt{z_n})^T X=(z1,z2,⋯,zn)T
- Y = ( z 1 y 1 , z 2 y 2 , ⋯ , z n y n ) T Y = (\sqrt{z_1}y_1,\sqrt{z_2}y_2,\cdots,\sqrt{z_n}y_n)^T Y=(z1y1,z2y2,⋯,znyn)T
- ∥ X T Y ∥ ) 2 ≤ ∥ X ∥ 2 ∥ Y ∥ 2 \| X^TY\|)^2\le \|X\|^2\|Y\|^2 ∥XTY∥)2≤∥X∥2∥Y∥2