全部笔记的汇总贴(视频也有传送门):中科大-凸优化
一、log-sum-exp(解析逼近)
f
(
x
)
=
log
(
e
x
1
+
⋯
+
e
x
n
)
x
∈
R
n
f(x)=\log(e^{x_1}+\cdots+e^{x_n})\;\;\;\;x\in\R^n
f(x)=log(ex1+⋯+exn)x∈Rn
max
{
x
1
,
⋯
,
x
n
}
≤
f
(
x
)
≤
max
{
x
1
+
⋯
+
x
n
}
+
log
n
\max\{x_1,\cdots,x_n\}\le f(x)\le\max\{x_1+\cdots+x_n\}+\log n
max{x1,⋯,xn}≤f(x)≤max{x1+⋯+xn}+logn
∂ f ∂ x i = e x i e x 1 + ⋯ + e x n , H = [ H i j ] \frac{\partial f}{\partial x_i}=\frac{e^{x_i}}{e^{x_1}+\cdots+e^{x_n}},\;\;H=\Big[\;H_{ij}\;\Big] ∂xi∂f=ex1+⋯+exnexi,H=[Hij]
当
i
≠
j
i\neq j
i=j时,
∂
2
f
∂
x
i
∂
y
i
=
−
e
x
i
e
x
j
(
e
x
1
+
⋯
+
e
e
n
)
2
=
−
e
x
i
e
x
j
(
1
∣
∣
z
∣
∣
)
2
\frac{\partial^2 f}{\partial x_i \partial y_i}=\frac{-e^{x_i}e^{x_j}}{(e^{x_1}+\cdots+e^{e_n})^2}=\frac{-e^{x_i}e^{x_j}}{(1||z||)^2}
∂xi∂yi∂2f=(ex1+⋯+een)2−exiexj=(1∣∣z∣∣)2−exiexj
当
i
=
j
i=j
i=j时,
∂
2
f
∂
x
i
2
=
−
e
x
i
e
x
i
+
e
x
i
(
e
x
1
+
⋯
+
e
x
n
)
(
e
x
1
+
⋯
+
e
e
n
)
2
=
−
e
x
i
e
x
i
+
e
x
i
1
T
z
(
1
∣
∣
z
∣
∣
)
2
\frac{\partial^2 f}{\partial x_i^2}=\frac{-e^{x_i}e^{x_i}+e^{x_i}(e^{x_1}+\cdots+e^{x_n})}{(e^{x_1}+\cdots+e^{e_n})^2}=\frac{-e^{x_i}e^{x_i}+e^{x_i}1^Tz}{(1||z||)^2}
∂xi2∂2f=(ex1+⋯+een)2−exiexi+exi(ex1+⋯+exn)=(1∣∣z∣∣)2−exiexi+exi1Tz
其中
z
=
[
e
x
1
,
⋯
,
e
x
n
]
T
z=[e^{x_1},\cdots,e^{x_n}]^T
z=[ex1,⋯,exn]T
H
=
1
(
1
T
z
)
2
⏟
>
0
(
(
1
T
z
)
d
i
a
g
{
z
}
−
z
z
T
)
⏟
K
∈
R
n
∗
n
H=\underset{>0}{\underbrace{\frac1{(1^Tz)^2}}}\underset{K\in\R^{n*n}}{\underbrace{((1^Tz)diag\{z\}-zz^T)}}
H=>0
(1Tz)21K∈Rn∗n
((1Tz)diag{z}−zzT)
∀
v
∈
R
n
v
T
K
v
≥
0
\forall v\in\R^n\;\;\;\;\;v^TKv\ge0
∀v∈RnvTKv≥0
v
T
K
v
=
(
1
T
z
)
v
T
d
i
a
g
{
z
}
v
−
v
T
z
z
T
v
=
(
∑
i
z
i
)
⏟
b
T
b
(
∑
i
v
i
2
z
i
)
⏟
a
T
a
−
(
∑
i
v
i
z
i
)
2
⏟
a
T
b
v^TKv=(1^Tz)v^Tdiag\{z\}v-v^Tzz^Tv\\=\underset{b^Tb}{\underbrace{(\sum_iz_i)}}\underset{a^Ta}{\underbrace{(\sum_iv_i^2z_i)}}-\underset{a^Tb}{\underbrace{(\sum_iv_iz_i)^2}}
vTKv=(1Tz)vTdiag{z}v−vTzzTv=bTb
(i∑zi)aTa
(i∑vi2zi)−aTb
(i∑vizi)2
a
i
=
v
i
z
i
b
i
=
z
i
a_i=v_i\sqrt{z_i}\;\;\;\;b_i=\sqrt{z_i}
ai=vizibi=zi
v
T
K
v
=
(
b
T
b
)
(
a
T
a
)
−
(
a
T
b
)
2
≥
0
v^TKv=(b^Tb)(a^Ta)-(a^Tb)^2\ge0
vTKv=(bTb)(aTa)−(aTb)2≥0
Cachy-Schwartz不等式
⇒
\Rightarrow
⇒log-sum-exp是凸函数
二、几何平均
f
(
x
)
=
(
x
1
⋅
…
⋅
x
n
)
1
n
x
∈
R
+
+
n
f(x)=(x_1\cdot…\cdot x_n)^{\frac1n}\;\;\;\;x\in\R^n_{++}
f(x)=(x1⋅…⋅xn)n1x∈R++n
是个凹函数,这里限制每一个分量都非负主要是不想考虑复数的情况。
三、对称半正定矩阵的行列式的对数
f ( x ) = log d e t ( x ) d o m f = S + + n f(x)=\log det(x)\;\;\;\;dom f=S_{++}^n f(x)=logdet(x)domf=S++n
当 n = 1 n=1 n=1时,是凹函数;
当
n
>
1
n>1
n>1时,
∀
z
∈
S
+
+
n
,
∀
t
∈
R
,
∀
v
∈
R
n
∗
n
\forall z\in S_{++}^n,\forall t\in\R,\forall v\in\R^{n*n}
∀z∈S++n,∀t∈R,∀v∈Rn∗n
z
+
t
v
∈
S
+
+
n
=
d
o
m
f
,
故
v
∈
S
n
z+tv\in S_{++}^n=dom f,故v\in S^n
z+tv∈S++n=domf,故v∈Sn
g
(
t
)
=
f
(
z
+
t
v
)
=
log
d
e
t
(
z
+
t
v
)
=
log
d
e
t
{
z
1
2
(
I
+
t
z
−
1
2
v
z
1
2
)
z
1
2
}
=
log
d
e
t
{
z
}
+
log
d
e
t
{
I
+
t
z
−
1
2
v
z
1
2
⏟
λ
i
为
该
矩
阵
的
特
征
值
}
=
log
d
e
t
{
z
}
+
∑
i
=
1
n
log
(
1
+
t
λ
i
)
g(t)=f(z+tv)=\log det(z+tv)\\=\log det\{z^{\frac12}(I+tz^{-\frac12}vz^{\frac12})z^{\frac12}\}\\=\log det\{z\}+\log det\{I+\underset{\lambda_i为该矩阵的特征值}{\underbrace{tz^{-\frac12}vz^{\frac12}}}\}\\=\log det\{z\}+\sum_{i=1}^n\log(1+t\lambda_i)
g(t)=f(z+tv)=logdet(z+tv)=logdet{z21(I+tz−21vz21)z21}=logdet{z}+logdet{I+λi为该矩阵的特征值
tz−21vz21}=logdet{z}+i=1∑nlog(1+tλi)
令 t z − 1 2 v z 1 2 = Q Λ Q T Q Q T = I tz^{-\frac12}vz^{\frac12}=Q\Lambda Q^T\;\;\;\;\;\;QQ^T=I tz−21vz21=QΛQTQQT=I
d e t ( I + t z − 1 2 v z 1 2 ) = d e t ( Q Q T + Q Λ Q T ) = d e t ( Q ) d e t ( I + Λ ) d e t ( Q T ) = d e t ( Q Q T I n ) d e t ( I + Λ 1 + λ i ) det(I+tz^{-\frac12}vz^{\frac12})=det(QQ^T+Q\Lambda Q^T)\\=det(Q)det(I+\Lambda)det(Q^T)\\=det(\underset{\color{blue}I_n}{QQ^T})det(\underset{\color{blue}1+\lambda_i}{I+\Lambda}) det(I+tz−21vz21)=det(QQT+QΛQT)=det(Q)det(I+Λ)det(QT)=det(InQQT)det(1+λiI+Λ)
g ′ ( t ) = ∑ i λ i 1 + t λ i g ′ ′ ( t ) = ∑ i − λ i 2 ( 1 + t λ i ) 2 ≤ 0 g'(t)=\sum_i\frac{\lambda_i}{1+t\lambda_i}\\g''(t)=\sum_i\frac{-\lambda_i^2}{(1+t\lambda_i)^2}\le0 g′(t)=i∑1+tλiλig′′(t)=i∑(1+tλi)2−λi2≤0
所以是凹函数。
下一章传送门:中科大-凸优化 笔记(lec14)-保凸运算