范数
-
R
n
R^n
Rn空间的范数,
P
(
x
)
,
x
∈
R
n
P(x),x\in R^n
P(x),x∈Rn
- p ( a x ) = ∣ a ∣ p ( x ) p(ax)=|a|p(x) p(ax)=∣a∣p(x)
- p ( x + y ) ≤ p ( x ) + p ( y ) p(x+y)\leq p(x)+p(y) p(x+y)≤p(x)+p(y)
- p ( x ) = 0 ⇔ x = 0 p(x)=0 \Leftrightarrow x=0 p(x)=0⇔x=0
- 证明范数是凸函数
∀ x , y ∈ R n , ∀ 0 ≤ θ ≤ 1 \forall x,y \in R^n,\forall 0 \leq \theta \leq 1 ∀x,y∈Rn,∀0≤θ≤1
p ( θ x + ( 1 − θ ) y ) ≤ p ( θ x ) + p ( ( 1 − θ ) y ) p(\theta x +(1-\theta)y) \leq p(\theta x)+p((1-\theta)y) p(θx+(1−θ)y)≤p(θx)+p((1−θ)y) p ( θ x + ( 1 − θ ) y ) ≤ θ p ( x ) + ( 1 − θ ) p ( y ) p(\theta x +(1-\theta)y) \leq \theta p(x)+(1-\theta)p(y) p(θx+(1−θ)y)≤θp(x)+(1−θ)p(y) - 零范数不是范数,也不是凸函数
∥ x ∥ 0 = \|x\|_0= ∥x∥0=非零元素数目
考虑 x ∈ R x\in R x∈R并进行扩展
{ 1 , x ≠ 0 0 , x = 0 \begin{cases} 1,x \neq 0 \\ 0,x=0 \end{cases} {1,x=00,x=0
极大值函数
-
f
(
x
)
=
m
a
x
{
x
1
,
⋯
,
x
n
}
f(x)=max\{x_1,\cdots,x_n\}
f(x)=max{x1,⋯,xn}
→ x , y ∈ R n , ∀ 0 ≤ θ ≤ 1 \rightarrow x,y\in R^n,\forall 0 \leq \theta \leq 1 →x,y∈Rn,∀0≤θ≤1
f ( θ x + ( 1 − θ ) y ) = m a x { f ( θ x i + ( 1 − θ ) y i ) } f(\theta x+(1-\theta)y)=max\{f(\theta x_i+(1-\theta)y_i)\} f(θx+(1−θ)y)=max{f(θxi+(1−θ)yi)}
≤ θ m a x { x i } + m a x { ( 1 − θ ) y i } \leq \theta max\{x_i\}+max\{(1-\theta)y_i\} ≤θmax{xi}+max{(1−θ)yi}
= θ f ( x ) + ( 1 − θ ) f ( y ) =\theta f(x)+(1-\theta)f(y) =θf(x)+(1−θ)f(y) - 但是这种极大值函数不可导,所以要进行解析逼近
(
l
o
g
−
s
u
m
−
u
p
)
(log-sum-up)
(log−sum−up)
f ( x ) = l o g ( e x 1 + ⋯ + e x n ) f(x)=log(e^{x_1}+\cdots+e^{x_n}) f(x)=log(ex1+⋯+exn)
→ m a x { x 1 , ⋯ , x n } ≤ f ( x ) ≤ m a x { x 1 , ⋯ , x n } + l o g ( n ) \rightarrow max\{x_1,\cdots,x_n\} \leq f(x) \leq max\{x_1,\cdots,x_n\}+log(n) →max{x1,⋯,xn}≤f(x)≤max{x1,⋯,xn}+log(n)
→ \rightarrow →证明该函数是凸的,通过凸函数第四个定义来进行证明
→ ∂ f ∂ x i = e x i e x 1 + ⋯ + e x n \Large \rightarrow \frac{\partial f}{\partial x_i}=\frac{e^{x_i}}{e^{x_1}+\cdots+e^{x_n}} →∂xi∂f=ex1+⋯+exnexi,再求二阶导-
i
≠
j
\Large i\neq j
i=j
→ ∂ 2 f ∂ x i ∂ x j = − e x i e x j ( e x 1 + ⋯ + e x n ) 2 \Large \rightarrow \frac{\partial^2f}{\partial x_i \partial x_j}=\frac{-e^{x_i}e^{x_j}}{(e^{x_1}+\cdots+e^{x_n})^2} →∂xi∂xj∂2f=(ex1+⋯+exn)2−exiexj -
i
=
j
\Large i = j
i=j
→ ∂ 2 f ∂ x i ∂ x j = − e x i e x i + e x i ( e x i + ⋯ + e x n ) ( e x 1 + ⋯ + e x n ) 2 \Large \rightarrow \frac{\partial^2f}{\partial x_i \partial x_j}=\frac{-e^{x_i}e^{x_i}+e^{x_i}(e^{x_i}+\cdots+e^{x_n})}{(e^{x_1}+\cdots+e^{x_n})^2} →∂xi∂xj∂2f=(ex1+⋯+exn)2−exiexi+exi(exi+⋯+exn)
-
i
≠
j
\Large i\neq j
i=j
然后通过判断该函数二阶导的Hessian矩阵是不是半正定的即可
几何平均
- f ( x ) = ( x 1 × ⋯ × x n ) 1 n , x ∈ R + + n f(x)=(x_1 \times \cdots\times x_n)^{\frac{1}{n}},x \in R_{++}^n f(x)=(x1×⋯×xn)n1,x∈R++n
行列式的对数
-
f
(
x
)
=
l
o
g
(
d
e
t
(
x
)
)
,
d
o
m
f
∈
S
+
+
n
f(x)=log(det(x)),domf \in S_{++}^n
f(x)=log(det(x)),domf∈S++n定义域是对称正定矩阵
→ \rightarrow → 当 n > 1 n>1 n>1 时, ∀ z ∈ S + + n , t ∈ R , ∀ v ∈ R n × n \forall z \in S_{++}^n,t \in R,\forall v\in R^{n \times n} ∀z∈S++n,t∈R,∀v∈Rn×n v必须是对称的
→ z + t v ∈ S + + n \rightarrow z+tv \in S_{++}^n →z+tv∈S++n
→ g ( t ) = f ( z + t v ) \rightarrow g(t)=f(z+tv) →g(t)=f(z+tv)
→ l o g ( d e t ( z + t v ) ) \rightarrow log(det(z+tv)) →log(det(z+tv))
→ l o g ( d e t ( z 1 2 ( I + t z − 1 2 v z − 1 2 ) z 1 2 ) ) \rightarrow log(det(z^{\frac{1}{2}}(I+t z^{\frac{-1}{2}}v z^{\frac{-1}{2}}) z^{\frac{1}{2}})) →log(det(z21(I+tz2−1vz2−1)z21))
→ l o g ( d e t ( z ) ) + l o g ( d e t ( I + t z − 1 2 v z − 1 2 ) ) \rightarrow log(det(z))+log(det(I+tz^{\frac{-1}{2}}vz^{\frac{-1}{2}})) →log(det(z))+log(det(I+tz2−1vz2−1))
→ l o g ( d e t ( z ) ) + ∑ i = 1 n l o g ( 1 + t λ i ) \rightarrow log(det(z))+\sum_{i=1}^nlog(1+t\lambda_i) →log(det(z))+∑i=1nlog(1+tλi)(后面一步来自行列式的值等于特征值乘积)
→ g ′ ( t ) = ∑ i λ i 1 + t λ i \rightarrow g^{'}(t)=\underset{i}{\sum}\frac{\lambda_i}{1+t\lambda_i} →g′(t)=i∑1+tλiλi
→ g ′ ′ ( t ) = − ∑ i λ i 2 ( 1 + t λ i ) 2 ≤ 0 \rightarrow g^{''}(t)=-\underset{i}{\sum}\frac{\lambda_i^2}{(1+t\lambda_i)^2}\leq 0 →g′′(t)=−i∑(1+tλi)2λi2≤0
保持函数凸性
1、非负加权和
- ∀ f i , w i ≥ 0 , f i \forall f_i,w_i \geq 0,f_i ∀fi,wi≥0,fi为凸函数 g = ∑ w i f i g=\sum w_if_i g=∑wifi也是凸函数,易证
-
f
(
x
,
y
)
,
∀
y
∈
A
,
f
(
x
,
y
)
f(x,y),\forall y\in A,f(x,y)
f(x,y),∀y∈A,f(x,y) 为凸,
f
(
x
,
y
)
f(x,y)
f(x,y)不一定是一个凸函数
→ w ( y ) ≥ 0 , ∀ y ∈ A \rightarrow w(y) \geq 0, \forall y \in A →w(y)≥0,∀y∈A
→ g ( x ) = ∫ y ∈ A w ( y ) f ( x , y ) d y \rightarrow g(x)=\int_{y\in A}w(y)f(x,y)dy →g(x)=∫y∈Aw(y)f(x,y)dy是凸函数
2、仿射映射
-
f
:
R
n
→
R
A
∈
R
n
×
m
,
b
∈
R
n
f:R^n \rightarrow R \quad A \in R^{n \times m},b\in R^n
f:Rn→RA∈Rn×m,b∈Rn
g ( x ) = f ( A x + b ) A x + b ∈ d o m f g(x)=f(Ax+b) \quad Ax+b \in domf g(x)=f(Ax+b)Ax+b∈domf 易证不难 -
f
i
:
R
n
→
R
,
i
=
1
,
⋯
,
m
f_i:R^n \rightarrow R,i=1,\cdots,m
fi:Rn→R,i=1,⋯,m为凸,
A
∈
R
n
,
b
∈
R
A \in R^n ,b \in R
A∈Rn,b∈R
g ( x ) = A T [ f i , ⋯ , f n ] + b g(x)=A^T[f_i,\cdots,f_n]+b g(x)=AT[fi,⋯,fn]+b不是一个凸函数,因为带加权的凸函数组合,必须要保证权重是正的
3、两个函数的极大值函数
- f 1 , f 2 f_1,f_2 f1,f2为凸函数,则 f ( x ) = m a x { f 1 ( x ) , f 2 ( x ) } f(x)=max\{f_1(x),f_2(x)\} f(x)=max{f1(x),f2(x)}也是凸函数, d o m f = d o m f 1 ⋂ d o m f f 2 domf=domf_1 \bigcap domf f_2 domf=domf1⋂domff2
- 向量中
r
r
r个最大元素的和,
x
∈
R
n
x \in R^n
x∈Rn
x [ i ] x[i] x[i]是第 i i i大元素
f ( x ) = ∑ i r x [ i ] f(x)=\sum_{i}^{r}x[i] f(x)=∑irx[i]
→ f ( x ) = m a x { x i + ⋯ + x r ∣ i 1 , ⋯ , i r } \rightarrow f(x)=max\{x_i+\cdots+x_r|i_1,\cdots,i_r\} →f(x)=max{xi+⋯+xr∣i1,⋯,ir}首先该函数中的每一个都算是仿射变换函数
→ f i ( x ) = { A i x ∣ i 1 , ⋯ , i r } \rightarrow f_i(x)=\{A_ix|i_1,\cdots,i_r\} →fi(x)={Aix∣i1,⋯,ir}
→ f ( x ) = m a x { f i ( x ) ∣ i ∈ C n r } \rightarrow f(x)=max\{f_i(x)|i \in C_n^r\} →f(x)=max{fi(x)∣i∈Cnr} -
f
(
x
,
y
)
f(x,y)
f(x,y)对于
x
x
x为凸,
∀
y
∈
A
:
\forall y \in A:
∀y∈A:
g
=
s
u
p
f
(
x
,
y
)
g=supf(x,y)
g=supf(x,y)无限个凸函数的最大值也是一个凸函数
- 例子:实对称矩阵的最大特征值
→ f ( x ) = λ m a x ( x ) , d o m f = S m ∗ m \rightarrow f(x)=\lambda_{max}(x),domf=S^{m*m} →f(x)=λmax(x),domf=Sm∗m
→ x y = λ y \rightarrow xy=\lambda y →xy=λy
→ y T x y = y T λ y \rightarrow y^Txy=y^T\lambda y →yTxy=yTλy
→ y T x y = λ ∥ y ∥ 2 \rightarrow y^Txy=\lambda \|y\|^2 →yTxy=λ∥y∥2
→ λ = y T x y ∥ y ∥ 2 \large \rightarrow \lambda=\frac{y^Txy}{\|y\|^2} →λ=∥y∥2yTxy
假设 ∥ y ∥ 2 = 1 \|y\|^2=1 ∥y∥2=1
→ λ m a x ( x ) = s u p { y T x y ∣ ∥ y ∥ 2 = 1 } \large \rightarrow \lambda_{max}(x)=sup\{y^Txy|\|y\|^2=1\} →λmax(x)=sup{yTxy∣∥y∥2=1}
→ \rightarrow →上述式子是一个凸函数,因为 y T x y y^Txy yTxy是一个关于x的线性变换,不影响凸性, s u p sup sup函数也不影响凸性
- 例子:实对称矩阵的最大特征值
4、函数的组合
-
h : R k → R , g : R n → R k h:R^k \rightarrow R,\quad g:R^n \rightarrow R^k h:Rk→R,g:Rn→Rk
f = h ⋅ g = h ( g ( x ) ) : R n → R d o m f = { x ∈ d o m g ∣ g ( x ) ∈ d o m h } f=h \cdot g=h(g(x)):R^n \rightarrow R \quad domf=\{x \in domg|g(x)\in domh\} f=h⋅g=h(g(x)):Rn→Rdomf={x∈domg∣g(x)∈domh} -
一维: k = n = 1 k=n=1 k=n=1假设为实数空间
- d o m f = d o m h = d o m f = R domf=domh=domf=R domf=domh=domf=R
- h , g h,g h,g都是二阶可微
-
f ′ ′ ( x ) = h ′ ′ ( g ( x ) ) g ′ ( x ) 2 + h ′ ( g ( x ) ) g ′ ′ ( x ) ≥ 0 f^{''}(x)=h^{''}(g(x))g^{'}(x)^2+h^{'}(g(x))g^{''}(x) \geq 0 f′′(x)=h′′(g(x))g′(x)2+h′(g(x))g′′(x)≥0
- h h h为凸,不降 g g g为凸,则 f f f为凸函数
- h h h为凸,不增 g g g为凹,则 f f f为凸函数
- h h h为凹,不降 g g g为凹,则 f f f为凹函数
- h h h为凹,不增 g g g为凸,则 f f f为凹函数
实际情况下,下面的几个条件都不能够满足
-
高维: n , k ≥ 1 n,k \geq 1 n,k≥1
- h , g h,g h,g二阶不可微
-
需要将 h h h进行一下扩展得到 h ^ \hat{h} h^
- h h h为凸, h ^ \hat{h} h^不降 g g g为凸,则 f f f为凸函数
- h h h为凸, h ^ \hat{h} h^不增 g g g为凹,则 f f f为凸函数
- h h h为凹, h ^ \hat{h} h^不降 g g g为凹,则 f f f为凹函数
- h h h为凹, h ^ \hat{h} h^不增 g g g为凸,则 f f f为凹函数
-
证明上述第一个: h h h为凸, h ^ \hat{h} h^不降 g g g为凸,则 f f f为凸函数
→ ∀ x , y ∈ d o m f , 0 ≤ θ ≤ 1 , g \rightarrow \forall x,y\in domf,\quad 0 \leq \theta \leq 1,\quad g →∀x,y∈domf,0≤θ≤1,g为凸, x , y ∈ d o m g , g ( x ) , g ( y ) ∈ d o m h x,y \in domg,g(x),g(y) \in domh x,y∈domg,g(x),g(y)∈domh
→ h \rightarrow h →h为凸,故 d o m h domh domh为凸, g ( x ) , g ( y ) ∈ d o m h g(x),g(y) \in domh g(x),g(y)∈domh
→ g ( θ x + ( 1 − θ ) y ) ≤ θ g ( x ) + ( 1 − θ ) g ( y ) \rightarrow g(\theta x+(1-\theta)y) \leq \theta g(x)+(1-\theta)g(y) →g(θx+(1−θ)y)≤θg(x)+(1−θ)g(y)
→ f ( θ x + ( 1 − θ ) y ) = h ( g ( θ x + ( 1 − θ ) y ) ) \rightarrow f(\theta x+(1-\theta)y)=h(g(\theta x+(1-\theta)y)) →f(θx+(1−θ)y)=h(g(θx+(1−θ)y))
$\rightarrow 需要证明子问题: 需要证明子问题: 需要证明子问题:g(\theta x+(1-\theta)y) \in domh$
→ \rightarrow →假设 g ( θ x + ( 1 − θ ) y ) ∉ d o m h g(\theta x+(1-\theta)y) \notin domh g(θx+(1−θ)y)∈/domh,对于 h h h的扩展 h ^ \hat{h} h^有
→ h ^ ( g ( θ x + ( 1 − θ ) y ) ≤ h ^ ( θ g ( x ) + ( 1 − θ ) g ( y ) ) \rightarrow \hat{h}(g(\theta x+(1-\theta)y) \leq \hat{h}(\theta g(x)+(1-\theta)g(y)) →h^(g(θx+(1−θ)y)≤h^(θg(x)+(1−θ)g(y))
→ \rightarrow →如果 g ( θ x + ( 1 − θ ) y g(\theta x+(1-\theta)y g(θx+(1−θ)y不在 d o m h domh domh中,那么左式值为正无穷,此时如果还要保证上式的成立,必须保证 θ g ( x ) + ( 1 − θ ) g ( y ) \theta g(x)+(1-\theta)g(y) θg(x)+(1−θ)g(y)是正无穷,无意义
→ \rightarrow →即: g ( θ x + ( 1 − θ ) y ) ∈ d o m h g(\theta x+(1-\theta)y) \in domh g(θx+(1−θ)y)∈domh
→ f ( θ x + ( 1 − θ ) y ) = h ( g ( θ x + ( 1 − θ ) y ) ) ≤ h ( θ g ( x ) + ( 1 − θ ) g ( y ) ) \rightarrow f(\theta x+(1-\theta)y)=h(g(\theta x+(1-\theta)y)) \leq h(\theta g(x)+(1-\theta)g(y)) →f(θx+(1−θ)y)=h(g(θx+(1−θ)y))≤h(θg(x)+(1−θ)g(y))
= θ h ( g ( x ) ) + ( 1 − θ ) h ( g ( y ) ) =\theta h(g(x))+(1-\theta)h(g(y)) =θh(g(x))+(1−θ)h(g(y))
= θ f ( x ) + ( 1 − θ ) f ( y ) =\theta f(x)+(1-\theta)f(y) =θf(x)+(1−θ)f(y)