文章目录
- 符号说明
- [Jensen’s inequality]
- f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) f(\theta x + (1-\theta)y) \le \theta f(x)+(1-\theta)f(y) f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)
- f ( θ 1 x 1 + … + θ k x k ) ≤ θ 1 f ( x 1 ) + … + θ k f ( x k ) f(\theta_1 x_1 + \ldots + \theta_k x_k) \le \theta_1 f(x_1)+\ldots+ \theta_k f(x_k) f(θ1x1+…+θkxk)≤θ1f(x1)+…+θkf(xk)
- f ( ∫ S p ( x ) x d x ) ≤ ∫ S f ( x ) p ( x ) d x f(\int_S p(x)x \: \mathrm{d}x) \le \int_S f(x)p(x) \mathrm{d}x f(∫Sp(x)xdx)≤∫Sf(x)p(x)dx
- f ( E x ) ≤ E ( f ( x ) ) f(\mathrm{E}x) \le \mathrm{E}(f(x)) f(Ex)≤E(f(x))
- [Young's inequality] a b ≤ a p p + b q q ab \le \frac{a^p}{p} + \frac{b^q}{q} ab≤pap+qbq
- [Holder's inequality] ∥ x y ∥ 1 ≤ ∥ x ∥ p ∥ y ∥ q \|xy\|_1 \le \|x\|_p \|y\|_q ∥xy∥1≤∥x∥p∥y∥q
- [trace-nuclear] T r ( A T B ) ≤ ∥ A ∥ ∥ B ∥ ∗ \mathrm{Tr}(A^TB) \le \|A\|\|B\|_* Tr(ATB)≤∥A∥∥B∥∗
- [算术-几何平均不等式] a θ b 1 − θ ≤ θ a + ( 1 − θ ) b a^{\theta}b^{1-\theta} \le \theta a +(1-\theta)b aθb1−θ≤θa+(1−θ)b
- [Gibb's inequality] − ∑ i = 1 n p i log p i ≤ − ∑ i = 1 n p i log q i -\sum \limits_{i=1}^np_i \log p_i \le -\sum \limits_{i=1}^n p_i\log q_i −i=1∑npilogpi≤−i=1∑npilogqi
- [Gronwall's inequality] u ( t ) ≤ f ( t ) e ∫ 0 t h ( s ) d s u(t) \le f(t)e^{\int_0^th(s)\mathrm{d}s} u(t)≤f(t)e∫0th(s)ds
- [ C p C_p Cp inequality] ( ∣ a ∣ + ∣ b ∣ ) p ≤ C p ( ∣ a ∣ p + ∣ b ∣ p ) (|a|+|b|)^p \le C_p(|a|^p+|b|^p) (∣a∣+∣b∣)p≤Cp(∣a∣p+∣b∣p)
符号说明
矩阵
A
∈
R
m
×
n
A \in \mathbb{R}^{m \times n}
A∈Rm×n
∥
A
∥
\|A\|
∥A∥:矩阵
A
A
A的谱范数
∥
A
∥
∗
\|A\|_*
∥A∥∗: 矩阵
A
A
A的核范数
∥
A
∥
F
\|A\|_F
∥A∥F: 矩阵
A
A
A的F范数
r
a
n
k
(
)
\mathrm{rank}()
rank()表示矩阵的秩。
[Jensen’s inequality]
f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) f(\theta x + (1-\theta)y) \le \theta f(x)+(1-\theta)f(y) f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)
如果
f
:
R
n
→
R
f:\mathbb{R}^n \rightarrow \mathbb{R}
f:Rn→R为凸函数,
θ
∈
[
0
,
1
]
\theta \in [0, 1]
θ∈[0,1],
x
,
y
∈
d
o
m
f
x,y\in \mathrm{dom}f
x,y∈domf那么:
f
(
θ
x
+
(
1
−
θ
)
y
)
≤
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
f(\theta x + (1-\theta)y) \le \theta f(x)+(1-\theta)f(y)
f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)
实际上,上述为凸函数的定义,为比较一般的Jensen’s inequality。
f ( θ 1 x 1 + … + θ k x k ) ≤ θ 1 f ( x 1 ) + … + θ k f ( x k ) f(\theta_1 x_1 + \ldots + \theta_k x_k) \le \theta_1 f(x_1)+\ldots+ \theta_k f(x_k) f(θ1x1+…+θkxk)≤θ1f(x1)+…+θkf(xk)
如果
f
:
R
n
→
R
f:\mathbb{R}^n \rightarrow \mathbb{R}
f:Rn→R为凸函数,
θ
i
∈
[
0
,
1
]
,
∑
i
=
1
k
θ
i
=
1
\theta_i \in [0, 1], \sum \limits_{i=1}^k \theta_i =1
θi∈[0,1],i=1∑kθi=1,
x
1
,
…
,
x
k
∈
d
o
m
f
x_1, \ldots, x_k \in \mathrm{dom}f
x1,…,xk∈domf那么:
f
(
θ
1
x
1
+
…
+
θ
k
x
k
)
≤
θ
1
f
(
x
1
)
+
…
+
θ
k
f
(
x
k
)
f(\theta_1 x_1 + \ldots + \theta_k x_k) \le \theta_1 f(x_1)+\ldots+ \theta_k f(x_k)
f(θ1x1+…+θkxk)≤θ1f(x1)+…+θkf(xk)
证:假设
θ
1
=
0
或
者
1
\theta_1 = 0 或者 1
θ1=0或者1时,不等式是一定成立的,所以假设
θ
1
∈
(
0
,
1
)
\theta_1 \in (0,1)
θ1∈(0,1)。
令
θ
=
θ
1
,
θ
′
=
1
−
θ
\theta = \theta_1, \theta' = 1-\theta
θ=θ1,θ′=1−θ,
x
=
x
1
,
θ
′
y
=
θ
2
x
2
+
…
+
θ
k
x
k
x = x_1, \theta'y = \theta_2 x_2 + \ldots + \theta_k x_k
x=x1,θ′y=θ2x2+…+θkxk,根据凸函数的定义可得:
f
(
θ
x
+
θ
′
y
)
≤
θ
f
(
x
)
+
θ
′
f
(
y
)
f(\theta x + \theta' y) \le \theta f(x) + \theta' f(y)
f(θx+θ′y)≤θf(x)+θ′f(y)
∑
i
=
2
k
θ
i
/
θ
′
=
1
\sum \limits_{i=2}^k \theta_i / \theta'=1
i=2∑kθi/θ′=1,所以,同样满足条件,所以通过数学归纳法即可证明上述等式。
f ( ∫ S p ( x ) x d x ) ≤ ∫ S f ( x ) p ( x ) d x f(\int_S p(x)x \: \mathrm{d}x) \le \int_S f(x)p(x) \mathrm{d}x f(∫Sp(x)xdx)≤∫Sf(x)p(x)dx
如果在
S
⊆
d
o
m
f
S \subseteq \mathrm{dom} f
S⊆domf上,
p
(
x
)
≥
0
p(x) \ge 0
p(x)≥0,且
∫
S
p
(
x
)
d
x
=
1
\int_S p(x) \: \mathrm{d}x=1
∫Sp(x)dx=1,则当相应的积分存在时:
f
(
∫
S
p
(
x
)
x
d
x
)
≤
∫
S
f
(
x
)
p
(
x
)
d
x
f(\int_S p(x)x \: \mathrm{d}x) \le \int_S f(x)p(x) \mathrm{d}x
f(∫Sp(x)xdx)≤∫Sf(x)p(x)dx
试证(注意,是试证):
令
θ
i
=
p
(
x
i
)
Δ
x
i
,
i
=
1
,
2
,
…
,
k
\theta_i = p(x_i) \Delta x_i, i=1,2,\ldots,k
θi=p(xi)Δxi,i=1,2,…,k,且满足
∑
i
=
1
k
θ
i
=
1
\sum \limits_{i=1}^k \theta_i =1
i=1∑kθi=1(这个性质至少在p(x)是连续函数的时候是能够满足的),那么根据第二形态Jensen’s inequality可以得到:
f
(
∑
i
=
1
k
p
(
x
i
)
Δ
x
i
x
i
)
≤
∑
i
=
1
k
p
(
x
i
)
Δ
x
i
f
(
x
i
)
f(\sum \limits_{i=1}^k p(x_i)\Delta x_i x_i) \le \sum \limits_{i=1}^k p(x_i)\Delta x_i f(x_i)
f(i=1∑kp(xi)Δxixi)≤i=1∑kp(xi)Δxif(xi)
令
max
∣
Δ
x
i
∣
→
0
\max |\Delta x_i| \rightarrow 0
max∣Δxi∣→0即可得积分形式不等式(当然,里面含有一个极限和函数互换的东西,因为凸函数一定是连续函数,所以这个是可以互换的,应该没弄错)。
f ( E x ) ≤ E ( f ( x ) ) f(\mathrm{E}x) \le \mathrm{E}(f(x)) f(Ex)≤E(f(x))
如果
x
x
x是随机变量,事件
x
∈
d
o
m
f
x \in \mathrm{dom}f
x∈domf发生的概率为1,函数
f
f
f为凸函数,且相应的期望存在时:
f
(
E
x
)
≤
E
(
f
(
x
)
)
f(\mathrm{E}x) \le \mathrm{E}(f(x))
f(Ex)≤E(f(x))
证:
令
S
=
d
o
m
f
S = domf
S=domf,随机变量
x
x
x的概率密度函数为
p
(
x
)
p(x)
p(x),则
∫
S
p
(
x
)
=
1
\int_S p(x)=1
∫Sp(x)=1,于是,根据积分形式的Jensen’s inequality即可得:
f
(
E
x
)
≤
E
(
f
(
x
)
)
f(\mathrm{E}x) \le \mathrm{E}(f(x))
f(Ex)≤E(f(x))
[Young’s inequality] a b ≤ a p p + b q q ab \le \frac{a^p}{p} + \frac{b^q}{q} ab≤pap+qbq
设
p
,
q
∈
[
1
,
+
∞
)
p,q \in [1, +\infty)
p,q∈[1,+∞)且均为实数,满足:
1
p
+
1
q
=
1
\frac{1}{p} + \frac{1}{q} = 1
p1+q1=1
若
a
,
b
>
0
a, b>0
a,b>0亦为实数,那么:
a
b
≤
a
p
p
+
b
q
q
ab \le \frac{a^p}{p} + \frac{b^q}{q}
ab≤pap+qbq
证1:
对于
x
∈
R
+
,
α
∈
(
0
,
1
)
x \in \mathbb{R}^+, \alpha \in (0, 1)
x∈R+,α∈(0,1),有
x
α
≤
1
+
α
(
x
−
1
)
x^{\alpha} \le 1 + \alpha (x-1)
xα≤1+α(x−1)(因为
x
α
x^{\alpha}
xα为凹函数,而不等式右边是在点
(
1
,
1
)
(1, 1)
(1,1)的切线)。令
x
=
b
/
a
,
α
=
1
/
q
x = b/a, \alpha = 1/q
x=b/a,α=1/q ,可得:
a
1
/
p
b
1
/
q
≤
a
p
+
b
q
a^{1/p}b^{1/q} \le \frac{a}{p} + \frac{b}{q}
a1/pb1/q≤pa+qb
令
a
:
=
a
p
,
b
:
=
b
q
a:=a^p, b:=b^q
a:=ap,b:=bq,代入即可得,另外
a
,
b
=
0
a,b=0
a,b=0的时候不等式必成立,结果得证。
证2:
考察
O
x
y
Oxy
Oxy平面上由方程
y
=
x
p
−
1
y=x^{p-1}
y=xp−1所定义的曲线,它也可以表示为
x
=
y
1
p
−
1
=
y
q
−
1
x=y^{\frac{1}{p-1}}=y^{q-1}
x=yp−11=yq−1,作积分得:
S
1
=
∫
0
a
y
d
x
=
∫
0
a
x
p
−
1
d
x
=
a
p
p
S
2
=
∫
0
b
x
d
y
=
∫
0
a
y
q
−
1
d
y
=
b
q
q
S_1 = \int_0^a y \mathrm{d}x = \int_0^a x^{p-1} \mathrm{d}x = \frac{a^p}{p} \\ S_2 = \int_0^b x \mathrm{d}y = \int_0^a y^{q-1} \mathrm{d}y = \frac{b^q}{q}
S1=∫0aydx=∫0axp−1dx=papS2=∫0bxdy=∫0ayq−1dy=qbq
显然:
a
b
≤
S
1
+
S
2
=
a
p
p
+
b
q
q
ab \le S_1 + S_2 = \frac{a^p}{p} + \frac{b^q}{q}
ab≤S1+S2=pap+qbq
只有当
b
q
=
a
p
b^q = a^p
bq=ap的时候,不等式才得以成立,证毕。
[Holder’s inequality] ∥ x y ∥ 1 ≤ ∥ x ∥ p ∥ y ∥ q \|xy\|_1 \le \|x\|_p \|y\|_q ∥xy∥1≤∥x∥p∥y∥q
离散形式
设
p
,
q
∈
[
1
,
+
∞
)
p, q \in [1, +\infty)
p,q∈[1,+∞),且
1
p
+
1
q
=
1
\frac{1}{p}+\frac{1}{q}=1
p1+q1=1,
x
,
y
∈
C
n
x, y \in C^{n}
x,y∈Cn,其中
C
C
C表示复数域,则:
∥
x
y
∥
1
=
∑
i
=
1
n
∣
x
i
y
i
∣
≤
(
∑
i
=
1
n
∣
x
i
∣
p
)
1
p
(
∑
i
=
1
n
∣
y
i
∣
q
)
1
q
=
∥
x
∥
p
∥
y
∥
q
\|xy\|_1 = \sum \limits_{i=1}^n |x_iy_i| \le (\sum \limits_{i=1}^n |x_i|^p)^{\frac{1}{p}}(\sum \limits_{i=1}^n |y_i|^q)^{\frac{1}{q}} = \|x\|_p \|y\|_q
∥xy∥1=i=1∑n∣xiyi∣≤(i=1∑n∣xi∣p)p1(i=1∑n∣yi∣q)q1=∥x∥p∥y∥q
注意,
m
×
n
m \times n
m×n的矩阵可以看成是
m
n
mn
mn维的向量。
证:
令
a
k
=
∣
x
k
∣
(
∑
i
=
1
n
∣
x
i
∣
p
)
1
p
,
b
k
=
∣
y
k
∣
(
∑
i
=
1
n
∣
y
i
∣
q
)
1
q
a_k = \frac{|x_k|}{(\sum \limits_{i=1}^{n} |x_i|^p)^{\frac{1}{p}}}, b_k = \frac{|y_k|}{(\sum \limits_{i=1}^{n} |y_i|^q)^{\frac{1}{q}}}
ak=(i=1∑n∣xi∣p)p1∣xk∣,bk=(i=1∑n∣yi∣q)q1∣yk∣
则有
∑
k
=
1
n
a
k
p
=
1
,
∑
k
=
1
n
b
k
q
=
1
\sum \limits_{k=1}^n a_k^p = 1, \sum_{k=1}^n b_k^q = 1
k=1∑nakp=1,∑k=1nbkq=1,由杨不等式
a
k
b
k
≤
a
k
p
p
+
b
k
q
q
a_kb_k \le \frac{a_k^p}{p} + \frac{b_k^q}{q}
akbk≤pakp+qbkq求和,得
∑
k
=
1
n
a
k
b
k
≤
∑
k
=
1
n
a
k
p
p
+
∑
k
=
1
n
b
k
q
q
=
1
p
+
1
q
=
1
\sum \limits_{k=1}^n a_k b_k \le \frac{\sum \limits_{k=1}^{n}a_k^p}{p} + \frac{\sum \limits_{k=1}^{n}b_k^q}{q} = \frac{1}{p} + \frac{1}{q}=1
k=1∑nakbk≤pk=1∑nakp+qk=1∑nbkq=p1+q1=1
即
∑
i
=
1
n
∣
x
i
∣
∣
y
i
∣
(
∑
i
=
1
n
∣
x
i
∣
p
)
1
p
(
∑
i
=
1
n
∣
y
i
∣
q
)
1
q
≤
1
\frac{\sum \limits_{i=1}^n |x_i||y_i|}{(\sum \limits_{i=1}^n |x_i|^p)^{\frac{1}{p}}(\sum \limits_{i=1}^n |y_i|^q)^{\frac{1}{q}}} \le1
(i=1∑n∣xi∣p)p1(i=1∑n∣yi∣q)q1i=1∑n∣xi∣∣yi∣≤1
所以得证。
另外需要一提的是
n
→
+
∞
n \rightarrow + \infty
n→+∞,且右端俩式收敛,则这个式子也对于
n
→
+
∞
n \rightarrow +\infty
n→+∞也可成立。
积分形式
设
p
,
q
∈
[
1
,
+
∞
)
p, q \in [1, +\infty)
p,q∈[1,+∞),且
1
p
+
1
q
=
1
\frac{1}{p}+\frac{1}{q}=1
p1+q1=1,
x
(
t
)
,
y
(
t
)
,
t
∈
[
t
0
,
t
1
]
x(t), y(t), t\in [t_0, t_1]
x(t),y(t),t∈[t0,t1],且
∫
t
0
t
1
∣
x
(
t
)
y
(
t
)
∣
d
t
,
[
∫
t
0
t
1
∣
x
(
t
)
∣
p
d
t
]
1
p
,
[
∫
t
0
t
1
∣
y
(
t
)
∣
q
d
t
]
1
q
\int_{t_0}^{t_1}|x(t)y(t)|\mathrm{d}t,\: [\int_{t_0}^{t_1}|x(t)|^p\mathrm{d}t]^{\frac{1}{p}}, \:[\int_{t_0}^{t_1}|y(t)|^q\mathrm{d}t]^{\frac{1}{q}}
∫t0t1∣x(t)y(t)∣dt,[∫t0t1∣x(t)∣pdt]p1,[∫t0t1∣y(t)∣qdt]q1
均存在,则
∫
t
0
t
1
∣
x
(
t
)
y
(
t
)
∣
d
t
≤
[
∫
t
0
t
1
∣
x
(
t
)
∣
p
d
t
]
1
p
[
∫
t
0
t
1
∣
y
(
t
)
∣
q
d
t
]
1
q
\int_{t_0}^{t_1}|x(t)y(t)|\mathrm{d}t \le [\int_{t_0}^{t_1}|x(t)|^p\mathrm{d}t]^{\frac{1}{p}} [\int_{t_0}^{t_1}|y(t)|^q\mathrm{d}t]^{\frac{1}{q}}
∫t0t1∣x(t)y(t)∣dt≤[∫t0t1∣x(t)∣pdt]p1[∫t0t1∣y(t)∣qdt]q1
证:
令
a
=
∣
x
(
t
)
∣
[
∫
t
0
t
1
∣
x
(
t
)
∣
p
d
t
]
1
p
,
b
=
∣
y
(
t
)
∣
[
∫
t
0
t
1
∣
y
(
t
)
∣
q
d
t
]
1
q
a = \frac{|x(t)|}{[\int_{t_0}^{t_1}|x(t)|^p\mathrm{d}t]^{\frac{1}{p}}}, \quad b = \frac{|y(t)|}{[\int_{t_0}^{t_1}|y(t)|^q\mathrm{d}t]^{\frac{1}{q}}}
a=[∫t0t1∣x(t)∣pdt]p1∣x(t)∣,b=[∫t0t1∣y(t)∣qdt]q1∣y(t)∣
则有
∫
t
0
t
1
a
p
d
t
=
1
,
∫
t
0
t
1
b
q
d
t
=
1
\int_{t_0}^{t_1}a^p \mathrm{d}t=1, \: \int_{t_0}^{t_1}b^q \mathrm{d}t=1
∫t0t1apdt=1,∫t0t1bqdt=1,并由杨不等式
a
b
≤
a
p
p
+
b
q
q
ab\le \frac{a^p}{p} + \frac{b^q}{q}
ab≤pap+qbq并积分可得:
∫
t
0
t
1
a
b
d
t
≤
1
\int_{t_0}^{t_1}ab \mathrm{d}t \le 1
∫t0t1abdt≤1
即
∫
t
0
t
1
∣
x
(
t
)
y
(
t
)
∣
d
t
≤
[
∫
t
0
t
1
∣
x
(
t
)
∣
p
d
t
]
1
p
[
∫
t
0
t
1
∣
y
(
t
)
∣
q
d
t
]
1
q
\int_{t_0}^{t_1}|x(t)y(t)|\mathrm{d}t \le [\int_{t_0}^{t_1}|x(t)|^p\mathrm{d}t]^{\frac{1}{p}} [\int_{t_0}^{t_1}|y(t)|^q\mathrm{d}t]^{\frac{1}{q}}
∫t0t1∣x(t)y(t)∣dt≤[∫t0t1∣x(t)∣pdt]p1[∫t0t1∣y(t)∣qdt]q1
证毕。
[trace-nuclear] T r ( A T B ) ≤ ∥ A ∥ ∥ B ∥ ∗ \mathrm{Tr}(A^TB) \le \|A\|\|B\|_* Tr(ATB)≤∥A∥∥B∥∗
证明:
根据
∥
B
∥
∗
\|B\|_*
∥B∥∗的对偶定义:
∥
B
∥
∗
=
sup
{
T
r
(
A
T
B
)
∣
∥
A
∥
≤
1
}
=
sup
{
T
r
(
A
T
B
)
∣
∥
A
∥
=
1
}
⇒
α
∥
B
∥
∗
≥
α
T
r
(
(
A
T
B
)
)
,
∥
A
∥
=
1
\|B\|_* = \sup \{\mathrm{Tr}(A^TB)| \|A\| \le 1\} = \sup \{\mathrm{Tr}(A^TB)| \|A\| = 1\} \\ \Rightarrow \alpha \|B\|_* \ge \alpha\mathrm{Tr}((A^TB)), \|A\| =1
∥B∥∗=sup{Tr(ATB)∣∥A∥≤1}=sup{Tr(ATB)∣∥A∥=1}⇒α∥B∥∗≥αTr((ATB)),∥A∥=1
令
A
:
=
α
A
A := \alpha A
A:=αA代之,则
∥
A
∥
=
α
\|A\| = \alpha
∥A∥=α
∥
A
∥
∥
B
∥
∗
≥
T
r
(
A
T
B
)
\|A\|\|B\|_* \ge \mathrm{Tr}(A^TB)
∥A∥∥B∥∗≥Tr(ATB)
因为
B
B
B是任意的,所以不等式对任意的
A
,
B
A,B
A,B都成立(当然前提是能做矩阵的乘法).
[算术-几何平均不等式] a θ b 1 − θ ≤ θ a + ( 1 − θ ) b a^{\theta}b^{1-\theta} \le \theta a +(1-\theta)b aθb1−θ≤θa+(1−θ)b
如果
a
,
b
≥
0
a,b\ge 0
a,b≥0,
θ
∈
[
0
,
1
]
\theta \in [0, 1]
θ∈[0,1],那么
a
θ
b
1
−
θ
≤
θ
a
+
(
1
−
θ
)
b
a^{\theta}b^{1-\theta} \le \theta a +(1-\theta)b
aθb1−θ≤θa+(1−θ)b
θ
=
1
/
2
\theta = 1/2
θ=1/2时,
a
b
≤
(
a
+
b
)
/
2
\sqrt{ab} \le (a+b)/2
ab≤(a+b)/2
证1:因为
−
log
x
-\log x
−logx为定义在
(
0
,
+
∞
)
(0, +\infty)
(0,+∞)上的凸函数,根据[Jensen’s inequality]可得:
−
log
(
θ
a
+
(
1
−
θ
)
b
)
≤
−
θ
log
(
a
)
−
(
1
−
θ
)
log
(
b
)
-\log (\theta a + (1-\theta)b) \le -\theta \log(a) -(1-\theta) \log(b)
−log(θa+(1−θ)b)≤−θlog(a)−(1−θ)log(b)
俩边取指数可得:
(
θ
a
+
(
1
−
θ
)
b
)
−
1
≤
(
a
θ
b
(
1
−
θ
)
)
−
1
\big(\theta a+(1-\theta)b\big)^{-1} \le (a^{\theta}b^{(1-\theta)})^{-1}
(θa+(1−θ)b)−1≤(aθb(1−θ))−1
所以
a
θ
b
1
−
θ
≤
θ
a
+
(
1
−
θ
)
b
a^{\theta}b^{1-\theta} \le \theta a +(1-\theta)b
aθb1−θ≤θa+(1−θ)b
证2:
根据[Young’s inequality]可得:
a
b
≤
a
p
p
+
b
q
q
ab \le \frac{a^p}{p} + \frac{b^q}{q}
ab≤pap+qbq
令
a
=
a
θ
,
b
=
b
1
−
θ
a = a^{\theta}, b = b^{1-\theta}
a=aθ,b=b1−θ,
p
=
1
/
θ
,
q
=
1
/
(
1
−
θ
)
p = 1/\theta,q=1/(1-\theta)
p=1/θ,q=1/(1−θ),
p
,
q
p,q
p,q满足条件,所以:
a
θ
b
1
−
θ
≤
θ
a
+
(
1
−
θ
)
b
a^{\theta}b^{1-\theta} \le \theta a +(1-\theta)b
aθb1−θ≤θa+(1−θ)b
[Gibb’s inequality] − ∑ i = 1 n p i log p i ≤ − ∑ i = 1 n p i log q i -\sum \limits_{i=1}^np_i \log p_i \le -\sum \limits_{i=1}^n p_i\log q_i −i=1∑npilogpi≤−i=1∑npilogqi
假设
P
=
{
p
1
,
…
,
p
n
}
,
Q
=
{
q
1
,
…
,
q
n
}
P=\{p_1, \ldots, p_n\}, Q=\{q_1, \ldots, q_n\}
P={p1,…,pn},Q={q1,…,qn}分别为一个概率分布, 那么有下列不等式成立:
−
∑
i
=
1
n
p
i
log
p
i
≤
−
∑
i
=
1
n
p
i
log
q
i
-\sum \limits_{i=1}^np_i \log p_i \le -\sum \limits_{i=1}^n p_i\log q_i
−i=1∑npilogpi≤−i=1∑npilogqi
等价于:
∑
i
=
1
n
p
i
log
p
i
≥
∑
i
=
1
n
p
i
log
q
i
\sum \limits_{i=1}^np_i \log p_i \ge \sum \limits_{i=1}^n p_i\log q_i
i=1∑npilogpi≥i=1∑npilogqi
亦等价于:
−
∑
i
=
1
n
p
i
log
p
i
q
i
≤
0
-\sum \limits_{i=1}^n p_i \log \frac{p_i}{q_i} \le 0
−i=1∑npilogqipi≤0
当且仅当
p
i
=
q
i
p_i=q_i
pi=qi时等式成立.
这意味着是KL散度:
D
(
P
∥
Q
)
=
−
∑
i
=
1
n
p
i
ln
q
i
p
i
≥
0
D(P\|Q)=-\sum_{i=1}^n p_i\ln \frac{q_i}{p_i} \ge 0
D(P∥Q)=−i=1∑npilnpiqi≥0
证1:
因为
log
a
=
ln
a
ln
2
\log a = \frac{\ln a}{\ln 2}
loga=ln2lna, 所以我们简单证明
ln
\ln
ln的不等式即可.
用
I
I
I表示
p
i
>
0
p_i > 0
pi>0的指示集,又
ln
x
≤
x
−
1
,
x
>
0
\ln x \le x-1, x>0
lnx≤x−1,x>0, 故:
−
∑
i
∈
I
p
i
ln
q
i
p
i
≥
−
∑
i
∈
I
p
i
(
q
i
p
i
−
1
)
=
−
∑
i
∈
I
q
i
+
1
≥
0
-\sum \limits_{i \in I} p_i \ln \frac{q_i}{p_i} \ge -\sum \limits_{i \in I} p_i (\frac{q_i}{p_i}-1) =-\sum \limits_{i \in I} q_i +1 \ge 0
−i∈I∑pilnpiqi≥−i∈I∑pi(piqi−1)=−i∈I∑qi+1≥0
经过延拓
0
ln
0
=
0
0\ln0=0
0ln0=0, 则上式成立, 又
x
=
1
x=1
x=1的时候
ln
x
=
x
−
1
\ln x = x-1
lnx=x−1, 所以
p
i
=
q
i
,
i
∈
I
p_i=q_i, i\in I
pi=qi,i∈I, 又因为
∑
i
∈
I
p
i
=
1
\sum_{i\in I} p_i=1
∑i∈Ipi=1, 所以
∑
i
∈
I
q
i
=
1
\sum_{i\in I} q_i=1
∑i∈Iqi=1, 所以
p
i
=
q
i
=
0
,
i
∉
I
p_i=q_i=0, i \not \in I
pi=qi=0,i∈I, 故
p
i
=
q
i
,
i
=
1
,
2
,
…
,
n
p_i =q_i, i=1,2,\ldots, n
pi=qi,i=1,2,…,n
证2:
因为
−
log
-\log
−log严格凸,所以利用[Jensen’ inequality]可以得到:
∑
i
p
i
log
q
i
p
i
≤
log
∑
i
p
i
q
i
p
i
=
0
\sum_i p_i \log \frac{q_i}{p_i} \le \log \sum_i p_i \frac{q_i}{p_i} = 0
i∑pilogpiqi≤logi∑pipiqi=0
而根据[Jensen’ inequality]等式成立的条件可以得到:
p
1
q
1
=
p
2
q
2
=
⋯
=
p
n
q
n
\frac{p_1}{q_1} = \frac{p_2}{q_2} =\cdots =\frac{p_n}{q_n}
q1p1=q2p2=⋯=qnpn
且
∑
i
q
i
=
∑
p
i
=
1
\sum_i q_i=\sum p_i =1
∑iqi=∑pi=1所以
p
i
=
q
i
p_i=q_i
pi=qi时等式成立,
p
i
=
0
p_i=0
pi=0的情况和上面一样讨论.
自然,该不等式可以推广到积分形式:
D
(
P
∥
Q
)
=
−
∫
p
(
x
)
log
q
(
x
)
p
(
x
)
d
x
≥
0
D(P\| Q)=-\int p(x) \log \frac{q(x)}{p(x)} \mathrm{d}x \ge 0
D(P∥Q)=−∫p(x)logp(x)q(x)dx≥0
[Gronwall’s inequality] u ( t ) ≤ f ( t ) e ∫ 0 t h ( s ) d s u(t) \le f(t)e^{\int_0^th(s)\mathrm{d}s} u(t)≤f(t)e∫0th(s)ds
假设
f
f
f在
[
0
,
+
∞
)
[0, +\infty)
[0,+∞)上非负,单调递增,
h
,
u
∈
C
[
0
,
+
∞
)
h, u \in \mathrm{C}[0, +\infty)
h,u∈C[0,+∞),且
h
h
h非负, 满足:
u
(
t
)
≤
f
(
t
)
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
,
t
≥
0
,
u(t) \le f(t) + \int_{0}^th(s)u(s) \mathrm{d}s, \quad t\ge 0,
u(t)≤f(t)+∫0th(s)u(s)ds,t≥0,
则:
u
(
t
)
≤
f
(
t
)
e
∫
0
t
h
(
s
)
d
s
.
u(t) \le f(t)e^{\int_0^th(s)\mathrm{d}s}.
u(t)≤f(t)e∫0th(s)ds.
注意:
如果
u
(
t
)
=
f
(
t
)
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
,
u(t) = f(t) + \int_{0}^th(s)u(s) \mathrm{d}s,
u(t)=f(t)+∫0th(s)u(s)ds,
并不能推出:
u
(
t
)
=
f
(
t
)
e
∫
0
t
h
(
s
)
d
s
.
u(t) = f(t)e^{\int_0^th(s)\mathrm{d}s}.
u(t)=f(t)e∫0th(s)ds.
但是当
f
(
t
)
≡
C
0
≥
0
f(t)\equiv C_0 \ge 0
f(t)≡C0≥0的时候, 是有此类性质的(可用类似证1的方法证明).
证1:
记:
w
(
t
)
=
∫
0
t
h
(
s
)
u
(
s
)
d
s
w(t)=\int_0^t h(s)u(s) \mathrm{d}s
w(t)=∫0th(s)u(s)ds, 则
w
(
0
)
=
0
w(0)=0
w(0)=0,
w
′
(
t
)
=
h
(
t
)
u
(
t
)
w'(t)=h(t)u(t)
w′(t)=h(t)u(t), 可得:
w
′
(
t
)
=
h
(
t
)
u
(
t
)
≤
h
(
t
)
f
(
t
)
+
h
(
t
)
w
(
t
)
.
w'(t)=h(t)u(t)\le h(t) f(t)+h(t)w(t).
w′(t)=h(t)u(t)≤h(t)f(t)+h(t)w(t).
即:
w
′
(
t
)
−
h
(
t
)
w
(
t
)
≤
h
(
t
)
f
(
t
)
.
w'(t)-h(t)w(t)\le h(t)f(t).
w′(t)−h(t)w(t)≤h(t)f(t).
记
H
(
t
)
=
∫
0
t
h
(
s
)
d
s
H(t)=\int_0^t h(s)\mathrm{d}s
H(t)=∫0th(s)ds, 则
H
(
0
)
=
0
,
H
′
(
t
)
=
h
(
t
)
H(0)=0, H'(t)=h(t)
H(0)=0,H′(t)=h(t).
俩边同乘以
e
−
H
(
t
)
e^{-H(t)}
e−H(t),不改变符号:
e
−
H
(
t
)
(
w
′
(
t
)
−
h
(
t
)
w
(
t
)
)
=
(
e
−
H
(
t
)
w
(
t
)
)
′
≤
e
−
H
(
t
)
h
(
t
)
f
(
t
)
,
e^{-H(t)}(w'(t)-h(t)w(t))=(e^{-H(t)}w(t))'\le e^{-H(t)}h(t)f(t),
e−H(t)(w′(t)−h(t)w(t))=(e−H(t)w(t))′≤e−H(t)h(t)f(t),
俩边是同时在
[
0
,
t
]
[0, t]
[0,t]上积分得:
w
(
t
)
≤
e
H
(
t
)
∫
0
t
e
−
H
(
s
)
h
(
s
)
f
(
s
)
d
s
.
w(t)\le e^{H(t)} \int_0^t e^{-H(s)}h(s)f(s)\mathrm{d}s.
w(t)≤eH(t)∫0te−H(s)h(s)f(s)ds.
注意到(因为
f
(
t
)
f(t)
f(t)单增, 且积分内部为非负):
∫
0
t
e
−
H
(
s
)
h
(
s
)
f
(
s
)
d
s
≤
∫
0
t
e
−
H
(
s
)
h
(
s
)
d
s
f
(
t
)
=
−
e
−
H
(
s
)
∣
0
t
f
(
t
)
=
(
1
−
e
−
H
(
t
)
)
f
(
t
)
,
\int_0^t e^{-H(s)}h(s)f(s)\mathrm{d}s\le \int_0^t e^{-H(s)}h(s)\mathrm{d}s \: f(t)=-e^{-H(s)}|_0^t \: f(t)=(1-e^{-H(t)})f(t),
∫0te−H(s)h(s)f(s)ds≤∫0te−H(s)h(s)dsf(t)=−e−H(s)∣0tf(t)=(1−e−H(t))f(t),
所以:
u
(
t
)
≤
f
(
t
)
+
w
(
t
)
≤
e
H
(
t
)
f
(
t
)
.
u(t) \le f(t)+w(t) \le e^{H(t)}f(t).
u(t)≤f(t)+w(t)≤eH(t)f(t).
证毕.
证2( u u u需非负):
u
(
t
)
≤
f
(
t
)
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
≤
f
(
t
)
+
ϵ
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
,
ϵ
>
0.
\begin{array}{ll} u(t) &\le f(t) + \int_{0}^th(s)u(s) \mathrm{d}s \\ & \le f(t) +\epsilon + \int_{0}^th(s)u(s) \mathrm{d}s, \epsilon > 0. \end{array}
u(t)≤f(t)+∫0th(s)u(s)ds≤f(t)+ϵ+∫0th(s)u(s)ds,ϵ>0.
则:
h
(
t
)
u
(
t
)
f
(
t
)
+
ϵ
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
≤
h
(
t
)
\frac{h(t)u(t)}{f(t)+\epsilon + \int_{0}^th(s)u(s) \mathrm{d}s} \le h(t)
f(t)+ϵ+∫0th(s)u(s)dsh(t)u(t)≤h(t)
俩边在
[
0
,
t
]
[0,t]
[0,t]上积分:
∫
0
t
h
(
s
)
u
(
s
)
f
(
s
)
+
ϵ
+
∫
0
s
h
(
τ
)
u
(
τ
)
d
τ
d
s
≤
∫
0
t
h
(
s
)
d
s
\int_0^t \frac{h(s)u(s)}{f(s)+\epsilon + \int_{0}^sh(\tau)u(\tau) \mathrm{d}\tau} \mathrm{d}s\le \int_0^t h(s)\mathrm{d}s
∫0tf(s)+ϵ+∫0sh(τ)u(τ)dτh(s)u(s)ds≤∫0th(s)ds
注意,因为
f
(
t
)
f(t)
f(t)是单增的,所以
s
∈
[
0
,
t
]
s\in[0, t]
s∈[0,t]时:
h
(
s
)
u
(
s
)
f
(
s
)
+
ϵ
+
∫
0
s
h
(
τ
)
u
(
τ
)
d
τ
≥
h
(
s
)
u
(
s
)
f
(
t
)
+
ϵ
+
∫
0
s
h
(
τ
)
u
(
τ
)
d
τ
≥
0
,
\frac{h(s)u(s)}{f(s)+\epsilon + \int_{0}^sh(\tau)u(\tau) \mathrm{d}\tau} \ge \frac{h(s)u(s)}{f(t)+\epsilon + \int_{0}^sh(\tau)u(\tau) \mathrm{d}\tau} \ge 0,
f(s)+ϵ+∫0sh(τ)u(τ)dτh(s)u(s)≥f(t)+ϵ+∫0sh(τ)u(τ)dτh(s)u(s)≥0,
所以:
∫
0
t
h
(
s
)
u
(
s
)
f
(
t
)
+
ϵ
+
∫
0
s
h
(
τ
)
u
(
τ
)
d
τ
d
s
=
ln
f
(
t
)
+
ϵ
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
f
(
t
)
+
ϵ
≤
∫
0
t
h
(
s
)
d
s
,
\int_0^t \frac{h(s)u(s)}{f(t)+\epsilon + \int_{0}^sh(\tau)u(\tau) \mathrm{d}\tau} \mathrm{d}s=\ln \frac{f(t)+\epsilon+\int_0^t h(s)u(s)\mathrm{d}s}{f(t)+\epsilon}\le \int_0^t h(s)\mathrm{d}s,
∫0tf(t)+ϵ+∫0sh(τ)u(τ)dτh(s)u(s)ds=lnf(t)+ϵf(t)+ϵ+∫0th(s)u(s)ds≤∫0th(s)ds,
所以:
f
(
t
)
+
ϵ
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
≤
e
H
(
t
)
(
f
(
t
)
+
ϵ
)
,
f(t)+\epsilon + \int_0^t h(s)u(s)\mathrm{d}s \le e^{H(t)}(f(t)+\epsilon),
f(t)+ϵ+∫0th(s)u(s)ds≤eH(t)(f(t)+ϵ),
其中
H
(
t
)
=
∫
0
t
h
(
s
)
d
s
H(t)=\int_0^t h(s) \mathrm{d}s
H(t)=∫0th(s)ds.
俩边令
ϵ
→
0
\epsilon \rightarrow0
ϵ→0得:
u
(
t
)
≤
f
(
t
)
+
∫
0
t
h
(
s
)
u
(
s
)
d
s
≤
e
H
(
t
)
f
(
t
)
.
u(t) \le f(t)+ \int_0^t h(s)u(s)\mathrm{d}s \le e^{H(t)}f(t).
u(t)≤f(t)+∫0th(s)u(s)ds≤eH(t)f(t).
证毕.
证3:
令
M
(
T
)
=
max
0
≤
t
≤
T
∫
0
t
h
(
s
)
u
(
s
)
d
s
M(T)=\max \limits_{0\le t\le T} \int_0^t h(s)u(s)\mathrm{d}s
M(T)=0≤t≤Tmax∫0th(s)u(s)ds,
则:
u
(
t
)
≤
f
(
t
)
+
M
(
T
)
⇒
h
(
t
)
u
(
t
)
≤
h
(
t
)
f
(
t
)
+
M
(
T
)
h
(
t
)
,
u(t)\le f(t)+M(T) \\ \Rightarrow h(t)u(t) \le h(t)f(t) + M(T)h(t),
u(t)≤f(t)+M(T)⇒h(t)u(t)≤h(t)f(t)+M(T)h(t),
于是:
u
(
t
)
≤
f
(
t
)
+
∫
0
t
h
(
s
)
f
(
s
)
+
M
(
T
)
h
(
s
)
d
s
.
u(t) \le f(t)+\int_0^th(s)f(s)+M(T)h(s)\mathrm{d}s.
u(t)≤f(t)+∫0th(s)f(s)+M(T)h(s)ds.
因为
f
(
t
)
f(t)
f(t)单增, 所以:
∫
0
t
h
(
s
)
f
(
s
)
d
s
≤
f
(
t
)
∫
0
t
h
(
s
)
d
s
.
\int_0^t h(s)f(s)\mathrm{d}s\le f(t) \int_0^th(s)\mathrm{d}s.
∫0th(s)f(s)ds≤f(t)∫0th(s)ds.
记
H
(
t
)
=
∫
0
t
h
(
s
)
d
s
H(t)=\int_0^t h(s)\mathrm{d}s
H(t)=∫0th(s)ds, 可得:
u
(
t
)
≤
f
(
t
)
(
1
+
H
(
t
)
)
+
H
(
t
)
M
(
T
)
⇒
h
(
t
)
u
(
t
)
≤
f
(
t
)
h
(
t
)
(
1
+
H
(
t
)
)
+
h
(
t
)
H
(
t
)
M
(
T
)
.
u(t)\le f(t)(1+H(t))+H(t)M(T) \\ \Rightarrow h(t)u(t)\le f(t)h(t)(1+H(t))+h(t)H(t)M(T).
u(t)≤f(t)(1+H(t))+H(t)M(T)⇒h(t)u(t)≤f(t)h(t)(1+H(t))+h(t)H(t)M(T).
于是:
u
(
t
)
≤
f
(
t
)
+
∫
0
t
f
(
s
)
h
(
s
)
(
1
+
H
(
s
)
)
+
h
(
s
)
H
(
s
)
M
(
T
)
d
s
.
u(t) \le f(t)+\int_0^t f(s)h(s)(1+H(s))+h(s)H(s)M(T) \mathrm{d} s.
u(t)≤f(t)+∫0tf(s)h(s)(1+H(s))+h(s)H(s)M(T)ds.
注意到:
∫
0
t
H
(
s
)
h
(
s
)
d
s
=
H
2
(
t
)
−
H
2
(
0
)
2
=
H
2
(
t
)
2
.
\int_0^t H(s)h(s) \mathrm{d}s=\frac{H^2(t)-H^2(0)}{2}=\frac{H^2(t)}{2}.
∫0tH(s)h(s)ds=2H2(t)−H2(0)=2H2(t).
所以:
u
(
t
)
≤
f
(
t
)
(
1
+
H
(
t
)
+
H
2
(
t
)
2
!
)
+
H
2
(
t
)
M
(
T
)
2
!
.
u(t) \le f(t)(1+H(t)+ \frac{H^2(t)}{2!})+\frac{H^2(t)M(T)}{2!}.
u(t)≤f(t)(1+H(t)+2!H2(t))+2!H2(t)M(T).
重复此类操作可得:
u
(
t
)
≤
f
(
t
)
(
1
+
H
(
t
)
+
H
2
(
t
)
2
!
+
.
.
.
+
H
n
(
t
)
n
!
)
+
H
n
(
t
)
M
(
T
)
n
!
.
u(t) \le f(t)(1+H(t)+ \frac{H^2(t)}{2!} + ...+\frac{H^n(t)}{n!})+\frac{H^n(t)M(T)}{n!}.
u(t)≤f(t)(1+H(t)+2!H2(t)+...+n!Hn(t))+n!Hn(t)M(T).
令
n
→
+
∞
n\rightarrow + \infty
n→+∞:
u
(
t
)
≤
f
(
t
)
e
H
(
t
)
+
0.
u(t) \le f(t)e^{H(t)}+0.
u(t)≤f(t)eH(t)+0.
证毕.
注:
最后这部分也可以利用:
1
+
t
+
…
+
t
n
n
!
≤
e
t
,
t
≥
0
1+t+\ldots+\frac{t^n}{n!}\le e^t, t\ge0
1+t+…+n!tn≤et,t≥0
来证明, 但是我觉得如果是俩边取极限,那就不必考虑
t
t
t得正负问题了,虽然多此一举,但是更酷啊.
[ C p C_p Cp inequality] ( ∣ a ∣ + ∣ b ∣ ) p ≤ C p ( ∣ a ∣ p + ∣ b ∣ p ) (|a|+|b|)^p \le C_p(|a|^p+|b|^p) (∣a∣+∣b∣)p≤Cp(∣a∣p+∣b∣p)
假设
a
,
b
a, b
a,b为实数,
p
>
0
p>0
p>0, 则
(
∣
a
∣
+
∣
b
∣
)
p
≤
C
p
(
∣
a
∣
p
+
∣
b
∣
p
)
,
(|a|+|b|)^p \le C_p(|a|^p+|b|^p),
(∣a∣+∣b∣)p≤Cp(∣a∣p+∣b∣p),
其中
C
p
=
{
1
,
0
<
p
≤
1
,
2
p
−
1
,
p
>
1.
C_p = \left \{ \begin{array}{ll} 1, & 0<p \le 1, \\ 2^{p-1}, & p>1. \end{array} \right.
Cp={1,2p−1,0<p≤1,p>1.
证明:
0
<
p
≤
1
0<p\le1
0<p≤1: 考虑函数
f
(
x
)
=
(
1
+
x
)
p
−
x
p
−
1
,
x
≥
0
f(x) = (1+x)^p-x^p-1, x \ge 0
f(x)=(1+x)p−xp−1,x≥0, 其导数为
f
′
(
x
)
=
p
[
(
x
+
1
)
p
−
1
−
x
p
−
1
]
<
0
,
f'(x) = p[(x+1)^{p-1}-x^{p-1}]<0,
f′(x)=p[(x+1)p−1−xp−1]<0,
则
f
(
x
)
f(x)
f(x)在
[
0
,
+
∞
)
[0,+\infty)
[0,+∞)上单调递减,由
f
(
0
)
=
0
f(0)=0
f(0)=0, 所以
f
(
x
)
≤
0
f(x)\le0
f(x)≤0. 代入
∣
b
∣
/
∣
a
∣
(
a
≠
0
)
|b|/|a|(a\not =0)
∣b∣/∣a∣(a=0)即得:
(
∣
a
∣
+
∣
b
∣
)
p
≤
C
p
(
∣
a
∣
p
+
∣
b
∣
p
)
,
(|a|+|b|)^p \le C_p(|a|^p+|b|^p),
(∣a∣+∣b∣)p≤Cp(∣a∣p+∣b∣p),
显然,
a
=
0
a=0
a=0时也成立.
p
>
1
p>1
p>1: 考虑凸函数
∣
x
∣
p
|x|^p
∣x∣p可得:
(
∣
a
∣
+
∣
b
∣
2
)
p
≤
1
2
(
∣
a
∣
p
+
∣
b
∣
p
)
,
(\frac{|a|+|b|}{2})^{p} \le \frac{1}{2}(|a|^p+|b|^p),
(2∣a∣+∣b∣)p≤21(∣a∣p+∣b∣p),
证毕.