概率统计
Hoeffding不等式,Markov不等式
Markov不等式
(Markov不等式)对于任何非负随机变量X, ∀ ϵ > 0 \forall \epsilon > 0 ∀ϵ>0,有:
P ( X ≥ ϵ ) ≤ E ( X ) ϵ P(X \geq \epsilon) \leq \frac{E(X)}{\epsilon} P(X≥ϵ)≤ϵE(X)
Proof:
E
(
X
)
=
∫
0
∞
x
f
(
x
)
d
x
≥
∫
ϵ
∞
x
f
(
x
)
d
x
≥
∫
ϵ
∞
ϵ
f
(
x
)
d
x
=
ϵ
P
(
X
≥
ϵ
)
\begin{aligned} E(X) &= \int_0^\infty xf(x)dx\\ & \geq \int_\epsilon^\infty xf(x)dx\\ & \geq \int_\epsilon^\infty \epsilon f(x)dx\\ & = \epsilon P(X \geq \epsilon) \end{aligned}
E(X)=∫0∞xf(x)dx≥∫ϵ∞xf(x)dx≥∫ϵ∞ϵf(x)dx=ϵP(X≥ϵ)
离散情况只需要将积分变累加即可。
Hoeffding不等式
(Hoeffding引理)X是一个随机变量且满足: E ( X ) = 0 E(X)=0 E(X)=0, a ≤ X ≤ b a \leq X \leq b a≤X≤b。因此对于任何 t > 0 t >0 t>0,有:
E ( e t X ) ≤ e t 2 ( b − a ) 2 8 E(e^{tX}) \leq e^{\frac{t^2(b-a)^2}{8}} E(etX)≤e8t2(b−a)2
Proof: 由于
e
t
X
e^{tX}
etX是下凸函数(
f
′
′
(
x
)
>
0
f''(x)>0
f′′(x)>0),由下凸函数性质,有:
e
t
X
≤
b
−
X
b
−
a
e
t
a
+
X
−
a
b
−
a
e
t
b
e^{tX} \leq \frac{b-X}{b-a}e^{ta} + \frac{X-a}{b-a}e^{tb}
etX≤b−ab−Xeta+b−aX−aetb
两边对
X
X
X取期望:
E
(
e
t
X
)
≤
b
b
−
a
e
t
a
+
−
a
b
−
a
e
t
b
=
e
ϕ
(
t
)
E(e^{tX}) \leq \frac{b}{b-a}e^{ta} + \frac{-a}{b-a}e^{tb} = e^{\phi(t)}
E(etX)≤b−abeta+b−a−aetb=eϕ(t)
则:
ϕ
(
t
)
=
l
o
g
(
b
b
−
a
e
t
a
+
−
a
b
−
a
e
t
b
)
=
t
a
+
l
o
g
(
b
b
−
a
+
−
a
b
−
a
e
t
(
b
−
a
)
)
\phi(t) = log(\frac{b}{b-a}e^{ta} + \frac{-a}{b-a}e^{tb}) = ta + log(\frac{b}{b-a} + \frac{-a}{b-a}e^{t(b-a)})
ϕ(t)=log(b−abeta+b−a−aetb)=ta+log(b−ab+b−a−aet(b−a))
ϕ
′
(
t
)
=
a
−
a
e
t
(
b
−
a
)
b
b
−
a
−
a
b
−
a
e
t
(
b
−
a
)
=
a
−
a
b
b
−
a
e
−
t
(
b
−
a
)
−
a
b
−
a
\phi'(t) = a - \frac{ae^t(b-a)}{\frac{b}{b-a}-\frac{a}{b-a}e^{t(b-a)}} = a - \frac{a}{\frac{b}{b-a}e^{-t(b-a)}-\frac{a}{b-a}}
ϕ′(t)=a−b−ab−b−aaet(b−a)aet(b−a)=a−b−abe−t(b−a)−b−aaa
ϕ
′
′
(
t
)
=
−
a
b
e
−
t
(
b
−
a
)
[
b
b
−
a
e
−
t
(
b
−
a
)
−
a
b
−
a
]
2
⟶
α
=
−
a
b
−
a
−
a
b
(
b
−
a
)
2
e
−
t
(
b
−
a
)
(
b
−
a
)
2
[
(
1
−
α
)
e
−
t
(
b
−
a
)
+
α
]
2
=
α
(
1
−
α
)
e
−
t
(
b
−
a
)
(
b
−
a
)
2
[
(
1
−
α
)
e
−
t
(
b
−
a
)
+
α
]
2
=
α
[
(
1
−
α
)
e
−
t
(
b
−
a
)
+
α
]
(
1
−
α
)
e
−
t
(
b
−
a
)
[
(
1
−
α
)
e
−
t
(
b
−
a
)
+
α
2
]
(
b
−
a
)
2
⟶
u
=
α
[
(
1
−
α
)
e
−
t
(
b
−
a
)
+
α
]
u
(
1
−
u
)
(
b
−
a
)
2
≤
1
4
(
b
−
a
)
2
\begin{aligned} \phi''(t) &= \frac{-abe^{-t(b-a)}}{[\frac{b}{b-a}e^{-t(b-a)}-\frac{a}{b-a}]^2}\\ & \overset{\alpha = \frac{-a}{b-a}}{\longrightarrow} \frac{\frac{-ab}{(b-a)^2}e^{-t(b-a)} (b-a)^2}{[(1-\alpha)e^{-t(b-a)}+\alpha]^2}\\ & =\frac{\alpha(1-\alpha)e^{-t(b-a)}(b-a)^2}{[(1-\alpha)e^{-t(b-a)}+\alpha]^2}\\ & =\frac{\alpha}{[(1-\alpha)e^{-t(b-a)}+\alpha]}\frac{(1-\alpha)e^{-t(b-a)}}{[(1-\alpha)e^{-t(b-a)}+\alpha^2]}(b-a)^2\\ & \overset{u = \frac{\alpha}{[(1-\alpha)e^{-t(b-a)}+\alpha]}}{\longrightarrow} u(1-u)(b-a)^2 \leq \frac{1}{4}(b-a)^2 \end{aligned}
ϕ′′(t)=[b−abe−t(b−a)−b−aa]2−abe−t(b−a)⟶α=b−a−a[(1−α)e−t(b−a)+α]2(b−a)2−abe−t(b−a)(b−a)2=[(1−α)e−t(b−a)+α]2α(1−α)e−t(b−a)(b−a)2=[(1−α)e−t(b−a)+α]α[(1−α)e−t(b−a)+α2](1−α)e−t(b−a)(b−a)2⟶u=[(1−α)e−t(b−a)+α]αu(1−u)(b−a)2≤41(b−a)2
由泰勒展开,对任何
t
>
0
t>0
t>0,存在
θ
∈
[
0
,
t
]
\theta \in [0,t]
θ∈[0,t],使得:
ϕ
(
t
)
=
ϕ
(
0
)
+
t
ϕ
′
(
0
)
+
t
2
2
ϕ
′
′
(
θ
)
≤
0
+
0
+
t
2
2
⋅
1
4
(
b
−
a
)
2
=
t
2
(
b
−
a
)
2
8
\phi(t) = \phi(0) + t\phi'(0) + \frac{t^2}{2}\phi''(\theta) \leq 0 + 0 + \frac{t^2}{2}\cdot \frac{1}{4}(b-a)^2 = \frac{t^2(b-a)^2}{8}
ϕ(t)=ϕ(0)+tϕ′(0)+2t2ϕ′′(θ)≤0+0+2t2⋅41(b−a)2=8t2(b−a)2
得证。
(Hoeffding不等式) X 1 , ⋯ , X m X_1,\cdots,X_m X1,⋯,Xm是相互独立的随机变量并且所有 X i X_i Xi的取值在 a i a_i ai到 b i b_i bi之间。那么对于任何 ϵ > 0 \epsilon >0 ϵ>0, S m = ∑ i = 1 m X i S_m = \sum_{i=1}^m X_i Sm=∑i=1mXi,有:
P [ S m − E [ S m ] ≥ ϵ ] ≤ e − 2 ϵ 2 ∑ i = 1 m ( b i − a i ) 2 P [ S m − E [ S m ] ≤ − ϵ ] ≤ e − 2 ϵ 2 ∑ i = 1 m ( b i − a i ) 2 \begin{aligned} P[S_m - E[S_m]\geq \epsilon] \leq e^{\frac{-2\epsilon^2}{\sum_{i=1}^m (b_i-a_i)^2}}\\ P[S_m - E[S_m]\leq -\epsilon] \leq e^{\frac{-2\epsilon^2}{\sum_{i=1}^m (b_i-a_i)^2}} \end{aligned} P[Sm−E[Sm]≥ϵ]≤e∑i=1m(bi−ai)2−2ϵ2P[Sm−E[Sm]≤−ϵ]≤e∑i=1m(bi−ai)2−2ϵ2
Proof:
P
[
S
m
−
E
[
S
m
]
≥
ϵ
]
=
P
[
e
t
(
S
m
−
E
[
S
m
]
)
≥
e
t
ϵ
]
≤
e
−
t
ϵ
E
[
e
t
(
S
m
−
E
[
S
m
]
)
]
(
M
a
r
k
o
v
不
等
式
)
=
e
−
t
ϵ
∏
i
=
1
m
E
[
e
t
(
X
i
−
E
[
X
i
]
)
]
(
相
互
独
立
性
)
≤
e
−
t
ϵ
∏
i
=
1
m
e
t
2
(
b
i
−
a
i
)
2
8
(
H
o
e
f
f
d
i
n
g
引
理
)
=
e
−
t
ϵ
e
t
2
∑
i
=
1
m
t
2
(
b
i
−
a
i
)
2
8
≤
e
−
2
ϵ
2
∑
i
=
1
m
(
b
i
−
a
i
)
2
\begin{aligned} P[S_m - E[S_m]\geq \epsilon] &= P[e^{t(S_m - E[S_m])}\geq e^{t\epsilon}]\\ & \leq e^{-t\epsilon}E[e^{t(S_m - E[S_m])}]\qquad\qquad (Markov不等式)\\ & = e^{-t\epsilon}\prod_{i=1}^m E[e^{t(X_i - E[X_i])}]\qquad\quad(相互独立性)\\ & \leq e^{-t\epsilon}\prod_{i=1}^m e^{\frac{t^2(b_i-a_i)^2}{8}}\qquad\qquad (Hoeffding 引理)\\ & = e^{-t\epsilon} e^{t^2\sum_{i=1}^m\frac{t^2(b_i-a_i)^2}{8}}\\ & \leq e^{\frac{-2\epsilon^2}{\sum_{i=1}^m(b_i-a_i)^2}} \end{aligned}
P[Sm−E[Sm]≥ϵ]=P[et(Sm−E[Sm])≥etϵ]≤e−tϵE[et(Sm−E[Sm])](Markov不等式)=e−tϵi=1∏mE[et(Xi−E[Xi])](相互独立性)≤e−tϵi=1∏me8t2(bi−ai)2(Hoeffding引理)=e−tϵet2∑i=1m8t2(bi−ai)2≤e∑i=1m(bi−ai)2−2ϵ2
最后一步是取
t
=
4
ϵ
∑
i
=
1
m
(
b
i
−
a
i
)
2
t = \frac{4\epsilon}{\sum_{i=1}^m(b_i-a_i)^2}
t=∑i=1m(bi−ai)24ϵ
得证。