CSDN博文https://blog.csdn.net/SanyHo/article/details/105569224 已经给出了较为详细的推导过程,这里只给出由Hoeffding不等式退出如下不等式:
P
(
R
(
f
)
−
R
^
(
f
)
≥
ϵ
)
≤
e
x
p
(
−
2
N
ϵ
2
)
\mathbb{P}(R(f) − \hat{R}(f) \geq \epsilon) \leq exp(−2N \epsilon^2)
P(R(f)−R^(f)≥ϵ)≤exp(−2Nϵ2)
的过程。
Hoeffding不等式为
P
(
E
S
n
−
S
n
≥
t
)
≤
e
x
p
(
−
2
t
2
∑
i
=
1
N
(
b
i
−
a
i
)
2
)
\mathbb{P} (\mathbb{E} S_n - S_n \geq t) \leq exp(\frac{-2t^2}{\sum_{i=1}^N (b_i - a_i)^2})
P(ESn−Sn≥t)≤exp(∑i=1N(bi−ai)2−2t2),
其中
S
n
=
∑
i
=
1
N
Z
i
S_n = \sum_{i=1}^N Z_i
Sn=∑i=1NZi 为
N
N
N个iid随机变量之和。
而 R ( f ) = E L o s s ( X , f ( X ) ) R(f) = \mathbb{E} Loss(X, f(X)) R(f)=ELoss(X,f(X))(期望值), R ^ ( f ) = 1 N ∑ i = 1 N L o s s ( X i , f ( X i ) ) \hat{R}(f) = \frac{1}{N} \sum_{i=1}^N Loss(X_i, f(X_i)) R^(f)=N1∑i=1NLoss(Xi,f(Xi))(均值)。这里考虑 Z i = L o s s ( X i , f ( X i ) ) Z_i = Loss(X_i, f(X_i)) Zi=Loss(Xi,f(Xi)),并且loss的上下限为 0 ≤ L o s s ( X i , f ( X i ) ) ≤ L 0 \leq Loss(X_i, f(X_i)) \leq L 0≤Loss(Xi,f(Xi))≤L,带入Hoeffding不等式,有:
P ( 1 N E ∑ i = 1 N Z i − 1 N ∑ i = 1 N Z i ≥ t ) = P ( E ∑ i = 1 N Z i − ∑ i = 1 N Z i ≥ N t ) ≤ e x p ( − 2 N 2 t 2 N L 2 ) \mathbb{P} (\frac{1}{N} \mathbb{E} \sum_{i=1}^N Z_i - \frac{1}{N} \sum_{i=1}^N Z_i \geq t) = \mathbb{P} (\mathbb{E} \sum_{i=1}^N Z_i - \sum_{i=1}^N Z_i \geq Nt) \leq exp(\frac{−2 N^2 t^2}{N L^2} ) P(N1E∑i=1NZi−N1∑i=1NZi≥t)=P(E∑i=1NZi−∑i=1NZi≥Nt)≤exp(NL2−2N2t2)
把
t
t
t换成
ϵ
\epsilon
ϵ就得到了
P
(
R
(
f
)
−
R
^
(
f
)
≥
ϵ
)
≤
e
x
p
(
−
2
N
ϵ
2
L
2
)
\mathbb{P}(R(f) − \hat{R}(f) \geq \epsilon) \leq exp(−\frac{2N \epsilon^2}{L^2})
P(R(f)−R^(f)≥ϵ)≤exp(−L22Nϵ2)
或者写成
R
(
f
)
≤
R
^
(
f
)
+
−
L
2
log
γ
2
N
R(f) \leq \hat{R}(f) + \sqrt{ -\frac{ L^2 \log \gamma}{2 N} }
R(f)≤R^(f)+−2NL2logγ
以概率
1
−
γ
1-\gamma
1−γ 成立。