机器学习基石笔记(六):泛化理论



Lecture 6: Theory of Generalization


Restriction of Break Point


The Four Break Points


N=3, K=2 Break Point


m H ( N ) ≤  maximum possible  m H ( N )  given  k ≤ p o l y ( N ) \begin{aligned} & m_{\mathcal{H}}(N) \\ \leq & \text { maximum possible } m_{\mathcal{H}}(N) \text { given } k \\ \leq & p o l y(N) \end{aligned} mH(N) maximum possible mH(N) given kpoly(N)

Fun Time

When minimum break point k = 1, what is the maximum possible m H ( N ) m_{\mathcal{H}}(N) mH(N) when N = 3 N = 3 N=3
1.  1  ✓ \checkmark              2. 2          3. 3           4. 4


Explanation
因为 k = 1 k=1 k=1,所以没有任何一个点可以和它共存,所以 m H ( N ) = 1 m_H (N) = 1 mH(N)=1

Bounding Function: Basic Cases


Bounding Function

bounding function B ( N , k ) B(N,k) B(N,k):
  maximum possible m H ( N ) m_H (N) mH(N) when break point = k
B ( N , k ) ≤ p o l y ( N ) B(N, k) \leq p o l y(N) B(N,k)poly(N)

换言之, B ( N , k ) B(N, k) B(N,k) m H ( N ) m_H (N) mH(N)上界


Table of Bounding Function

Table of Bounding Function

Fun Time

For the 2D perceptrons, which of the following claim is true?
1 minimum break point k = 2
2 m H ( 4 ) m_{\mathcal{H}}(4) mH(4)= 15
3 m H ( N ) &lt; B ( N , k ) m_{\mathcal{H}}(N)&lt;B(N, k) mH(N)<B(N,k) when $N = k = $ minimum break point   ✓ \checkmark
4 m H ( N ) &gt; B ( N , k ) m_{\mathcal{H}}(N)&gt;B(N, k) mH(N)>B(N,k) when $N = k = $ minimum break point


Explanation
minimum break point k = 3
m H ( 4 ) m_{\mathcal{H}}(4) mH(4)= 14
B ( N , k ) B(N, k) B(N,k) m H ( N ) m_H (N) mH(N)上界
不记得2D感知器的同学,可以回顾Lecture 5: Training versus Testing中的Effective Number of Hypotheses ?

Bounding Function: Inductive Cases


B ( 4 , 3 ) = 11 = 2 α + β B(4,3)=11=2 \alpha+\beta B(4,3)=11=2α+β
Instance Estimating Part


B ( N , k ) = 2 α + β α + β ≤ B ( N − 1 , k ) α ≤ B ( N − 1 , k − 1 ) ⇒ B ( N , k ) ≤ B ( N − 1 , k ) + B ( N − 1 , k − 1 ) B ( N , k ) ≤ ∑ i = 0 k − 1 ( N i ) \begin{aligned} B(N, k) &amp;=2 \alpha+\beta \\ \alpha+\beta &amp; \leq B(N-1, k) \\ \alpha &amp; \leq B(N-1, k-1) \\ \Rightarrow B(N, k) &amp; \leq B(N-1, k)+B(N-1, k-1) \end{aligned} \\ B(N, k) \leq \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) B(N,k)α+βαB(N,k)=2α+βB(N1,k)B(N1,k1)B(N1,k)+B(N1,k1)B(N,k)i=0k1(Ni)
The Upper Bound of Bounding Function

≤ \le 实际上是 = = =

B ( N , k ) = B ( N − 1 , k ) + B ( N − 1 , k − 1 ) B ( N , k ) = ∑ i = 0 k − 1 ( N i ) = C N 0 + C N 1 + . . . + C N k − 1 B(N, k) = B(N-1, k)+B(N-1, k-1) \\ B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) = C_N^0+C_N^1 +...+C_N^{k-1} B(N,k)=B(N1,k)+B(N1,k1)B(N,k)=i=0k1(Ni)=CN0+CN1+...+CNk1

The Three Break Points

2D perceptrons break point at 4, m H ( N ) ≤ B ( N , 4 ) = 1 6 N 3 + 5 6 N + 1 = O ( N 3 ) m_{\mathcal{H}}(N) \leq B(N, 4) = \frac{1}{6} N^{3}+\frac{5}{6} N+1 = O(N^3) mH(N)B(N,4)=61N3+65N+1=O(N3)

Fun Time

For 1D perceptrons (positive and negative rays), we know that m H ( N ) m_H (N) mH(N) = 2N. Let k be the minimum break point. Which of the following is not true?
1 k = 3
2 for some integers N &gt; 0 ,   m H ( N ) = ∑ i = 0 k − 1 ( N i ) N&gt;0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) N>0, mH(N)=i=0k1(Ni)
3 for all integers N &gt; 0 ,   m H ( N ) = ∑ i = 0 k − 1 ( N i ) N&gt;0 ,\ m_{\mathcal{H}}(N)=\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) N>0, mH(N)=i=0k1(Ni)   ✓ \checkmark
4 for all integers N &gt; 2 ,   m H ( N ) &lt; ∑ i = 0 k − 1 ( N i ) N&gt;2 ,\ m_{\mathcal{H}}(N)&lt;\sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) N>2, mH(N)<i=0k1(Ni)


Explanation
minimum break point k = 3
B ( N , k ) = ∑ i = 0 k − 1 ( N i ) B(N, k) = \sum_{i=0}^{k-1} \left( \begin{array}{c}{N} \\ {i}\end{array}\right) B(N,k)=i=0k1(Ni)
B ( N , k ) B(N, k) B(N,k) m H ( N ) m_H (N) mH(N)上界,当N ≥ \ge k时, m H ( N ) &lt; B ( N , k ) m_H (N)&lt;B(N, k) mH(N)<B(N,k); 当N &lt; &lt; <k时, m H ( N ) = B ( N , k ) m_H (N)=B(N, k) mH(N)=B(N,k).


拓展:回顾下Lecture 5: Training versus Testing中的Effective Number of Hypotheses Funtime
求2维感知器中5个点的有效分类数(k=3,N=5 m H ( N ) = ? ≤ 1 6 N 3 + 5 6 N + 1 m_{\mathcal{H}}(N)=? \leq \frac{1}{6} N^{3}+\frac{5}{6} N+1 mH(N)=?61N3+65N+1),N>k,=取不到。
正确答案22<( 125 6 + 25 6 + 1 = 25 \frac{125}{6}+\frac{25}{6}+1=25 6125+625+1=25),验证成功,回顾题目也挺有趣味的。?


A Pictorial Proof

Step 1: Replace E_out by E_in'

E i n ′ E_{in}&#x27; Ein(有限)替换 E o u t E_{out} Eout(无限),但是这个不等式及 1 2 \frac{1}{2} 21的系数的出处,我没想明白。

Step 2: Decompose H by Kind

将上界定义为以 m H ( 2 N ) m_{H}(2N) mH(2N)为基准的。

Step 3: Use Hoeffding without Replacement

使用无放回的霍夫丁不等式,结果类似,只是 ν = E  in  , μ = E  in  + E  in  ′ 2 \nu=E_{\text { in }},\mu=\frac{E_{\text { in }}+E_{\text { in }}^{\prime}}{2} ν=E in ,μ=2E in +E in 

Vapnik-Chervonenkis (VC) bound

P [ ∃ h ∈ H  s.t.  ∣ E  in  ( h ) − E  out  ( h ) ∣ &gt; ϵ ] ≤ 4 m H ( 2 N ) exp ⁡ ( − 1 8 ϵ 2 N ) \begin{aligned} &amp; \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \\ &amp; \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right) \end{aligned} P[hH s.t. E in (h)E out (h)>ϵ]4mH(2N)exp(81ϵ2N)
   m H ( N ) m_H (N) mH(N) can replace M with a few changes

Fun Time

For positive rays, m H ( N ) = N + 1 m_H (N) = N + 1 mH(N)=N+1. Plug it into the VC bound for ? = 0.1 and N = 10000. What is VC bound of BAD events?
P [ ∃ h ∈ H  s.t.  ∣ E  in  ( h ) − E  out  ( h ) ∣ &gt; ϵ ] ≤ 4 m H ( 2 N ) exp ⁡ ( − 1 8 ϵ 2 N ) \mathbb{P}\left[\exists h \in \mathcal{H} \text { s.t. } | E_{\text { in }}(h)-E_{\text { out }}(h) |&gt;\epsilon\right] \leq 4 m_{\mathcal{H}}(2 N) \exp \left(-\frac{1}{8} \epsilon^{2} N\right) P[hH s.t. E in (h)E out (h)>ϵ]4mH(2N)exp(81ϵ2N)
1 2.77 × 1 0 − 87 2.77 × 10^{−87} 2.77×1087
2 5.54 × 1 0 − 83 5.54 × 10^{−83} 5.54×1083
3 2.98 × 1 0 − 1 2.98 × 10^{−1} 2.98×101   ✓ \checkmark
4 2.29 × 1 0 − 2 2.29 × 10^{−2} 2.29×102


Explanation
代入公式计算即可。
0.2981471603789822

Summary

本篇讲义主要讲了Bound Function B ( N , k ) B(N,k) B(N,k)以及VC Bound的含义及推导。


讲义总结


m H ( N ) m_{\mathcal{H}}(N) mH(N)有break point,且 N N N足够大,那么 E o u t ≈ E i n E_{\mathrm{out}} \approx E_{\mathrm{in}} EoutEin.


Restriction of Break Point
  break point ‘breaks’ consequent points

Bounding Function: Basic Cases
   B ( N , k ) B(N,k) B(N,k) bounds m H ( N ) m_H (N) mH(N) with break point k

Bounding Function: Inductive Cases
   B ( N , k ) B(N,k) B(N,k) is poly(N)

A Pictorial Proof
   m H ( N ) m_H (N) mH(N) can replace M with a few changes

参考文献

《Machine Learning Foundations》(机器学习基石)—— Hsuan-Tien Lin (林轩田)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值