机器学习基石笔记（五）：训练与测试

最新推荐文章于 2021-03-24 10:50:43 发布

夜是故乡明

最新推荐文章于 2021-03-24 10:50:43 发布

阅读量451

点赞数

分类专栏：机器学习机器学习基石文章标签：机器学习基石详解笔记

本文链接：https://blog.csdn.net/the_harder_to_love/article/details/89425861

版权

机器学习同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

机器学习基石

8 篇文章 2 订阅

订阅专栏

文章目录

Lecture 5: Training versus Testing

Lecture 5: Training versus Testing

Recap and Preview

Two Central Questions

Two Central Questions

make sure that $E_{out} (g)$ is close enough to $E_{in} (g)$ .
make $E_{in} (g)$ mall enough.

Fun Time

Data size: how large do we need?
One way to use the inequality
$\mathbb{P}\left[ | E_{\text { in }}(g)-E_{\text { out }}(g) |>\epsilon\right] \leq \underbrace{2 \cdot M \cdot \exp \left(-2 \epsilon^{2} N\right)}_{\delta}$
is to pick a tolerable difference ? as well as a tolerable BAD probability δ, and then gather data with size (N) large enough to achieve those tolerance criteria. Let ? = 0.1, δ = 0.05, and M = 100.
What is the data size needed?
1. 215 2. 415 $\checkmark$ 3. 615 4. 815

Explanation
$N=\frac{1}{2 \epsilon^{2}} \ln \frac{2 M}{\delta}$
所以 $\approx 415$

Effective Number of Lines

Uniform Bound

Similar Hypotheses

$B_{m} : | E_{\text { in }}\left(h_{m}\right)-E_{\text { out }}\left(h_{m}\right) |>\epsilon$

for most $D$ , $E_{\mathrm{in}}(h_i) = E_{\mathrm{in}}(h_j)$
$E_{\mathrm{out}}(h_i) \approx E_{\mathrm{out}}(h_j)$

$\begin{aligned}\mathbb{P}_{\mathcal{D}}[\mathrm{BAD} \ \mathcal{D} \ for \ h_1]+\mathbb{P}_{\mathcal{D}}[\mathrm{BAD} \ \mathcal{D} \ for \ h_2]+...+\mathbb{P}_{\mathcal{D}}[\mathrm{BAD} \ \mathcal{D} \ for \ h_M](union \ bound)\end{aligned}$ over-estimating

为了合并重叠的部分，我们按照类别将类似的假设分组归类。

Many Lines

$\mathcal{H}=\left\{\text { all lines in } \mathbb{R}^{2}\right\}$

Effective Number of Hypotheses

1点2个类别

2个点4个类别

3个点6个类别

4个点14个类别

$effective(N)<< 2^N$ ,用有效类别数( $e f f e c t i v e (N)$ )替换 $M$ (infite)

Fun Time

What is the effective number of lines for five inputs $ \in R^2$ ?
1. 14 2. 16 3. 22 $\checkmark$ 4. 32

Explanation
总共有32( $2^5$ )种
5个O:1种，4个O1个X:5种，3个O2个X:共有$C_5^3$10种，只有5种满足条件。
而又O和X等价
所以有效类别数共有2X(1+5+5)=22种。

Growth Function

$effective(N)<< 2^N$
$\Rightarrow effective(N)$

Growth Function for Positive Rays

Positive Rays
$h(x)=\operatorname{sign}(x-a) \\ m_{\mathcal{H}}(N)=N+1 \ll 2^{N}$

Growth Function for Positive Intervals

Positive Intervals

$\text { if } x \in[\ell, r),-1 \ otherwise \\ m_{\mathcal{H}}(N)=\frac{1}{2} N^{2}+\frac{1}{2} N+1 \ll 2^{N}$

Growth Function for Convex Sets

Convex Sets
$\ dichotomy \ can \ be \ implemented \\ m_{\mathcal{H}}(N)=2^{N}$

$m_{\mathcal{H}}(N)=2^{N}$ call those $N$ inputs ‘shattered’ by $H$

Fun Time

Consider positive and negative rays as H, which is equivalent to the perceptron hypothesis set in 1D. The hypothesis set is often called ‘decision stump’ to describe the shape of its hypotheses. What is the growth function $m_H (N)$ ?
1. N 2. N+1 3. 2N $\checkmark$ 4. $2^N$

Explanation
正或负方向的Positive Intervals，再加上全X和全O。
2X(N-1)+2 = 2N

Break Point

$m_{\mathcal{H}}$ 是多项式( $O\left(N^{k-1}\right)$ )，我们用它替换 $M$

$m_{\mathcal{H}}(k)<2^{k}$ call k a break point for $H$

if k is a break point, k+1, k+2, k+3,… also break points! k is the minimum break point.

The Four Break Points

Fun Time

Consider positive and negative rays as H, which is equivalent to the perceptron hypothesis set in 1D. As discussed in an earlier quiz question, the growth function $m_H (N) = 2N$ . What is the minimum break point for $H$ ?
1. 1 2. 2 3. 3 $\checkmark$ 4. 4

Explanation
$m_H (N) = 2N \\ 2*2 = 4 = 2^2, \quad 2*3 = 6 < 2^3$
正负线的"一线曙光"是3.