CalTech machine learning, video 13 note(validation)

最新推荐文章于 2022-12-13 22:16:48 发布

「已注销」

最新推荐文章于 2022-12-13 22:16:48 发布

阅读量711

点赞数

CalTech video note 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

8:58 2014-10-09
start CalTech machine learning, vieo 13,

validation

9:57 2014-10-09
outline:

* validation set

* model selection

* cross validation

10:03 2014-10-09
Validation vs. regularization

Eout(h) = Ein(h) + overfit penalty

regularization estimates "overfit penalty"

validation estimates "Eout(h)"

10:08 2014-10-09
Eval(h) // validation error

// this will be a good estimate of the out-of-sample performance

10:13 2014-10-09
k is taken out of N

// validation set are different from training set

10:18 2014-10-09
K points => validation

N-K points => training

10:18 2014-10-09
Dval, Dtrain

10:22 2014-10-09
small K => best estimate

large K =>

10:26 2014-10-09
why not put K back into the original N?

10:26 2014-10-09
we call it validation because we use it to make choice

10:34 2014-10-09
Dval is used to make learning choices

If an estimate of Eout affects learning

10:36 2014-10-09
early stopping

10:36 2014-10-09
this is going up, I better stop here

10:37 2014-10-09
What is the difference?

* Test set is unbiased;

* validation set has optimistic bias

10:39 2014-10-09
e1 is an unbiased estimate of out-of-sample error

10:42 2014-10-09
unbiased mean the expected value is what should be

10:42 2014-10-09
Error estimates e1 & e2

Pick h ∈{h1, h2} with e = min(e1, e2)

what is the expectation of e: E(e)?

10:45 2014-10-09
now we realize that this is an optimistic bias

10:46 2014-10-09
fortunately to us the utility of validation in

machine learning is so light, that we're going to

swallow the bias

10:47 2014-10-09
so with this understanding, let's use validation for

model selection which validation set do

10:48 2014-10-09
the choice of λ happens to be a manifestation of this

10:48 2014-10-09
Using Dval more than once

10:49 2014-10-09
that's a choice between models

10:50 2014-10-09
they have a small minus because I'm traning on Dtrain

10:53 2014-10-09
so these are done without any validation, just train

on a reduced set.

10:53 2014-10-09
once I get them, I'm going to evaluate the performance

10:54 2014-10-09
these are "validation errors"

10:54 2014-10-09
your model selection is to look at these errors which

supposed to reflect the out-of-sample performance if you

use this as your final product

10:57 2014-10-09
you pick the smallest of them, now you have a bias

10:57 2014-10-09
now we realized it has an optimistic bias

10:58 2014-10-09
we're now going back to our full data set

10:58 2014-10-09
restore your D as we did it before

10:59 2014-10-09
so this is the algorithm for model selection

10:59 2014-10-09
so I'm going to run an experiment to show you the bias

11:00 2014-10-09
not because it has an inherent good performance, but

because you look for the one with a good performance

11:01 2014-10-09
validation set size

11:02 2014-10-09
and after that, I look the actual out-of-sample performance error

11:03 2014-10-09
I'd like to ask you 2 questions:

* why the curves goes up?

* why are the 2 curves getting closer together?

11:06 2014-10-09
because when I use more for validation, I use less

for training,

11:07 2014-10-09
how much bias depend on the factors, but the bias is there

11:11 2014-10-09
I'm using the validation set to estimate the Eout

11:12 2014-10-09
validation set(Dval) is used for "training" on the

"finalist model"

11:16 2014-10-09
if you have decent set(set size == K), then your estimate

will not be that far from Eout(out-sample-error)

11:25 2014-10-09
so I'm choosing when to stop

11:25 2014-10-09
the training of the network tries to choose the

weight of the network

11:27 2014-10-09
validation error is a reasonable estimate of the

out-of-sample error that we can rely on

11:28 2014-10-09
data contamination:

if you use the data for making choices, you're

contaminating it as far as the ability to make the

real performance

11:31 2014-10-09
contamination: optismistic(decpetive) bias

11:32 2014-10-09
you're trying to measure what is the level of contamination

11:33 2014-10-09
we have a great Ein, and we know Ein is no indication

of Eout, this has been contaminated to death

11:34 2014-10-09
when you go to the 'test set', this is totally clean,

there is no bias here

11:35 2014-10-09
Ein // in-sample error

Etest // out-of-sample error

Eval // validation set

11:36 2014-10-09
the validation set is in between, it's slightly

contaminated.

11:36 2014-10-09
now we go to 'cross validation', very sweet regime

11:38 2014-10-09
the dilemma about K

11:40 2014-10-09
the fluctuation around the estimate we want

11:39 2014-10-09
Eout(g) // g is the hypothesis we're going to report

11:42 2014-10-09
Eout(g-)

// this is the proper sample error but on the hypothese set

// on a reduced set

11:42 2014-10-09
Eout(g) ≈ Eout(g-) ≈ Eval(g-)

Eout(g) // this is what we want

Eout(g-) // this is unknown to me

Eval(g-) // this is what I'm working with

11:43 2014-10-09
I want K to be small so that: Eout(g) ≈ Eout(g-)

11:45 2014-10-09
but also I want K to be large, because Eout(g-) ≈ Eval(g-)

11:45 2014-10-09
can we have K both small & large?

11:46 2014-10-09
leave one out, leave more out

11:46 2014-10-09
I'm going to use N-1 points for training,

and 1 point for validation

11:47 2014-10-09
I'm going to create a reduced set from D, called Dn

11:48 2014-10-09
this one(the taken out) will be the one I use for validation

11:48 2014-10-09
let's look at the validation error

11:49 2014-10-09
in this case, the validation error is just 1 point

11:49 2014-10-09
what happens if I repeat this exercise for different

small n?

11:50 2014-10-09
so in spite of these are different hypotheses, the fact

that they come from different points (N-1),

11:53 2014-10-09
I'm going to define the cross validation error: Ecv

11:53 2014-10-09
the catch is that these are not independent,

each of them is affected by the other

11:55 2014-10-09
It's remarkablly in getting it

11:56 2014-10-09
let's just estimate the out-of-sample error

using the cross validation method

11:57 2014-10-09
and we take an average performance of these

as an indication of what will happen out of sample

12:01 2014-10-09
we're using only 2 points here, when we're done,

we're using 3 points

12:02 2014-10-09
but think of 99/100, who cares?

12:02 2014-10-09
so let's use this for model selection

12:02 2014-10-09
model selection using CV // CV == Cross Validation

12:03 2014-10-09
we're like to find a separating surface

12:07 2014-10-09
Ecv tracks Eout very nicely

12:09 2014-10-09
if I use it as a criteria for model choice

12:10 2014-10-09
let me cutoff at six, and see what the performance like?

// early stop

12:10 2014-10-09
without validation, I'm using the full model

12:11 2014-10-09
with validation, you stop at 6, because the

cross validation tells you do so, it's nice

smooth surface

12:12 2014-10-09
I don't care the in-sample error to go to zero,

that's harmful in some cases

12:12 2014-10-09
so now you can see why validation is seen in this

context as similar to regularization, it does the

same thing, it prevents overfitting, but it prevents

overfitting by estimating out-of-sample error(Eout)

rather than estimating something else

12:16 2014-10-09
seldom use leave one out in real problems,

12:18 2014-10-09
take more points for validation

12:18 2014-10-09
Leave more than one out

12:18 2014-10-09
what you do is you take your data set,

you just break it into several fold

12:18 2014-10-09
exactly the same, except for here,

here I'm taking a chunk

12:20 2014-10-09
this is what I recommend it to you:

10-fold cross validation

-----------------------------------------------
13:29 2014-10-09
both validation & cross validation have bias

for the same reason

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CalTech machine learning, video 13 note(validation)

8:58 2014-10-09start CalTech machine learning, vieo 13, validation9:57 2014-10-09outline:* validation set* model selection* cross validation10:03 2014-10-09Va
复制链接

扫一扫

专栏目录