CalTech machine learning, video 12 note(regularization)

最新推荐文章于 2015-08-30 23:45:40 发布

「已注销」

最新推荐文章于 2015-08-30 23:45:40 发布

阅读量599

点赞数

文章标签： CalTech overfitting regularization machine learning big data

CalTech video note 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

10:05 2014-10-08
start CalTech machine learning, video 11

regularization

10:05 2014-10-08
overfitting: we're fitting the data all too well

at the expense of the out-of-sample expense

10:06 2014-10-08
if you think of what the VC analysis told us,

the VC analysis told us that given the data resources &

the complexity of the hypothese set with nothing said

about the target,given those we can predict the level

of generalization as a bound.

10:08 2014-10-08
data resouce + VC dimension => level of generalization

10:09 2014-10-08
source of the overfitting is to fit the noise

11:15 2014-10-08
stochastic noise/deterministic noise

11:16 2014-10-08
deterministic noise is the function of limitation of your model

11:17 2014-10-08
1st cure for overfitting

11:18 2014-10-08
outline:

* Regularization - informal

* Regularization - formal

* Weight decay

* Choosiing a regulizer

11:19 2014-10-08
unconstrained solution: minimize Ein //in-sample error

11:37 2014-10-08
let's look at the constrained version, what happens

if we constaining the weights

11:40 2014-10-08
so here is the constaint we're going to work with

11:41 2014-10-08
I have a smaller hypothese set, so the VC dimension is going

in the direction of being smaller, so I'm standing at the point

of better generalization.

11:44 2014-10-08
constraining the weights:

* Hard constaint

* Softer-order constaint

11:45 2014-10-08
Wreg instead of Win

11:46 2014-10-08
you are minimize this subject to the constaint

11:47 2014-10-08
KKT

11:47 2014-10-08
I have 2 thing here, I have the error surface trying

to minimize, and I have the constraint.

11:49 2014-10-08
I'm going to put contours where the in-sample error is constant

11:50 2014-10-08
let's take a point on the surface

11:52 2014-10-08
let's look at the gradient of the objective function

11:53 2014-10-08
gradient of the objective function will give me a good

idea about the direction to move in order to minimize the

objective function

11:54 2014-10-08
move along the circle will change the value of Ein

11:58 2014-10-08
Augmented error: Eaug(w)

12:05 2014-10-08
regularization term

12:06 2014-10-08
I use a subset of the hypothese set and I expect good

generalization.

12:07 2014-10-08
one step learning including regularization

12:13 2014-10-08
let's apply it and see the results in the real case.

12:14 2014-10-08
so the medicine is working, a small dose of medicine

did the job

12:16 2014-10-08
I think we're overdosing here.

12:16 2014-10-08
if you keep increasing λ => overdose !!!

12:17 2014-10-08
the choice of λis extremely critical

12:18 2014-10-08
the good new is that this will be a heuristic choice

12:18 2014-10-08
the choice of λwill be extremely principled based on validation

12:19 2014-10-08
we went to another extreme: now we're "underfitting"

12:20 2014-10-08
overfitting => underfitting

12:20 2014-10-08
the proper choice of λ is important

12:21 2014-10-08
the most famous regularizer is "weight decay"

12:21 2014-10-08
we know in neural network you don't have a neat

closed-form solution, you use gradient descent

12:22 2014-10-08
batch gradient descent => stochastic gradient descent(SGD)

12:23 2014-10-08
I'm in the weight space & this is my weight, and

here is the direction that backpropagation suggest to move to.

12:27 2014-10-08
used to be without regularization, I move from here to here

12:27 2014-10-08
shinking & moving

12:28 2014-10-08
the weight decay from this one to the next

12:29 2014-10-08
weight space

12:31 2014-10-08
some weights are more important than others

12:31 2014-10-08
low-order fit

12:33 2014-10-08
Tikhonov regularizer

12:34 2014-10-08
regularization parameter λ

12:36 2014-10-08
you have to use the regularizer, because without

the regularizer, you're going to get overfitting

12:38 2014-10-08
but there are guidelines to choose the regularizer

12:38 2014-10-08
after you choose the regularizer, there is a check of

the λ

12:39 2014-10-08
practical rule:

stochastic noise is 'high-frequency'

deterministic noise is also non-smooth

12:41 2014-10-08
because of this, here is the guideline for

choosing regularizer:

=> constrain learning towards smoother hypothese

12:42 2014-10-08
regularization is a cure, and the cure has a side-effect

12:42 2014-10-08
it's a cure for fitting the noise

12:43 2014-10-08
punishing the noise more than you punishing the signal

12:43 2014-10-08
in most of the parameterization, small weights correspond

to smoother hypothese, that's why small weights or 'weight decay'

works well in those cases.

12:45 2014-10-08
general form of augmented error

calling the regularizer Ω = Ω(h)

12:46 2014-10-08
we minimize

Eaug(h) = Ein(h) + λ/ N * Ω(h)

// this is what we minimize

12:47 2014-10-08
Eaug is better than Ein as a proxy for Eout

12:50 2014-10-08
augmented error(Eaug) is better than Ein for approximating Eout

12:51 2014-10-08
we found a better proxy for the out-of-sample(Eout)

12:51 2014-10-08
how we choose regularizer?

mainly a heuristic choice

12:52 2014-10-08
perfect hypothese set

12:52 2014-10-08
the perfect regularizer Ω:

constaint in the 'direction' of the target function

12:52 2014-10-08
regularization is an attempt to reduce overfitting

12:55 2014-10-08
harms the overfitting(noise) more than the fitting

12:56 2014-10-08
guidelines:

the direction of smoother

12:56 2014-10-08
we have the error function for the movie rating

12:57 2014-10-08
the notion of simple here is very interesting

12:59 2014-10-08
now you're regularizer to the simpler solution

13:04 2014-10-08
what happened if you choose a bad Ω? // Ω regularizer

we don't worry too much, because we have the saving grace

of λ, we're going to validation

13:06 2014-10-08
the validation will tell us it's harmful, we'll factor

the regularizer out of the game all together.

13:08 2014-10-08
neural network regularizer

13:09 2014-10-08
weight decay

13:09 2014-10-08
so we have this big network, layer upon layer upon layer...

13:11 2014-10-08
I'm looking at the functionalities that I'm implementing

13:12 2014-10-08
as you increase the weight, you're going to enter the more

interesting nonlinearity here.

13:12 2014-10-08
you're going from the most simple to the most complex

13:13 2014-10-08
weight decay: from linear to logical

13:13 2014-10-08
weight elimination:

fewer weights => smaller VC dimension

13:15 2014-10-08
early stopping as a regularizer

13:17 2014-10-08
regularization through the optimizer

13:18 2014-10-08
the optimal λ:

as you increase the noise, you need more regularization

-----------------------------------------------------
13:38 2014-10-08
there are regularizer stood the test of time

13:38 2014-10-08
machine learning is somewhere between theory & practice

13:39 2014-10-08

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CalTech machine learning, video 12 note(regularization)

10:05 2014-10-08start CalTech machine learning, video 11regularization10:05 2014-10-08overfitting: we're fitting the data all too wellat the expense of the out-of-sample expense
复制链接

扫一扫

专栏目录