CalTech machine learning, video 12 note(regularization)

10:05 2014-10-08
start CalTech machine learning, video 11


regularization


10:05 2014-10-08
overfitting: we're fitting the data all too well


at the expense of the out-of-sample expense


10:06 2014-10-08
if you think of what the VC analysis told us,


the VC analysis told us that given the data resources &


the complexity of the hypothese set with nothing said 


about the target,given those we can predict the level


of generalization as a bound.


10:08 2014-10-08
data resouce + VC dimension => level of generalization


10:09 2014-10-08
source of the overfitting is to fit the noise


11:15 2014-10-08
stochastic noise/deterministic noise


11:16 2014-10-08
deterministic noise is the function of limitation of your model


11:17 2014-10-08
1st cure for overfitting


11:18 2014-10-08
outline:


* Regularization - informal


* Regularization - formal


* Weight decay


* Choosiing a regulizer


11:19 2014-10-08
unconstrained solution: minimize Ein //in-sample error


11:37 2014-10-08
let's look at the constrained version, what happens


if we constaining the weights


11:40 2014-10-08
so here is the constaint we're going to work with


11:41 2014-10-08
I have a smaller hypothese set, so the VC dimension is going


in the direction of being smaller, so I'm standing at the point 


of better generalization.


11:44 2014-10-08
constraining the weights:


* Hard constaint


* Softer-order constaint


11:45 2014-10-08
Wreg instead of Win


11:46 2014-10-08
you are minimize this subject to the constaint


11:47 2014-10-08
KKT


11:47 2014-10-08
I have 2 thing here, I have the error surface trying


to minimize, and I have the constraint.


11:49 2014-10-08
I'm going to put contours where the in-sample error is constant


11:50 2014-10-08
let's take a point on the surface


11:52 2014-10-08
let's look at the gradient of the objective function


11:53 2014-10-08
gradient of the objective function will give me a good


idea about the direction to move in order to minimize the


objective function


11:54 2014-10-08
move along the circle will change the value of Ein


11:58 2014-10-08
Augmented error: Eaug(w)


12:05 2014-10-08
regularization term


12:06 2014-10-08
I use a subset of the hypothese set and I expect good 


generalization.


12:07 2014-10-08
one step learning including regularization


12:13 2014-10-08
let's apply it and see the results in the real case.


12:14 2014-10-08
so the medicine is working, a small dose of medicine


did the job


12:16 2014-10-08
I think we're overdosing here.


12:16 2014-10-08
if you keep increasing λ => overdose !!!


12:17 2014-10-08
the choice of λis extremely critical


12:18 2014-10-08
the good new is that this will be a heuristic choice


12:18 2014-10-08
the choice of λwill be extremely principled based on validation


12:19 2014-10-08
we went to another extreme: now we're "underfitting"


12:20 2014-10-08
overfitting => underfitting


12:20 2014-10-08
the proper choice of λ is important


12:21 2014-10-08
the most famous regularizer is "weight decay"


12:21 2014-10-08
we know in neural network you don't have a neat


closed-form solution, you use gradient descent


12:22 2014-10-08
batch gradient descent => stochastic gradient descent(SGD)


12:23 2014-10-08
I'm in the weight space & this is my weight, and 


here is the direction that backpropagation suggest to move to.


12:27 2014-10-08
used to be without regularization, I move from here to here


12:27 2014-10-08
shinking & moving


12:28 2014-10-08
the weight decay from this one to the next


12:29 2014-10-08
weight space


12:31 2014-10-08
some weights are more important than others


12:31 2014-10-08
low-order fit


12:33 2014-10-08
Tikhonov regularizer


12:34 2014-10-08
regularization parameter λ


12:36 2014-10-08
you have to use the regularizer, because without 


the regularizer, you're going to get overfitting


12:38 2014-10-08
but there are guidelines to choose the regularizer


12:38 2014-10-08
after you choose the regularizer, there is a check of


the λ


12:39 2014-10-08
practical rule:


stochastic noise is 'high-frequency'


deterministic noise is also non-smooth


12:41 2014-10-08
because of this, here is the guideline for


choosing regularizer:


=> constrain learning towards smoother hypothese


12:42 2014-10-08
regularization is a cure, and the cure has a side-effect


12:42 2014-10-08
it's a cure for fitting the noise


12:43 2014-10-08
punishing the noise more than you punishing the signal


12:43 2014-10-08
in most of the parameterization, small weights correspond 


to smoother hypothese, that's why small weights or 'weight decay'


works well in those cases.


12:45 2014-10-08
general form of augmented error


calling the regularizer Ω = Ω(h)


12:46 2014-10-08
we minimize 


Eaug(h) = Ein(h) + λ/ N * Ω(h) 


// this is what we minimize


12:47 2014-10-08
Eaug is better than Ein as a proxy for Eout


12:50 2014-10-08
augmented error(Eaug) is better than Ein for approximating Eout


12:51 2014-10-08
we found a better proxy for the out-of-sample(Eout)


12:51 2014-10-08
how we choose regularizer?


mainly a heuristic choice


12:52 2014-10-08
perfect hypothese set


12:52 2014-10-08
the perfect regularizer Ω:


constaint in the 'direction' of the target function


12:52 2014-10-08
regularization is an attempt to reduce overfitting


12:55 2014-10-08
harms the overfitting(noise) more than the fitting


12:56 2014-10-08
guidelines:


the direction of smoother


12:56 2014-10-08
we have the error function for the movie rating


12:57 2014-10-08
the notion of simple here is very interesting


12:59 2014-10-08
now you're regularizer to the simpler solution


13:04 2014-10-08
what happened if you choose a bad Ω? // Ω regularizer


we don't worry too much, because we have the saving grace 


of λ, we're going to validation


13:06 2014-10-08
the validation will tell us it's harmful, we'll factor


the regularizer out of the game all together.


13:08 2014-10-08
neural network regularizer


13:09 2014-10-08
weight decay


13:09 2014-10-08
so we have this big network, layer upon layer upon layer...


13:11 2014-10-08
I'm looking at the functionalities that I'm implementing


13:12 2014-10-08
as you increase the weight, you're going to enter the more


interesting nonlinearity here.


13:12 2014-10-08
you're going from the most simple to the most complex


13:13 2014-10-08
weight decay: from linear to logical


13:13 2014-10-08
weight elimination:


fewer weights => smaller VC dimension


13:15 2014-10-08
early stopping as a regularizer


13:17 2014-10-08
regularization through the optimizer


13:18 2014-10-08
the optimal λ:


as you increase the noise, you need more regularization


-----------------------------------------------------
13:38 2014-10-08
there are regularizer stood the test of time


13:38 2014-10-08
machine learning is somewhere between theory & practice


13:39 2014-10-08
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值