CalTech machine learning, video 9 note(the Linear Model II)

8:39 2014-09-27 
start CalTech machine learning, 


video 9, the Linear Model II


8:40 2014-09-27
Bias-Variance decomposition of the 


out-of-sample error


8:41 2014-09-27
* linear classification


* linear regression


* logistic regression


8:54 2014-09-27
the tradeoff between approximation & generalization


8:55 2014-09-27
the ability to generalization of linear classification


8:55 2014-09-27
nonlinear transformation


8:57 2014-09-27
feature space


8:57 2014-09-27
linear surface => quadratic surface


8:59 2014-09-27
almost separable: 


this guy is errouneously classified.


9:06 2014-09-27
the lesson learned from this is that:


if you look at the data before choosing the model,


can be hazardous to your (Eout) health, not your health


but the generalization health.


9:15 2014-09-27
if you look at the data, we said that you did learning


9:17 2014-09-27
VC dimension of the hypotheses set


9:17 2014-09-27
this is the manifestation of the biggest 


trap that practioners fall into.


9:18 2014-09-27
when you go into machine learning, learning


from the data, choosing the model is very tricky


9:19 2014-09-27
it's very tempting, let me just look at the data,


and pick something suitable


9:20 2014-09-27
it's not against the law, you can do it,


but just charge accordingly.


9:20 2014-09-27
if you look at the data before choosing


your model, you have already forfeit the warrantry


that is given by the VC inequality.


9:22 2014-09-27
this is the manifestation of basically snooping,


you snoop into the data in a way that is not allowed.


9:22 2014-09-27
data snooping


9:22 2014-09-27
when you do this, bad things happen.


9:23 2014-09-27
validation, model selection


9:24 2014-09-27
it will be a legitimate way of select a model,


it's a model selection that does not contaminat 


the data,


9:25 2014-09-27
it's no longer trusted to reflect the real performance


because you already used in learning


9:26 2014-09-27
linear model is a economy car, nonlinear model 


gives you a truck,


9:28 2014-09-27
logistic regression


9:28 2014-09-27
the model: what is the hypothese set


9:28 2014-09-27
soft threshold


9:36 2014-09-27
there is a proability sitting there generating


examples.


9:37 2014-09-27
credit score, risk score


9:41 2014-09-27
this is supervised learning, I have to give you tags.


9:44 2014-09-27
error measure based on likelihood


9:51 2014-09-27
the data is generated by this target function


9:52 2014-09-27
if that probability is very small, then your 


assumption must be poor.


9:52 2014-09-27
and if that probability is high, then your assumption


has more plausibility.


9:52 2014-09-27
so I can use this to build comparative way to say


that this is more plausible 


9:53 2014-09-27
what is the probability of generating this data


if your assumption is true?


// result => causal ???


9:54 2014-09-27
what is the most probable hypothesis given the data?


what is the probability of the data given the hypothesis?


9:57 2014-09-27
prior


9:57 2014-09-27
if I choose a hypothesis under which having the 


data is very plausible, it look like this hypothesis


is very likely, hence the likelihood name


9:59 2014-09-27
what is the likelihood of this whole data set?


10:06 2014-09-27
maximizing the likelihood => minimizing the error measure


10:08 2014-09-27
we're maximizing the likelihood of this hypothesis


under this data set.


10:12 2014-09-27
cross-entropy error


10:19 2014-09-27
learning algorithm


10:19 2014-09-27
How to minimize Ein?


10:20 2014-09-27
linear regression => logistic regression


10:20 2014-09-27
iterative solution, closed-form solution


10:21 2014-09-27
iterative method: gradient descent


10:22 2014-09-27
convex optimization


10:24 2014-09-27
you're sitting on the surface, then you close 


your eyes, and all you do is feel around you,


and then dicide that this is a more promising direction


than this, that's all you do in one step.


10:28 2014-09-27
when you go the new point, repeat, repeat,...


10:29 2014-09-27
until you get to the minimum.


10:29 2014-09-27
that' all the iterative method you're going to use.


10:29 2014-09-27
fixed-step size


10:30 2014-09-27
Iterative method: gradient descent


General method for nonlinear optimization,


start at w(0); take a step along steepest slope


fixed step size.


10:30 2014-09-27
under this situation, you're going to derive


what is v hat?


10:34 2014-09-27
gradient descent


10:34 2014-09-27
how do I choose the direction in order to 


make this as negative as possible?


10:38 2014-09-27
Fixed-size step?


10:44 2014-09-27
logistic regression algorithm


// using gradient descent


10:50 2014-09-27
summary of linear model:


* perceptron // linear classification


* linear regression


* logistic regression


10:52 2014-09-27
Apply to credit analysis


* perceptron => Approve or Deny  => binary classification error


(PLA, Pocket)


* linear regression     => Amount of Credit  => squared error 


(Pseudo-inverse)


* logistic regression   => Probability of Default => cross-entropy error


(Gradient descent)


10:53 2014-09-27
I will stop here, and then we'll start after a short break.


10:57 2014-09-27
let's start the Q & A


10:57 2014-09-27
there is the question of "learning rate"


10:58 2014-09-27
there're other questions of "initialization"


10:58 2014-09-27
so let's set up a target error, so if I don't


got to the target error, I won't stop.


11:00 2014-09-27
local minimum, global minimum


11:01 2014-09-27
termination is tricky, a combination of criteria 


is the best way.


11:05 2014-09-27
in many situations you just doing a gradient descent


in a simple way & get a very good result.


11:08 2014-09-27
you're applying the algorithm faithfully, and ...


11:09 2014-09-27
from a practical point of view, starting from different


initialization point, so each of them will go to it's 


local minimum.


11:10 2014-09-27
ordinarilly it will give you a good local minimum, 


but getting a global minimum is NP hard.


11:12 2014-09-27
for entropy, you get a function based on the probability.


11:15 2014-09-27
because you will be charged for that.


11:36 2014-09-27
because I use the CPU cycles but does not improve much


11:36 2014-09-27
Neural Networks & hidden layers
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值