CalTech machine learning, video 9 note(the Linear Model II)

最新推荐文章于 2019-10-29 12:27:31 发布

「已注销」

最新推荐文章于 2019-10-29 12:27:31 发布

阅读量590

点赞数

CalTech video note 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

8:39 2014-09-27
start CalTech machine learning,

video 9, the Linear Model II

8:40 2014-09-27
Bias-Variance decomposition of the

out-of-sample error

8:41 2014-09-27
* linear classification

* linear regression

* logistic regression

8:54 2014-09-27
the tradeoff between approximation & generalization

8:55 2014-09-27
the ability to generalization of linear classification

8:55 2014-09-27
nonlinear transformation

8:57 2014-09-27
feature space

8:57 2014-09-27
linear surface => quadratic surface

8:59 2014-09-27
almost separable:

this guy is errouneously classified.

9:06 2014-09-27
the lesson learned from this is that:

if you look at the data before choosing the model,

can be hazardous to your (Eout) health, not your health

but the generalization health.

9:15 2014-09-27
if you look at the data, we said that you did learning

9:17 2014-09-27
VC dimension of the hypotheses set

9:17 2014-09-27
this is the manifestation of the biggest

trap that practioners fall into.

9:18 2014-09-27
when you go into machine learning, learning

from the data, choosing the model is very tricky

9:19 2014-09-27
it's very tempting, let me just look at the data,

and pick something suitable

9:20 2014-09-27
it's not against the law, you can do it,

but just charge accordingly.

9:20 2014-09-27
if you look at the data before choosing

your model, you have already forfeit the warrantry

that is given by the VC inequality.

9:22 2014-09-27
this is the manifestation of basically snooping,

you snoop into the data in a way that is not allowed.

9:22 2014-09-27
data snooping

9:22 2014-09-27
when you do this, bad things happen.

9:23 2014-09-27
validation, model selection

9:24 2014-09-27
it will be a legitimate way of select a model,

it's a model selection that does not contaminat

the data,

9:25 2014-09-27
it's no longer trusted to reflect the real performance

because you already used in learning

9:26 2014-09-27
linear model is a economy car, nonlinear model

gives you a truck,

9:28 2014-09-27
logistic regression

9:28 2014-09-27
the model: what is the hypothese set

9:28 2014-09-27
soft threshold

9:36 2014-09-27
there is a proability sitting there generating

examples.

9:37 2014-09-27
credit score, risk score

9:41 2014-09-27
this is supervised learning, I have to give you tags.

9:44 2014-09-27
error measure based on likelihood

9:51 2014-09-27
the data is generated by this target function

9:52 2014-09-27
if that probability is very small, then your

assumption must be poor.

9:52 2014-09-27
and if that probability is high, then your assumption

has more plausibility.

9:52 2014-09-27
so I can use this to build comparative way to say

that this is more plausible

9:53 2014-09-27
what is the probability of generating this data

if your assumption is true?

// result => causal ???

9:54 2014-09-27
what is the most probable hypothesis given the data?

what is the probability of the data given the hypothesis?

9:57 2014-09-27
prior

9:57 2014-09-27
if I choose a hypothesis under which having the

data is very plausible, it look like this hypothesis

is very likely, hence the likelihood name

9:59 2014-09-27
what is the likelihood of this whole data set?

10:06 2014-09-27
maximizing the likelihood => minimizing the error measure

10:08 2014-09-27
we're maximizing the likelihood of this hypothesis

under this data set.

10:12 2014-09-27
cross-entropy error

10:19 2014-09-27
learning algorithm

10:19 2014-09-27
How to minimize Ein?

10:20 2014-09-27
linear regression => logistic regression

10:20 2014-09-27
iterative solution, closed-form solution

10:21 2014-09-27
iterative method: gradient descent

10:22 2014-09-27
convex optimization

10:24 2014-09-27
you're sitting on the surface, then you close

your eyes, and all you do is feel around you,

and then dicide that this is a more promising direction

than this, that's all you do in one step.

10:28 2014-09-27
when you go the new point, repeat, repeat,...

10:29 2014-09-27
until you get to the minimum.

10:29 2014-09-27
that' all the iterative method you're going to use.

10:29 2014-09-27
fixed-step size

10:30 2014-09-27
Iterative method: gradient descent

General method for nonlinear optimization,

start at w(0); take a step along steepest slope

fixed step size.

10:30 2014-09-27
under this situation, you're going to derive

what is v hat?

10:34 2014-09-27
gradient descent

10:34 2014-09-27
how do I choose the direction in order to

make this as negative as possible?

10:38 2014-09-27
Fixed-size step?

10:44 2014-09-27
logistic regression algorithm

// using gradient descent

10:50 2014-09-27
summary of linear model:

* perceptron // linear classification

* linear regression

* logistic regression

10:52 2014-09-27
Apply to credit analysis

* perceptron => Approve or Deny => binary classification error

(PLA, Pocket)

* linear regression => Amount of Credit => squared error

(Pseudo-inverse)

* logistic regression => Probability of Default => cross-entropy error

(Gradient descent)

10:53 2014-09-27
I will stop here, and then we'll start after a short break.

10:57 2014-09-27
let's start the Q & A

10:57 2014-09-27
there is the question of "learning rate"

10:58 2014-09-27
there're other questions of "initialization"

10:58 2014-09-27
so let's set up a target error, so if I don't

got to the target error, I won't stop.

11:00 2014-09-27
local minimum, global minimum

11:01 2014-09-27
termination is tricky, a combination of criteria

is the best way.

11:05 2014-09-27
in many situations you just doing a gradient descent

in a simple way & get a very good result.

11:08 2014-09-27
you're applying the algorithm faithfully, and ...

11:09 2014-09-27
from a practical point of view, starting from different

initialization point, so each of them will go to it's

local minimum.

11:10 2014-09-27
ordinarilly it will give you a good local minimum,

but getting a global minimum is NP hard.

11:12 2014-09-27
for entropy, you get a function based on the probability.

11:15 2014-09-27
because you will be charged for that.

11:36 2014-09-27
because I use the CPU cycles but does not improve much

11:36 2014-09-27
Neural Networks & hidden layers

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CalTech machine learning, video 9 note(the Linear Model II)

8:39 2014-09-27 start CalTech machine learning, video 9, the Linear Model II8:40 2014-09-27Bias-Variance decomposition of the out-of-sample error8:41 2014-09-27* linear c
复制链接

扫一扫