8:46 2014-09-26 Friday
start CalTech machine learning video 8
Bias-Variance Trandeoff
8:46 2014-09-26
what is the "VC dimension"?
VC dimension dVC(H) is "most points H can shatter"
8:47 2014-09-26
this is where it applies
8:48 2014-09-26
the most important part of the application is
are the disappearing block, because it gives the
generality of VC inequality has, the VC bound is
valid for
1.any "learning algorithm", for
2.any "input distribution" may take place, and also for
3.any "target function" that may be able to learn.
8:51 2014-09-26
N: you need the number of examples
Rule of thumb: N >= 10 * dVC
8:53 2014-09-26
generalization bound:
Eout <= Ein + Ω
8:54 2014-09-26
it gives us a different angle of generalization
8:55 2014-09-26
* bais & variance
* learning curves
8:58 2014-09-26
small Eout:
that's the purpose of learning,
if Eout is small, you have learned, you have hypotheis
approximate the target function well.
8:59 2014-09-26
Small Eout: good approximation of f out of sample
9:00 2014-09-26
More complex H => better chance of approximating f
Less complex H => better chance of generalizing out of sample.
9:01 2014-09-26
but have a bigger hypothese set may be bad news,
because you may not be able to get it.
9:02 2014-09-26
what is the ideal hypothese set for learning?
9:02 2014-09-26
Quantifying the tradeoff:
VC analysis was one approach: Eout <= Ein + Ω
// the generalization bound
9:06 2014-09-26
Ein is an approximation, Ω is purely generalization
9:07 2014-09-26
Bias-variance analysis is another: decomposing Eout into
1. How well H can approximate f
2. How well we can zoom in on a good h ∈H
9:13 2014-09-26
this is the best hypothesis, it has a certain
ability, I need to pick it, I have to use the example
to zoom into the hypothese set.
9:14 2014-09-26
decomposing into: approximation + generalization
9:22 2014-09-26
the 1st thing I'm going to do, is exchange
the order of expectations
9:25 2014-09-26
the average hypothesis
9:26 2014-09-26
you learned from a bunch of data sets and get a hypothesis,
someone learned from another bunch of data sets & get another
hypothesis, so how about get the expectation of these hypothese?
9:28 2014-09-26
hopping from your guy to the target goes from small steps:
1. from your guy to the best hypothesis
2. another hop from the best hypothesis to the target function
your hypothesis => the best hypothesis => the target function
9:38 2014-09-26
the cross term goes away, and that's the advantage
of the particular measure that we have.
9:40 2014-09-26
bias(x) + var(x)
9:43 2014-09-26
and this is the bias + variance decomposition
9:47 2014-09-26
here I have a small hypotheses set,
this one, I have a huge hypothese set.
9:50 2014-09-26
if hypothese set gets bigger => bias decrease, variance increase
9:56 2014-09-26
which one is better?
better for what? that's the key issue.
10:03 2014-09-26
you don't know it, you only know the target function
10:07 2014-09-26
let's do the bias + variance decomposition
10:10 2014-09-26
when you're in a learning situation, always remember
you're matching the "model complexity" to the "data resource"
you have, not to the "target complexity"
10:25 2014-09-26
the matter is not that if the target function is there,
the question is can I find it?
10:27 2014-09-26
expected Eout // expected out-of-sample error
expected Ein // expected in-sample error
how do they vary with N?
10:29 2014-09-26
expected error vs Number of Data Points, N
10:30 2014-09-26
it doesn't bother me, because the in-sample error
is not the bottom line, the out-of-sample error is.
10:32 2014-09-26
VC <=> Bias + Variance
10:37 2014-09-26
VC analysis
10:37 2014-09-26
Eout = Ein + Ω
Eout // out-of-sample error
Ein // in-sample error
Ω // generalization error
// generalize from in-sample to out-of-sample error
10:37 2014-09-26
generalization:
in-sample error => out-of-sample error
10:42 2014-09-26
linear regression
10:44 2014-09-26
noisy target: linear + noise
10:44 2014-09-26
Data set D
10:45 2014-09-26
you put the input data set & output data set
10:45 2014-09-26
in-sample error pattern
10:46 2014-09-26
in-sample error vector
10:47 2014-09-26
error pattern
10:47 2014-09-26
out-of-sample error vector
10:47 2014-09-26
pseudo-inverse
10:48 2014-09-26
expected in-sample error,
expected out-of-sample error,
expected generalization error
10:56 2014-09-26
degree of freedom
start CalTech machine learning video 8
Bias-Variance Trandeoff
8:46 2014-09-26
what is the "VC dimension"?
VC dimension dVC(H) is "most points H can shatter"
8:47 2014-09-26
this is where it applies
8:48 2014-09-26
the most important part of the application is
are the disappearing block, because it gives the
generality of VC inequality has, the VC bound is
valid for
1.any "learning algorithm", for
2.any "input distribution" may take place, and also for
3.any "target function" that may be able to learn.
8:51 2014-09-26
N: you need the number of examples
Rule of thumb: N >= 10 * dVC
8:53 2014-09-26
generalization bound:
Eout <= Ein + Ω
8:54 2014-09-26
it gives us a different angle of generalization
8:55 2014-09-26
* bais & variance
* learning curves
8:58 2014-09-26
small Eout:
that's the purpose of learning,
if Eout is small, you have learned, you have hypotheis
approximate the target function well.
8:59 2014-09-26
Small Eout: good approximation of f out of sample
9:00 2014-09-26
More complex H => better chance of approximating f
Less complex H => better chance of generalizing out of sample.
9:01 2014-09-26
but have a bigger hypothese set may be bad news,
because you may not be able to get it.
9:02 2014-09-26
what is the ideal hypothese set for learning?
9:02 2014-09-26
Quantifying the tradeoff:
VC analysis was one approach: Eout <= Ein + Ω
// the generalization bound
9:06 2014-09-26
Ein is an approximation, Ω is purely generalization
9:07 2014-09-26
Bias-variance analysis is another: decomposing Eout into
1. How well H can approximate f
2. How well we can zoom in on a good h ∈H
9:13 2014-09-26
this is the best hypothesis, it has a certain
ability, I need to pick it, I have to use the example
to zoom into the hypothese set.
9:14 2014-09-26
decomposing into: approximation + generalization
9:22 2014-09-26
the 1st thing I'm going to do, is exchange
the order of expectations
9:25 2014-09-26
the average hypothesis
9:26 2014-09-26
you learned from a bunch of data sets and get a hypothesis,
someone learned from another bunch of data sets & get another
hypothesis, so how about get the expectation of these hypothese?
9:28 2014-09-26
hopping from your guy to the target goes from small steps:
1. from your guy to the best hypothesis
2. another hop from the best hypothesis to the target function
your hypothesis => the best hypothesis => the target function
9:38 2014-09-26
the cross term goes away, and that's the advantage
of the particular measure that we have.
9:40 2014-09-26
bias(x) + var(x)
9:43 2014-09-26
and this is the bias + variance decomposition
9:47 2014-09-26
here I have a small hypotheses set,
this one, I have a huge hypothese set.
9:50 2014-09-26
if hypothese set gets bigger => bias decrease, variance increase
9:56 2014-09-26
which one is better?
better for what? that's the key issue.
10:03 2014-09-26
you don't know it, you only know the target function
10:07 2014-09-26
let's do the bias + variance decomposition
10:10 2014-09-26
when you're in a learning situation, always remember
you're matching the "model complexity" to the "data resource"
you have, not to the "target complexity"
10:25 2014-09-26
the matter is not that if the target function is there,
the question is can I find it?
10:27 2014-09-26
expected Eout // expected out-of-sample error
expected Ein // expected in-sample error
how do they vary with N?
10:29 2014-09-26
expected error vs Number of Data Points, N
10:30 2014-09-26
it doesn't bother me, because the in-sample error
is not the bottom line, the out-of-sample error is.
10:32 2014-09-26
VC <=> Bias + Variance
10:37 2014-09-26
VC analysis
10:37 2014-09-26
Eout = Ein + Ω
Eout // out-of-sample error
Ein // in-sample error
Ω // generalization error
// generalize from in-sample to out-of-sample error
10:37 2014-09-26
generalization:
in-sample error => out-of-sample error
10:42 2014-09-26
linear regression
10:44 2014-09-26
noisy target: linear + noise
10:44 2014-09-26
Data set D
10:45 2014-09-26
you put the input data set & output data set
10:45 2014-09-26
in-sample error pattern
10:46 2014-09-26
in-sample error vector
10:47 2014-09-26
error pattern
10:47 2014-09-26
out-of-sample error vector
10:47 2014-09-26
pseudo-inverse
10:48 2014-09-26
expected in-sample error,
expected out-of-sample error,
expected generalization error
10:56 2014-09-26
degree of freedom