18:56 2014-10-09
start CalTech machine learning, video 15
Kernel Method
18:56 2014-10-09
if you think of linear model as economy car,
you can think SVM as luxury car
18:56 2014-10-09
maximizing the margin
18:58 2014-10-09
Review of Lecture 14
* The margin
* quadratic programming
* support vectors
* nonlinear transform
19:01 2014-10-09
support vectors are the ones that achieve the margin,
they're used to define the plane
19:02 2014-10-09
in-sample check out-of-sample error
19:03 2014-10-09
we went to a fairly high-dimensional space
19:04 2014-10-09
outline:
* The kernel trick
* Soft-margin SVM
19:06 2014-10-09
you're going to a high-dimensional space without
paying the price for it
19:07 2014-10-09
you count the number of "support vectors"
19:09 2014-10-09
I'm the guadian of the z space, you come to me
with requests,
19:17 2014-10-09
inner product after a transformation
/
8:58 2014-10-10 Friday
start CalTech machine learning, video 14
Kernel Methods
8:58 2014-10-10
outline:
* The kernel trick
* Soft-margin SVM
9:08 2014-10-10
extend SVM from the linear separable case to
the nonlinear separable case allowing yourself
to make errors
9:09 2014-10-10
shift from hard-margin to soft-margin to allow
some errors
9:10 2014-10-10
in a practical problem, you're going to use both,
you're going to a high dimensional space, probably
an infinite dimensional space without paying the price
for it
9:11 2014-10-10
the idea of the kernel is that: I want to go to the
z space without paying the price for it.
9:12 2014-10-10
you count the #support vectors, the dimensionality
of the z space doesn't appear
9:13 2014-10-10
we still need to take the innner product in the z space
9:13 2014-10-10
what do I need from the z space in order to be carrying
out the machinery that I have seen so far?
9:14 2014-10-10
in order to be able to carry out the Langrange, I need to get the
inner product in the z space
9:16 2014-10-10
but getting the inner product in the z space is less demand
than getting actual vector in the z space.
9:16 2014-10-10
I'm the guardian of the z space, I 'm closing the door, nobody
has acess to the z space, you come to me with requests, if you
give me a x, and ask me what is the transformation, it's a big
demand, I have to handle a big z.
9:19 2014-10-10
but all I'm willing to give you is "inner product"
9:20 2014-10-10
you give me x & x', I close the door,do my thing, and come
back with a number, which is the inner product with z & z',
without actually telling where z & z'
9:21 2014-10-10
that will be a simple operation,
9:21 2014-10-10
it's a good thing, because we can compeletely focusing on
innner product in the z space, and see if that can lead us
to simplification.
9:22 2014-10-10
so this is the 1st constraints, I don't see any z
9:23 2014-10-10
I don't know what w is, w lives in the z space
9:23 2014-10-10
can I get a way with just inner product to solve this?
9:24 2014-10-10
w is not mysterious to us, we already solve it
9:24 2014-10-10
but particular to the support vectors with nonzero αs
9:25 2014-10-10
I solve for be by taking any support vector
9:26 2014-10-10
we only deal with z, as far as inner product is concerned
9:27 2014-10-10
if I'm able to compute the inner product in the z space
without visiting the z space, I still can carry this machinery
9:28 2014-10-10
all we have to do is some thing:
I give you x & x', 2 points in the x space,
you do your thing, come back with a number,
promise that this is the inner product in the z space.
the mysterious z space.
9:30 2014-10-10
you do something, you know the existence is sufficient
9:31 2014-10-10
let's look at the idea as a generalizd inner product
9:31 2014-10-10
we view it as a generalized inner product in the x space
9:32 2014-10-10
K(x, x') // the kernel
the kernel will correspond to some z space
9:33 2014-10-10
K(x, x') // generalized inner product between x & x'
9:33 2014-10-10
not an ordinary inner product, but an inner product
after a transformation
9:34 2014-10-10
z = Φ(x) // transformation
K(x, x') = ztz' // inner product
9:36 2014-10-10
now we come to the trick:
Can we compute this kernel, K(x, x') without transforming
x & x'?
9:37 2014-10-10
it doesn't tranform thing into the z space & take the inner product,
it just tell you what the kernel is, and then I'm going to
convince you that this kernel actually correspond to a transformation
to some z space, and taking the inner product there.
9:39 2014-10-10
the main thing you can observe is that, this is not
a inner product
9:40 2014-10-10
the polynomial kernel
9:42 2014-10-10
the equivalent kernel
9:45 2014-10-10
a valid kernel is the inner product in some space
9:46 2014-10-10
how much computations does it take you to do this?
9:46 2014-10-10
multiply them ,and raise the the power Q
9:47 2014-10-10
d: dimensionality of the x space
9:48 2014-10-10
you can see that I expand this conceptually not
computationally.
9:48 2014-10-10
this will be an ugly beast to deal with
9:50 2014-10-10
but the bottom line is a kernel of this form does
correspond to an inner product in a higher space.
9:51 2014-10-10
by compute it in the x space just using this formula,
9:52 2014-10-10
with this in mind, we need only z to be exist,
let's get carried away, and try to just get a kernel,
and map us to z without even imaging what a is.
9:54 2014-10-10
we take this K(x, x') to be an inner product in "some" space z
9:55 2014-10-10
We need only Z to exist!
9:55 2014-10-10
I have no diea what it is, I can compute it.
9:56 2014-10-10
the interesting thing is that that space is infinite dimesional
9:56 2014-10-10
you have get the benefit of a horrific nonlinear transformation
9:57 2014-10-10
here we don't care, we carry the machinery, then we
count the #support vectors,
10:00 2014-10-10
my purpose is to convince you that there is a z space,
and this is an inner product
10:02 2014-10-10
this is a very interesting kernel, called the
RBF(Radial Basis Function) kernel
10:04 2014-10-10
let's look at the kernel in action, it's a very
sophisticated kernel, it corresponds to an infinite
dimensional space,
10:06 2014-10-10
so this is the data set I'm working with
10:07 2014-10-10
now I'm going to transform x into an infinite dimensional space
10:08 2014-10-10
you get the kernel, you pass it onto the quadratic proramming,
the QP give you the "support vectors"
10:09 2014-10-10
I darken the dot that end up being support vectors
10:11 2014-10-10
I have 9 SVs altogether, hundred points, can you tell me
what is the Eout(out-of-sample error)?
can you bounded above?
10:12 2014-10-10
the z space is a mysterious guy
10:13 2014-10-10
but you can see why support vectors are called support vectors
10:14 2014-10-10
if you don't get linear separability in the z space,
you're really in trouble
10:15 2014-10-10
the other thing is that when you think of the notion
of a distance, when I get the linear separability there(the z space),
I get a margin, I try to maximize the margin, that is
already maximized by the machinery.
10:17 2014-10-10
I do get the small number of support vectors
10:17 2014-10-10
but again, this is not the margin, these
guys are pre-image of support vectors, the distance
solve for happens in the z space,
10:19 2014-10-10
you may get support vectors which are far away,
it happens in the space you don't understand,
10:19 2014-10-10
so we get the solution, it's a pretty nice tool to have
10:20 2014-10-10
check the number of SVs, it will tell you the generalization property
10:21 2014-10-10
if I give you kernel, it's a valid kernel, how
do you formalate the problem in the z space?
10:21 2014-10-10
Kernel formulation of SVM
10:22 2014-10-10
QP == Quadratic Programming
10:22 2014-10-10
quadratic coefficient
10:22 2014-10-10
how do I construct the hypothese in terms of the kernel?
10:24 2014-10-10
SVM is not a specific model, you choose a kernel,
it will give you a different model
10:26 2014-10-10
this is for any support vector which is defined by
αm > 0
10:28 2014-10-10
I just avoided the label by using the kernel,
I got the solution made me completely forgot
that I did nonliear transformation to the z space
10:30 2014-10-10
the only thing to remember is that this transformation
depends on you data set.
10:31 2014-10-10
so the whole idea of the kernel is that you don't
visit the z space, if you find this is a valid kernel
namely some inner product in a space without visiting
that space.
10:34 2014-10-10
by the way, in support vector machines, you will
come up with your own kernels
10:34 2014-10-10
in order to be a valid kernel, there are 3 approaches:
* By construction // conceptually construction?
* Math property // use the math properties of the kernel(Mercer's condition)
* who cares?
10:36 2014-10-10
Design your own kernel
K(x, x') is a valid kernel iff
1. it is symmetric
2. the matrix is positive semi-definite
// the matrix should be greater or equal to zero!
// Mercer's condition
10:39 2014-10-10
but that indeed is the condition, and if you manage to
establish the condition for any kernel, then you establish
the z space exists, even if you don't know what the z space
is.
10:43 2014-10-10
now we're going to the case where the data is not
linear separable, and we still insist on separating them
with making some errors
=> soft-margin SVM
10:45 2014-10-10
2 types of non-separable
* slightly // soft-margin SVMs deal with this
* serious // kernels deal with this
10:46 2014-10-10
you will be combining the kernel with the soft-margin SVM
in almost all the probolems you encounter
10:47 2014-10-10
Error measure:
* Margin violation
*
10:52 2014-10-10
error measure based on violating the margin
10:53 2014-10-10
when this fails, the margin violated
10:54 2014-10-10
I'm going to introduce a slack for every point
10:55 2014-10-10
now I'm going to penalize you for the total violation
you made, I'm just going to add up these violations
10:56 2014-10-10
so this is the quantity that I provide to you,
capture the violation of the margin.
10:58 2014-10-10
this is no different from our notion of argmented error
10:59 2014-10-10
* margin support vectors
* non-margin support vectors
11:13 2014-10-10
the KKT condition is necessary
11:18 2014-10-10
the major success of SVM is in classification
------------------------------------------------------
start CalTech machine learning, video 15
Kernel Method
18:56 2014-10-09
if you think of linear model as economy car,
you can think SVM as luxury car
18:56 2014-10-09
maximizing the margin
18:58 2014-10-09
Review of Lecture 14
* The margin
* quadratic programming
* support vectors
* nonlinear transform
19:01 2014-10-09
support vectors are the ones that achieve the margin,
they're used to define the plane
19:02 2014-10-09
in-sample check out-of-sample error
19:03 2014-10-09
we went to a fairly high-dimensional space
19:04 2014-10-09
outline:
* The kernel trick
* Soft-margin SVM
19:06 2014-10-09
you're going to a high-dimensional space without
paying the price for it
19:07 2014-10-09
you count the number of "support vectors"
19:09 2014-10-09
I'm the guadian of the z space, you come to me
with requests,
19:17 2014-10-09
inner product after a transformation
/
8:58 2014-10-10 Friday
start CalTech machine learning, video 14
Kernel Methods
8:58 2014-10-10
outline:
* The kernel trick
* Soft-margin SVM
9:08 2014-10-10
extend SVM from the linear separable case to
the nonlinear separable case allowing yourself
to make errors
9:09 2014-10-10
shift from hard-margin to soft-margin to allow
some errors
9:10 2014-10-10
in a practical problem, you're going to use both,
you're going to a high dimensional space, probably
an infinite dimensional space without paying the price
for it
9:11 2014-10-10
the idea of the kernel is that: I want to go to the
z space without paying the price for it.
9:12 2014-10-10
you count the #support vectors, the dimensionality
of the z space doesn't appear
9:13 2014-10-10
we still need to take the innner product in the z space
9:13 2014-10-10
what do I need from the z space in order to be carrying
out the machinery that I have seen so far?
9:14 2014-10-10
in order to be able to carry out the Langrange, I need to get the
inner product in the z space
9:16 2014-10-10
but getting the inner product in the z space is less demand
than getting actual vector in the z space.
9:16 2014-10-10
I'm the guardian of the z space, I 'm closing the door, nobody
has acess to the z space, you come to me with requests, if you
give me a x, and ask me what is the transformation, it's a big
demand, I have to handle a big z.
9:19 2014-10-10
but all I'm willing to give you is "inner product"
9:20 2014-10-10
you give me x & x', I close the door,do my thing, and come
back with a number, which is the inner product with z & z',
without actually telling where z & z'
9:21 2014-10-10
that will be a simple operation,
9:21 2014-10-10
it's a good thing, because we can compeletely focusing on
innner product in the z space, and see if that can lead us
to simplification.
9:22 2014-10-10
so this is the 1st constraints, I don't see any z
9:23 2014-10-10
I don't know what w is, w lives in the z space
9:23 2014-10-10
can I get a way with just inner product to solve this?
9:24 2014-10-10
w is not mysterious to us, we already solve it
9:24 2014-10-10
but particular to the support vectors with nonzero αs
9:25 2014-10-10
I solve for be by taking any support vector
9:26 2014-10-10
we only deal with z, as far as inner product is concerned
9:27 2014-10-10
if I'm able to compute the inner product in the z space
without visiting the z space, I still can carry this machinery
9:28 2014-10-10
all we have to do is some thing:
I give you x & x', 2 points in the x space,
you do your thing, come back with a number,
promise that this is the inner product in the z space.
the mysterious z space.
9:30 2014-10-10
you do something, you know the existence is sufficient
9:31 2014-10-10
let's look at the idea as a generalizd inner product
9:31 2014-10-10
we view it as a generalized inner product in the x space
9:32 2014-10-10
K(x, x') // the kernel
the kernel will correspond to some z space
9:33 2014-10-10
K(x, x') // generalized inner product between x & x'
9:33 2014-10-10
not an ordinary inner product, but an inner product
after a transformation
9:34 2014-10-10
z = Φ(x) // transformation
K(x, x') = ztz' // inner product
9:36 2014-10-10
now we come to the trick:
Can we compute this kernel, K(x, x') without transforming
x & x'?
9:37 2014-10-10
it doesn't tranform thing into the z space & take the inner product,
it just tell you what the kernel is, and then I'm going to
convince you that this kernel actually correspond to a transformation
to some z space, and taking the inner product there.
9:39 2014-10-10
the main thing you can observe is that, this is not
a inner product
9:40 2014-10-10
the polynomial kernel
9:42 2014-10-10
the equivalent kernel
9:45 2014-10-10
a valid kernel is the inner product in some space
9:46 2014-10-10
how much computations does it take you to do this?
9:46 2014-10-10
multiply them ,and raise the the power Q
9:47 2014-10-10
d: dimensionality of the x space
9:48 2014-10-10
you can see that I expand this conceptually not
computationally.
9:48 2014-10-10
this will be an ugly beast to deal with
9:50 2014-10-10
but the bottom line is a kernel of this form does
correspond to an inner product in a higher space.
9:51 2014-10-10
by compute it in the x space just using this formula,
9:52 2014-10-10
with this in mind, we need only z to be exist,
let's get carried away, and try to just get a kernel,
and map us to z without even imaging what a is.
9:54 2014-10-10
we take this K(x, x') to be an inner product in "some" space z
9:55 2014-10-10
We need only Z to exist!
9:55 2014-10-10
I have no diea what it is, I can compute it.
9:56 2014-10-10
the interesting thing is that that space is infinite dimesional
9:56 2014-10-10
you have get the benefit of a horrific nonlinear transformation
9:57 2014-10-10
here we don't care, we carry the machinery, then we
count the #support vectors,
10:00 2014-10-10
my purpose is to convince you that there is a z space,
and this is an inner product
10:02 2014-10-10
this is a very interesting kernel, called the
RBF(Radial Basis Function) kernel
10:04 2014-10-10
let's look at the kernel in action, it's a very
sophisticated kernel, it corresponds to an infinite
dimensional space,
10:06 2014-10-10
so this is the data set I'm working with
10:07 2014-10-10
now I'm going to transform x into an infinite dimensional space
10:08 2014-10-10
you get the kernel, you pass it onto the quadratic proramming,
the QP give you the "support vectors"
10:09 2014-10-10
I darken the dot that end up being support vectors
10:11 2014-10-10
I have 9 SVs altogether, hundred points, can you tell me
what is the Eout(out-of-sample error)?
can you bounded above?
10:12 2014-10-10
the z space is a mysterious guy
10:13 2014-10-10
but you can see why support vectors are called support vectors
10:14 2014-10-10
if you don't get linear separability in the z space,
you're really in trouble
10:15 2014-10-10
the other thing is that when you think of the notion
of a distance, when I get the linear separability there(the z space),
I get a margin, I try to maximize the margin, that is
already maximized by the machinery.
10:17 2014-10-10
I do get the small number of support vectors
10:17 2014-10-10
but again, this is not the margin, these
guys are pre-image of support vectors, the distance
solve for happens in the z space,
10:19 2014-10-10
you may get support vectors which are far away,
it happens in the space you don't understand,
10:19 2014-10-10
so we get the solution, it's a pretty nice tool to have
10:20 2014-10-10
check the number of SVs, it will tell you the generalization property
10:21 2014-10-10
if I give you kernel, it's a valid kernel, how
do you formalate the problem in the z space?
10:21 2014-10-10
Kernel formulation of SVM
10:22 2014-10-10
QP == Quadratic Programming
10:22 2014-10-10
quadratic coefficient
10:22 2014-10-10
how do I construct the hypothese in terms of the kernel?
10:24 2014-10-10
SVM is not a specific model, you choose a kernel,
it will give you a different model
10:26 2014-10-10
this is for any support vector which is defined by
αm > 0
10:28 2014-10-10
I just avoided the label by using the kernel,
I got the solution made me completely forgot
that I did nonliear transformation to the z space
10:30 2014-10-10
the only thing to remember is that this transformation
depends on you data set.
10:31 2014-10-10
so the whole idea of the kernel is that you don't
visit the z space, if you find this is a valid kernel
namely some inner product in a space without visiting
that space.
10:34 2014-10-10
by the way, in support vector machines, you will
come up with your own kernels
10:34 2014-10-10
in order to be a valid kernel, there are 3 approaches:
* By construction // conceptually construction?
* Math property // use the math properties of the kernel(Mercer's condition)
* who cares?
10:36 2014-10-10
Design your own kernel
K(x, x') is a valid kernel iff
1. it is symmetric
2. the matrix is positive semi-definite
// the matrix should be greater or equal to zero!
// Mercer's condition
10:39 2014-10-10
but that indeed is the condition, and if you manage to
establish the condition for any kernel, then you establish
the z space exists, even if you don't know what the z space
is.
10:43 2014-10-10
now we're going to the case where the data is not
linear separable, and we still insist on separating them
with making some errors
=> soft-margin SVM
10:45 2014-10-10
2 types of non-separable
* slightly // soft-margin SVMs deal with this
* serious // kernels deal with this
10:46 2014-10-10
you will be combining the kernel with the soft-margin SVM
in almost all the probolems you encounter
10:47 2014-10-10
Error measure:
* Margin violation
*
10:52 2014-10-10
error measure based on violating the margin
10:53 2014-10-10
when this fails, the margin violated
10:54 2014-10-10
I'm going to introduce a slack for every point
10:55 2014-10-10
now I'm going to penalize you for the total violation
you made, I'm just going to add up these violations
10:56 2014-10-10
so this is the quantity that I provide to you,
capture the violation of the margin.
10:58 2014-10-10
this is no different from our notion of argmented error
10:59 2014-10-10
* margin support vectors
* non-margin support vectors
11:13 2014-10-10
the KKT condition is necessary
11:18 2014-10-10
the major success of SVM is in classification
------------------------------------------------------