CalTech machine learning, video 15 note(Kernel Method)

18:56 2014-10-09
Kernel Method

if you think of linear model as economy car,

you can think SVM as luxury car

maximizing the margin

Review of Lecture 14

* The margin

* quadratic programming

* support vectors

* nonlinear transform

support vectors are the ones that achieve the margin,

they're used to define the plane

in-sample check out-of-sample error

we went to a fairly high-dimensional space

* The kernel trick

* Soft-margin SVM

you're going to a high-dimensional space without

paying the price for it

you count the number of "support vectors"

I'm the guadian of the z space, you come to me

with requests, 

inner product after a transformation

start CalTech machine learning, video 14
Kernel Methods

Kernel Methods

* The kernel trick

* Soft-margin SVM

extend SVM from the linear separable case to 

the nonlinear separable case allowing yourself

to make errors

shift from hard-margin to soft-margin to allow

some errors

in a practical problem, you're going to use both,

you're going to a high dimensional space, probably 

an infinite dimensional space without paying the price

for it

the idea of the kernel is that: I want to go to the 

z space without paying the price for it.

you count the #support vectors, the dimensionality 

of the z space doesn't appear

we still need to take the innner product in the z space

what do I need from the z space in order to be carrying 

out the machinery that I have seen so far?

in order to be able to carry out the Langrange, I need to get the 

inner product in the z space

but getting the inner product in the z space is less demand

than getting actual vector in the z space.

I'm the guardian of the z space, I 'm closing the door, nobody 

has acess to the z space, you come to me with requests, if you

give me a x, and ask me what is the transformation, it's a big 

demand, I have to handle a big z.

but all I'm willing to give you is "inner product"

you give me x & x', I close the door,do my thing, and come

back with a number, which is the inner product with z & z',

without actually telling where z & z' 

that will be a simple operation,

it's a good thing, because we can compeletely focusing on 

innner product in the z space, and see if that can lead us

to simplification.

so this is the 1st constraints, I don't see any z

I don't know what w is, w lives in the z space

can I get a way with just inner product to solve this?

w is not mysterious to us, we already solve it

but particular to the support vectors with nonzero αs

I solve for be by taking any support vector

we only deal with z, as far as inner product is concerned

if I'm able to compute the inner product in the z space

without visiting the z space, I still can carry this machinery

all we have to do is some thing:

I give you x & x', 2 points in the x space, 

you do your thing, come back with a number, 

promise that this is the inner product in the z space.

the mysterious z space.

you do something, you know the existence is sufficient

let's look at the idea as a generalizd inner product

we view it as a generalized inner product in the x space

K(x, x') // the kernel

the kernel will correspond to some z space

K(x, x') // generalized inner product between x & x'

not an ordinary inner product, but an inner product

after a transformation

z = Φ(x) // transformation

K(x, x') = ztz' // inner product

now we come to the trick:

Can we compute this kernel, K(x, x') without transforming

x & x'?

it doesn't tranform thing into the z space & take the inner product,

it just tell you what the kernel is, and then I'm going to 

convince you that this kernel actually correspond to a transformation

to some z space, and taking the inner product there.

the main thing you can observe is that, this is not

a inner product 

the polynomial kernel

the equivalent kernel

a valid kernel is the inner product in some space

how much computations does it take you to do this?

multiply them ,and raise the the power Q

d: dimensionality of the x space

you can see that I expand this conceptually not


this will be an ugly beast to deal with

but the bottom line is a kernel of this form does

correspond to an inner product in a higher space.

by compute it in the x space just using this formula,

with this in mind, we need only z to be exist,

let's get carried away, and try to just get a kernel,

and map us to z without even imaging what a is.

we take this K(x, x') to be an inner product in "some" space z

9:55 2014-10-10
We need only Z to exist!

I have no diea what it is, I can compute it.

the interesting thing is that that space is infinite dimesional

you have get the benefit of a horrific nonlinear transformation

here we don't care, we carry the machinery, then we

count the #support vectors, 

my purpose is to convince you that there is a z space,

and this is an inner product

this is a very interesting kernel, called the

RBF(Radial Basis Function) kernel

let's look at the kernel in action, it's a very

sophisticated kernel, it corresponds to an infinite

dimensional space,

so this is the data set I'm working with

now I'm going to transform x into an infinite dimensional space

you get the kernel, you pass it onto the quadratic proramming,

the QP give you the "support vectors"

I darken the dot that end up being support vectors

I have 9 SVs altogether, hundred points, can you tell me

what is the Eout(out-of-sample error)?

can you bounded above?

the z space is a mysterious guy

but you can see why support vectors are called support vectors

if you don't get linear separability in the z space, 

you're really in trouble

the other thing is that when you think of the notion

of a distance, when I get the linear separability there(the z space),

I get a margin, I try to maximize the margin, that is 

already maximized by the machinery.

I do get the small number of support vectors

but again, this is not the margin, these

guys are pre-image of support vectors, the distance 

solve for happens in the z space, 

you may get support vectors which are far away,

10:19 2014-10-10
so we get the solution, it's a pretty nice tool to have

check the number of SVs, it will tell you the generalization property

if I give you kernel, it's a valid kernel, how

do you formalate the problem in the z space?

Kernel formulation of SVM

QP == Quadratic Programming

quadratic coefficient

how do I construct the hypothese in terms of the kernel?

SVM is not a specific model, you choose a kernel,

it will give you a different model

this is for any support vector which is defined by

αm > 0

I just avoided the label by using the kernel,

I got the solution made me completely forgot 

that I did nonliear transformation to the z space

the only thing to remember is that this transformation

depends on you data set.

so the whole idea of the kernel is that you don't

visit the z space, if you find this is a valid kernel

namely some inner product in a space without visiting 

that space.

by the way, in support vector machines, you will

come up with your own kernels

in order to be a valid kernel, there are 3 approaches:

* By construction // conceptually construction?

* Math property   // use the math properties of the kernel(Mercer's condition)

* who cares?

Design your own kernel

K(x, x') is a valid kernel iff

1. it is symmetric 

2. the matrix is positive semi-definite

// the matrix should be greater or equal to zero!

// Mercer's condition

but that indeed is the condition, and if you manage to 

establish the condition for any kernel, then you establish

the z space exists, even if you don't know what the z space


now we're going to the case where the data is not 

linear separable, and we still insist on separating them

with making some errors  

=> soft-margin SVM

2 types of non-separable

* slightly  // soft-margin SVMs deal with this

* serious  // kernels deal with this

you will be combining the kernel with the soft-margin SVM

in almost all the probolems you encounter

Error measure:

* Margin violation


error measure based on violating the margin

10:53 2014-10-10
10:54 2014-10-10
10:55 2014-10-10
now I'm going to penalize you for the total violation

10:56 2014-10-10
so this is the quantity that I provide to you,

capture the violation of the margin.

this is no different from our notion of argmented error

* margin support vectors

* non-margin support vectors

the KKT condition is necessary

the major success of SVM is in classification





