CalTech machine learning, video 14 note(Support Vector Machine)

14:32 2014-10-09 Thursday
start CalTech machine learning, video 14




Support Vector Machine


14:33 2014-10-09
SVM is argubaly the most successful classification method


in machine learning


14:42 2014-10-09
it's a very very neat piece of work for machine learning


14:43 2014-10-09
outline:


* maximizing the margin


* the solution


* nonlinear transforms


14:43 2014-10-09
it will be a constrained optimization method


14:44 2014-10-09
linear separation


14:44 2014-10-09
linear separable data


14:44 2014-10-09
I can get different lines. Is there advantage of 


choosing one of the lines over the other.


14:45 2014-10-09
that's the margin you have


14:46 2014-10-09
margin of error


14:46 2014-10-09
if you look at this line, it does seem to have


a better margin.


14:46 2014-10-09
let me to get the most possible margin


14:47 2014-10-09
which is the best line for classification.


14:47 2014-10-09
you will choose the fat margin


14:48 2014-10-09
let us ask 2 questions:


* why the big margin better?


* can I solve a w that maximize the margin?


14:48 2014-10-09
all possible dichotomies


14:50 2014-10-09
growth function


14:50 2014-10-09
we know the growth function being big 


is bad news for generalization


14:50 2014-10-09
dichotomies with fact margin


14:51 2014-10-09
I'm requied that the margin at least be something


14:53 2014-10-09
effectively by requiring the margin at least be something,


I'm putting a restriction on the growth function


14:54 2014-10-09
fat margin imply fewer dichotomies


14:54 2014-10-09
fat dichotomies have a smaller growth function, a smaller


VC dimension than if I did not restrict them at all


14:55 2014-10-09
we will find out indeed when you have a bigger margin, 


you'll achieve better out-of-sample performance


14:57 2014-10-09
fat margins are good


14:57 2014-10-09
find the w that not only classifies the points correctly,


but achieve so with big possible margin.


14:57 2014-10-09
margin is just the distance from the plane to the point


14:58 2014-10-09
hyperplane that separate the points


14:58 2014-10-09
if I give you w & x, can you get a formual & plug in to 


give me the distance between that plane described by w


& point xn?


15:00 2014-10-09
normalize w


15:01 2014-10-09
I have the plane, so the plane has that the signal == 0,


it doesn't touch any points, the points are linearly separable,


now when you get the signal to be positive, you're moving in


one direction, you hit the closest point, you hit more points,


the interior points so to be speak.and when you go the other direction,


it's negative, you hit other points


15:05 2014-10-09
pull out w0


15:07 2014-10-09
the plane is now : 


w'x + b = 0 (no x0)


15:08 2014-10-09
computing the distance


15:08 2014-10-09
this is the geometry, I have a plane, and I have a point,


I want to estimate the distance.


15:09 2014-10-09
the vector w is orthogonal to the plane // normal vector


15:10 2014-10-09
now you can see the good old b dropped out


15:10 2014-10-09
x' - x'' // this must be orthogonal to the vector w


15:11 2014-10-09
this could be any 2 points on the plane, and the conclusion is that: 


w is orthogonal to any vector on the plane


15:12 2014-10-09
now w has an explanation // w is always orthogonal to the plane


15:13 2014-10-09
if you project this fellow on this direction


15:14 2014-10-09
project vector a to vector b


15:14 2014-10-09
unit vector


15:14 2014-10-09
we're going to find an equivalent problem that is


more fiendly


15:17 2014-10-09
the optimization problem: maximize ... subject to ...


15:17 2014-10-09
does anybody see quadratic programming coming on 


the horizon?


15:19 2014-10-09
when you solve it, you'll get the separating plane


with the best margin.


15:23 2014-10-09
constrained optimization  // Langrange multiplier


15:24 2014-10-09
but here the constraints are "inequality constraints"


instead of "equality constraints"


15:25 2014-10-09
inequality constraints => KKT


15:26 2014-10-09
it is good to look at that picture because 


it will put the analysis here in perspective


15:27 2014-10-09
what are we optimizing? and what is the constraint?


15:29 2014-10-09
objective function, constraints


15:30 2014-10-09
pass it to a package of quadratic programming


to get the solution


15:30 2014-10-09
and they become part of the objective function // constraints


15:32 2014-10-09
we're minimizing this respect to what?


respect to w & b


15:34 2014-10-09
classification, segmentation


15:38 2014-10-09
this is a "KKT condition"


15:42 2014-10-09
the solution - quadratic programming


15:43 2014-10-09
convex optimization


15:48 2014-10-09
convex function


15:49 2014-10-09
just a word of warning before we go ...


15:49 2014-10-09
it's a practical consideration, but it's an 


important consideration.


15:50 2014-10-09
flirting with danger


15:51 2014-10-09
QP hands us α // QP == Quadratic Programming


15:52 2014-10-09
so the solution is vector of αs


15:53 2014-10-09
you get the αs, you plug them in, so you get the w


15:53 2014-10-09
so I would like to tell you a condition which is


very important, and it will be the key to find the 


support vectors, KKT condition


15:55 2014-10-09
when you look at the vector, you just see a whole bunch


of αs that are 0


15:55 2014-10-09
for all the interior points, you're guaranteed that the 


multiplier is zero


15:59 2014-10-09
we end up with no need for regularization


15:59 2014-10-09
αn > 0    =>    Xn is a support vector


16:00 2014-10-09
now we come to the interesting definition,α is largely 


zeros, interior points, the most important points in the game


are the points that actually define the plane & the margin


these are the ones for which αs positive


16:03 2014-10-09
these are called support vectors, so I have n points, 


I classifies them & I got the maximum margin, because 


it's a maximum margin, it touches on some of the +1 & 


some of the -1 points, those points support the plane 


so to speak, and are called the "support vectors", and 


other guys are interior points 


16:06 2014-10-09
α greater than zero identify a support vector


16:07 2014-10-09
where are the support vectors in this picture?


they're the closest one to the plane


16:08 2014-10-09
all of these guys here & here contribute nothing,


α == 0 in this case


16:08 2014-10-09
and the support vectors achieve the margin exactly


they're the critical points


16:09 2014-10-09
this is our way to get the α back, this is the currency


we get back from the quadratic programming,and plug in


order to get the w


16:11 2014-10-09
now I notice that many of the αs are zero, and αis only 


positive for support vectors,


16:12 2014-10-09
then I can say that I can sum this up over only the support 


vectors(SV)


16:13 2014-10-09
seems only a minor technicality, but there is a very important


point in here


16:13 2014-10-09
α is the parameter of your model, when they're 0,


they don't count.


16:14 2014-10-09
w // weight vector


16:14 2014-10-09
now you realize that why there might be a 


generalization dividend here, because I end up


here with fewer parameters


16:16 2014-10-09
solve for b using any SV(Support Vector)


16:16 2014-10-09
now you have w & b, you're ready with the classification


line or hyperplane that you have


16:17 2014-10-09
nonlinear transformation


16:17 2014-10-09
now let me close with the nonlinear transformation, 


which will be a very short presentation that have 


an enormous impact, we're talking about linear boundary


and we're talking about linearly separable case


16:19 2014-10-09
instead of working in the x space, we work in the z space


16:19 2014-10-09
z instead of x


16:20 2014-10-09
not just get a generic separator, you are getting


the best separator according SVM


16:23 2014-10-09
I getting the separating line or plane in the z space


16:24 2014-10-09
If I have nonlinear transformation, do I still have SV?


you have 'support vectors' sure in the z space


you get the planes there, you get the margin, the margin


will touch some points, these are your support vectors


by definition.


16:28 2014-10-09
you can identify them even without looking geometrically


at the z space because what are the support vectors?


I look at the αs I get, and the αs are positive, those


are corresponds to support vectors


16:31 2014-10-09
this is what the boundaries look like


16:32 2014-10-09
now we have alarm bells, overfitting, overfitting....


16:32 2014-10-09
so if I look at what the support vectors are in the z space,


16:33 2014-10-09
"pre-images" of support vectors


16:33 2014-10-09
but you need to be careful, because the formal definition


is in the z space


16:34 2014-10-09
now the interesting aspect here is that, if this is true,


16:34 2014-10-09
that's remarkable, because I just went to a million dimensional


space, w is a million dimensional vector,and when I did the solution,


if I only 4 support vectors


the generalization behavior will go with 4 parameters


16:36 2014-10-09
so this looks like a sophisticated surface, but it's a 


sophisticated surface in disguise, it's so careful chosen,


there are lots of snakes go aound and messing up with the


generalization,this one will be the best of them, and you have


a handle on how good generalization just by counting the number


of support vectors,


16:38 2014-10-09
the distance betweent the support vectors & the surface here


are not the margin, the margins are in the linear space etc.


16:39 2014-10-09
they're likely these guys to be close to the surface,


but the distance wouldn't be the same, and there are 


perhaps other vectors look like should be support vectors,


they're not! what make them to be support vectors:


they're ahieving the margin in the z space,


this is just an illustrative version of it.


16:40 2014-10-09
now we come to the generalization result that


makes this fly!


16:41 2014-10-09
generalization result: Eout <= #of SV's / (N-1)


16:42 2014-10-09
that will give you an upperbound in Eout


16:42 2014-10-09
the result is very close to this.


16:42 2014-10-09
in order to get the correct result


16:44 2014-10-09
how many support vector do you get?


16:44 2014-10-09
you ask me after you've done with this...,


how many support vectors do you get? if you have 


a thousand data points, and you get 10 support vectors,


you're in pretty good shape, regardless of the dimensionality


of the z space you visited


16:46 2014-10-09
now I can go to any dimension space, it will be fine?


16:46 2014-10-09
but this is the main theoretical result that make


people use support vector


16:47 2014-10-09
kernel method, which is a modification of SVM


--------------------------------------------------------
16:53 2014-10-09
nonlinear transformation => kernel method ???


16:53 2014-10-09
in almost all the cases, the vast majority of terms 


will be zero


16:54 2014-10-09
this is not a accurate statement, but it's a reasonable


statement


16:55 2014-10-09
the number of nonzero paramters, which corresponds to 


VC dimension happens to be the #SV by defn


16:56 2014-10-09
so the number of SV give you a bound on Eout


16:56 2014-10-09
some of the aggregation method like boosting has a margin


on, then you compare that,


16:59 2014-10-09
basically the support vectors are there to "support


the separating plane"


16:59 2014-10-09
they're the ones dictate the margin
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值