Andrew Ng machine learning 课程笔记--最优间隔分类器问题

最新推荐文章于 2020-03-22 21:03:33 发布

组煮珠竹

最新推荐文章于 2020-03-22 21:03:33 发布

阅读量160

点赞数

分类专栏：机器学习

机器学习专栏收录该内容

26 篇文章 0 订阅

订阅专栏

The optimal margin classfier:we were beginning on developing on support vector machines,I said that a hypothesis represented as H sub wb as g of w transpose x+b,where g will be +1 or -1.And I said that in our development of support vector machines,we will use we will change the convention of letting y be +1,-1 to note the class labels.So last time,we talked about the functional margin,which was the thing,gamma hat i.And so we had the intuition that if functional margin is a large positive number,then that means that we are classifying a training example correctly and very confidently.so yi id +1,we would like w transpose xi +b to be very large.the geometric margin is has the interpretation as a distance between a training example and a hyperplane.And it will actually be a signed distance,so that this distance will be positive if you are classifying the example correctly.And if you misclassify the example,this distance it will be the minus of the distance,reaching the point,reaching the training example.And you are separating hyperplane where you are separating hyperplane is defined by the equation w transpose x+b=0.I guess also defined these things as the functional margin,respect to training set I define it the minimum functional geometric margin .In our development of the optimal margin classifier,our learning algorithm would choose parameters w and b so as to maximize the geometric margin.So our goal is to find the separating hyperplane that separate the positive and negative examples with as large a distance as possible between hyperplane and the positive and negative examples.And if you go to choose parameters w and b to maximize this ,one copy of the geometric margin is that you can actually scale w and b arbitrarily.I want to impose a constraint that thr functional margin is equal to 1.I want to impose a constraint that min over I,yi that in the worst case,function y is over 1.because if you solve for w and b,and you find your worst-case function margin is actually 10 or whatever,then by dividing through w and b by a factor of 10,I can get my functional margin to be over 1.So this is a scaling constraint would imply.And so the picture is you have a quadratic function,and you are ruling out various half spaces where each of these linear constraints.And I hope if you can picture this in 3D,this is a convex problem that has no local optimum.B ut they be run great within this set of points that hasnot ruled out,then you convert to the global optimum.And so hat is the convex optimization problem.

Primal and dual optimization:the method of Lagrange multipliers is was suppose there is some function you want to minimize,we are subject to some set of constraints that each I for w must equal 0 for i=1,I will write h of w as this vector value function.So that is equal to 0,where 0 is the arrow on top.you construct this lagrangian,which is the original optimization objective .so the partial derivative with respect to your lafrange multipliers ,and set that to 0.And then the same as theorem through it is necessary that beta exist,those partial derivatives are equal to 0.Notice that if gi of w is greater than 0,so if w violates one of your primal problems constraints,then state of p of w would be infinity.suppose I picked a value of w that violates one of these constraints.so gi of w is positive.then well,theta p is this maximize this function of alpha and beta.So one of these gi of w's is this positive,then by setting the other responding alpha I to plus infinity,I can make this arbitrarily large.and so if w violates one of my primal problem's constraints in one of the gis,then max over alpha of this lagrange will be plus infinity.if hi of w is not equal to 0,for some value of I,then in my lagrange,I had a beta I hi theorem.and so by setting beta I to be plus infinity or minus infinity depending on the sigh of hi, I can make this plus infinity as well.And otherwise,theta p of w is just equal to f of w.it turns out if I had a value of w that satifies all of the gi and the hi constraints,then we maximize in terms of alpha and beta all the lagrange multipy theorems will actually be obtained by setting all the lagrange multiply terms to be 0.

Dual problem:p star was a value of the prime optimization problem.the max of the min of something is less than equal to the min of the max of something.it seems that under certain conditions these two optimizatioon problems have the same value.and so our stategy for working out support vextor machine will be that ,we will write down the primal optimization problem,which we did previously,and maximizing classifier.and then we will derive the dual optimization problem for that.and then we will solve the dual problem.and by modifying that a little bit,that is how we will derive the support vector machine.

Convex optimization:it means that Hessian,h is positive.so it just means the whole function like that.

KKT conditions:

SVM dual:so the problem we worked out previously was we want to minimize the normal w squared and just add a half there by convention .and subject to the constrains,yi x w xi +v must be = greater than 1.and let me just take this constraint,and I will rewrite it as a constraint.It's gi of w,b.that necessary implies that gi of w ,b is equal to 0.yi has functional margin equal to 1.because this constraint was that the marginal function has to be greater equal to 1.one useful property of this is that so true in general as well,it turns out that we find a solution to this optimization problem,you find that relatively few training examples have functional margin equal to 1.they are what we are going to call the support vectors.w therefore is the output of your input feature vectoes xi.this is sum of your various weights given by the alpha I's and times the xi's,which are your examples in your training set.partial derivative of lagrange

Kernels:maximize this w of alpha,my dual problem is the following,find the worst example,and this is support vector machine.