The Standard SVM Formulation
Given an implicit embedding Φ and training data (xi, yi) from 2 classes such that yi = ± 1, a Support Vector Machine finds a hyperplane wT Φ(x) + b = 0 that best separates the two classes (see Fig. 1). The learnt hyperplane is optimal in the sense that it maximises the margin while minimising some measure of loss on the training data.
![]() |
Figure 1. The SVM learns a hyperplane which best separates the two classes. Red dots have a label yi = +1 while blue dots have a label yi = -1. |
More formally, the primal formulation of the l1 C-SVM is
![]() |
With C being a user specified misclassification penalty. The primal variables can not be solved for directly since w is often infinite dimensional and \Φ is unspecified. The solution is obtained by moving to a dual formulation. First, the Lagrangian is formed by adding the constraints to the objective. Next, it is shown that for the given problem, the order of first maximising with respect to the Lagrange multipliers and then minimising with respect to the primal variables can be switched. This is helpful since a lot of terms simplify by first minimising over the primal variables analytically. This leads to the following simplified dual formulation
![]() |
where Y is a diagonal matrix with the labels on the diagonal. The dual is an instance of a convex Quadratic Programming problem and therefore has a unique global optimum. Having solved for α, the perpendicular to the separating hyperplane turns out to be w = Σi yi αiΦ(xi) and b can be solved for by using wT Φ(x) +b = yi for support vectors. A novel points x can now be classified as ±1 by evaluating sign(wT Φ(x) + b).