Linear SVM


We want to find a hyper-plane w ⊤ x + b = 0 w^\top x + b = 0 wx+b=0 that maximizes the margin.

Set up

We first show that the vector w w w is orthogonal to this hyper-plane. Let x 1 x_1 x1, x 2 x_2 x2 be any element on the hyper-plane. So we have w ⊤ x 1 + b = 0 w^\top x_1 + b = 0 wx1+b=0 and w ⊤ x 2 + b = 0 w^\top x_2 + b = 0 wx2+b=0. Then w ⊤ ( x 1 − x 2 ) = 0 w^\top (x_1 - x_2) = 0 w(x1x2)=0, which implies w w w is orthogonal to the hyper-plane. Now, we set two dashed lines to w ⊤ x + b = 1 w^\top x + b = 1 wx+b=1 and w ⊤ x + b = − 1 w^\top x + b = -1 wx+b=1. In fact, “ 1 1 1” doesn’t matter and we can pick any value here. “ 1 1 1” is just the convention.

Now we pick any line parallel (orthogonal) to w w w (hyper-plane), then the line intersect two dashed line with point x ( + ) x^{(+)} x(+) and x ( − ) x^{(-)} x(). We want to maximize the margin

m a r g i n = ∣ ∣ x ( + ) − x ( − ) ∣ ∣ margin = || x^{(+)} - x^{(-)}|| margin=x(+)x()

Re-express margin

Since w w w is parallel to x ( + ) − x ( − ) x^{(+)} - x^{(-)} x(+)x(), we have x ( + ) − x ( − ) x^{(+)} - x^{(-)} x(+)x() = λ w \lambda w λw for some λ \lambda λ. Then

x ( + ) = λ w + x ( − ) x^{(+)} = \lambda w + x^{(-)} x(+)=λw+x()

Since w ⊤ x ( + ) + b = 1 w^\top x^{(+)} + b = 1 wx(+)

