往期文章链接目录
Linear SVM
Idea
We want to find a hyper-plane w ⊤ x + b = 0 w^\top x + b = 0 w⊤x+b=0 that maximizes the margin.
Set up
We first show that the vector w w w is orthogonal to this hyper-plane. Let x 1 x_1 x1, x 2 x_2 x2 be any element on the hyper-plane. So we have w ⊤ x 1 + b = 0 w^\top x_1 + b = 0 w⊤x1+b=0 and w ⊤ x 2 + b = 0 w^\top x_2 + b = 0 w⊤x2+b=0. Then w ⊤ ( x 1 − x 2 ) = 0 w^\top (x_1 - x_2) = 0 w⊤(x1−x2)=0, which implies w w w is orthogonal to the hyper-plane. Now, we set two dashed lines to w ⊤ x + b = 1 w^\top x + b = 1 w⊤x+b=1 and w ⊤ x + b = − 1 w^\top x + b = -1 w⊤x+b=−1. In fact, “ 1 1 1” doesn’t matter and we can pick any value here. “ 1 1 1” is just the convention.
Now we pick any line parallel (orthogonal) to w w w (hyper-plane), then the line intersect two dashed line with point x ( + ) x^{(+)} x(+) and x ( − ) x^{(-)} x(−). We want to maximize the margin
m a r g i n = ∣ ∣ x ( + ) − x ( − ) ∣ ∣ margin = || x^{(+)} - x^{(-)}|| margin=∣∣x(+)−x(−)∣∣
Re-express margin
Since w w w is parallel to x ( + ) − x ( − ) x^{(+)} - x^{(-)} x(+)−x(−), we have x ( + ) − x ( − ) x^{(+)} - x^{(-)} x(+)−x(−) = λ w \lambda w λw for some λ \lambda λ. Then
x ( + ) = λ w + x ( − ) x^{(+)} = \lambda w + x^{(-)} x(+)=λw+x(−)
Since w ⊤ x ( + ) + b = 1 w^\top x^{(+)} + b = 1 w⊤x(+)