支持向量机(Support Vector Machine)是一种有监督的机器学习方法,主要用来解决二分类问题,具体是通过寻求一个决策边界使得两类数据可以最好地分开
需求:现有两类数据点,需要寻找一个决策辨别使得两类数据被最好的分开
决策边界数学表达:
ω
T
x
+
b
=
0
\boldsymbol{\omega}^{T} \boldsymbol{x}+b=0
ωTx+b=0这里
ω
T
\omega^{T}
ωT是一个向量代表着这个决策边界无论是线或者面都可以表达
点到这个决策边界的距离表达:(由于决策边界是面的情况较多见也较复杂,此处计算选择计算点到面的距离)
假设
x
′
\mathbf{x}^{\prime}
x′在决策平面上,所以有
w
T
x
′
=
−
b
\mathbf{w}^{T} \mathbf{x}^{\prime}=-b
wTx′=−b,做出向量
x
−
x
′
x-x^{\prime}
x−x′,利用向量的投影原理,将向量
x
−
x
′
x-x^{\prime}
x−x′投影至平面的法向量
ω
\omega
ω方向并计算模长即
∣
w
T
∥
w
∥
(
x
−
x
′
)
∣
\left|\frac{\mathbf{w}^{T}}{\|\mathbf{w}\|}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\right|
∣∣∣∥w∥wT(x−x′)∣∣∣,而
w
T
x
′
=
−
b
\mathbf{w}^{T} \mathbf{x}^{\prime}=-b
wTx′=−b,化简整理得点到平面的距离为:
distance
(
x
,
b
,
w
)
=
1
∥
w
∥
∣
w
T
x
+
b
∣
\operatorname{distance}(\mathbf{x}, b, \mathbf{w}){=} \frac{1}{\|\mathbf{w}\|}\left|\mathbf{w}^{T} \mathbf{x}+b\right|
distance(x,b,w)=∥w∥1∣∣wTx+b∣∣
优化的目标:
找到一个决策边界(
w
、
b
\mathbf{w}、b
w、b)使得离该边界最近的点能够距离最远
目标函数的导出:
在svm中,label值被定义为1或者-1(Logistic Regression 会定义为0或1)
设决策方程为
y
(
x
)
=
w
T
Φ
(
x
)
+
b
y(x)=w^{T} \Phi(x)+b
y(x)=wTΦ(x)+b,即若
y
(
x
i
)
>
0
y\left(x_{i}\right)>0
y(xi)>0则
y
i
=
1
\mathcal{y}_{i} = 1
yi=1,
y
(
x
i
)
<
0
y\left(x_{i}\right)<0
y(xi)<0则
y
i
=
−
1
\mathcal{y}_{i} = - 1
yi=−1(决策方程中
Φ
(
x
)
\Phi(x)
Φ(x)表示对原数据做了核变换),所以
y
i
⋅
y
(
x
i
)
>
0
y_{i} \cdot y\left(x_{i}\right)>0
yi⋅y(xi)>0
由
y
i
⋅
y
(
x
i
)
>
0
y_{i} \cdot y\left(x_{i}\right)>0
yi⋅y(xi)>0将点到面的距离公式化简为:
distance
(
x
,
b
,
w
)
=
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
∥
w
∥
\operatorname{distance}(\mathbf{x}, b, \mathbf{w}){=} \frac{y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)}{\|w\|}
distance(x,b,w)=∥w∥yi⋅(wT⋅Φ(xi)+b)
根据优化的目标,目标函数为:
arg
max
w
,
b
{
1
∥
w
∥
min
i
[
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
]
}
\underset{w, b}{\arg \max }\left\{\frac{1}{\|w\|} \min _{i}\left[y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)\right]\right\}
w,bargmax{∥w∥1imin[yi⋅(wT⋅Φ(xi)+b)]}
目标函数的化简:
通过放缩变换(等式两边同比例扩大)将结果值变为
∣
Y
∣
>
=
1
|Y|>=1
∣Y∣>=1,即
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1
yi⋅(wT⋅Φ(xi)+b)≥1(缩放函数间隔不影响几何间隔)。
由此我们可以直接得出
min
i
[
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
]
=
1
\min _{i}\left[y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)\right] = 1
mini[yi⋅(wT⋅Φ(xi)+b)]=1所以目标函数变为
arg
max
v
,
b
1
∥
w
∥
\underset{v, b}{\arg \max } \frac{1}{\|w\|}
v,bargmax∥w∥1
S
.
T
:
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
S.T :y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1
S.T:yi⋅(wT⋅Φ(xi)+b)≥1
将目标函数转化为极小化问题后,目标函数变为
min
w
,
b
1
2
w
2
\min _{w, b} \frac{1}{2} w^{2}
w,bmin21w2
S
.
T
:
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
S.T :y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1
S.T:yi⋅(wT⋅Φ(xi)+b)≥1
目标函数的求解:
应用拉格朗日乘子法求解带约束的极小化问题,引入乘子α
得出式子
L
(
w
,
b
,
α
)
=
1
2
∥
w
∥
2
−
∑
i
=
1
n
α
i
(
y
i
(
w
T
⋅
Φ
(
x
i
)
+
b
)
−
1
)
L(w, b, \alpha)=\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right)
L(w,b,α)=21∥w∥2−i=1∑nαi(yi(wT⋅Φ(xi)+b)−1)
将最小化问题转化为(对偶问题)先求最大再求最小问题,又由于式子符合KKT条件,求最大与求最小之间可以交换顺序。目标函数变为先求最小,再求最大问题:
max
α
min
w
,
b
L
(
w
,
b
,
α
)
\max _{\alpha} \min _{w, b} L(w, b, \alpha)
αmaxw,bminL(w,b,α)
对w求偏导得
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
Φ
(
x
n
)
\frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right)
∂w∂L=0⇒w=i=1∑nαiyiΦ(xn)
对b求偏导得
∂
L
∂
b
=
0
⇒
0
=
∑
i
=
1
n
α
i
y
i
\frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i}
∂b∂L=0⇒0=i=1∑nαiyi
将
w
=
∑
i
=
1
n
α
i
y
i
Φ
(
x
n
)
w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right)
w=∑i=1nαiyiΦ(xn)、
0
=
∑
i
=
1
n
α
i
y
i
0=\sum_{i=1}^{n} \alpha_{i} y_{i}
0=∑i=1nαiyi带入
L
(
w
,
b
,
α
)
L(w, b, \alpha)
L(w,b,α)
得
L
(
w
,
b
,
α
)
=
1
2
w
T
w
−
w
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
−
b
∑
i
=
1
n
α
i
y
i
+
∑
i
=
1
n
α
i
=
∑
i
=
1
n
α
i
−
1
2
(
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
)
T
∑
i
=
1
n
α
i
y
i
Φ
(
x
i
)
=
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
α
i
α
j
y
i
y
j
Φ
T
(
x
i
)
Φ
(
x
j
)
\begin{array}{l}{L(w, b, \alpha)=\frac{1}{2} w^{T} w-w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-b \sum_{i=1}^{n} \alpha_{i} y_{i}+\sum_{i=1}^{n} \alpha_{i}} \\ {=\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2}\left(\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)\right)^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)} \\ {=\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j} \Phi^{T}\left(x_{i}\right) \Phi\left(x_{j}\right)}\end{array}
L(w,b,α)=21wTw−wT∑i=1nαiyiΦ(xi)−b∑i=1nαiyi+∑i=1nαi=∑i=1nαi−21(∑i=1nαiyiΦ(xi))T∑i=1nαiyiΦ(xi)=∑i=1nαi−21∑i=1nαiαjyiyjΦT(xi)Φ(xj)
由此就实现了
L
(
w
,
b
,
α
)
L(w, b, \alpha)
L(w,b,α)的求最小部分,即什么样的
w
、
b
\mathbf{w}、b
w、b可以使得
L
(
w
,
b
,
α
)
L(w, b, \alpha)
L(w,b,α)最小,还需要求什么样的α可以使得
L
(
w
,
b
,
α
)
L(w, b, \alpha)
L(w,b,α)最大,即
max
α
∑
i
=
1
n
α
i
−
1
2
∑
i
=
1
n
∑
i
=
1
n
α
i
α
j
y
i
y
j
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
\max _{\alpha} \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{n} \sum_{i=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)
αmaxi=1∑nαi−21i=1∑ni=1∑nαiαjyiyj(Φ(xi)⋅Φ(xj))
S
.
T
:
∑
i
=
1
n
α
i
y
i
=
0
S.T : \sum_{i=1}^{n} \alpha_{i} y_{i}=0
S.T:i=1∑nαiyi=0
α
i
≥
0
\alpha_{i} \geq 0
αi≥0
将其继续转化为求极小值问题得
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
−
∑
i
=
1
n
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)-\sum_{i=1}^{n} \alpha_{i}
αmin21i=1∑nj=1∑nαiαjyiyj(Φ(xi)⋅Φ(xj))−i=1∑nαi
S
.
T
:
∑
i
=
1
n
α
i
y
i
=
0
S.T : \sum_{i=1}^{n} \alpha_{i} y_{i}=0
S.T:i=1∑nαiyi=0
α
i
≥
0
\alpha_{i} \geq 0
αi≥0
由于涉及到原始数据的内积,所以需要将原始数据带入后再继续求解
例如此时原始数据为:正例:(3,3)、(4,3) 负例:(1,1)
将数据代入表达式
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
Φ
(
x
i
)
⋅
Φ
(
x
j
)
)
−
∑
i
=
1
n
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)-\sum_{i=1}^{n} \alpha_{i}
minα21∑i=1n∑j=1nαiαjyiyj(Φ(xi)⋅Φ(xj))−∑i=1nαi
得
1
2
(
18
α
1
2
+
25
α
2
2
+
2
α
3
2
+
42
α
1
α
2
−
12
α
1
α
3
−
14
α
2
α
3
)
−
α
1
−
α
2
−
α
3
\frac{1}{2}\left(18 \alpha_{1}^{2}+25 \alpha_{2}^{2}+2 \alpha_{3}^{2}+42 \alpha_{1} \alpha_{2}-12 \alpha_{1} \alpha_{3}-14 \alpha_{2} \alpha_{3}\right)-\alpha_{1}-\alpha_{2}-\alpha_{3}
21(18α12+25α22+2α32+42α1α2−12α1α3−14α2α3)−α1−α2−α3
又因为
α
1
+
α
2
=
α
3
\alpha_{1}+\alpha_{2}=\alpha_{3}
α1+α2=α3,所以原式化简为
4
α
1
2
+
13
2
α
2
2
+
10
α
1
α
2
−
2
α
1
−
2
α
2
4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2}
4α12+213α22+10α1α2−2α1−2α2
对上式分别对
α
1
,
α
2
\alpha_{1},\alpha_{2}
α1,α2求偏导,得
α
1
=
1.5
,
α
2
=
−
1
\alpha_{1} = 1.5,\alpha_{2} = -1
α1=1.5,α2=−1
这个结果并不满足约束条件
α
i
≥
0
\alpha_{i} \geq 0
αi≥0,所以原式的解在边界上
将
α
1
=
0
,
α
2
=
2
/
13
\alpha_{1} = 0,\alpha_{2} = 2/13
α1=0,α2=2/13带入
4
α
1
2
+
13
2
α
2
2
+
10
α
1
α
2
−
2
α
1
−
2
α
2
4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2}
4α12+213α22+10α1α2−2α1−2α2,结果为-0.153
将
α
1
=
0.25
,
α
2
=
0
\alpha_{1} = 0.25,\alpha_{2} = 0
α1=0.25,α2=0带入
4
α
1
2
+
13
2
α
2
2
+
10
α
1
α
2
−
2
α
1
−
2
α
2
4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2}
4α12+213α22+10α1α2−2α1−2α2,结果为-0.25
所以解为
α
1
=
0.25
,
α
2
=
0
,
α
1
=
0.25
\alpha_{1} = 0.25,\alpha_{2} = 0,\alpha_{1} = 0.25
α1=0.25,α2=0,α1=0.25
将解带入w表达式
w
=
∑
i
=
1
n
α
i
y
i
Φ
(
x
n
)
w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right)
w=i=1∑nαiyiΦ(xn)所以
w
=
1
4
∗
1
∗
(
3
,
3
)
+
1
4
∗
(
−
1
)
∗
(
1
,
1
)
=
(
1
2
,
1
2
)
w=\frac{1}{4} * 1 *(3,3)+\frac{1}{4} *(-1) *(1,1)=\left(\frac{1}{2}, \frac{1}{2}\right)
w=41∗1∗(3,3)+41∗(−1)∗(1,1)=(21,21)
将解带入b表达式
b
=
y
i
−
∑
i
=
1
n
a
i
y
i
(
x
i
x
j
)
b=y_{i}-\sum_{i=1}^{n} a_{i} y_{i}\left(x_{i} x_{j}\right)
b=yi−i=1∑naiyi(xixj)所以
b
=
1
−
(
1
4
∗
1
∗
18
+
1
4
∗
(
−
1
)
∗
6
)
=
−
2
b=1-\left(\frac{1}{4} * 1 * 18+\frac{1}{4} *(-1) * 6\right)=-2
b=1−(41∗1∗18+41∗(−1)∗6)=−2
所以平面方程为
0.5
x
1
+
0.5
x
2
−
2
=
0
0.5 x_{1}+0.5 x_{2}-2=0
0.5x1+0.5x2−2=0
支持向量:
求解α的取值的时候可以看出,并不是每一个点的α都不为0,事实上,只有最靠近决策边界的点的α不等于0,即只有最靠近决策边界的点才会对决策边界的取值产生影响,这样的点被称为支持向量
Soft-Margin
软间隔问题是对硬间隔问题的改进,主要是针对有时候数据中有一些噪音点,如果考虑这些噪音点,会非常难找出划分边界,所以需要引入一个损失loss在目标函数中
此处引入的loss为
max
(
0
,
1
−
(
w
T
x
+
b
)
)
\max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right))
max(0,1−(wTx+b))这种损失函数又被称为hinge loss(合页损失),因为其函数图像像一个合页
所以新的目标函数为
min
1
2
∥
w
∥
2
+
C
∑
max
(
0
,
1
−
(
w
T
x
+
b
)
)
\min \frac{1}{2}\|w\|^{2}+C \sum_{}\max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right))
min21∥w∥2+C∑max(0,1−(wTx+b))
S
.
T
:
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
−
max
(
0
,
1
−
(
w
T
x
+
b
)
)
S.T : y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 - \max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right))
S.T:yi⋅(wT⋅Φ(xi)+b)≥1−max(0,1−(wTx+b))此处的C是一个超参数,代表惩罚系数,当C很大,意味着分类严格不能有错误,当c很小,意味着可以容忍更大的错误
我们定义
ξ
i
=
1
−
(
w
T
x
−
b
)
\xi_{i} = 1-(\mathbf{w}^{T} \mathbf{x}-\mathbf{b})
ξi=1−(wTx−b),所以目标函数变为
min
1
2
∥
w
∥
2
+
C
∑
i
=
1
n
ξ
i
\min \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}
min21∥w∥2+Ci=1∑nξi
S
.
T
:
y
i
⋅
(
w
T
⋅
Φ
(
x
i
)
+
b
)
≥
1
−
ξ
i
S.T : y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 - \xi_{i}
S.T:yi⋅(wT⋅Φ(xi)+b)≥1−ξi
ξ
i
≥
0
\xi_{i}\geq0
ξi≥0
应用拉格朗日方法求解目标函数
L
(
w
,
b
,
ξ
,
α
,
μ
)
≡
1
2
∥
w
∥
2
+
C
∑
i
=
1
n
ξ
i
−
∑
i
=
1
n
α
i
(
y
i
(
w
⋅
x
i
+
b
)
−
1
+
ξ
i
)
−
∑
i
=
1
n
μ
i
ξ
i
L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i}
L(w,b,ξ,α,μ)≡21∥w∥2+Ci=1∑nξi−i=1∑nαi(yi(w⋅xi+b)−1+ξi)−i=1∑nμiξi
依旧是将
L
(
w
,
b
,
ξ
,
α
,
μ
)
L(w, b, \xi, \alpha, \mu)
L(w,b,ξ,α,μ)转化为先求最小,再求最大的问题
max
α
min
w
,
b
L
(
w
,
b
,
ξ
,
α
,
μ
)
\max _{\alpha} \min _{w, b}L(w, b, \xi, \alpha, \mu)
αmaxw,bminL(w,b,ξ,α,μ)
对w求偏导得
∂
L
∂
w
=
0
⇒
w
=
∑
i
=
1
n
α
i
y
i
ϕ
(
x
n
)
\frac{\partial L}{\partial w}=0 \Rightarrow{}w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{n}\right)
∂w∂L=0⇒w=i=1∑nαiyiϕ(xn)
对b求偏导得
∂
L
∂
b
=
0
⇒
0
=
∑
i
=
1
n
α
i
y
i
\frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i}
∂b∂L=0⇒0=i=1∑nαiyi
对
ξ
i
\xi_{i}
ξi求偏导得
∂
L
∂
α
=
0
⇒
C
−
α
i
−
μ
i
=
0
\frac{\partial L}{\partial \alpha}=0 \Rightarrow C-\alpha_{i}-\mu_{i}=0
∂α∂L=0⇒C−αi−μi=0此外,还有两个约束条件为
α
i
≥
0
μ
i
≥
0
\alpha_{i} \geq 0 \quad \mu_{i} \geq 0
αi≥0μi≥0
将三个个式子带入原式,并将求最大问题转化为求最小问题
min
α
1
2
∑
i
=
1
n
∑
j
=
1
n
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
n
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i}
αmin21i=1∑nj=1∑nαiαjyiyj(xi⋅xj)−i=1∑nαi
S
.
T
:
∑
i
=
1
n
α
i
y
i
=
0
S.T: \sum_{i=1}^{n} \alpha_{i} y_{i}=0
S.T:i=1∑nαiyi=0
0
≤
α
i
≤
C
0 \leq \alpha_{i} \leq C
0≤αi≤C依旧是带入数据后,得出α的取值,并通过α的取值计算出
w
、
b
\mathbf{w}、b
w、b,得出最终决策边界
核方法
为了解决低维不可分问题,利用核方法将低维数据映射到高维空间,在高维空间中继续寻找决策边界(高维空间更容易找到决策边界,理论上来说维度足够高任何数据都可以被分开)
举例:
两个点 X = (1,2,3) Y = (4,5,6)
通过
f
(
x
)
=
(
x
1
x
1
,
x
1
x
2
,
x
1
x
3
,
x
2
x
1
,
x
2
x
2
,
x
2
x
3
,
x
3
x
1
x
3
x
2
,
x
3
x
3
)
f(x)=(x_{1} x_{1}, x_{1} x_{2}, x_{1} x_{3}, x_{2} x_{1},x_{2} x_{2}, x_{2} x_{3} ,x_{3} x_{1}x_{3} x_{2}, x_{3} x_{3})
f(x)=(x1x1,x1x2,x1x3,x2x1,x2x2,x2x3,x3x1x3x2,x3x3)的方式映射到高维
结果为
f
(
x
)
=
(
1
,
2
,
3
,
2
,
4
,
6
,
3
,
6
,
9
)
f(x)=(1,2,3,2,4,6,3,6,9)
f(x)=(1,2,3,2,4,6,3,6,9)
f
(
y
)
=
(
16
,
20
,
24
,
20
,
25
,
36
,
24
,
30
,
36
)
f(y)=(16,20,24,20,25,36,24,30,36)
f(y)=(16,20,24,20,25,36,24,30,36)
此时X与Y的内积为
<
f
(
x
)
,
f
(
y
)
>
=
16
+
40
+
72
+
40
+
100
+
180
+
72
+
180
+
324
=
1024
<f(x), f(y)>=16+40+72+40+100+180+72+180+324=1024
<f(x),f(y)>=16+40+72+40+100+180+72+180+324=1024
这样运算可以出结果,但是当维数过大的时候计算的复杂度太高
发现
K
(
x
,
y
)
=
(
<
x
,
y
>
)
∧
2
\mathrm{K}(\mathrm{x}, \mathrm{y})=(<\mathrm{x}, \mathrm{y}>)^{\wedge} 2
K(x,y)=(<x,y>)∧2
即
K
(
x
,
y
)
=
(
4
+
10
+
18
)
∧
2
=
3
2
∧
2
=
1024
\mathrm{K}(\mathrm{x}, \mathrm{y})=(4+10+18)^{\wedge} 2=32^{\wedge} 2=1024
K(x,y)=(4+10+18)∧2=32∧2=1024
这样就降低了计算的复杂度
所以核技巧实际上并没有在高维空间中计算结果,而是在低维空间去完成高维空间的样本内及计算,这是核技巧的好处之一
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.model_selection import train_test_split
#绘制SVM图像的函数-------------------------------------------------------------------------------------------------------------------
def plot_svc_decision_function(model, ax=None, plot_support=True):
"""Plot the decision function for a 2D SVC"""
if ax is None:
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
x = np.linspace(xlim[0], xlim[1], 30)
y = np.linspace(ylim[0], ylim[1], 30)
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
# plot decision boundary and margins
ax.contour(X, Y, P, colors='k',
levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
if plot_support:
ax.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
s=300, linewidth=1, facecolors='none');
ax.set_xlim(xlim)
ax.set_ylim(ylim)
#SVM的实现-------------------------------------------------------------------------------------------------------------------
from sklearn.svm import SVC
X, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.8)#生成数据
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = SVC(kernel="linear", gamma="auto")
model.fit(X_train, y_train)
plt.scatter(X[:, 0], X[:, 1], c=y)
plot_svc_decision_function(model)
plt.show()
print(model.support_vectors_)
print(model.score(X_test, y_test))
#带核函数的SVM-------------------------------------------------------------------------------------------------------------------
from sklearn.datasets.samples_generator import make_circles#生成簇的数据
X, y = make_circles(100, factor=0.1, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = SVC(kernel="rbf", C=1E6)#高斯变换
clf.fit(X_train, y_train)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(clf)
plt.show()
print(clf.predict(X_test))
print(clf.score(X_test, y_test))
#不同C的对比-------------------------------------------------------------------------------------------------------------------
X, y = make_blobs(n_samples=100, centers=2, random_state=0, cluster_std=0.8)
X_train, X_test, y_train, y_test = train_test_split(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y,s=50, cmap = "autumn")
model = SVC(kernel="linear",C=10)
model.fit(X_train,y_train)
plot_svc_decision_function(model)
plt.show()
print(model.score(X_test,y_test))
model = SVC(kernel = "linear", C = 0.1)
model.fit(X_train,y_train)
plt.scatter(X[:, 0], X[:, 1], c=y,s=50, cmap = "autumn")
plot_svc_decision_function(model)
plt.show()
print(model.score(X_test,y_test))
人脸识别的例子
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
#导入人脸数据集------------------------------------------------------------------------------------------------------------------------------
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)#只要大于60张的脸
print(faces.target_names)
print(faces.images.shape)
#打印每一张人脸------------------------------------------------------------------------------------------------------------------------------
fig, ax = plt.subplots(3, 5)
for i, axi in enumerate(ax.flat):
axi.imshow(faces.images[i], cmap='bone')
axi.set(xticks=[], yticks=[],
xlabel=faces.target_names[faces.target[i]])
plt.show()
#每一个像素点为一个样本,这样样本太多,利用PCA降维,并将数据传入SVM------------------------------------------------------------------------------------------------------------------------------
pca = PCA(n_components=150, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)#创建并行,数据先进PCA再进SVM
#创建训练集和测试集并搜索最佳参数
Xtrain, Xtest, ytrain, ytest = train_test_split(faces.data, faces.target,test_size=0.2)
print("*************")
param_grid = {'svc__C': [1, 5, 10, 50],
'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(Xtrain, ytrain)
print(grid.best_params_)
#利用最佳分类器预测测试集------------------------------------------------------------------------------------------------------------------------------
model = grid.best_estimator_
yfit = model.predict(Xtest)
#展示预测结果与真实值------------------------------------------------------------------------------------------------------------------------------
fig, ax = plt.subplots(4, 6)
for i, axi in enumerate(ax.flat):
axi.imshow(Xtest[i].reshape(62, 47), cmap='bone')
axi.set(xticks=[], yticks=[])
axi.set_ylabel(faces.target_names[yfit[i]].split()[-1],
color='black' if yfit[i] == ytest[i] else 'red')#注意这里的写法,正确为黑色名字,不正确为红色
fig.suptitle('Predicted Names; Incorrect Labels in Red', size=14)
plt.show()
#利用classification_report展示分类结果-----------------------------------------------------------------------------------------------------------------------------
from sklearn.metrics import classification_report
print(classification_report(ytest, yfit,
target_names=faces.target_names))
#展示混淆矩阵-----------------------------------------------------------------------------------------------------------------------------
import seaborn as sns
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(ytest, yfit)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=faces.target_names,
yticklabels=faces.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label')
plt.show()