支持向量机与案例-SVM调参

支持向量机

Support Vector Machine

要解决的问题:什么样的决策边界才是最好的呢?特征数据本身如果就很难分,怎么办呢?计算复杂度怎么样?能实际应用吗?
目标:基于上述问题对SVM进行推导

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yQK7Hm2D-1642150598911)(笔记图片/image-20220114140117203.png)]

决策边界:选出来离雷区最远的(雷区就是边界上的点,要Large Margin)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EEaF1vL1-1642150598912)(笔记图片/image-20220114140645085.png)]

距离的计算

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gJoGhCJT-1642150598913)(笔记图片/image-20220114140726896.png)]

数据标签定义
数据集:(X1,Y1)(X2,Y2)… (Xn,Yn)

Y为样本的类别: 当X为正例时候 Y = +1, 当X为负例时候 Y = -1
决策方程: y ( x ) = w T Φ ( x ) + b y(x)=w^{T} \Phi(x)+b y(x)=wTΦ(x)+b(其中 Φ ( x ) \Phi(x) Φ(x) 是对数据做了变换,后面继续说)

⇒ y ( x i ) > 0 ⇔ y i = + 1 = y ( x i ) < 0 ⇔ y i = − 1 ⇒ y i ⋅ y ( x i ) > 0 \Rightarrow \begin{array}{l} y\left(x_{i}\right)>0 \Leftrightarrow y_{i}=+1 \\ =y\left(x_{i}\right)<0 \Leftrightarrow y_{i}=-1 \end{array} \quad \Rightarrow y_{i} \cdot y\left(x_{i}\right)>0 y(xi)>0yi=+1=y(xi)<0yi=1yiy(xi)>0

优化的目标
通俗解释:找到一个条线(w和b),使得离该线最近的点(雷区)能够最远

将点到直线的距离化简得: y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ∥ w ∥ \frac{y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)}{\|w\|} wyi(wTΦ(xi)+b) (由于 y i ⋅ y ( x i ) > 0 y_{i} \cdot y\left(x_{i}\right)>0 yiy(xi)>0 所以将绝对值展开原始依旧成立)

目标函数
放缩变换:对于决策方程(w,b)可以通过放缩使得其结果值|Y|>= 1
=> y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 yi(wTΦ(xi)+b)1 (之前我们认为恒大于0,现在严格了些)
优化目标: arg ⁡ max ⁡ w , b { 1 ∥ v ∥ min ⁡ i [ y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ] } \underset{w, b}{\arg \max }\left\{\frac{1}{\|v\|} \min _{i}\left[y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)\right]\right\} w,bargmax{v1mini[yi(wTΦ(xi)+b)]}
由于 y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 yi(wTΦ(xi)+b)1,只需要考虑 arg ⁡ max ⁡ w , b 1 ∥ w ∥ \underset{w, b}{\arg \max } \frac{1}{\|w\|} w,bargmaxw1(目标函数搞定!)

当前目标: max ⁡ w , b 1 ∥ W ∥ \max _{w, b} \frac{1}{\|W\|} maxw,bW1 , 约束条件: y i ( w T ⋅ Φ ( x i ) + b ) ≥ 1 y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 yi(wTΦ(xi)+b)1

常规套路:将求解极大值问题转换成极小值问题=> min ⁡ w , b 1 2 w 2 \min _{w, b} \frac{1}{2} w^{2} minw,b21w2

如何求解:应用拉格朗日乘子法求解

拉格朗日乘子法
带约束的优化问题:

min ⁡ x f 0 ( x )  subject to  f i ( x ) ≤ 0 , i = 1 , … m h i ( x ) = 0 , i = 1 , … q \begin{array}{ll} \min _{x} \quad f_{0}(x) & \\ \text { subject to } & f_{i}(x) \leq 0, i=1, \ldots m \\ & h_{i}(x)=0, i=1, \ldots q \end{array} minxf0(x) subject to fi(x)0,i=1,mhi(x)=0,i=1,q

原式转换: min ⁡ L ( x , λ , v ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 q v i h i ( x ) \min L(x, \lambda, v)=f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{q} v_{i} h_{i}(x) minL(x,λ,v)=f0(x)+i=1mλifi(x)+i=1qvihi(x)

我们的式子: L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 n α i ( y i ( w T ⋅ Φ ( x i ) + b ) − 1 ) L(w, b, \alpha)=\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right) L(w,b,α)=21w2i=1nαi(yi(wTΦ(xi)+b)1) (约束条件不要忘: y i ( w T ⋅ Φ ( x i ) + b ) ≥ 1 y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 yi(wTΦ(xi)+b)1

SVM求解

分别对w和b求偏导,分别得到两个条件(由于对偶性质)

min ⁡ w , b max ⁡ α L ( w , b , α ) → max ⁡ α min ⁡ w , b L ( w , b , α ) \min _{w, b} \max _{\alpha} L(w, b, \alpha) \rightarrow \max _{\alpha} \min _{w, b} L(w, b, \alpha) minw,bmaxαL(w,b,α)maxαminw,bL(w,b,α)

对w求偏导: ∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i Φ ( x n ) \frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) wL=0w=i=1nαiyiΦ(xn)

对b求偏导: ∂ L ∂ b = 0 ⇒ 0 = ∑ i = 1 n α i y i \frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} bL=00=i=1nαiyi

带入原始: L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 n α i ( y i ( w T ⋅ Φ ( x i ) + b ) − 1 ) L(w, b, \alpha)=\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right) L(w,b,α)=21w2i=1nαi(yi(wTΦ(xi)+b)1) 其中 w = ∑ i = 1 n α i y i Φ ( x n ) w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) w=i=1nαiyiΦ(xn) 0 = ∑ i = 1 n α i y i 0=\sum_{i=1}^{n} \alpha_{i} y_{i} 0=i=1nαiyi

= 1 2 w T w − w T ∑ i = 1 n α i y i Φ ( x i ) − b ∑ i = 1 n α i y i + ∑ i = 1 n α i = ∑ i = 1 n α i − 1 2 ( ∑ i = 1 n α i y i Φ ( x i ) ) T ∑ i = 1 n α i y i Φ ( x i ) = ∑ i = 1 n α i − 1 2 ∑ i = 1 , j = 1 n α i α j y i y j Φ T ( x i ) Φ ( x j ) \begin{array}{l} =\frac{1}{2} w^{T} w-w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-b \sum_{i=1}^{n} \alpha_{i} y_{i}+\sum_{i=1}^{n} \alpha_{i} \\ =\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2}\left(\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)\right)^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right) \\ =\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1, j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j} \Phi^{T}\left(x_{i}\right) \Phi\left(x_{j}\right) \end{array} =21wTwwTi=1nαiyiΦ(xi)bi=1nαiyi+i=1nαi=i=1nαi21(i=1nαiyiΦ(xi))Ti=1nαiyiΦ(xi)=i=1nαi21i=1,j=1nαiαjyiyjΦT(xi)Φ(xj)

完成了第一步求解 min ⁡ w , b L ( w , b , α ) \min _{w, b} L(w, b, \alpha) minw,bL(w,b,α)

继续对ɑ求极大值: max ⁡ α ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( Φ ( x i ) ⋅ Φ ( x j ) ) \max _{\alpha} \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right) maxαi=1nαi21i=1nj=1nαiαjyiyj(Φ(xi)Φ(xj)) 条件: ∑ i = 1 n α i y i = 0 \sum_{i=1}^{n} \alpha_{i} y_{i}=0 i=1nαiyi=0 , α i ≥ 0 \alpha_{i} \geq 0 αi0

极大值转换成求极小值: min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( Φ ( x i ) ⋅ Φ ( x j ) ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)-\sum_{i=1}^{n} \alpha_{i} minα21i=1nj=1nαiαjyiyj(Φ(xi)Φ(xj))i=1nαi 条件: ∑ i = 1 n α i y i = 0 \sum_{i=1}^{n} \alpha_{i} y_{i}=0 i=1nαiyi=0 , α i ≥ 0 \alpha_{i} \geq 0 αi0

SVM求解实例

数据:3个点,其中正例 X1(3,3) ,X2(4,3) ,负例X3(1,1)

求解: 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} 21i=1nj=1nαiαjyiyj(xixj)i=1nαi

约束条件: α 1 + α 2 − α 3 = 0 α i ≥ 0 , i = 1 , 2 , 3 \begin{array}{l} \alpha_{1}+\alpha_{2}-\alpha_{3}=0 \\ \alpha_{i} \geq 0, \quad i=1,2,3 \end{array} α1+α2α3=0αi0,i=1,2,3

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-18anL6vj-1642150598913)(笔记图片/image-20220114145924388.png)]

原式: 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} 21i=1nj=1nαiαjyiyj(xixj)i=1nαi ,将数据代入

1 2 ( 18 α 1 2 + 25 α 2 2 + 2 α 3 2 + 42 α 1 α 2 − 12 α 1 α 3 − 14 α 2 α 3 ) − α 1 − α 2 − α 3 \frac{1}{2}\left(18 \alpha_{1}^{2}+25 \alpha_{2}^{2}+2 \alpha_{3}^{2}+42 \alpha_{1} \alpha_{2}-12 \alpha_{1} \alpha_{3}-14 \alpha_{2} \alpha_{3}\right)-\alpha_{1}-\alpha_{2}-\alpha_{3} 21(18α12+25α22+2α32+42α1α212α1α314α2α3)α1α2α3

由于: α 1 + α 2 = α 3 \alpha_{1}+\alpha_{2}=\alpha_{3} α1+α2=α3 化简可得: 4 α 1 2 + 13 2 α 2 2 + 10 α 1 α 2 − 2 α 1 − 2 α 2 4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2} 4α12+213α22+10α1α22α12α2

分别对ɑ1和ɑ2求偏导,偏导等于0可得: α 1 = 1.5 α 2 = − 1 \begin{array}{l} \alpha_{1}=1.5 \\ \alpha_{2}=-1 \end{array} α1=1.5α2=1 (并不满足约束条件 α i ≥ 0 , i = 1 , 2 , 3 \alpha_{i} \geq 0, \quad i=1,2,3 αi0,i=1,2,3 ,所以解应在边界上)

α 1 = 0 α 2 = − 2 / 13 \begin{array}{l} \alpha_{1}=0 \\ \alpha_{2}=-2 / 13 \end{array} α1=0α2=2/13 -> 带入原式=-0.153 (不满足约束)

α 1 = 0.25 α 2 = 0 \begin{array}{l} \alpha_{1}=0.25 \\ \alpha_{2}=0 \end{array} α1=0.25α2=0 ->带入原式=-0.25 (满足啦!)

最小值在(0.25,0,0.25)处取得

将ɑ结果带入求解 w = ∑ i = 1 n α i y i Φ ( x n ) w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) w=i=1nαiyiΦ(xn)

w = 1 4 ∗ 1 ∗ ( 3 , 3 ) + 1 4 ∗ ( − 1 ) ∗ ( 1 , 1 ) = ( 1 2 , 1 2 ) w=\frac{1}{4} * 1 *(3,3)+\frac{1}{4} *(-1) *(1,1)=\left(\frac{1}{2}, \frac{1}{2}\right) w=411(3,3)+41(1)(1,1)=(21,21)

b = y i − ∑ i = 1 n a i y i ( x i x j ) = 1 − ( 1 4 ∗ 1 ∗ 18 + 1 4 ∗ ( − 1 ) ∗ 6 ) = − 2 b=y_{i}-\sum_{i=1}^{n} a_{i} y_{i}\left(x_{i} x_{j}\right)=1-\left(\frac{1}{4} * 1 * 18+\frac{1}{4} *(-1) * 6\right)=-2 b=yii=1naiyi(xixj)=1(41118+41(1)6)=2

平面方程为: 0.5 x 1 + 0.5 x 2 − 2 = 0 0.5 x_{1}+0.5 x_{2}-2=0 0.5x1+0.5x22=0

支持向量:真正发挥作用的数据点,ɑ值不为0的点

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ItNFLGx1-1642150598914)(笔记图片/image-20220114151716334.png)]

soft-margin

软间隔:有时候数据中有一些噪音点,如果考虑它们咱们的线就不太好了

之前的方法要求要把两类点完全分得开,这个要求有点过于严格了,我们来放松一点!

为了解决该问题,引入松弛因子 y i ( w ⋅ x i + b ) ≥ 1 − ξ i y_{i}\left(w \cdot x_{i}+b\right) \geq 1-\xi_{i} yi(wxi+b)1ξi

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-97CMlwCl-1642150598915)(笔记图片/image-20220114151826662.png)]

新的目标函数: min ⁡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i \min \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i} min21w2+Ci=1nξi

当C趋近于很大时:意味着分类严格不能有错误;当C趋近于很小时:意味着可以有更大的错误容忍。C是我们需要指定的一个参数!

拉格朗日乘子法: L ( w , b , ξ , α , μ ) ≡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i − ∑ i = 1 n α i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 n μ i ξ i L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i} L(w,b,ξ,α,μ)21w2+Ci=1nξii=1nαi(yi(wxi+b)1+ξi)i=1nμiξi

w = ∑ i = 1 n α i y i ϕ ( x n ) min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{n}\right) \quad \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} w=i=1nαiyiϕ(xn)minα21i=1nj=1nαiαjyiyj(xixj)i=1nαi

约束: 0 = ∑ i = 1 n α i y i C − α i − μ i = 0 α i ≥ 0 μ i ≥ 0 \begin{array}{l} 0=\sum_{i=1}^{n} \alpha_{i} y_{i} \\ C-\alpha_{i}-\mu_{i}=0 \\ \alpha_{i} \geq 0 \quad \mu_{i} \geq 0 \end{array} 0=i=1nαiyiCαiμi=0αi0μi0 ,同样的解法: min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i ∑ i = 1 n α i y i = 0 0 ≤ α i ≤ C \begin{array}{l} \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} \\ \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \\ 0 \leq \alpha_{i} \leq C \end{array} minα21i=1nj=1nαiαjyiyj(xixj)i=1nαii=1nαiyi=00αiC

低维不可分问题

核变换:既然低维的时候不可分,那我给它映射到高维呢?

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AF9no3AQ-1642150598916)(笔记图片/image-20220114153114674.png)]

目标:找到一种变换的方法,也就是 ϕ ( X ) \phi(X) ϕ(X)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-v6jjQ0Ih-1642150598917)(笔记图片/image-20220114153211922.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cSXWZkbS-1642150598917)(笔记图片/image-20220114153235089.png)]

高斯核函数: K ( X , Y ) = exp ⁡ { − ∥ X − Y ∥ 2 2 σ 2 } K(\mathrm{X}, \mathrm{Y})=\exp \left\{-\frac{\|X-Y\|^{2}}{2 \sigma^{2}}\right\} K(X,Y)=exp{2σ2XY2}

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-g3xZ2AdD-1642150598918)(笔记图片/image-20220114153324745.png)]

案例-SVM调参

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# use seaborn plotting defaults
import seaborn as sns; sns.set()

支持向量基本原理

<img src="…/images/3.png", width=“900”>

如何解决这个线性不可分问题呢?咱们给它映射到高维来试试

z = x 2 + y 2 z=x^2+y^2 z=x2+y2.

例子

#随机来点数据
# from sklearn.datasets.samples_generator import make_blobs
from sklearn.datasets import make_blobs
# cluster_std 簇的离散程度
X, y = make_blobs(n_samples=50, centers=2,
                  random_state=0, cluster_std=0.60)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lnPfqy2v-1642150598919)(笔记图片/image-20220114164427675.png)]

随便的画几条分割线,哪个好来着?

xfit = np.linspace(-1, 3.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plt.plot([0.6], [2.1], 'x', color='red', markeredgewidth=2, markersize=10)

for m, b in [(1, 0.65), (0.5, 1.6), (-0.2, 2.9)]:
    plt.plot(xfit, m * xfit + b, '-k')

plt.xlim(-1, 3.5);

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OfwywH8V-1642150598919)(笔记图片/image-20220114164502571.png)]

Support Vector Machines: 最小化 雷区

xfit = np.linspace(-1, 3.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')

for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:
    yfit = m * xfit + b
    plt.plot(xfit, yfit, '-k')
    plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none',
                     color='#AAAAAA', alpha=0.4)

plt.xlim(-1, 3.5);

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c1Yx1ytJ-1642150598919)(笔记图片/image-20220114164530761.png)]

# 训练一个基本的SVM
from sklearn.svm import SVC # "Support vector classifier"
model = SVC(kernel='linear')
model.fit(X, y)

SVC(kernel=‘linear’)

#绘图函数
def plot_svc_decision_function(model, ax=None, plot_support=True):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    
    # create grid to evaluate model
    x = np.linspace(xlim[0], xlim[1], 30)
    y = np.linspace(ylim[0], ylim[1], 30)
    Y, X = np.meshgrid(y, x)
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)
    
    # plot decision boundary and margins
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])
    
    # plot support vectors
    if plot_support:
        ax.scatter(model.support_vectors_[:, 0],
                   model.support_vectors_[:, 1],
                   s=300, linewidth=1, facecolors='none');
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(model);

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PrD9Yis7-1642150598920)(笔记图片/image-20220114164615830.png)]

  • 这条线就是我们希望得到的决策边界啦

  • 观察发现有3个点做了特殊的标记,它们恰好都是边界上的点

  • 它们就是我们的support vectors(支持向量)

  • 在Scikit-Learn中, 它们存储在这个位置 support_vectors_(一个属性)

model.support_vectors_
array([[0.44359863, 3.11530945],
       [2.33812285, 3.43116792],
       [2.06156753, 1.96918596]])
  • 观察可以发现,只需要支持向量我们就可以把模型构建出来

  • 接下来我们尝试一下,用不同多的数据点,看看效果会不会发生变化

  • 分别使用60个和120个数据点

def plot_svm(N=10, ax=None):
    X, y = make_blobs(n_samples=200, centers=2,
                      random_state=0, cluster_std=0.60)
    X = X[:N]
    y = y[:N]
    model = SVC(kernel='linear', C=1E10)
    model.fit(X, y)
    
    ax = ax or plt.gca()
    ax.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
    ax.set_xlim(-1, 4)
    ax.set_ylim(-1, 6)
    plot_svc_decision_function(model, ax)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
for axi, N in zip(ax, [60, 120]):
    plot_svm(N, axi)
    axi.set_title('N = {0}'.format(N))

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ptvxArST-1642150598920)(笔记图片/image-20220114164714072.png)]

  • 左边是60个点的结果,右边的是120个点的结果
  • 观察发现,只要支持向量没变,其他的数据怎么加无所谓!

引入核函数的SVM

  • 首先我们先用线性的核来看一下在下面这样比较难的数据集上还能分了吗?
from sklearn.datasets import make_circles
X, y = make_circles(100, factor=.1, noise=.1)

clf = SVC(kernel='linear').fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(clf, plot_support=False);

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-16dd8JQN-1642150598920)(笔记图片/image-20220114164751892.png)]

  • 坏菜喽,分不了了,那咋办呢?试试高维核变换吧!
  • We can visualize this extra data dimension using a three-dimensional plot:
#加入了新的维度r
from mpl_toolkits import mplot3d
r = np.exp(-(X ** 2).sum(1))
def plot_3D(elev=30, azim=30, X=X, y=y):
    ax = plt.subplot(projection='3d')
    ax.scatter3D(X[:, 0], X[:, 1], r, c=y, s=50, cmap='autumn')
    ax.view_init(elev=elev, azim=azim)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('r')

plot_3D(elev=45, azim=45, X=X, y=y)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-76n8JvSw-1642150598921)(笔记图片/image-20220114164824559.png)]

#加入径向基函数
clf = SVC(kernel='rbf', C=1E6)
clf.fit(X, y)

SVC(C=1000000.0)

#这回牛逼了!
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(clf)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
            s=300, lw=1, facecolors='none');

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VOwAQhWM-1642150598921)(笔记图片/image-20220114164902732.png)]

使用这种核支持向量机,我们学习一个合适的非线性决策边界。这种核变换策略在机器学习中经常被使用!

调节SVM参数: Soft Margin问题

调节C参数

  • 当C趋近于无穷大时:意味着分类严格不能有错误
  • 当C趋近于很小的时:意味着可以有更大的错误容忍
X, y = make_blobs(n_samples=100, centers=2,
                  random_state=0, cluster_std=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn');

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dD0LugC4-1642150598921)(笔记图片/image-20220114164937470.png)]

X, y = make_blobs(n_samples=100, centers=2,
                  random_state=0, cluster_std=0.8)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)

for axi, C in zip(ax, [10.0, 0.1]):
    model = SVC(kernel='linear', C=C).fit(X, y)
    axi.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
    plot_svc_decision_function(model, axi)
    axi.scatter(model.support_vectors_[:, 0],
                model.support_vectors_[:, 1],
                s=300, lw=1, facecolors='none');
    axi.set_title('C = {0:.1f}'.format(C), size=14)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c5CwxvvP-1642150598921)(笔记图片/image-20220114165003169.png)]

# gamma值控制模型的复杂程度
X, y = make_blobs(n_samples=100, centers=2,
                  random_state=0, cluster_std=1.1)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)

for axi, gamma in zip(ax, [10.0, 0.1]):
    model = SVC(kernel='rbf', gamma=gamma).fit(X, y)
    axi.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
    plot_svc_decision_function(model, axi)
    axi.scatter(model.support_vectors_[:, 0],
                model.support_vectors_[:, 1],
                s=300, lw=1, facecolors='none');
    axi.set_title('gamma = {0:.1f}'.format(gamma), size=14)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2OjH5a7G-1642150598922)(笔记图片/image-20220114165028249.png)]

Example: Face Recognition

As an example of support vector machines in action, let’s take a look at the facial recognition problem.
We will use the Labeled Faces in the Wild dataset, which consists of several thousand collated photos of various public figures.
A fetcher for the dataset is built into Scikit-Learn:

# 人脸分类数据集
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)
print(faces.target_names)
print(faces.images.shape)

[‘Ariel Sharon’ ‘Colin Powell’ ‘Donald Rumsfeld’ ‘George W Bush’
‘Gerhard Schroeder’ ‘Hugo Chavez’ ‘Junichiro Koizumi’ ‘Tony Blair’]
(1348, 62, 47)

Let’s plot a few of these faces to see what we’re working with:

fig, ax = plt.subplots(3, 5)
for i, axi in enumerate(ax.flat):
    axi.imshow(faces.images[i], cmap='bone')
    axi.set(xticks=[], yticks=[],
            xlabel=faces.target_names[faces.target[i]])

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iriyzUIg-1642150598923)(笔记图片/image-20220114165127838.png)]

  • 每个图的大小是 [62×47]
  • 在这里我们就把每一个像素点当成了一个特征,但是这样特征太多了,用PCA降维一下吧!
from sklearn.svm import SVC
#from sklearn.decomposition import RandomizedPCA
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=150, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(faces.data, faces.target,
                                                random_state=40)

使用grid search cross-validation来选择我们的参数

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10],
              'svc__gamma': [0.0001, 0.0005, 0.001]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(Xtrain, ytrain)
print(grid.best_params_)

Wall time: 15.3 s
{‘svc__C’: 5, ‘svc__gamma’: 0.001}

model = grid.best_estimator_
yfit = model.predict(Xtest)
yfit.shape

(337,)

fig, ax = plt.subplots(4, 6)
for i, axi in enumerate(ax.flat):
    axi.imshow(Xtest[i].reshape(62, 47), cmap='bone')
    axi.set(xticks=[], yticks=[])
    axi.set_ylabel(faces.target_names[yfit[i]].split()[-1],
                   color='black' if yfit[i] == ytest[i] else 'red')
fig.suptitle('Predicted Names; Incorrect Labels in Red', size=14);

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vNQJRn7B-1642150598923)(笔记图片/image-20220114165303836.png)]

from sklearn.metrics import classification_report
print(classification_report(ytest, yfit,
                            target_names=faces.target_names))

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1NYFmBdv-1642150598923)(笔记图片/image-20220114165331145.png)]

  • 精度(precision) = 正确预测的个数(TP)/被预测正确的个数(TP+FP)
  • 召回率(recall)=正确预测的个数(TP)/预测个数(TP+FN)
  • F1 = 2精度召回率/(精度+召回率)
# 对角线数据就是指将自己预测为自己的样本数
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(ytest, yfit)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
            xticklabels=faces.target_names,
            yticklabels=faces.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ggtUgjZS-1642150598924)(笔记图片/image-20220114165400617.png)]

  • 这样显示出来能帮助我们查看哪些人更容易弄混

问题解决

ModuleNotFoundError: No module named 'sklearn.datasets.samples_generator’
https://blog.csdn.net/qq_46092061/article/details/119033931

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值