SVM原理与实现

支持向量机(Support Vector Machine)是一种有监督的机器学习方法,主要用来解决二分类问题,具体是通过寻求一个决策边界使得两类数据可以最好地分开

需求:现有两类数据点,需要寻找一个决策辨别使得两类数据被最好的分开
在这里插入图片描述
决策边界数学表达 ω T x + b = 0 \boldsymbol{\omega}^{T} \boldsymbol{x}+b=0 ωTx+b=0这里 ω T \omega^{T} ωT是一个向量代表着这个决策边界无论是线或者面都可以表达

点到这个决策边界的距离表达:(由于决策边界是面的情况较多见也较复杂,此处计算选择计算点到面的距离)
假设 x ′ \mathbf{x}^{\prime} x在决策平面上,所以有 w T x ′ = − b \mathbf{w}^{T} \mathbf{x}^{\prime}=-b wTx=b,做出向量 x − x ′ x-x^{\prime} xx,利用向量的投影原理,将向量 x − x ′ x-x^{\prime} xx投影至平面的法向量 ω \omega ω方向并计算模长即 ∣ w T ∥ w ∥ ( x − x ′ ) ∣ \left|\frac{\mathbf{w}^{T}}{\|\mathbf{w}\|}\left(\mathbf{x}-\mathbf{x}^{\prime}\right)\right| wwT(xx),而 w T x ′ = − b \mathbf{w}^{T} \mathbf{x}^{\prime}=-b wTx=b,化简整理得点到平面的距离为: distance ⁡ ( x , b , w ) = 1 ∥ w ∥ ∣ w T x + b ∣ \operatorname{distance}(\mathbf{x}, b, \mathbf{w}){=} \frac{1}{\|\mathbf{w}\|}\left|\mathbf{w}^{T} \mathbf{x}+b\right| distance(x,b,w)=w1wTx+b

优化的目标
找到一个决策边界( w 、 b \mathbf{w}、b wb)使得离该边界最近的点能够距离最远

目标函数的导出
在svm中,label值被定义为1或者-1(Logistic Regression 会定义为0或1)
设决策方程为 y ( x ) = w T Φ ( x ) + b y(x)=w^{T} \Phi(x)+b y(x)=wTΦ(x)+b,即若 y ( x i ) &gt; 0 y\left(x_{i}\right)&gt;0 y(xi)>0 y i = 1 \mathcal{y}_{i} = 1 yi=1 y ( x i ) &lt; 0 y\left(x_{i}\right)&lt;0 y(xi)<0 y i = − 1 \mathcal{y}_{i} = - 1 yi=1(决策方程中 Φ ( x ) \Phi(x) Φ(x)表示对原数据做了核变换),所以 y i ⋅ y ( x i ) &gt; 0 y_{i} \cdot y\left(x_{i}\right)&gt;0 yiy(xi)>0
y i ⋅ y ( x i ) &gt; 0 y_{i} \cdot y\left(x_{i}\right)&gt;0 yiy(xi)>0将点到面的距离公式化简为:
distance ⁡ ( x , b , w ) = y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ∥ w ∥ \operatorname{distance}(\mathbf{x}, b, \mathbf{w}){=} \frac{y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)}{\|w\|} distance(x,b,w)=wyi(wTΦ(xi)+b)
根据优化的目标,目标函数为: arg ⁡ max ⁡ w , b { 1 ∥ w ∥ min ⁡ i [ y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ] } \underset{w, b}{\arg \max }\left\{\frac{1}{\|w\|} \min _{i}\left[y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)\right]\right\} w,bargmax{w1imin[yi(wTΦ(xi)+b)]}

目标函数的化简
通过放缩变换(等式两边同比例扩大)将结果值变为 ∣ Y ∣ &gt; = 1 |Y|&gt;=1 Y>=1,即 y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 yi(wTΦ(xi)+b)1(缩放函数间隔不影响几何间隔)。
由此我们可以直接得出 min ⁡ i [ y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ] = 1 \min _{i}\left[y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)\right] = 1 mini[yi(wTΦ(xi)+b)]=1所以目标函数变为 arg ⁡ max ⁡ v , b 1 ∥ w ∥ \underset{v, b}{\arg \max } \frac{1}{\|w\|} v,bargmaxw1 S . T : y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 S.T :y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 S.Tyi(wTΦ(xi)+b)1
将目标函数转化为极小化问题后,目标函数变为 min ⁡ w , b 1 2 w 2 \min _{w, b} \frac{1}{2} w^{2} w,bmin21w2 S . T : y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 S.T :y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 S.Tyi(wTΦ(xi)+b)1

目标函数的求解
应用拉格朗日乘子法求解带约束的极小化问题,引入乘子α
得出式子 L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 n α i ( y i ( w T ⋅ Φ ( x i ) + b ) − 1 ) L(w, b, \alpha)=\frac{1}{2}\|w\|^{2}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right)-1\right) L(w,b,α)=21w2i=1nαi(yi(wTΦ(xi)+b)1)
将最小化问题转化为(对偶问题)先求最大再求最小问题,又由于式子符合KKT条件,求最大与求最小之间可以交换顺序。目标函数变为先求最小,再求最大问题: max ⁡ α min ⁡ w , b L ( w , b , α ) \max _{\alpha} \min _{w, b} L(w, b, \alpha) αmaxw,bminL(w,b,α)
对w求偏导得 ∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i Φ ( x n ) \frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) wL=0w=i=1nαiyiΦ(xn)
对b求偏导得 ∂ L ∂ b = 0 ⇒ 0 = ∑ i = 1 n α i y i \frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} bL=00=i=1nαiyi
w = ∑ i = 1 n α i y i Φ ( x n ) w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) w=i=1nαiyiΦ(xn) 0 = ∑ i = 1 n α i y i 0=\sum_{i=1}^{n} \alpha_{i} y_{i} 0=i=1nαiyi带入 L ( w , b , α ) L(w, b, \alpha) L(w,b,α)
L ( w , b , α ) = 1 2 w T w − w T ∑ i = 1 n α i y i Φ ( x i ) − b ∑ i = 1 n α i y i + ∑ i = 1 n α i = ∑ i = 1 n α i − 1 2 ( ∑ i = 1 n α i y i Φ ( x i ) ) T ∑ i = 1 n α i y i Φ ( x i ) = ∑ i = 1 n α i − 1 2 ∑ i = 1 n α i α j y i y j Φ T ( x i ) Φ ( x j ) \begin{array}{l}{L(w, b, \alpha)=\frac{1}{2} w^{T} w-w^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)-b \sum_{i=1}^{n} \alpha_{i} y_{i}+\sum_{i=1}^{n} \alpha_{i}} \\ {=\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2}\left(\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)\right)^{T} \sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{i}\right)} \\ {=\sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j} \Phi^{T}\left(x_{i}\right) \Phi\left(x_{j}\right)}\end{array} L(w,b,α)=21wTwwTi=1nαiyiΦ(xi)bi=1nαiyi+i=1nαi=i=1nαi21(i=1nαiyiΦ(xi))Ti=1nαiyiΦ(xi)=i=1nαi21i=1nαiαjyiyjΦT(xi)Φ(xj)
由此就实现了 L ( w , b , α ) L(w, b, \alpha) L(w,b,α)的求最小部分,即什么样的 w 、 b \mathbf{w}、b wb可以使得 L ( w , b , α ) L(w, b, \alpha) L(w,b,α)最小,还需要求什么样的α可以使得 L ( w , b , α ) L(w, b, \alpha) L(w,b,α)最大,即
max ⁡ α ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ i = 1 n α i α j y i y j ( Φ ( x i ) ⋅ Φ ( x j ) ) \max _{\alpha} \sum_{i=1}^{n} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{n} \sum_{i=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right) αmaxi=1nαi21i=1ni=1nαiαjyiyj(Φ(xi)Φ(xj)) S . T : ∑ i = 1 n α i y i = 0 S.T : \sum_{i=1}^{n} \alpha_{i} y_{i}=0 S.Ti=1nαiyi=0 α i ≥ 0 \alpha_{i} \geq 0 αi0
将其继续转化为求极小值问题得 min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( Φ ( x i ) ⋅ Φ ( x j ) ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)-\sum_{i=1}^{n} \alpha_{i} αmin21i=1nj=1nαiαjyiyj(Φ(xi)Φ(xj))i=1nαi S . T : ∑ i = 1 n α i y i = 0 S.T : \sum_{i=1}^{n} \alpha_{i} y_{i}=0 S.Ti=1nαiyi=0 α i ≥ 0 \alpha_{i} \geq 0 αi0
由于涉及到原始数据的内积,所以需要将原始数据带入后再继续求解
例如此时原始数据为:正例:(3,3)、(4,3) 负例:(1,1)
将数据代入表达式 min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( Φ ( x i ) ⋅ Φ ( x j ) ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(\Phi\left(x_{i}\right) \cdot \Phi\left(x_{j}\right)\right)-\sum_{i=1}^{n} \alpha_{i} minα21i=1nj=1nαiαjyiyj(Φ(xi)Φ(xj))i=1nαi
1 2 ( 18 α 1 2 + 25 α 2 2 + 2 α 3 2 + 42 α 1 α 2 − 12 α 1 α 3 − 14 α 2 α 3 ) − α 1 − α 2 − α 3 \frac{1}{2}\left(18 \alpha_{1}^{2}+25 \alpha_{2}^{2}+2 \alpha_{3}^{2}+42 \alpha_{1} \alpha_{2}-12 \alpha_{1} \alpha_{3}-14 \alpha_{2} \alpha_{3}\right)-\alpha_{1}-\alpha_{2}-\alpha_{3} 21(18α12+25α22+2α32+42α1α212α1α314α2α3)α1α2α3
又因为 α 1 + α 2 = α 3 \alpha_{1}+\alpha_{2}=\alpha_{3} α1+α2=α3,所以原式化简为 4 α 1 2 + 13 2 α 2 2 + 10 α 1 α 2 − 2 α 1 − 2 α 2 4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2} 4α12+213α22+10α1α22α12α2
对上式分别对 α 1 , α 2 \alpha_{1},\alpha_{2} α1α2求偏导,得 α 1 = 1.5 , α 2 = − 1 \alpha_{1} = 1.5,\alpha_{2} = -1 α1=1.5α2=1
这个结果并不满足约束条件 α i ≥ 0 \alpha_{i} \geq 0 αi0,所以原式的解在边界上
α 1 = 0 , α 2 = 2 / 13 \alpha_{1} = 0,\alpha_{2} = 2/13 α1=0α2=2/13带入 4 α 1 2 + 13 2 α 2 2 + 10 α 1 α 2 − 2 α 1 − 2 α 2 4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2} 4α12+213α22+10α1α22α12α2,结果为-0.153
α 1 = 0.25 , α 2 = 0 \alpha_{1} = 0.25,\alpha_{2} = 0 α1=0.25α2=0带入 4 α 1 2 + 13 2 α 2 2 + 10 α 1 α 2 − 2 α 1 − 2 α 2 4 \alpha_{1}^{2}+\frac{13}{2} \alpha_{2}^{2}+10 \alpha_{1} \alpha_{2}-2 \alpha_{1}-2 \alpha_{2} 4α12+213α22+10α1α22α12α2,结果为-0.25
所以解为 α 1 = 0.25 , α 2 = 0 , α 1 = 0.25 \alpha_{1} = 0.25,\alpha_{2} = 0,\alpha_{1} = 0.25 α1=0.25α2=0α1=0.25
将解带入w表达式 w = ∑ i = 1 n α i y i Φ ( x n ) w=\sum_{i=1}^{n} \alpha_{i} y_{i} \Phi\left(x_{n}\right) w=i=1nαiyiΦ(xn)所以 w = 1 4 ∗ 1 ∗ ( 3 , 3 ) + 1 4 ∗ ( − 1 ) ∗ ( 1 , 1 ) = ( 1 2 , 1 2 ) w=\frac{1}{4} * 1 *(3,3)+\frac{1}{4} *(-1) *(1,1)=\left(\frac{1}{2}, \frac{1}{2}\right) w=411(3,3)+41(1)(1,1)=(21,21)
将解带入b表达式 b = y i − ∑ i = 1 n a i y i ( x i x j ) b=y_{i}-\sum_{i=1}^{n} a_{i} y_{i}\left(x_{i} x_{j}\right) b=yii=1naiyi(xixj)所以 b = 1 − ( 1 4 ∗ 1 ∗ 18 + 1 4 ∗ ( − 1 ) ∗ 6 ) = − 2 b=1-\left(\frac{1}{4} * 1 * 18+\frac{1}{4} *(-1) * 6\right)=-2 b=1(41118+41(1)6)=2
所以平面方程为 0.5 x 1 + 0.5 x 2 − 2 = 0 0.5 x_{1}+0.5 x_{2}-2=0 0.5x1+0.5x22=0

支持向量
求解α的取值的时候可以看出,并不是每一个点的α都不为0,事实上,只有最靠近决策边界的点的α不等于0,即只有最靠近决策边界的点才会对决策边界的取值产生影响,这样的点被称为支持向量

Soft-Margin
软间隔问题是对硬间隔问题的改进,主要是针对有时候数据中有一些噪音点,如果考虑这些噪音点,会非常难找出划分边界,所以需要引入一个损失loss在目标函数中
此处引入的loss为 max ⁡ ( 0 , 1 − ( w T x + b ) ) \max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right)) max(0,1(wTx+b))这种损失函数又被称为hinge loss(合页损失),因为其函数图像像一个合页

所以新的目标函数为
min ⁡ 1 2 ∥ w ∥ 2 + C ∑ max ⁡ ( 0 , 1 − ( w T x + b ) ) \min \frac{1}{2}\|w\|^{2}+C \sum_{}\max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right)) min21w2+Cmax(0,1(wTx+b)) S . T : y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 − max ⁡ ( 0 , 1 − ( w T x + b ) ) S.T : y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 - \max \left(0,1-(\mathbf{w}^{T} \mathbf{x}+\mathbf{b}\right)) S.Tyi(wTΦ(xi)+b)1max(0,1(wTx+b))此处的C是一个超参数,代表惩罚系数,当C很大,意味着分类严格不能有错误,当c很小,意味着可以容忍更大的错误
我们定义 ξ i = 1 − ( w T x − b ) \xi_{i} = 1-(\mathbf{w}^{T} \mathbf{x}-\mathbf{b}) ξi=1(wTxb),所以目标函数变为
min ⁡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i \min \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i} min21w2+Ci=1nξi S . T : y i ⋅ ( w T ⋅ Φ ( x i ) + b ) ≥ 1 − ξ i S.T : y_{i} \cdot\left(w^{T} \cdot \Phi\left(x_{i}\right)+b\right) \geq 1 - \xi_{i} S.Tyi(wTΦ(xi)+b)1ξi ξ i ≥ 0 \xi_{i}\geq0 ξi0
应用拉格朗日方法求解目标函数
L ( w , b , ξ , α , μ ) ≡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i − ∑ i = 1 n α i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 n μ i ξ i L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i} L(w,b,ξ,α,μ)21w2+Ci=1nξii=1nαi(yi(wxi+b)1+ξi)i=1nμiξi
依旧是将 L ( w , b , ξ , α , μ ) L(w, b, \xi, \alpha, \mu) L(w,b,ξ,α,μ)转化为先求最小,再求最大的问题 max ⁡ α min ⁡ w , b L ( w , b , ξ , α , μ ) \max _{\alpha} \min _{w, b}L(w, b, \xi, \alpha, \mu) αmaxw,bminL(w,b,ξ,α,μ)
对w求偏导得 ∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i ϕ ( x n ) \frac{\partial L}{\partial w}=0 \Rightarrow{}w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{n}\right) wL=0w=i=1nαiyiϕ(xn)
对b求偏导得 ∂ L ∂ b = 0 ⇒ 0 = ∑ i = 1 n α i y i \frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} bL=00=i=1nαiyi
ξ i \xi_{i} ξi求偏导得 ∂ L ∂ α = 0 ⇒ C − α i − μ i = 0 \frac{\partial L}{\partial \alpha}=0 \Rightarrow C-\alpha_{i}-\mu_{i}=0 αL=0Cαiμi=0此外,还有两个约束条件为
α i ≥ 0 μ i ≥ 0 \alpha_{i} \geq 0 \quad \mu_{i} \geq 0 αi0μi0
将三个个式子带入原式,并将求最大问题转化为求最小问题 min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} αmin21i=1nj=1nαiαjyiyj(xixj)i=1nαi S . T : ∑ i = 1 n α i y i = 0 S.T: \sum_{i=1}^{n} \alpha_{i} y_{i}=0 S.Ti=1nαiyi=0 0 ≤ α i ≤ C 0 \leq \alpha_{i} \leq C 0αiC依旧是带入数据后,得出α的取值,并通过α的取值计算出 w 、 b \mathbf{w}、b wb,得出最终决策边界

核方法
为了解决低维不可分问题,利用核方法将低维数据映射到高维空间,在高维空间中继续寻找决策边界(高维空间更容易找到决策边界,理论上来说维度足够高任何数据都可以被分开)

举例:
两个点 X = (1,2,3) Y = (4,5,6)
通过 f ( x ) = ( x 1 x 1 , x 1 x 2 , x 1 x 3 , x 2 x 1 , x 2 x 2 , x 2 x 3 , x 3 x 1 x 3 x 2 , x 3 x 3 ) f(x)=(x_{1} x_{1}, x_{1} x_{2}, x_{1} x_{3}, x_{2} x_{1},x_{2} x_{2}, x_{2} x_{3} ,x_{3} x_{1}x_{3} x_{2}, x_{3} x_{3}) f(x)=(x1x1,x1x2,x1x3,x2x1,x2x2,x2x3,x3x1x3x2,x3x3)的方式映射到高维
结果为 f ( x ) = ( 1 , 2 , 3 , 2 , 4 , 6 , 3 , 6 , 9 ) f(x)=(1,2,3,2,4,6,3,6,9) f(x)=(1,2,3,2,4,6,3,6,9)
f ( y ) = ( 16 , 20 , 24 , 20 , 25 , 36 , 24 , 30 , 36 ) f(y)=(16,20,24,20,25,36,24,30,36) f(y)=(16,20,24,20,25,36,24,30,36)
此时X与Y的内积为 &lt; f ( x ) , f ( y ) &gt; = 16 + 40 + 72 + 40 + 100 + 180 + 72 + 180 + 324 = 1024 &lt;f(x), f(y)&gt;=16+40+72+40+100+180+72+180+324=1024 <f(x),f(y)>=16+40+72+40+100+180+72+180+324=1024
这样运算可以出结果,但是当维数过大的时候计算的复杂度太高
发现 K ( x , y ) = ( &lt; x , y &gt; ) ∧ 2 \mathrm{K}(\mathrm{x}, \mathrm{y})=(&lt;\mathrm{x}, \mathrm{y}&gt;)^{\wedge} 2 K(x,y)=(<x,y>)2
K ( x , y ) = ( 4 + 10 + 18 ) ∧ 2 = 3 2 ∧ 2 = 1024 \mathrm{K}(\mathrm{x}, \mathrm{y})=(4+10+18)^{\wedge} 2=32^{\wedge} 2=1024 K(x,y)=(4+10+18)2=322=1024
这样就降低了计算的复杂度
所以核技巧实际上并没有在高维空间中计算结果,而是在低维空间去完成高维空间的样本内及计算,这是核技巧的好处之一

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.model_selection import train_test_split

#绘制SVM图像的函数-------------------------------------------------------------------------------------------------------------------
def plot_svc_decision_function(model, ax=None, plot_support=True):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    # create grid to evaluate model
    x = np.linspace(xlim[0], xlim[1], 30)
    y = np.linspace(ylim[0], ylim[1], 30)
    Y, X = np.meshgrid(y, x)
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)

    # plot decision boundary and margins
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])

    # plot support vectors
    if plot_support:
        ax.scatter(model.support_vectors_[:, 0],
                   model.support_vectors_[:, 1],
                   s=300, linewidth=1, facecolors='none');
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)







#SVM的实现-------------------------------------------------------------------------------------------------------------------
from sklearn.svm import SVC

X, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.8)#生成数据
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = SVC(kernel="linear", gamma="auto")
model.fit(X_train, y_train)

plt.scatter(X[:, 0], X[:, 1], c=y)
plot_svc_decision_function(model)
plt.show()

print(model.support_vectors_)
print(model.score(X_test, y_test))






#带核函数的SVM-------------------------------------------------------------------------------------------------------------------
from sklearn.datasets.samples_generator import make_circles#生成簇的数据

X, y = make_circles(100, factor=0.1, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y)

clf = SVC(kernel="rbf", C=1E6)#高斯变换
clf.fit(X_train, y_train)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(clf)
plt.show()

print(clf.predict(X_test))
print(clf.score(X_test, y_test))







#不同C的对比-------------------------------------------------------------------------------------------------------------------
X, y = make_blobs(n_samples=100, centers=2, random_state=0, cluster_std=0.8)
X_train, X_test, y_train, y_test = train_test_split(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y,s=50, cmap = "autumn")

model = SVC(kernel="linear",C=10)
model.fit(X_train,y_train)

plot_svc_decision_function(model)
plt.show()

print(model.score(X_test,y_test))


model = SVC(kernel = "linear", C = 0.1)
model.fit(X_train,y_train)

plt.scatter(X[:, 0], X[:, 1], c=y,s=50, cmap = "autumn")
plot_svc_decision_function(model)
plt.show()

print(model.score(X_test,y_test))

人脸识别的例子

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV



#导入人脸数据集------------------------------------------------------------------------------------------------------------------------------
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)#只要大于60张的脸
print(faces.target_names)
print(faces.images.shape)




#打印每一张人脸------------------------------------------------------------------------------------------------------------------------------
fig, ax = plt.subplots(3, 5)
for i, axi in enumerate(ax.flat):
    axi.imshow(faces.images[i], cmap='bone')
    axi.set(xticks=[], yticks=[],
            xlabel=faces.target_names[faces.target[i]])
plt.show()






#每一个像素点为一个样本,这样样本太多,利用PCA降维,并将数据传入SVM------------------------------------------------------------------------------------------------------------------------------
pca = PCA(n_components=150, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)#创建并行,数据先进PCA再进SVM




#创建训练集和测试集并搜索最佳参数
Xtrain, Xtest, ytrain, ytest = train_test_split(faces.data, faces.target,test_size=0.2)
print("*************")
param_grid = {'svc__C': [1, 5, 10, 50],
              'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(Xtrain, ytrain)
print(grid.best_params_)




#利用最佳分类器预测测试集------------------------------------------------------------------------------------------------------------------------------
model = grid.best_estimator_
yfit = model.predict(Xtest)





#展示预测结果与真实值------------------------------------------------------------------------------------------------------------------------------
fig, ax = plt.subplots(4, 6)
for i, axi in enumerate(ax.flat):
    axi.imshow(Xtest[i].reshape(62, 47), cmap='bone')
    axi.set(xticks=[], yticks=[])
    axi.set_ylabel(faces.target_names[yfit[i]].split()[-1],
                   color='black' if yfit[i] == ytest[i] else 'red')#注意这里的写法,正确为黑色名字,不正确为红色
fig.suptitle('Predicted Names; Incorrect Labels in Red', size=14)
plt.show()





#利用classification_report展示分类结果-----------------------------------------------------------------------------------------------------------------------------
from sklearn.metrics import classification_report
print(classification_report(ytest, yfit,
                            target_names=faces.target_names))






#展示混淆矩阵-----------------------------------------------------------------------------------------------------------------------------
import seaborn as sns
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(ytest, yfit)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
            xticklabels=faces.target_names,
            yticklabels=faces.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label')
plt.show()
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值