支持向量机-线性支持向量机和核函数

1.目标函数:
   若样本数据集线性不可分,则增加松弛因子 ξ i ≥ 0 \xi_{i} \geq 0 ξi0,使函数间隔加上松弛变量大于等于1,这样,约束条件变为
y i ( w ⋅ x i + b ) ≥ 1 − ξ i y_{i}\left(w \cdot x_{i}+b\right) \geq 1-\xi_{i} yi(wxi+b)1ξi
   目标函数为:
min ⁡ w , b 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i \min _{w, b} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i} w,bmin21w2+Ci=1Nξi
 s.t.  y i ( w ⋅ x i + b ) ≥ 1 − ξ i , i = 1 , 2 , ⋯   , n \text { s.t. } \quad y_{i}\left(w \cdot x_{i}+b\right) \geq 1-\xi_{i}, \quad i=1,2, \cdots, n  s.t. yi(wxi+b)1ξi,i=1,2,,n
2.带松弛因子的SVM拉格朗日函数及其计算:
L ( w , b , ξ , α , μ ) ≡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i − ∑ i = 1 n α i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 n μ i ξ i L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \mu_{i} \xi_{i} L(w,b,ξ,α,μ)21w2+Ci=1nξii=1nαi(yi(wxi+b)1+ξi)i=1nμiξi
   对 w , b , ξ \mathrm{w}, \mathrm{b}, \xi w,b,ξ分别求偏导
∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i ϕ ( x n ) ∂ L ∂ b = 0 ⇒ 0 = ∑ i = 1 n α i y i ∂ L ∂ ξ i = 0 ⇒ C − α i − μ i = 0 \begin{aligned} &\frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} \phi\left(x_{n}\right)\\ &\frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i}\\ &\frac{\partial L}{\partial \xi_{i}}=0 \Rightarrow C-\alpha_{i}-\mu_{i}=0 \end{aligned} wL=0w=i=1nαiyiϕ(xn)bL=00=i=1nαiyiξiL=0Cαiμi=0
   将三式带入L中得到
min ⁡ w , b , ξ L ( w , b , ξ , α , μ ) = − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 n α i \min _{w, b, \xi} L(w, b, \xi, \alpha, \mu)=-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{n} \alpha_{i} w,b,ξminL(w,b,ξ,α,μ)=21i=1nj=1nαiαjyiyj(xixj)+i=1nαi
   对上式关于 α \alpha α的极大值,得到
max ⁡ a − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 n α i  s.t.  ∑ i = 1 n α i y i = 0 C − α i − μ i = 0 α i ≥ 0 μ i ≥ 0 , i = 1 , 2 , … , n 0 ≤ α i ≤ C \begin{aligned} &\max _{a}-\frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{n} \alpha_{i}\\ &\text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0\\ &\begin{array}{l} C-\alpha_{i}-\mu_{i}=0 \\ \alpha_{i} \geq 0 \\ \mu_{i} \geq 0, \quad i=1,2, \ldots, n \end{array} \quad 0 \leq \alpha_{i} \leq C \end{aligned} amax21i=1nj=1nαiαjyiyj(xixj)+i=1nαi s.t. i=1nαiyi=0Cαiμi=0αi0μi0,i=1,2,,n0αiC
   整理得到对偶问题:
min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} αmin21i=1nj=1nαiαjyiyj(xixj)i=1nαi
 s.t.  ∑ i = 1 n α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , … , n \begin{aligned} &\text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0\\ &0 \leq \alpha_{i} \leq C, \quad i=1,2, \dots, n \end{aligned}  s.t. i=1nαiyi=00αiC,i=1,2,,n
   构造并求解约束最优化问题:
min ⁡ α 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 n α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{n} \alpha_{i} αmin21i=1nj=1nαiαjyiyj(xixj)i=1nαi
 s.t.  ∑ i = 1 n α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , … , n \begin{array}{l} \text { s.t. } \sum_{i=1}^{n} \alpha_{i} y_{i}=0 \\ \qquad 0 \leq \alpha_{i} \leq C, \quad i=1,2, \ldots, n \end{array}  s.t. i=1nαiyi=00αiC,i=1,2,,n
   求解得到最优解 α ∗ \alpha^* α
   计算
w ∗ = ∑ i = 1 n α i ∗ y i x i w^{*}=\sum_{i=1}^{n} \alpha_{i}^{*} y_{i} x_{i} w=i=1nαiyixi
b ∗ = max ⁡ i : y i = − 1 w ∗ ⋅ x i + min ⁡ i : y i = 1 w ∗ ⋅ x i 2 b^{*}=\frac{\max _{i: y_{i}=-1} w^{*} \cdot x_{i}+\min _{i: y_{i}=1} w^{*} \cdot x_{i}}{2} b=2maxi:yi=1wxi+mini:yi=1wxi
   求得分离超平面 w ∗ x + b ∗ = 0 w^{*} x+b^{*}=0 wx+b=0
   分类决策函数
f ( x ) = sign ⁡ ( w ∗ x + b ∗ ) f(x)=\operatorname{sign}\left(w^{*} x+b^{*}\right) f(x)=sign(wx+b)
3.损失函数分析
   绿色:0/1损失;蓝色:SVM Hinge损失函数;红色:Logistic损失函数。
在这里插入图片描述
   Logistic损失函数
在这里插入图片描述
4.核函数
   可以使用核函数,将原始输入空间映射到新的特征空间,从而可以使得原本线性不可分的样本。常见的有三种核函数:
      多项式核函数: κ ( x 1 , x 2 ) = ( x 1 ⋅ x 2 + c ) d \kappa\left(x_{1}, x_{2}\right)=\left(x_{1} \cdot x_{2}+c\right)^{d} κ(x1,x2)=(x1x2+c)d
      高斯核RBF函数: κ ( x 1 , x 2 ) = exp ⁡ ( − γ ⋅ ∥ x 1 − x 2 ∥ 2 ) \kappa\left(x_{1}, x_{2}\right)=\exp \left(-\gamma \cdot\left\|x_{1}-x_{2}\right\|^{2}\right) κ(x1,x2)=exp(γx1x22)
      Sigmoid核函数: κ ( x 1 , x 2 ) = tanh ⁡ ( x 1 ⋅ x 2 + c ) \kappa\left(x_{1}, x_{2}\right)=\tanh \left(x_{1} \cdot x_{2}+c\right) κ(x1,x2)=tanh(x1x2+c)
   (1)多项式核函数:
κ ( x ⃗ , y ⃗ ) = ( x ⃗ ⋅ y ⃗ ) 2 ⇒ ( ∑ i = 1 n x i y i ) 2 = ∑ i = 1 n ∑ j = 1 n x i x j y i y j = ∑ i = 1 n ∑ j = 1 n ( x i x j ) ( y i y j ) \begin{aligned} &\kappa(\vec{x}, \vec{y})=(\vec{x} \cdot \vec{y})^{2}\\ &\Rightarrow\left(\sum_{i=1}^{n} x_{i} y_{i}\right)^{2}\\ &=\sum_{i=1}^{n} \sum_{j=1}^{n} x_{i} x_{j} y_{i} y_{j}\\ &=\sum_{i=1}^{n} \sum_{j=1}^{n}\left(x_{i} x_{j}\right)\left(y_{i} y_{j}\right) \end{aligned} κ(x ,y )=(x y )2(i=1nxiyi)2=i=1nj=1nxixjyiyj=i=1nj=1n(xixj)(yiyj)
     特别的,n=3,即为:
Φ ( x ⃗ ) = ( x 1 x 1 x 1 x 2 x 1 x 3 x 2 x 1 x 2 x 2 x 2 x 3 x 3 x 1 x 3 x 2 x 3 x 3 ) \Phi(\vec{x})=\left(\begin{array}{l} x_{1} x_{1} \\ x_{1} x_{2} \\ x_{1} x_{3} \\ x_{2} x_{1} \\ x_{2} x_{2} \\ x_{2} x_{3} \\ x_{3} x_{1} \\ x_{3} x_{2} \\ x_{3} x_{3} \end{array}\right) Φ(x )=x1x1x1x2x1x3x2x1x2x2x2x3x3x1x3x2x3x3
   另外, κ ( x ⃗ , y ⃗ ) = ( x ⃗ ⋅ y ⃗ + c ) 2 ⇒ ( x ⃗ ⋅ y ⃗ ) 2 + 2 c x ⃗ ⋅ y ⃗ + c 2 = ∑ i = 1 n ∑ j = 1 n ( x i x j ) ( y i y j ) + ∑ i = 1 n ( 2 c x i ⋅ 2 c x j ) + c 2 \begin{array}{l} \kappa(\vec{x}, \vec{y})=(\vec{x} \cdot \vec{y}+c)^{2} \\ \Rightarrow(\vec{x} \cdot \vec{y})^{2}+2 c \vec{x} \cdot \vec{y}+c^{2} \\ =\sum_{i=1}^{n} \sum_{j=1}^{n}\left(x_{i} x_{j}\right)\left(y_{i} y_{j}\right)+\sum_{i=1}^{n}\left(\sqrt{2 c} x_{i} \cdot \sqrt{2 c} x_{j}\right)+c^{2} \end{array} κ(x ,y )=(x y +c)2(x y )2+2cx y +c2=i=1nj=1n(xixj)(yiyj)+i=1n(2c xi2c xj)+c2
     特别的,n=3,即为:
Φ ( x ⃗ ) = ( x 1 x 1 x 1 x 2 x 1 x 3 x 2 x 1 x 2 x 2 x 2 x 3 x 3 x 1 x 3 x 2 x 3 x 3 2 c x 1 2 c x 2 2 c x 3 c ) \Phi(\vec{x})=\left(\begin{array}{c} x_{1} x_{1} \\ x_{1} x_{2} \\ x_{1} x_{3} \\ x_{2} x_{1} \\ x_{2} x_{2} \\ x_{2} x_{3} \\ x_{3} x_{1} \\ x_{3} x_{2} \\ x_{3} x_{3} \\ \sqrt{2 c} x_{1} \\ \sqrt{2 c} x_{2} \\ \sqrt{2 c} x_{3} \\ c \end{array}\right) Φ(x )=x1x1x1x2x1x3x2x1x2x2x2x3x3x1x3x2x3x32c x12c x22c x3c
   核函数映射:
在这里插入图片描述
   (2)高斯核
κ ( x 1 , x 2 ) = e ∣ x 1 − x 2 ∣ 2 2 σ 2 = e − ( x 1 − x 2 ) 2 2 σ 2 = e − x 1 2 + x 2 2 − 2 x 1 x 2 2 σ 2 = e − x 1 2 + x 2 2 2 σ 2 ⋅ e x 1 x 2 σ 2 = e − x 1 2 + x 2 2 2 σ 2 ⋅ ( 1 + 1 σ 2 ⋅ x 1 x 2 1 ! + ( 1 σ 2 ) 2 ⋅ ( x 1 x 2 ) 2 2 ! + ( 1 σ 2 ) 3 ⋅ ( x 1 x 2 ) 3 3 ! + ⋯ + ( 1 σ 2 ) n ⋅ ( x 1 x 2 ) n n ! + ⋯   ) = e x 2 + x 2 2 σ 2 ⋅ ( 1 ⋅ 1 + 1 1 ! x 1 σ ⋅ x 2 σ + 1 2 ! ⋅ x 1 2 σ 2 ⋅ x 2 2 σ 2 + 1 3 ! ⋅ x 1 3 σ 3 ⋅ x 2 3 σ 3 + ⋯ + 1 n ! ⋅ x 1 n σ n ⋅ x 2 n σ n + ⋯   ) = Φ ( x 1 ) T ⋅ Φ ( x 2 ) \begin{aligned} &\kappa\left(x_{1}, x_{2}\right)=e^{\frac{\left|x_{1}-x_{2}\right|^{2}}{2 \sigma^{2}}}=e^{-\frac{\left(x_{1}-x_{2}\right)^{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}-2 x_{1} x_{2}}{2 \sigma^{2}}}=e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot e^{\frac{x_{1} x_{2}}{\sigma^{2}}}\\ &\begin{array}{l} =e^{-\frac{x_{1}^{2}+x_{2}^{2}}{2 \sigma^{2}}} \cdot\left(1+\frac{1}{\sigma^{2}} \cdot \frac{x_{1} x_{2}}{1 !}+\left(\frac{1}{\sigma^{2}}\right)^{2} \cdot \frac{\left(x_{1} x_{2}\right)^{2}}{2 !}+\left(\frac{1}{\sigma^{2}}\right)^{3} \cdot \frac{\left(x_{1} x_{2}\right)^{3}}{3 !}+\cdots+\left(\frac{1}{\sigma^{2}}\right)^{n} \cdot \frac{\left(x_{1} x_{2}\right)^{n}}{n !}+\cdots\right) \\ =e^{\frac{x^{2}+x^{2}}{2 \sigma^{2}}} \cdot\left(1 \cdot 1+\frac{1}{1 !} \frac{x_{1}}{\sigma} \cdot \frac{x_{2}}{\sigma}+\frac{1}{2 !} \cdot \frac{x_{1}^{2}}{\sigma^{2}} \cdot \frac{x_{2}^{2}}{\sigma^{2}}+\frac{1}{3 !} \cdot \frac{x_{1}^{3}}{\sigma^{3}} \cdot \frac{x_{2}^{3}}{\sigma^{3}}+\cdots+\frac{1}{n !} \cdot \frac{x_{1}^{n}}{\sigma^{n}} \cdot \frac{x_{2}^{n}}{\sigma^{n}}+\cdots\right) \\ =\Phi\left(x_{1}\right)^{T} \cdot \Phi\left(x_{2}\right) \end{array} \end{aligned} κ(x1,x2)=e2σ2x1x22=e2σ2(x1x2)2=e2σ2x12+x222x1x2=e2σ2x12+x22eσ2x1x2=e2σ2x12+x22(1+σ211!x1x2+(σ21)22!(x1x2)2+(σ21)33!(x1x2)3++(σ21)nn!(x1x2)n+)=e2σ2x2+x2(11+1!1σx1σx2+2!1σ2x12σ2x22+3!1σ3x13σ3x23++n!1σnx1nσnx2n+)=Φ(x1)TΦ(x2)
其中, Φ ( x ) = e − x 2 2 σ 2 ( 1 , 1 1 ! x σ , 1 2 ! x 2 σ 2 , 1 3 ! x 3 σ 3 , ⋯   , 1 n ! x n σ n , ⋯   ) \Phi(x)=e^{-\frac{x^{2}}{2 \sigma^{2}}}\left(1, \sqrt{\frac{1}{1 !}} \frac{x}{\sigma}, \sqrt{\frac{1}{2 !}} \frac{x^{2}}{\sigma^{2}}, \sqrt{\frac{1}{3 !}} \frac{x^{3}}{\sigma^{3}}, \cdots, \sqrt{\frac{1}{n !}} \frac{x^{n}}{\sigma^{n}}, \cdots\right) Φ(x)=e2σ2x2(1,1!1 σx,2!1 σ2x2,3!1 σ3x3,,n!1 σnxn,)可以知道高斯核是无穷维。
5.总结和思考
   SVM可以用作多分类,1 vs 1 / 1 vs rest
   SVM和Logistic分类比较:SVM直接输出类别,Logistic输出属于哪个类别的后验概率。
   SVM用于回归问题:SVR。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值