Machine Learning 06 - Support Vector Machine

正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。

6.1 Large Margin Classification

6.1.1 Optimizaiton objective

Here we intorduce the last supervised algorithm : Support Vector Machine.

Hypothesis :

hθ(x)={10ifθTx0otherwise h θ ( x ) = { 1 if θ T x ≥ 0 0 otherwise

Cost function :

minθ Ci=1m[y(i)cost1(θTx(i))+(1y(i))cost0(θTx(i))]+12i=1nθ2j min θ   C ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) ] + 1 2 ∑ i = 1 n θ j 2

where cost1 c o s t 1 is the cost when y=1 y = 1 and cost0 c o s t 0 is the cost when y=0 y = 0 . An intuitive explanation is below :

cost function

Decision boundary :

decision boundary

SVM will find a line that has the largest margin between the data. And the regularized term C C is intuitively show below :

regularized term

6.1.2 Concept of kernels

In this part, in order to fit Non-linear decision boundary, we will adapt the hypothesis function to

hθ(x)={1θ0+θ1f1+θ2f200 otherwise 

(1) Polynomial

fi=xki(i,j=1,2,) f i = x i k ( i , j = 1 , 2 , ⋯ )

It can fit dataset very well, but we don’t know which features to add and it is very computationally expensive.

(2) Gaussian Kernel
First, choose some landmarks l(i)( i=1,2,) l ( i ) (   i = 1 , 2 , ⋯ )

Second, define fi( i=1,2,) f i (   i = 1 , 2 , ⋯ ) , such as Gaussian Kernel :

fi=expxl(i)22σ2=sim(x,l(i)) f i = e x p ( − ‖ x − l ( i ) ‖ 2 2 σ 2 ) = sim ( x , l ( i ) )

It mesures the similarity of two points :

  • If xl(i):fi1 x ≈ l ( i ) : f i ≈ 1 ,
  • If x x is far from l(i) : fi0 f i ≈ 0 .

And the σ σ just like a scale of the distance of two points :

example

Finally, what it perdicet (for example) is :

example

6.1.3 SVM with kernels

(1) Choose landmarks

Given (x(1),y(1)),(x(2),y(2)),,(x(m),y(m)) ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯ , ( x ( m ) , y ( m ) ) ,and choose l(1)=x(1),l(2)=x(2),,l(m)=x(m) l ( 1 ) = x ( 1 ) , l ( 2 ) = x ( 2 ) , ⋯ , l ( m ) = x ( m ) .

(2) Define kernels

We define f f as Gaussian Kernel :

f(i)=[f0(i)f1(i)fm(i)]=[sim(x(i),l(1))sim(x(i),l(2))sim(x(i),l(m))],i=1,2,,m

(3) Training

minθ Ci=1m[y(i)cost1(θTf(i))+(1y(i))cost0(θTf(i))]+12j=1mθ2j min θ   C ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T f ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T f ( i ) ) ] + 1 2 ∑ j = 1 m θ j 2

Use minimization algorirhm to solve it.

(4) Evaluation

  • Large C C :Lower bias, higher variance.
  • Small C:Higher bias, lower variance.
  • Large σ2 σ 2 : Higher bias, lower variance. ( f f is more “smooth”)
  • Small σ2: Lower bias, higher variance.

(5) Note

  • Perform feature scaling before using the Gaussian Kernel .
  • Not all similarity functions make valid kernals. (Need to satisfy “Mercer Theorem” to make sure SVM packages run correctly)
  • Other kernels : Polynomial kernel, String kernel, …
  • Muti-class classification : one-vs-all method.
  • If nm n ≫ m , use logistic regression or SVM without kernel; if n n is samll, m is intermediate, use SVM with kernel; if mn m ≫ n , create more features, and turn to case one. Neural network likely to work well for most of these things.
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值