Classifier Reviews in Machine Learning#1

Classifiers Reviews - #1

SVM (Support Vector Machine)

1. Concept

SVM is a kind of classifier which is also a line that can separate two sets of linearly separable data. And this line is special because it is right in the middle of the two sets, it is a line that has the largest distance to the closest data point. When you add more out of sample data into the sets, the line will still work.
Remove the data point on the support vector will affect the decision boundary.
Datasets which have a clear classification boundary will function best with SVM’s.

2. Parameters

1. C

C—misclassification penalty,
The higher the C, less toleration for the misclassification, may result in over-fitting.
The lower the C, more toleration for the misclassification, may result in under-fitting.
C too big or small will make lower the ability of generalization.

Hard margin: The SVM allows very low error in classification.
Soft margin: also called noisy linear SVM which includes some miss-classified points.

2. Kernel

The selection of kernel in SVM is very important, especially for those data that is not linearly separable. The goal is to project the linearly inseparable data onto a High-dimensional eigenspace in order to let the data become linearly separable. We define this projection as Φ ( x ) \Phi (x) Φ(x) .
When it comes to optimization, there will be Φ ( i ) ⋅ Φ ( j ) \Phi (i) \cdot \Phi (j) Φ(i)Φ(j) which requires a very large calculation dimension, so we introduce kernel into the calculation, which is be much faster.
Here are some kernels that are often used in SVM:

NameUsageFunction
Linear kernelmainly used for linearly separable data, also when there are large amount of features k ( x , x j ) = x ⋅ x i k(x, x_j) = x \cdot x_i k(x,xj)=xxi
Polynomial kernelcan achieve the projection, but the parameters are a lot, so when the degree is high, the elements in the metrics will be close to zero, calculation complexity will be huge k ( x , x j ) = ( ( x ⋅ x i ) + 1 ) d k(x, x_j) = ((x \cdot x_i) + 1)^d k(x,xj)=((xxi)+1)d
RBF kernelLinearly inseparable, less parameters, normal sample amount and less features amount, when you don’t know what to use, use this one first (most used one) k ( x , x j ) = e x p ( − ∣ ∣ x − x i ∣ ∣ 2 σ 2 ) k(x, x_j) = exp(- \frac{\vert\vert{x-x_i\vert\vert ^2}}{\sigma ^2}) k(x,xj)=exp(σ2xxi2)
Sigmoidachieve neural networks k ( x , x j ) = t a n h k(x, x_j) = tanh k(x,xj)=tanh
gamma

gamma is a parameter that from the RBF kernel.
The greater the gamma, the less the support vectors. The smaller the gamma, the more the support vectors. The number of support vectors affect the training and predicting speed.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值