svm训练完保存权重,如何使用线性SVM权重进行特征选择

I have built a SVM linear model for two types of classes (1 and 0), using the following code:

class1.svm.model

and I have extracted the weights for the training set using the following code:

#extract the weights and constant from the SVM model:

w

b

I get weights for each feature like the following example:

X2 0.001710949

X3 -0.002717934

X4 -0.001118897

X5 0.009280056

X993 -0.000256577

X1118 0

X1452 0.004280963

X2673 0.002971335

X4013 -0.004369505

Now how do I perform feature selection based on the weights extracted for each feature? how shall I build a weight matrix?

I read papers but the concept is yet not clear to me, Please help!

解决方案

I've dashed this answer off rather quickly, so I expect there will be quite a few points that others can expand on, but as something to get you started...

There are a number of ways of doing this, but the first thing to tackle is to convert the linear weights into a measure of how important each feature is to the classification. This is a relatively simple three step process:

Normalise the input data such that each feature has mean = 0 and standard deviation = 1.

Train your model

Take the absolute value of the weights. That is, if the weight is -0.57, take 0.57.

Optionally you can generate a more robust measure of feature importance by repeating the above several times on different sets of training data which you have created by randomly re-sampling your original training data.

Now that you have a way to determine how important each feature is to the classification, you can use this in a number of different ways to select which features to include in your final model. I will give an example of Recursive Feature Elimination, since it is one of my favourites, but you may want to look into iterative feature selection, or noise perturbation.

So, to perform recursive feature elimination:

Start by training a model on the entire set of features, and calculate it's feature importances.

Discard the feature with the smallest importance value, and re-train the model on the remaining features

Repeat 2 until you have a small enough set of features[1].

[1] where a small enough set of features is determined by the point at which the accuracy begins to suffer when you apply your model to a validation set. On which note: when doing this sort of method of feature selection, make sure that you have not only a separate training and test set, but also a validation set for use in choosing how many features to keep.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值