几种常用的特征选择方法

                                                                   几种常用的特征选择方法文章推荐阅读

推荐读这篇,对比的方法更多一些,最后还有对比汇总示例:https://blog.csdn.net/SecondLieutenant/article/details/80693765

这篇的最后有一点补充作用,https://blog.csdn.net/u010899985/article/details/81699091

这篇提供的特征筛选checklist值得一读,https://machinelearningmastery.com/an-introduction-to-feature-selection/

 

期刊(JMLR)值得关注  Journal of Machine Learning Research  : http://www.jmlr.org  

有最新的研究和应用成果,可以阅读,比如上面的checklist就来自期刊的这篇文章

http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf

还有一篇 A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization (基于分布鲁棒优化的回归模型鲁棒学习方法) 也值得一读。http://www.jmlr.org/papers/volume19/17-295/17-295.pdf

 

I have reproduced the salient parts of the checklist here:

  1. Do you have domain knowledge? If yes, construct a better set of "ad hoc" features
  2. Are your features commensurate? If no, consider normalizing them.
  3. Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features, as much as your computer resources allow you.
  4. Do you need to prune the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of feature
  5. Do you need to assess features individually (e.g. to understand their influence on the system or because their number is so large that you need to do a first filtering)? If yes, use a variable ranking method; else, do it anyway to get baseline results.
  6. Do you need a predictor? If no, stop
  7. Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them.
  8. Do you know what to try first? If no, use a linear predictor. Use a forward selection method with the “probe” method as a stopping criterion or use the 0-norm embedded method for comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.
  9. Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coefficients, backward selection and embedded methods. Use linear and non-linear predictors. Select the best approach with model selection
  10. Do you want a stable solution (to improve performance and/or understanding)? If yes, subsample your data and redo your analysis for several “bootstrap”.
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值