几种常用的特征选择方法

最新推荐文章于 2024-06-21 11:17:49 发布

Just Jump

最新推荐文章于 2024-06-21 11:17:49 发布

阅读量926

点赞数 1

分类专栏： python 算法模型分析方法文章标签：特征选择

原文链接：https://blog.csdn.net/SecondLieutenant/article/details/80693765

版权

python 同时被 2 个专栏收录

41 篇文章 3 订阅

订阅专栏

算法模型分析方法

22 篇文章 2 订阅

订阅专栏

几种常用的特征选择方法文章推荐阅读

推荐读这篇，对比的方法更多一些，最后还有对比汇总示例：https://blog.csdn.net/SecondLieutenant/article/details/80693765

这篇的最后有一点补充作用，https://blog.csdn.net/u010899985/article/details/81699091

这篇提供的特征筛选checklist值得一读，https://machinelearningmastery.com/an-introduction-to-feature-selection/

期刊(JMLR)值得关注 Journal of Machine Learning Research ： http://www.jmlr.org

有最新的研究和应用成果，可以阅读，比如上面的checklist就来自期刊的这篇文章

http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf

还有一篇 A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization （基于分布鲁棒优化的回归模型鲁棒学习方法）也值得一读。http://www.jmlr.org/papers/volume19/17-295/17-295.pdf

I have reproduced the salient parts of the checklist here:

Do you have domain knowledge? If yes, construct a better set of "ad hoc" features
Are your features commensurate? If no, consider normalizing them.
Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features, as much as your computer resources allow you.
Do you need to prune the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of feature
Do you need to assess features individually (e.g. to understand their influence on the system or because their number is so large that you need to do a first filtering)? If yes, use a variable ranking method; else, do it anyway to get baseline results.
Do you need a predictor? If no, stop
Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them.
Do you know what to try first? If no, use a linear predictor. Use a forward selection method with the “probe” method as a stopping criterion or use the 0-norm embedded method for comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.
Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coefficients, backward selection and embedded methods. Use linear and non-linear predictors. Select the best approach with model selection
Do you want a stable solution (to improve performance and/or understanding)? If yes, subsample your data and redo your analysis for several “bootstrap”.

Just Jump

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
几种常用的特征选择方法

几种常用的特征选择方法文章推荐阅读推荐读这篇，对比的方法更多一些，最后还有对比汇总示例：https://blog.csdn.net/SecondLieutenant/article/details/80693765这篇的最后有一点补充作用，https://b...
复制链接

扫一扫

专栏目录