特征选择学习笔记

本文是关于特征选择的学习笔记,涵盖了特征之间的相关性及与类别之间的相关性分析。主要介绍了过滤法(如方差选择、Pearson相关系数、卡方验证等)、包装法(包括完全搜索、启发式搜索、随机搜索和递归特征消除)以及嵌入法(如LASSO方法)等选择特征的策略。这些方法在提高模型性能和降低过拟合风险方面起着关键作用。
摘要由CSDN通过智能技术生成

特征选择学习笔记

1.特征之间的相关性分析
2.特征与类别之间的相关性分析

特征选择的三个方法

1.filter(过滤法)
定义:按照发散性或相关性对各个特征进行评分,设定阈值或者待选择特征的个数进行筛选,分为单变量过滤方法和多变量过滤方法

  • 单变量过滤方法:不需要考虑特征之间的相互关系,按照特征变量和目标变量之间的相关性或互信息对特征进行排序,过滤掉最不相关的特征变量。优点是计算效率高、不易过拟合
    多变量过滤方法:考虑特征之间的相互关系,常用方法有基于相关性和一致性的特征选择

1.1常用的过滤方法

(1)方差选择法
(2)Pearson相关系数
(3)卡方验证 互信息法和最大信息系数
(4)fisher得分 :Fisher线性分类器是将n维训练样本投影到1维空间上,然后在一维空间进行分类,最关键的参数就是投影方向w。
(5)相关特征选择(correlation feature selection ,CFS)
(6)最小冗余最大相关性(minimum redundancy maxumum relevance ,mrmr)
(7)relief算法

2.wrapper(包装法)
**

n many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose here a new approach, called {\em Multi-Cluster/Class Feature Selection} (MCFS), for feature selection. Specifically, we select those features such that the multi-cluster/class structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. It is important to note that MCFS can be applied in superised, unsupervised and semi-supervised cases. If you find these algoirthms useful, we appreciate it very much if you can cite our following works: Papers Deng Cai, Chiyuan Zhang, Xiaofei He, "Unsupervised Feature Selection for Multi-cluster Data", 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'10), July 2010. Bibtex source Xiaofei He, Deng Cai, and Partha Niyogi, "Laplacian Score for Feature Selection", Advances in Neural Information Processing Systems 18 (NIPS'05), Vancouver, Canada, 2005 Bibtex source
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值