scikit-learn-feature_selection

最新推荐文章于 2024-08-10 07:46:52 发布

闪闪发亮的小星星

最新推荐文章于 2024-08-10 07:46:52 发布

阅读量71

点赞数

本文链接：https://blog.csdn.net/weixin_39107270/article/details/134944752

版权

机器学习专栏收录该内容

31 篇文章 2 订阅

订阅专栏

参考：
Feature selection¶

1. 移除低方差的特征

方差低，说明变化不大。将特征方差值小于一定值的特征移除
在这里插入图片描述

单变量特征分析

通过单特征分析，选择最好的（前k个）的特征，scikit-learn 提供的方法有：

SelectKBest removes all but the highest scoring features
SelectPercentile removes all but a user-specified highest scoring percentage of featuresusing common univariate statistical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
GenericUnivariateSelect allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
X, y = load_iris(return_X_y=True)
X.shape
X_new = SelectKBest(f_classif, k=2).fit_transform(X, y)
X_new.shape

在这里插入图片描述

example

https://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html#sphx-glr-download-auto-examples-feature-selection-plot-feature-selection-py

递归特征消除

给定一个为特征分配权重的外部估计器(例如，线性模型的系数)，递归特征消除(RFE)的目标是通过递归地考虑越来越小的特征集来选择特征。首先，在初始特征集上训练估计器，并通过任何特定属性(如coef_， feature_importances_)或可调用属性获得每个特征的重要性。然后，从当前特征集中修剪最不重要的特征。该过程在已修剪的集合上递归重复，直到所需的数目。
在这里插入图片描述

使用SelectFromMode进行特征选择

SelectFromModel是一个元转换器，可以与任何通过特定属性(如coef_， feature_importances_)或在拟合后通过一个可调用的importance_getter来为每个特性分配重要性的估计器一起使用。如果特征值的相应重要性低于所提供的阈值参数，则认为特征不重要并将其删除。除了以数字方式指定阈值之外，还有使用字符串参数查找阈值的内置启发式方法。可用的启发式方法是“平均值”、“中位数”和它们的浮点倍数，如“0.1*mea”。
在这里插入图片描述