特征选择---SelectKBest

最新推荐文章于 2024-05-31 15:12:14 发布

云仄

最新推荐文章于 2024-05-31 15:12:14 发布

阅读量3.2w

点赞数 14

分类专栏：生信文章标签：特征选择

生信专栏收录该内容

9 篇文章 2 订阅

订阅专栏

看论文偶然看到这个方法，就了解一下。

from sklearn.feature_selection import SelectKBest

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest.set_params

class SelectKBest(_BaseFilter):
    """Select features according to the k highest scores.

    Read more in the :ref:`User Guide <univariate_feature_selection>`.

    Parameters
    ----------
    score_func : callable
        Function taking two arrays X and y, and returning a pair of arrays
        (scores, pvalues) or a single array with scores.
        Default is f_classif (see below "See also"). The default function only
        works with classification tasks.

    k : int or "all", optional, default=10
        Number of top features to select.
        The "all" option bypasses selection, for use in a parameter search.

    Attributes
    ----------
    scores_ : array-like, shape=(n_features,)
        Scores of features.

    pvalues_ : array-like, shape=(n_features,)
        p-values of feature scores, None if `score_func` returned only scores.

    Notes
    -----
    Ties between features with equal scores will be broken in an unspecified
    way.

    See also
    --------
    f_classif: ANOVA F-value between label/feature for classification tasks.
    mutual_info_classif: Mutual information for a discrete target.
    chi2: Chi-squared stats of non-negative features for classification tasks.
    f_regression: F-value between label/feature for regression tasks.
    mutual_info_regression: Mutual information for a continuous target.
    SelectPercentile: Select features based on percentile of the highest scores.
    SelectFpr: Select features based on a false positive rate test.
    SelectFdr: Select features based on an estimated false discovery rate.
    SelectFwe: Select features based on family-wise error rate.
    GenericUnivariateSelect: Univariate feature selector with configurable mode.
    """

官网的一个例子（需要自己给出计算公式、和k值）

参数

1、score_func : callable，函数取两个数组X和y，返回一对数组（scores, pvalues）或一个分数的数组。默认函数为f_classif，默认函数只适用于分类函数。
2、k：int or "all", optional, default=10。所选择的topK个特征。“all”选项则绕过选择，用于参数搜索。

属性

1、scores_ : array-like, shape=(n_features,)，特征的得分
2、pvalues_ : array-like, shape=(n_features,)，特征得分的p_value值，如果score_func只返回分数，则返回None。

score_func里可选的公式

方法

1、fit(X,y)，在（X，y）上运行记分函数并得到适当的特征。
2、fit_transform(X[, y])，拟合数据，然后转换数据。
3、get_params([deep])，获得此估计器的参数。
4、get_support([indices])，获取所选特征的掩码或整数索引。
5、inverse_transform(X)，反向变换操作。
6、set_params(**params)，设置估计器的参数。
7、transform(X)，将X还原为所选特征。

如何返回选择特征的名称或者索引。其实在上面的方法中已经提了一下了，那就是get_support（）

之前的digit数据是不带特征名称的，我选择了带特征的波士顿房价数据，因为是回归数据，所以计算的评价指标也跟着变换了，f_regression，这里需要先fit一下，才能使用get_support()。里面的参数如果索引选择True，

返回值就是feature的索引，可能想直接返回feature name在这里不能这么直接的调用了，但是在dataset里面去对应一下应该很容易的。这里我给出的K是5，选择得分最高的前5个特征，分别是第2,5,9,10,12个属性。
如果里面的参数选择了False，返回值就是该特征是否被选择的Boolean值。

链接：https://www.jianshu.com/p/586ba8c96a3d

云仄

关注

14
点赞
踩
89

收藏

觉得还不错? 一键收藏
7
评论
特征选择---SelectKBest

看论文偶然看到这个方法，就了解一下。from sklearn.feature_selection import SelectKBesthttp://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest...
复制链接

扫一扫