sklearn的feature_importances_含义是什么?

本文介绍了scikit-learn库中特征重要性`feature_importances_`的含义,它基于特征对节点纯度的总减少(平均在所有树上)。另外,还提到了另一种评估方法——平均准确率下降,这种方法直接衡量特征对模型准确性的影响。文中包含了两种方法的代码示例,并讨论了它们之间的差异。
摘要由CSDN通过智能技术生成

增添:这篇博文讲的也特别好

正文:

Sk-learn作者的答案:

There are indeed several ways to get feature “importances”. As often, there is no strict consensus about what this word means.
In scikit-learn, we implement the importance as described in [1] (often cited, but unfortunately rarely read…). It is sometimes called “gini importance” or “mean decrease impurity” and is defined as the total decrease in node impurity (weighted by the probability of reaching that node (which is approximated by the proportion of samples reaching that node)) averaged over all trees of the ensemble.
In the literature or in some other packages, you can also find feature importances implemented as the “mean decrease accuracy”. Basically, the idea is to measure the decrease in accuracy on OOB data when you randomly permute the values for that feature. If the decrease is low, then the feature is not important, and vice-versa.
(Note that both algorithms are available in the randomForest R package.)
[1]: Breiman, Friedman, “Classification and regression trees”, 1984.

所以,共有两种比较流行的特征重要性评估方法:
这一篇文中有两种方法的代码:Selecting good features – Part III: random forests

  1. Mean decrease impurity
    这个方法的原理其实就是Tree-Model进行分类、回归的原理:特征越重要,对节点的纯度增加的效果越好。而纯度的判别标准有很多,如GINI、信息熵、信息熵增益。
    这也是Sklearn的feature_importances_的意义。
    代码:
from sklearn.datasets import load_boston
from sklearn.ensemble imp
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值