Python中进行特征重要性分析的9个常用方法

最新推荐文章于 2024-07-30 21:09:58 发布

Python_P叔

最新推荐文章于 2024-07-30 21:09:58 发布

阅读量1.3k

点赞数 18

文章标签： python 开发语言计算机视觉

本文链接：https://blog.csdn.net/Saki_Python/article/details/135387378

版权

特征重要性分析用于了解每个特征(变量或输入)对于做出预测的有用性或价值。目标是确定对模型输出影响最大的最重要的特征，它是机器学习中经常使用的一种方法。

为什么特征重要性分析很重要?

如果有一个包含数十个甚至数百个特征的数据集，每个特征都可能对你的机器学习模型的性能有所贡献。但是并不是所有的特征都是一样的。有些可能是冗余的或不相关的，这会增加建模的复杂性并可能导致过拟合。

特征重要性分析可以识别并关注最具信息量的特征，从而带来以下几个优势:

改进的模型性能
减少过度拟合
更快的训练和推理
增强的可解释性

下面我们深入了解在Python中的一些特性重要性分析的方法。

特征重要性分析方法

1、排列重要性 PermutationImportance

该方法会随机排列每个特征的值，然后监控模型性能下降的程度。如果获得了更大的下降意味着特征更重要



from sklearn.datasets import load\_breast\_cancer  
from sklearn.ensemble import RandomForestClassifier  
from sklearn.inspection import permutation\_importance  
from sklearn.model\_selection import train\_test\_split  
import matplotlib.pyplot as plt  
   
cancer = load\_breast\_cancer()  
   
X\_train, X\_test, y\_train, y\_test = train\_test\_split(cancer.data, cancer.target, random\_state=1)  
   
rf = RandomForestClassifier(n\_estimators=100, random\_state=1)  
rf.fit(X\_train, y\_train)  
   
baseline = rf.score(X\_test, y\_test)  
result = permutation\_importance(rf, X\_test, y\_test, n\_repeats=10, random\_state=1, scoring='accuracy')  
   
importances = result.importances\_mean  
   
# Visualize permutation importances  
plt.bar(range(len(importances)), importances)  
plt.xlabel('Feature Index')  
plt.ylabel('Permutation Importance')  
plt.show()

[外链图片转存中…(img-pi8NSf8x-1704351889783)]

2、内置特征重要性(coef_或feature_importances_)

一些模型，如线性回归和随机森林，可以直接输出特征重要性分数。这些显示了每个特征对最终预测的贡献。



from sklearn.datasets import load\_breast\_cancer  
from sklearn.ensemble import RandomForestClassifier  
   
X, y = load\_breast\_cancer(return\_X\_y=True)  
   
rf = RandomForestClassifier(n\_estimators=100, random\_state=1)  
rf.fit(X, y)  
   
importances = rf.feature\_importances\_  
   
# Plot importances  
plt.bar(range(X.shape\[1\]), importances)  
plt.xlabel('Feature Index