随机森林模型sklearn_使用python+sklearn实现随机森林的特征重要性

最新推荐文章于 2024-07-15 15:30:12 发布

任博冰Bob

最新推荐文章于 2024-07-15 15:30:12 发布

阅读量2k

点赞数

文章标签：随机森林模型sklearn

本文链接：https://blog.csdn.net/weixin_35708531/article/details/112194506

版权

本文通过一个示例展示了如何使用Python的sklearn库实现随机森林模型，并评估特征的重要性和在树间的差异性。结果显示，三个特征对于分类任务具有显著信息价值，而其他特征则不重要。

摘要由CSDN通过智能技术生成

本示例显示了使用随机森林来评估特征在人工分类任务中的重要性。下图中的红色柱形(red bars)表示随机森林的特征重要性，以及它们在树间的可变性(inter-trees variability)。

不出所料，该图表明了3个特征是有信息的，而其余特征则没有。

输出：

Feature ranking:1. feature 1 (0.295902)2. feature 2 (0.208351)3. feature 0 (0.177632)4. feature 3 (0.047121)5. feature 6 (0.046303)6. feature 8 (0.046013)7. feature 7 (0.045575)8. feature 4 (0.044614)9. feature 9 (0.044577)10. feature 5 (0.043912)

print(__doc__)import numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import make_classificationfrom sklearn.ensemble import ExtraTreesClassifier# 使用3个有信息的特征构建分类任务 X, y = make_classification(n_samples=1000, n_features=10, n_informative=3, n_redundant=0, n_repeated=0, n_classes=2, random_state=0, shuffle=False)# 建立随机森林模型并计算特征重要性 forest = ExtraTreesClassifier(n_estimators=250, random_state=0) forest.fit(X, y) importances = forest.feature_importances_ std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0) indices = np.argsort(importances)[::-1]# 打印特征等级 print("Feature ranking:")for f in range(X.shape[1]): print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))# 绘制随机森林的特征重要性 plt.figure() plt.title("Feature importances") plt.bar(range(X.shape[1]), importances[indices], color="r", yerr=std[indices], align="center") plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.show()