pyhton_随机森林分类分析

最新推荐文章于 2024-07-15 09:51:19 发布

炼丹师666

最新推荐文章于 2024-07-15 09:51:19 发布

阅读量1.1k

点赞数 1

分类专栏：算法

本文链接：https://blog.csdn.net/wj1298250240/article/details/103821636

版权

算法专栏收录该内容

101 篇文章 5 订阅

订阅专栏

pyhton_随机森林分析


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=100, noise=0.25, random_state=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
                                                    random_state=42)
# n_estimators  决策树的个数
# random_state 随机种子
forest = RandomForestClassifier(n_estimators=5, random_state=2)
forest.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=5,
                       n_jobs=None, oob_score=False, random_state=2, verbose=0,
                       warm_start=False)
# 作为随机森林的一部分，树被保存在 estimator_ 属性中。我们将每棵树学到的决策边界可
# 视化，也将它们的总预测（即整个森林做出的预测）可视化

fig, axes = plt.subplots(2, 3, figsize=(20, 10))
for i, (ax, tree) in enumerate(zip(axes.ravel(), forest.estimators_)):
    ax.set_title("Tree {}".format(i))
    mglearn.plots.plot_tree_partition(X_train, y_train, tree, ax=ax)
    
mglearn.plots.plot_2d_separator(forest, X_train, fill=True, ax=axes[-1, -1],
                                alpha=.4)
axes[-1, -1].set_title("Random Forest")
mglearn.discrete_scatter(X_train[:, 0], X_train[:, 1], y_train)
[<matplotlib.lines.Line2D at 0x1743fa87470>,
 <matplotlib.lines.Line2D at 0x1743fa87a20>]

在这里插入图片描述

# 再举一个例子，我们将包含 100 棵树的随机森林应用在乳腺癌数据集上
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, random_state=0)
forest = RandomForestClassifier(n_estimators=100, random_state=0)
forest.fit(X_train, y_train)

print("Accuracy on training set: {:.3f}".format(forest.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(forest.score(X_test, y_test)))
Accuracy on training set: 1.000
Accuracy on test set: 0.972

# 在没有调节任何参数的情况下，随机森林的精度为 97%，比线性模型或单棵决策树都要
# 好。我们可以调节 max_features 参数，或者像单棵决策树那样进行预剪枝。但是，随机森
# 林的默认参数通常就已经可以给出很好的结果。
# 与决策树类似，随机森林也可以给出特征重要性，计算方法是将森林中所有树的特征重要
# 性求和并取平均。一般来说，随机森林给出的特征重要性要比单棵树给出的更为可靠



plot_feature_importances_cancer(forest)

在这里插入图片描述

单棵树相比，随机森林中有更多特征的重要性不为零。与单棵决策树类似，
随机森林也给了“worst radius”（最大半径）特征很大的重要性，但从总体来看，它实际
上却选择“worst perimeter”（最大周长）作为信息量最大的特征。由于构造随机森林过程
中的随机性，算法需要考虑多种可能的解释，结果就是随机森林比单棵树更能从总体把握
数据的特征。

炼丹师666

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
pyhton_随机森林分类分析

pyhton_随机森林分析from sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import make_moonsX, y = make_moons(n_samples=100, noise=0.25, random_state=3)X_train, X_test, y_train, y_te...
复制链接

扫一扫

专栏目录