python 使用文本注解绘制树节点_使用python+sklearn实现在鸢尾花数据集上绘制集成树的决策面...-CSDN博客

本文链接：https://blog.csdn.net/weixin_39830225/article/details/111634711

本文展示了如何使用Python和sklearn在鸢尾花数据集上绘制决策树、随机森林、极端树和AdaBoost分类器的决策面。通过不同特征组合对比效果，并探讨模型性能和参数调整对结果的影响。

摘要由CSDN通过智能技术生成

绘制在鸢尾花数据集上一对特征训练的集成树的决策面。

下图比较了决策树分类器(第一列)，随机森林分类器(第二列)，极端树(extra- trees)分类器(第三列)和AdaBoost分类器(第四列)学习到的决策面。

在第一行中，仅使用间隔宽度和间隔长度的特征来构建分类器，在第二行中仅使用花瓣长度和间隔长度的特征来构建分类器，在第三行中仅使用花瓣宽度和花瓣长度的特征来构建分类器。

按质量从高到低的顺序，当使用30个估计器对所有4个特征进行训练(不在本示例之内)并使用10倍(fold)交叉验证进行评分时，我们看到：

ExtraTreesClassifier() # 0.95 得分 RandomForestClassifier() # 0.94 得分 AdaBoost(DecisionTree(max_depth=3)) # 0.94 得分 DecisionTree(max_depth=None) # 0.94 得分

如果增加max_depth的话，AdaBoost会降低得分的标准偏差(但平均得分不会提高)。

有关每个模型的更多详细信息，请参见控制台的输出。

在本示例中，您可以尝试：

改变DecisionTreeClassifier和AdaBoostClassifier的max_depth参数，对于DecisionTreeClassifier，可以尝试设置max_depth=3，或者对于AdaBoostClassifier，设置max_depth=None。
改变n_estimators的值。

值得注意的是，随机森林(RandomForests)和极端树(ExtraTrees)都可以在多核上进行并行拟合，因为每棵树都是独立于其他树构建的。AdaBoost的样本(samples )是按顺序构建的，因此请勿使用多核进行并行拟合。

输出：

DecisionTree with features [0, 1] has a score of 0.9266666666666666 RandomForest with 30 estimators with features [0, 1] has a score of 0.9266666666666666 ExtraTrees with 30 estimators with features [0, 1] has a score of 0.9266666666666666 AdaBoost with 30 estimators with features [0, 1] has a score of 0.84 DecisionTree with features [0, 2] has a score of 0.9933333333333333 RandomForest with 30 estimators with features [0, 2] has a score of 0.9933333333333333 ExtraTrees with 30 estimators with features [0, 2] has a score of 0.9933333333333333 AdaBoost with 30 estimators with features [0, 2] has a score of 0.9933333333333333 DecisionTree with features [2, 3] has a score of 0.9933333333333333 RandomForest with 30 estimators with features [2, 3] has a score of 0.9933333333333333 ExtraTrees with 30 estimators with features [2, 3] has a score of 0.9933333333333333 AdaBoost with 30 estimators with features [2, 3] has a score of 0.9933333333333333

print(__doc__)import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormapfrom sklearn.datasets import load_irisfrom sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier)from sklearn.tree import DecisionTreeClassifier# 参数 n_classes = 3 n_estimators = 30 cmap = plt.cm.RdYlBu plot_step = 0.02 # 决定曲面轮廓的步长 plot_step_coarser = 0.5 # 分类器猜测的步长 RANDOM_SEED = 13 # 在每次迭代中的随机种子# 加载数据 iris = load_iris() plot_idx = 1 models = [DecisionTreeClassifier(max_depth=None), RandomForestClassifier(n_estimators=n_estimators), ExtraTreesClassifier(n_estimators=n_estimators), AdaBoostClassifier(DecisionTreeClassifier(max_depth=3), n_estimators=n_estimators)]for pair in ([0, 1], [0, 2], [2, 3]):for model in models:# 只采用两个对应的特征 X = iris.data[:, pair] y = iris.target# 打乱数据 idx = np.arange(X.shape[0]) np.random.seed(RANDOM_SEED) np.random.shuffle(idx) X = X[idx] y = y[idx]# 标准化 mean = X.mean(axis=0) std = X.std(axis=0) X = (X - mean) / std# 训练 model.fit(X, y) scores = model.score(X, y)# 通过使用str()切掉字符串中没有用的部分，为控制台的每列输出创建标题 model_title = str(type(model)).split(".")[-1][:-2][:-len("Classifier")] model_details = model_titleif hasattr(model, "estimators_"): model_details += " with {} estimators".format( len(model.estimators_)) print(model_details + " with features", pair,"has a score of", scores) plt.subplot(3, 4, plot_idx)if plot_idx <= len(models):# 在每列输出的顶部添加标题 plt.title(model_title, fontsize=9)# 使用精细网格作为填充轮廓图的输入来绘制决策边界 x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step))# 绘制单个DecisionTreeClassifier或alpha混合分类器集成的决策面if isinstance(model, DecisionTreeClassifier): Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) cs = plt.contourf(xx, yy, Z, cmap=cmap)else:# 根据估计器的数量选择Alpha混合级别# (请注意，如果AdaBoost达到足够好的拟合度，则可以使用比最大估计器限度还要少的估计器) estimator_alpha = 1.0 / len(model.estimators_)for tree in model.estimators_: Z = tree.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) cs = plt.contourf(xx, yy, Z, alpha=estimator_alpha, cmap=cmap)# 构建一个更粗糙的网格以绘制一组集成分类，用来显示这些分类与我们在决策面上看到的分类结果有何不同。# 这些点通常是空格，没有黑色的轮廓。 xx_coarser, yy_coarser = np.meshgrid( np.arange(x_min, x_max, plot_step_coarser), np.arange(y_min, y_max, plot_step_coarser)) Z_points_coarser = model.predict(np.c_[xx_coarser.ravel(), yy_coarser.ravel()] ).reshape(xx_coarser.shape) cs_points = plt.scatter(xx_coarser, yy_coarser, s=15, c=Z_points_coarser, cmap=cmap, edgecolors="none")# 绘制训练点，它们聚集在一起并具有黑色的轮廓。 plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(['r', 'y', 'b']), edgecolor='k', s=20) plot_idx += 1 # 按顺序绘制下一个图 plt.suptitle("Classifiers on feature subsets of the Iris dataset", fontsize=12) plt.axis("tight") plt.tight_layout(h_pad=0.2, w_pad=0.2, pad=2.5) plt.show()

脚本的总运行时间：(0分钟6.692秒)

估计的内存使用量： 14 MB

下载Python源代码:plot_forest_iris.py

下载Jupyter notebook源代码:plot_forest_iris.ipynb

由Sphinx-Gallery生成的画廊

☆☆☆为方便大家查阅，小编已将scikit-learn学习路线专栏文章统一整理到公众号底部菜单栏，同步更新中，关注公众号，点击左下方“系列文章”，如图：

欢迎大家和我一起沿着scikit-learn文档这条路线，一起巩固机器学习算法基础。(添加微信：mthler，备注：sklearn学习，一起进【sklearn机器学习进步群】开启打怪升级的学习之旅。)