sklearn决策树概述

最新推荐文章于 2024-04-30 17:34:36 发布

江南路漫

最新推荐文章于 2024-04-30 17:34:36 发布

阅读量98

点赞数

本文链接：https://blog.csdn.net/ucler/article/details/119740717

版权

决策树是一类常见的机器学习方法，决策树学习的目的是为了产生一棵泛化能力强，即处理未见示例能力强的决策树。决策树是个递归生成的过程，如何选择最优划分属性是决策树学习的关键。我们希望决策树的分支节点所包含的样本尽可能属于同一类别，信息熵、增益率和基尼指数都可以用来形容节点的“纯度”。

以下是简单的决策树示例：

# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)

# Predict
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

# Plot the results
plt.figure()
plt.scatter(X, y, s=20, edgecolor="black",
            c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue",
         label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

以iris数据集为例，我们可以构建如下的决策树：

from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
X, y = iris.data, iris.target
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)

tree.plot_tree(clf)

带结果图片的决策树代码

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02

# Load data
iris = load_iris()

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    # Train
    clf = DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
                    cmap=plt.cm.RdYlBu, edgecolor='black', s=15)

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")

plt.figure()
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
plot_tree(clf, filled=True)
plt.show()

江南路漫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sklearn决策树概述

决策树是一类常见的机器学习方法，决策树学习的目的是为了产生一棵泛化能力强，即处理未见示例能力强的决策树。决策树是个递归生成的过程，如何选择最优划分属性是决策树学习的关键。我们希望决策树的分支节点所包含的样本尽可能属于同一类别，信息熵、增益率和基尼指数都可以用来形容节点的“纯度”。以下是简单的决策树示例：# Create a random datasetrng = np.random.RandomState(1)X = np.sort(5 * rng.rand(80, 1), axis=0)y =
复制链接

扫一扫