Python语言基于CART决策树的鸢尾花数据分类

最新推荐文章于 2024-04-07 08:59:43 发布

Steven_AgN3

最新推荐文章于 2024-04-07 08:59:43 发布

阅读量612

点赞数

文章标签： python 决策树分类机器学习人工智能

本文链接：https://blog.csdn.net/Steven_AgN3/article/details/133652818

版权

1.数据集的获取。

使用SCIKIT-LEARN的自带的鸢尾花数据集，获取数据集.

2.数据集的划分。

基于hold-out法，构建训练集与测试集并且确保训练集与测试集内各类别占比一致。

要求：训练集80%，测试集20%。

3. 模型的学习。

利用训练集，学习两种复杂程度不同的CART分类树(用深度控制)，可视化分类树的学习结果，并给出每一棵树的特征重要性评分。

4. 基于测试集的分类树的评价。

(1)结合测试集各样本的类别预测结果及真实类别答案，生成混淆矩阵，并可视化混淆矩阵

(2)基于混淆矩阵，估计每个类别的查准率、查全率、F1值，以及宏查准率、宏查全率、宏F1值；估计总体预测正确率.

5. 使用整个数据集学习上述两种不同深度的分类树, 可视化。

源码如下：

import pandas as pd
from matplotlib import pyplot as plt
from pandas.core.common import random_state
from sklearn.datasets import load_iris
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, accuracy_score
from sklearn.model_selection import train_test_split
import seaborn as sns

iris = load_iris()
X, y = iris.data, iris.target
# noinspection PyArgumentList
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=random_state())

# 训练两个不同深度的分类树
dtree_shallow = DecisionTreeClassifier(max_depth=2)
dtree_shallow.fit(X_train, y_train)

dtree_deep = DecisionTreeClassifier(max_depth=4)
dtree_deep.fit(X_train, y_train)

# 可视化两个分类树
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))

plot_tree(dtree_shallow, filled=True, rounded=True, ax=axes[0], feature_names=iris.feature_names,
          class_names=iris.target_names)
axes[0].set_title('Shallow Decision Tree')

plot_tree(dtree_deep, filled=True, rounded=True, ax=axes[1], feature_names=iris.feature_names,
          class_names=iris.target_names)
axes[1].set_title('Deep Decision Tree')
plt.show()


# 给出每个特征的重要性评分
def show_score(tree):
    importance = tree.feature_importances_
    for i, v in enumerate(importance):
        print('Feature: %0d, Score: %.5f' % (i, v))


print("深度为2的决策树评分为：")
show_score(dtree_shallow)
print("深度为4的决策树评分为：")
show_score(dtree_deep)


def show_confusion_matrix(tree, title):
    # 生成测试集的预测结果
    y_pred = tree.predict(X_test)

    # 生成混淆矩阵
    cm = confusion_matrix(y_test, y_pred)
    df_cm = pd.DataFrame(cm)
    ax = sns.heatmap(df_cm, annot=True, cmap="Purples")
    ax.set_title(title)  # 标题
    ax.set_xlabel('predict target')  # x轴标签
    ax.set_ylabel('true target')  # y轴标签
    plt.show()


# 调用函数分别为两棵深度不同的决策树生成混淆矩阵
show_confusion_matrix(dtree_shallow, 'Confusion Matrix of ShallowTree')
show_confusion_matrix(dtree_deep, 'Confusion Matrix of DeepTree')


def show_performance_measurement(tree):
    # 计算查准率、查全率、F1值和总体预测正确率
    y_pred = tree.predict(X_test)
    precision = precision_score(y_test, y_pred, average=None)
    recall = recall_score(y_test, y_pred, average=None)
    f1 = f1_score(y_test, y_pred, average=None)
    accuracy = accuracy_score(y_test, y_pred)

    macro_precision = precision_score(y_test, y_pred, average='macro')
    macro_recall = recall_score(y_test, y_pred, average='macro')
    macro_f1 = f1_score(y_test, y_pred, average='macro')

    print(f'Precision: {precision}')
    print(f'Recall: {recall}')
    print(f'F1 score: {f1}')
    print(f'Accuracy: {accuracy}')
    print(f'Macro Precision: {macro_precision}')
    print(f'Macro Recall: {macro_recall}')
    print(f'Macro F1 score: {macro_f1}')


# 调用函数分别打印两棵决策树的性能度量指标
print("深度为2的决策树性能度量指标：")
show_performance_measurement(dtree_shallow)
print("深度为4的决策树性能度量指标：")
show_performance_measurement(dtree_shallow)

# 使用整个数据集学习两种不同深度的决策树
X = iris.data
y = iris.target

# 建立两种深度的决策树模型
tree1 = DecisionTreeClassifier(max_depth=2)
tree1.fit(X, y)
tree2 = DecisionTreeClassifier(max_depth=4)
tree2.fit(X, y)

# 可视化两种决策树模型
plt.figure(figsize=(15, 7))

plt.subplot(1, 2, 1)  # 布局为一行两列，第一个子图绘制在第一个位置
plot_tree(tree1, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title('Decision Tree with max depth 2')

plt.subplot(1, 2, 2)  # 布局为一行两列，第二个子图绘制在第二个位置
plot_tree(tree2, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title('Decision Tree with max depth 4')
plt.show()

运行结果与输出图片：

深度为2的决策树评分为：
Feature: 0, Score: 0.00000
Feature: 1, Score: 0.00000
Feature: 2, Score: 0.00000
Feature: 3, Score: 1.00000
深度为4的决策树评分为：
Feature: 0, Score: 0.01875
Feature: 1, Score: 0.01875
Feature: 2, Score: 0.05648
Feature: 3, Score: 0.90602
深度为2的决策树性能度量指标：
Precision: [1.  0.9 0.9]
Recall: [1.  0.9 0.9]
F1 score: [1.  0.9 0.9]
Accuracy: 0.9333333333333333
Macro Precision: 0.9333333333333332
Macro Recall: 0.9333333333333332
Macro F1 score: 0.9333333333333332
深度为4的决策树性能度量指标：
Precision: [1.  0.9 0.9]
Recall: [1.  0.9 0.9]
F1 score: [1.  0.9 0.9]
Accuracy: 0.9333333333333333
Macro Precision: 0.9333333333333332
Macro Recall: 0.9333333333333332
Macro F1 score: 0.9333333333333332

进程已结束,退出代码0