分类树参数调优

望月斩

已于 2022-08-22 13:23:35 修改

阅读量598

点赞数

文章标签：决策树机器学习 python

于 2022-08-22 13:21:25 首次发布

本文链接：https://blog.csdn.net/weixin_59959097/article/details/126463945

版权

import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine

# 参数调优
#   调参的原因： 防止决策树与训练数据集过拟合，当与训练集过拟合，就是决策树适应了训练集合中噪声的影响，
#   从而在对数据集合做出判断的时候，有误差，也就是说参数调优对判断的准确性有帮助

# 参数：
max_depth_ = 3  # 限制当前树的最大深度
min_samples_leafs_ = 10  # 限制一个节点其所有的子节点至少包含N个数据样本，若是不满足，则不允许当前节点分支 或者是使得当前节点向着
#                        满足条件的方向去分支 。 推荐从五开始 也是使用浮点数（ = 数据集 * 浮点数）
min_samples_split_ = 10  # 限制叶子的数据样本个数 ， 若是小于这个，则不被允许分支
max_features_ = 10  # 限制分支时候，使用的特征个数 这个也是取决于特征的个数
min_impurity_decrease_ = 10  # 设置最小信息增益

# 调参的过程中，没办法一下选到最合适的数值，可以使用画图的方法解决

wine = load_wine()
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, train_size=0.7, random_state=30)

scores = []
for i in range(20):
    clf = tree.DecisionTreeClassifier(criterion="entropy", splitter="best", max_depth=i + 1, random_state=20)
    clf = clf.fit(x_train, y_train)
    scores.append(clf.score(x_test, y_test))

plt.plot(range(1,21), scores)
plt.show()