分类:
from sklearn.datasets import load_iris
iris=load_iris()
# print(iris)
# print(len(iris["data"]))#150个数据
from sklearn.model_selection import train_test_split
#把数据分为测试数据和验证数据 test_size=0.2验证数据集占20% 也就是150个数据有30个验证集 random_state=1随机的选择30个数据
train_data,test_data,train_target,test_target=train_test_split(iris.data,iris.target,test_size=0.2,random_state=1)
#2.进行建模
from sklearn import tree
#引入分类器
clf=tree.DecisionTreeClassifier(criterion="entropy")
clf.fit(train_data,train_target)#用训练集进行训练,一般情况下fit就是建立模型,而predict就是进行预测
#进行预测
y_pred=clf.predict(test_data)
print(test_target)
print(y_pred)
import numpy as np
print(np.mean(y_pred==test_target))
输出:
[0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 1 0 2 1 0 0 1 2]
[0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 2 0 2 1 0 0 1 2]
0.9666666666666667
下面关于调参的内容是引用的别人的博客,原文网址:https://blog.csdn.net/qq_38923076/article/details/82931340
sklearn中决策树算法参数共有13个,如下:
class sklearn.tree.DecisionTreeClassifier(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
英文文档在这:决策树参数
回归:
# 1.决策树回归
from sklearn import tree
model_decision_tree_regression = tree.DecisionTreeRegressor()
下面的参数说明转载出处为https://blog.csdn.net/wl2858623940/article/details/80448875