参考这一篇了
https://developer.aliyun.com/article/753507
2021美赛|2|第三题决策树分类模型
1.训练决策树模型
1.导包
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn import tree
然后载入iris数据集。scikit-learn内置了Iris数据集,因此我们不需要从其他网站下载了。下面的Python代码载入Iris数据集:
2.开始训练
import pandas as pd
from sklearn.datasets import load_irisdata = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
X_train, X_test, Y_train, Y_test = train_test_split(df[data.feature_names], df['target'], random_state=0)
# Step 1: Import the model you want to use
# This was already imported earlier in the notebook so commenting out
#from sklearn.tree import DecisionTreeClassifier
# Step 2: Make an instance of the Model
clf = DecisionTreeClassifier(max_depth = 2,
random_state = 0)
# Step 3: Train the model on the data
clf.fit(X_train, Y_train)
# Step 4: Predict labels of unseen (test) data
# Not doing this step in the tutorial
# clf.predict(X_test)
2.使用Matplotlib将决策树可视化
tree.plot_tree(clf);
添加特征和分类名称:
fn=[‘sepal length (cm)’,‘sepal width (cm)’,‘petal length (cm)’,‘petal width (cm)’]
cn=[‘setosa’, ‘versicolor’, ‘virginica’]
fig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,4), dpi=300)
tree.plot_tree(clf,
feature_names = fn,
class_names=cn,
filled = True);
fig.savefig(‘imagename.png’)
3、使用Graphviz将决策树可视化
3.1决策树模型导出为dot文件
tree.export_graphviz(clf,
out_file=“tree.dot”,
feature_names = fn,
class_names=cn,
filled = True)
3.2 用conda安装graphviz
conda install python-graphviz
3.3决策树模型导出的dot文件转换为图形文件
(最好考虑文件地址的绝对引用)
dot -Tpng /Download/tree.dot -o /Download/tree.png
结果如下:
4.另外还有方法,将随机森林里的单个决策树可视化
详见 https://developer.aliyun.com/article/753507