graphviz
1下载
下载链接: http://www.graphviz.org/Download_windows.php
2安装
下一步 - 下一步
3环境配置
将graphviz下的bin安装路径 放到 系统环境变量中 ,
4测试安装成功
cmd 命令行
输入 dot -version
出现版本信息 ,说明配置成功
Python
Python下生成的tree.dot
1在python工程文件中放1.txt
1.txt内容
1.5 50 thin
1.5 60 fat
1.6 40 thin
1.6 60 fat
1.7 60 thin
1.7 80 fat
1.8 60 thin
1.8 90 fat
1.9 70 thin
1.9 80 fat
2决策树生成代码,生成一个新的文件 tree.dot
# -*- coding: utf-8 -*-
import numpy as np
import scipy as sp
from sklearn import tree
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import classification_report
from sklearn.cross_validation import train_test_split
''''' 数据读入 '''
data = []
labels = []
with open("1.txt") as ifile:
for line in ifile:
tokens = line.strip().split(' ')
data.append([float(tk) for tk in tokens[:-1]])
labels.append(tokens[-1])
x = np.array(data)
labels = np.array(labels)
y = np.zeros(labels.shape)
''''' 标签转换为0/1 '''
y[labels == 'fat'] = 1
''''' 拆分训练数据与测试数据 '''
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
''''' 使用信息熵作为划分标准,对决策树进行训练 '''
clf = tree.DecisionTreeClassifier(criterion='entropy')
print(clf)
clf.fit(x_train, y_train)
''''' 把决策树结构写入文件 '''
with open("tree.dot", 'w') as f:
f = tree.export_graphviz(clf, out_file=f)
''''' 系数反映每个特征的影响力。越大表示该特征在分类中起到的作用越大 '''
print(clf.feature_importances_)
'''''测试结果的打印'''
answer = clf.predict(x_train)
print(x_train)
print(answer)
print(y_train)
print(np.mean(answer == y_train))
'''''准确率与召回率'''
precision, recall, thresholds = precision_recall_curve(y_train, clf.predict(x_train))
answer = clf.predict_proba(x)[:, 1]
print(classification_report(y, answer, target_names=['thin', 'fat']))
tree.dot
digraph Tree {
node [shape=box] ;
0 [label="X[1] <= 75.0\nentropy = 1.0\nsamples= 8\nvalue = [4, 4]"] ;
1 [label="X[0] <= 1.65\nentropy =0.9183\nsamples = 6\nvalue = [4, 2]"] ;
0 -> 1 [labeldistance=2.5, labelangle=45,headlabel="True"] ;
2 [label="X[1] <= 50.0\nentropy =0.9183\nsamples = 3\nvalue = [1, 2]"] ;
1 -> 2 ;
3 [label="entropy = 0.0\nsamples = 1\nvalue = [1,0]"] ;
2 -> 3 ;
4 [label="entropy = 0.0\nsamples = 2\nvalue = [0,2]"] ;
2 -> 4 ;
5 [label="entropy = 0.0\nsamples = 3\nvalue = [3,0]"] ;
1 -> 5 ;
6 [label="entropy = 0.0\nsamples = 2\nvalue = [0,2]"] ;
0 -> 6 [labeldistance=2.5, labelangle=-45,headlabel="False"] ;
}
3在命令行 cd 定位到tree.dot的文件位置
输入 dot -Tpdf tree.dot -o tree.pdf 或者 dot -Tpng tree.dot -o tree.png 分别生成对应的pdf格式文件 或者 png图片格式
4生成tree.pdf文件