【机器学习】应用简例 决策树 并用graphviz可视化树

导入库

import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import train_test_split

导入数据

df_t=pd.read_excel(r'D:\EdgeDownloadPlace\3dd40612152202ee8440f82a3d277008\train.xlsx')
df_t=df_t.drop(columns='uid')
df_t

df_t
用众数替换问号

for col in df_t.columns:
    df_t[col][df_t[col] == '?'] = df_t[col].value_counts().index[0] if df_t[col].value_counts().index[0] != '?' else df_t[col].value_counts().index[1]
df_t

df_t2
得到float类型矩阵

arr_t=df_t.values.astype(np.float32)
arr_t

array([[61., 0., 2., …, 0., 7., 0.],
[64., 1., 3., …, 0., 7., 1.],
[40., 0., 4., …, 0., 6., 1.],
…,
[65., 0., 3., …, 1., 3., 0.],
[63., 1., 4., …, 0., 7., 0.],
[55., 0., 4., …, 1., 7., 1.]], dtype=float32)
把训练集分成测试集训练集(倒入的全是训练集)

Xtrain,Xtest,Ytrain,Ytest = train_test_split(arr_t[:,:-1],arr_t[:,-1],test_size=0.3)

实例化决策树,训练模型,查看正确率

dtc = tree.DecisionTreeClassifier(criterion="entropy"
                                 ,max_depth=4
                                 ,min_samples_split=10).fit(Xtrain,Ytrain)
score = dtc.score(Xtest,Ytest)
score

0.8140703517587939

画图

graph_tree = graphviz.Source(tree.export_graphviz(dtc
                                 ,feature_names = df_t.keys()[:-1]
                                 ,class_names = ['患病','不患病']
                                 ,filled = True
                                 ,rounded = True))
graph_tree

graph_tree

©️2020 CSDN 皮肤主题: 数字20 设计师:CSDN官方博客 返回首页