第5章【思考与练习2】将数据集划分为训练集与测试集，查看决策树分类器的性能。将例5-3中的分类器保存到文件中，然后重新加载预测给出的新数据。

最新推荐文章于 2024-03-30 16:29:35 发布

是学数据分析的阿龙

最新推荐文章于 2024-03-30 16:29:35 发布

阅读量1w

点赞数 10

分类专栏：宋晖《数据科学技术与应用》第2版课后习题答案文章标签：决策树数据分析 python 机器学习

本文链接：https://blog.csdn.net/m0_51474486/article/details/124143862

版权

宋晖《数据科学技术与应用》第2版课后习题答案专栏收录该内容

26 篇文章

订阅专栏

P101思考与练习2

1.将数据集划分为训练集与测试集，查看决策树分类器的性能。

#1、
#划分为测试集与训练集
import pandas as pd
data = pd.read_csv('data/bankdebt.csv',index_col = 0,header = None)
data.loc[data[1] =='Yes',1] = 1
data.loc[data[1] =='No',1] = 0
data.loc[data[4] =='Yes',4] = 1
data.loc[data[4] =='No',4] = 0
data.loc[data[2] =='Single',2] = 1
data.loc[data[2] =='Married',2] = 2
data.loc[data[2] =='Divorced',2] = 3
from sklearn import model_selection
x = data.iloc[: ,0:3].values.astype(float)
y = data.iloc[: ,3].values.astype(int)
x_train,x_test,y_train,y_test = model_selection.train_test_split(x,y,test_size = 0.3,random_state = 1)
#查看决策分类器的性能
#训练集中训练
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(x_train,y_train)
clf.score(x_train,y_train)
#在测试集上测试，性能评估
predicted_y = clf.predict(x_test)
from sklearn import metrics
print(metrics.classification_report(y_test,predicted_y))
print('Confusion matrix:')
print(metrics.confusion_matrix(y_test,predicted_y))

拓展：画决策树（接以上代码）

方法一：使用export_text函数，需要注意的是，需要scikit-learn（简称sklearn）库版本≥0.21.3才可显示，否则会出错。

查看sklearn库版本：（其他库查看版本同此方法）
import sklearn
print(sklearn.__version__)
下载0.21.3版本的库：（其他库下载一定版本同此方法）
#点开Anaconda Prompt，输入如下内容：
pip install scikit-learn==0.21.3
下载好重启Jupyter Notebook，再次运行即可。

#【拓展】显示生成的决策树
#方法一，使用export_text函数
from sklearn.tree import export_text
fName = ['House','Marital','Income']                  
clfStruc = export_text(clf,feature_names = fName)
print(clfStruc)

方法二：使用Graphviz将得到的决策树可视化。需要下载Graphviz与Ipython，Graphviz是个软件，不能直接用pip安装。Ipython也是利用Python进行科学计算和交互可视化的一个最佳的平台，直接用pip安装：
#打开Anaconda Prompt，输入以下内容：
pip install ipython
Graphviz是一个开源的图（Graph）可视化软件，采用抽象的图和网络来表示结构化的信息。下载地址：Download | Graphvizhttps://graphviz.org/download/

我的电脑是Windows-64位，根据自己机型选择相应版本下载：

下载安装好需要再设置环境变量，否则会出错。

通过如下代码运行即可添加环境变量： 'E:/anaconda/Graphviz/bin/'是我安装Graphviz的地址路径。
import os
os.environ["PATH"] += os.pathsep + 'E:/anaconda/Graphviz/bin/'
添加好环境变量，就可以绘图了。

#方法二，使用Graphviz将得到的决策树可视化
#生成并显示决策树图
featureName =['House', 'Marital', 'Income']
className = ['Cheat','Not Cheat']
#生成图
from graphviz import Source
graph = Source( tree.export_graphviz(clf, out_file=None,feature_names=featureName,class_names=className))
#保存到文件中并显示
png_bytes = graph.pipe(format='png')
with open('dectree.png','wb') as f:
    f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)

结果如下：

2.将例5-3中的分类器保存到文件中，然后重新加载预测给出的新数据。

#2、
from sklearn.externals import joblib
joblib.dump(clf,'clf.pkl')        #将模型保存到本地
import numpy as np
load_clf = joblib.load('clf.pkl')    #调入本地模型
new_x = np.array([[0,1,7.5]])        #导入新数据(自己编)
print('是否可以偿还债务：',np.where(load_clf.predict(new_x)==0,'No','Yes'))    #对数据进行预测