目录
一、观察表格,导入库
import pandas as pd
from sklearn import tree
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
data=pd.read_excel('文件名字.xlsx')
data.head()
二、按照9:1划分数据集和训练集
from sklearn.model_selection import train_test_split
feature_vector=data.iloc[:,1:13]
objcet_variable=data['购买意愿']
x_train,x_test,y_train,y_test=train_test_split(feature_vector,objcet_variable,test_size=0.1,random_state=3)
三、构建模型训练模型进行评估
clf=tree.DecisionTreeClassifier(criterion='entropy',max_depth=7,random_state=3)
clf=clf.fit(x_train,y_train)
clf.score(x_test,y_test)
结果为0.875
四、使用混淆矩阵对模型进行评估
import warnings
from sklearn.metrics import ConfusionMatrixDisplay
warnings.filterwarnings('ignore')
y_pred=clf.predict(x_test)
cm=confusion_matrix(y_test,y_pred)
ConfusionMatrixDisplay(confusion_matrix=cm).plot()
该模型可以作为决策的依据
五、构建决策树并可视化
import matplotlib
y_pred=clf.predict(x_test)
confusion_matrix(y_test,y_pred)
name=data.columns
feature_names=name.tolist()
class_names=['愿意','不愿意']
plt.figure(dpi=80,figsize=(25,9))
matplotlib.rcParams['font.sans-serif'] = ['SimHei'] # 显示中文
matplotlib.rcParams['axes.unicode_minus'] = False
tree.plot_tree(clf,feature_names=feature_names,class_names=class_names,impurity=False,fontsize=10)
六、预测
forecast=pd.read_excel('文件名字.xlsx',sheet_name='预测客户数据')
clf.predict(forecast.iloc[:,1:])
结果: array([0, 0, 1, 0, 0], dtype=int64)
--------------------------------------------------------------------------------------------------------------------------------
兄弟们,写题不易,记得点赞关注!!!!!