机器学习决策树作业
作业1:用独热编码表达天气数据集,并画出决策树。
作业2:
对于以下数据集,实际值和预测值:
data = {‘y_Actual’: [‘Yes’, ‘No’, ‘No’, ‘Yes’, ‘No’, ##‘Yes’, ‘No’, ‘No’, ‘Yes’, ‘No’, ‘Yes’, ‘No’],
‘y_Predicted’: [‘Yes’, ‘Yes’, ‘No’, ‘Yes’, ‘No’, ‘Yes’, ‘Yes’, ‘No’, ‘Yes’, ‘No’, ‘No’, ‘No’] }
1)手动计算查全率、查准率、精确率和F1分数(F1 score 先自学)(重要,可能会考试)
2)利用skleran编程验证上述结果
作业1:用独热编码表达天气数据集,并画出决策树。
用独热编码表达天气数据集
# 导入库
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier
# 导入数据
df = pd.read_csv('playornot.csv')
# 分割数据集
X, y = df.iloc[:, :-1], df.iloc[:, -1]
#实例化
enc = preprocessing.OneHotEncoder()
# 训练模型
enc.fit(X,y)
# 以数组形式显示转化后的独热编码
df1 = enc.transform(X).toarray()
输出
画出决策树
# 构建模型
cls = DecisionTreeClassifier()
# 训练模型
cls.fit(df1,y)
# 树相关信息
feature_name,class_name = df1,df['类别'].unique() #中间节点和叶子节点
# 中文乱码
plt.rcParams['font.sans-serif'] = ['SimHei']
# 创建画布,并设置画布格式为1行1列,画布长宽为10英寸
fig, ax = plt.subplots(1, 1, figsize = (10,10))
# 绘制决策树
tree.plot_tree(cls, feature_names = feature_name, class_names = class_name, filled = True)
plt.show()
输出
作业2:
对于以下数据集,实际值和预测值:
data = {‘y_Actual’: [‘Yes’, ‘No’, ‘No’, ‘Yes’, ‘No’, ##‘Yes’, ‘No’, ‘No’, ‘Yes’, ‘No’, ‘Yes’, ‘No’],
‘y_Predicted’: [‘Yes’, ‘Yes’, ‘No’, ‘Yes’, ‘No’, ‘Yes’, ‘Yes’, ‘No’, ‘Yes’, ‘No’, ‘No’, ‘No’] }
from sklearn import metrics
y_Predicted = [1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]
y_Actual = [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]
print("precision_score = :", metrics.precision_score(y_Predicted, y_Actual))
print("recall_score = :", metrics.recall_score(y_Predicted, y_Actual))
print("accuracy_score = :", metrics.accuracy_score(y_Predicted, y_Actual))
print("f1_score=:", metrics.f1_score(y_Predicted, y_Actual))
结果
原创不易 转载请标明出处
如果对你有所帮助 别忘啦点赞支持哈