信息论基础(熵 联合熵 条件熵 信息增益 基尼不纯度)
决策树简单的理解为if-then的集合,其优点主要有分类速度快、可读性等。
决策树的生成主要可分为三个步骤:特征的选择、决策树的生成、决策树的剪枝。
熵
联合熵
条件熵
信息增益
基尼不纯度
决策树的不同分类算法(原理及应用场景)
ID3算法
C4.5
CART分类树
回归树原理
决策树防止过拟合手段
模型评估
sklearn参数详解,Python绘制决策树
函数:sklearn.tree.DecisionTreeClassifier()
当不进行剪枝时:
#!/user/bin/env python
#-*- coding:utf-8 -*-
from sklearn.tree import export_graphviz
from sklearn.tree import DecisionTreeClassifier
#3:1拆分数据集
from sklearn.model_selection import train_test_split
#乳腺癌数据集
from sklearn.datasets import load_breast_cancer
import pydot
cancer = load_breast_cancer()
#参数random_state是指随机生成器,0表示函数输出是固定不变的
X_train,X_test,y_train,y_test = train_test_split(cancer['data'],cancer['target'],random_state=42)
tree = DecisionTreeClassifier(random_state=0)
tree.fit(X_train,y_train)
print('Train score:{:.3f}'.format(tree.score(X_train,y_train)))
print('Test score:{:.3f}'.format(tree.score(X_test,y_test)))
#生成可视化图
export_graphviz(tree,out_file="tree.dot",class_names=['严重','轻微'],feature_names=cancer.feature_names,impurity=False,filled=True)
#展示可视化图
(graph,) = pydot.graph_from_dot_file('tree.dot')
graph.write_png('tree.png')
进行预剪枝时:
#!/user/bin/env python
#-*- coding:utf-8 -*-
from sklearn.tree import export_graphviz
from sklearn.tree import DecisionTreeClassifier
#3:1拆分数据集
from sklearn.model_selection import train_test_split
#乳腺癌数据集
from sklearn.datasets import load_breast_cancer
import pydot
cancer = load_breast_cancer()
#参数random_state是指随机生成器,0表示函数输出是固定不变的
X_train,X_test,y_train,y_test = train_test_split(cancer['data'],cancer['target'],random_state=42)
#设置深度为4,即产生4个问题就停止生长
tree = DecisionTreeClassifier(max_depth=4,random_state=0)
tree.fit(X_train,y_train)
print('Train score:{:.3f}'.format(tree.score(X_train,y_train)))
print('Test score:{:.3f}'.format(tree.score(X_test,y_test)))
#生成可视化图
export_graphviz(tree,out_file="tree.dot",class_names=['严重','轻微'],feature_names=cancer.feature_names,impurity=False,filled=True)
#展示可视化图
(graph,) = pydot.graph_from_dot_file('tree.dot')
graph.write_png('tree.png')
参考
西瓜书
cs229吴恩达机器学习课程
李航统计学习
谷歌搜索
公式推导参考:http://t.cn/EJ4F9Q0
任务地址