Q:对数据集,分别采用信息增益和Gini指标,利用sklearn的DecisionTreeClassifier 函数构建决策树
代码:
from matplotlib import pyplot as plt
# 特征
a1 = [1,1,1,1,1,0,0,0,1,1]
a2 = [0,1,1,0,1,0,0,0,1,0]
X=[]
for i in range(len(a1)):
x = [a1[i],a2[i]]
X.append(x)
# 类别
Y = [1,1,1,0,1,0,0,0,0,0]
from sklearn import tree
# clf = tree.DecisionTreeClassifier(criterion='entropy')
clf = tree.DecisionTreeClassifier(criterion='gini')
clf = clf.fit(X,Y)
tree.plot_tree(clf,filled=True)
plt.title("Decision tree trained on all the features")
plt.show()
# 文本输出决策树
r = tree.export_text(clf)
print(r)
1、Gini指标:
(文本决策树)
|--- feature_1 <= 0.50
| |--- feature_0 <= 0.50
| | |--- class: 0
| |--- feature_0 > 0.50
| | |--- class: 0
|--- feature_1 > 0.50
| |--- class: 1
2、信息增益:
(文本决策树)
|--- feature_0 <= 0.50
| |--- class: 0
|--- feature_0 > 0.50
| |--- feature_1 <= 0.50
| | |--- class: 0
| |--- feature_1 > 0.50
| | |--- class: 1