Decision

最新推荐文章于 2024-01-16 22:08:10 发布

Eily_king

最新推荐文章于 2024-01-16 22:08:10 发布

阅读量124

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/u011084269/article/details/104406074

版权

python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏


# -*-coding: utf-8-*-
import numpy as np
data =np. array([[ 0.3, 5, 2, 0],
[0.4, 6, 0, 0], 
[0.5, 6.5, 1, 1],
[0.6, 6, 0, 0],
[0.7, 9, 2, 1],
[0.5, 7, 1, 0],
[0.4, 6, 0, 0],
[0.6, 8.5, 0, 1],
[0.3, 5.5, 2, 0],
[0.9, 10, 0, 1],
[1, 12, 1, 0],
[0.6, 9, 1, 0]])


Y = data[:, -1]
X = data[:, 0:-1]
from sklearn.model_selection import train_test_split
from sklearn import tree

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X = cancer['data']
Y = cancer['target']
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# 概率分布，信息熵衡量信息量的大小，也就是对随机变量不确定度的一个衡量。熵越大，不确定性越大entropy gini＃比起基尼系数，信息熵对不纯度更加敏感，对不纯度的惩罚最强。但是在实际使用中，信息烟和基尼系数的效果基本相同。
# 信息熵的计算比基尼系数缓慢-些，因为基尼系数的计算不涉及对数。另外，因为信息熵对不纯度更加敏感
# 剪枝参数max_depth

clf = tree.DecisionTreeClassifier(criterion='entropy', max_depth=4)
clf.fit(x_train, y_train)
# print('特征所占的权重是:', clf.feature_importances_）
answer = clf.predict(x_test)
# print('测试数据使用模型预测对应的类是:', answer)
# print('测试数据对应的类是:', y_test)
from collections import Counter 
print(clf.score(x_train, y_train))
print(clf.score(x_test, y_test)) 
from sklearn. metrics import precision_recall_curve
from sklearn. metrics import classification_report

# 准确率与召回率！！！！！！

print(np.mean(answer==y_test))
precision, recall, thresholds = precision_recall_curve(y_train, clf. predict(x_train))
answer = clf.predict_proba(X)[:, 1] 
answer = np.where(answer<0.1,0, 1)
a =np.zeros(Y.shape)
a[Y==answer] =1
print(Counter(Y), Counter(a))
print(classification_report(Y, answer))

Eily_king

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Decision

# -*-coding: utf-8-*-import numpy as npdata =np. array([[ 0.3, 5, 2, 0][0.4, 6, 0, 0], [0.5, 6.5, 1, 1],[0.6, 6, 0, 0],[0.7, 9, 2, 1],[0.5, 7, 1, 0],[0.4, 6, 0, 0],[0.6, 8.5, 0, 1],[0.3, 5....
复制链接

扫一扫