1. 分类模型的评估
• estimator.score()
• 一般最常见使用的是准确率,即预测结果正确的百分比
1.1 混淆矩阵
上面的不需要记,只要记住精确率和召回率。其中召回率考虑的比较多。
1.2 精确率(Precision)与召回率(Recall)
其他分类标准,F1-score,反映了模型的稳健型
2. 分类模型评估API
sklearn.metrics.classification_report
代码演示
from sklearn.naive_bayes import MultinomialNB
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report
#获取数据
news = fetch_20newsgroups(subset='all')
#对数据进行分割
x_train, x_test, y_train, y_test = train_test_split(news.data,news.target,test_size=0.25)
#对数据集进行特征抽取
tf = TfidfVectorizer()
#以训练集当中的词的列表进行每篇文章重要性统计['a','b'.'c','d']
x_train = tf.fit_transform(x_train)
# print('获取特征的名字',tf.get_feature_names())
x_test = tf.transform(x_test)
#进行朴素贝叶斯算法的预测
mlt =MultinomialNB(alpha=1.0)
# print(x_train)
mlt.fit(x_train,y_train)
y_predict = mlt.predict(x_test)
print('预测文章的类别', y_predict)
#得出准确率
print('准确率为', mlt.score(x_test,y_test))
print('每个类别的精确率和召回率:',classification_report(y_test,y_predict,target_names=news.target_names))
结果:
预测文章的类别 [ 4 8 7 ... 12 3 11]
准确率为 0.8586587436332768
每个类别的精确率和召回率: precision recall f1-score support
alt.atheism 0.90 0.79 0.84 187
comp.graphics 0.87 0.75 0.80 243
comp.os.ms-windows.misc 0.87 0.84 0.86 244
comp.sys.ibm.pc.hardware 0.73 0.91 0.81 235
comp.sys.mac.hardware 0.94 0.82 0.87 255
comp.windows.x 0.94 0.83 0.88 266
misc.forsale 0.94 0.70 0.80 241
rec.autos 0.89 0.92 0.91 245
rec.motorcycles 0.92 0.95 0.93 234
rec.sport.baseball 0.94 0.95 0.95 258
rec.sport.hockey 0.92 0.98 0.95 256
sci.crypt 0.74 0.96 0.83 251
sci.electronics 0.90 0.80 0.85 251
sci.med 0.96 0.90 0.93 235
sci.space 0.93 0.96 0.95 266
soc.religion.christian 0.66 0.97 0.79 277
talk.politics.guns 0.72 0.96 0.82 220
talk.politics.mideast 0.88 1.00 0.93 209
talk.politics.misc 1.00 0.60 0.75 188
talk.religion.misc 0.98 0.27 0.42 151
accuracy 0.86 4712
macro avg 0.88 0.84 0.84 4712
weighted avg 0.88 0.86 0.85 4712