【scikit-learn】评估分类器性能的度量，像混淆矩阵、ROC、AUC等

最新推荐文章于 2024-07-24 19:56:38 发布

JasonDing1354

最新推荐文章于 2024-07-24 19:56:38 发布

阅读量2.6w

点赞数 7

分类专栏：【ML Experiments】机器学习实验文章标签： scikit-learn

本文链接：https://blog.csdn.net/JasonDing1354/article/details/50562543

版权

本文深入探讨了在scikit-learn中评估分类器性能的方法，包括分类准确率的局限性，以及如何利用混淆矩阵、ROC曲线和AUC来更全面地评估模型性能。通过实例分析，解释了混淆矩阵中的真阳性、真阴性、假阳性和假阴性，以及它们如何影响分类器的灵敏性和特效性。此外，还讨论了调整分类阈值对模型性能的影响，并强调了在类别不平衡场景下AUC作为评估指标的重要性。

摘要由CSDN通过智能技术生成

内容概要¶

模型评估的目的及一般评估流程
分类准确率的用处及其限制
混淆矩阵（confusion matrix）是如何表示一个分类器的性能
混淆矩阵中的度量是如何计算的
通过改变分类阈值来调整分类器性能
ROC曲线的用处
曲线下面积（Area Under the Curve, AUC）与分类准确率的不同

1. 回顾¶

模型评估可以用于在不同的模型类型、调节参数、特征组合中选择适合的模型，所以我们需要一个模型评估的流程来估计训练得到的模型对于非样本数据的泛化能力，并且还需要恰当的模型评估度量手段来衡量模型的性能表现。

对于模型评估流程而言，之前介绍了K折交叉验证的方法，针对模型评估度量方法，回归问题可以采用平均绝对误差（Mean Absolute Error）、均方误差（Mean Squared Error）、均方根误差（Root Mean Squared Error），而分类问题可以采用分类准确率和这篇文章中介绍的度量方法。

2. 分类准确率（Classification accuracy）¶

这里我们使用Pima Indians Diabetes dataset，其中包含健康数据和糖尿病状态数据，一共有768个病人的数据。

In [1]:

# read the data into a Pandas DataFrame
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)

In [2]:

# print the first 5 rows of data
pima.head()

Out[2]:

	pregnant	glucose	bp	skin	insulin	bmi	pedigree	age	label
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

上面表格中的label一列，1表示该病人有糖尿病，0表示该病人没有糖尿病

In [3]:

# define X and y
feature_cols = ['pregnant', 'insulin', 'bmi', 'age']
X = pima[feature_cols]
y = pima.label

In [4]:

# split X and y into training and testing sets
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [5]:

# train a logistic regression model on the training set
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

Out[5]:

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr',
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0)

In [6]:

# make class predictions for the testing set
y_pred_class = logreg.predict(X_test)

In [7]:

# calculate accuracy
from sklearn import metrics
print metrics.accuracy_score(y_test, y_pred_class)

0.692708333333

分类准确率分数是指所有分类正确的百分比。

空准确率（null accuracy）是指当模型总是预测比例较高的类别，那么其正确的比例是多少

In [8]:

# examine the class distribution of the testing set (using a Pandas Series method)
y_test.value_counts()

Out[8]:

0    130
1     62
dtype: int64

In [9]:

# calculate the percentage of ones
y_test.mean()

Out[9]:

0.32291666666666669

In [10]:

# calculate the percentage of zeros
1 - y_test.mean()

Out[10]:

0.67708333333333326

In [11]:

# calculate null accuracy(for binary classification problems coded as 0/1)
max(y_test.mean(), 1-y_test.mean())

Out[11]:

0.67708333333333326

我们看到空准确率是68%，而分类准确率是69%，这说明该分类准确率并不是很好的模型度量方法，分类准确率的一个缺点是其不能表现任何有关测试数据的潜在分布。

In [12]:

# calculate null accuracy (for multi-class classification problems)
y_test.value_counts().head(1) / len(y_test)

Out[12]:

0    0.677083
dtype: float64

最低0.47元/天解锁文章

JasonDing1354

关注

7
点赞
踩
27

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录