回归模型评估指标
均方误差(MSE)
M S E = 1 m ∑ i = 1 m ( y t e s t ( i ) − y ^ t e s t ( i ) ) 2 MSE = \frac{1}{m}\sum_{i=1}^{m}(y_{test}^{(i)} - \hat{y}_{test}^{(i)})^2 MSE=m1i=1∑m(ytest(i)−y^test(i))2
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_true, y_predict))
均方根误差(RMSE)
R M S E = M S E RMSE = \sqrt{MSE} RMSE=MSE
平均绝对误差(MAE)
M A E = 1 m ∑ i = 1 m ∣ y t e s t ( i ) − y ^ t e s t ( i ) ∣ MAE = \frac{1}{m}\sum_{i=1}^{m}|y_{test}^{(i)} - \hat{y}_{test}^{(i)}| MAE=m1i=1∑m∣ytest(i)−y^test(i)∣
from sklearn.metrics import mean_absolute_error
print(mean_absolute_error(y_true, y_predict))
R方(R Square)
- 主要的评估指标,具备统一评价指标,范围<=1
R 2 = 1 − ∑ i = 1 m ( y ^ ( i ) − y ( i ) ) 2 ∑ i = 1 m ( y ‾ − y ( i ) ) 2 = 1 − M S E v a r ( y ) , v a r 表 示 方 差 R^2 = 1 - \frac{\sum_{i=1}^m(\hat{y}^{(i)} - y^{(i)})^2}{\sum_{i=1}^m(\overline{y} - y^{(i)})^2} = 1 - \frac{MSE}{var(y)},var表示方差 R2=1−∑i=1m(y−y(i))2∑i=1m(y^(i)−y(i))2=1−var(y)MSE,var表示方差
from sklearn.metrics import r2_score
print(1 - mean_squared_error(y, y_predict)/np.var(y))
print(r2_score(y, y_predict)) # sk-learn中的R方
分类模型评估指标
准确度
计算预测正确的样本占所有测试集的比例
from sklearn.metrics import accuracy_score
# 根据测试结果与预测结果求准确度
score1 = accuracy_score(y_test, y_predict)
# 根据测试集求准确度
score2 = KNN_classifier.score(x_test, y_test)
混淆矩阵
- 对于极度偏斜的数据,使用准确度来评价分类结果是远远不够的
对于二分类问题
行代表真实值,列代表预测值
0 —— negative 1—— positive
0(预测) | 1(预测) | |
---|---|---|
0(真实) | 预测negative正确的数量(TN) | 预测positive错误的数量(FP) |
1(真实) | 预测negative错误的数量(FN) | 预测positive正确的数量(TP) |
from sklearn.metrics import confusion_matrix
# 混淆矩阵
confusion_matrix(y_test, y_predict)
- 精准率
p r e c i s i o n = T P T P + F P precision = \frac{TP}{TP+FP} precision=TP+FPTP
from sklearn.metrics import precision_score
precision_score(y_test, y_predict)
- 召回率
r e c a l l = T P T P + F N recall = \frac{TP}{TP+FN} recall=TP+FNTP
from sklearn.metrics import recall_score
recall_score(y_test, y_predict)
精准率与召回率互相牵制
- 调和平均值
1 F 1 = 1 2 ( 1 p r e c i s i o n + 1 r e c a l l ) \frac{1}{F1} = \frac{1}{2}(\frac{1}{precision}+\frac{1}{recall}) F11=21(precision1+recall1)
F 1 = 2 p r e c i s i o n ∗ r e c a l l p r e c i s i o n + r e c a l l F1 = \frac{2precision*recall}{precision+recall} F1=precision+recall2precision∗recall
from sklearn.metrics import f1_score
f1_score(y_test, y_predict)
ROC曲线
- TPR
T P R = r e c a l l = T P T P + F N TPR = recall = \frac{TP}{TP+FN} TPR=recall=TP+FNTP
- FPR
F P R = F P T N + F P FPR = \frac{FP}{TN+FP} FPR=TN+FPFP
from sklearn.metrics import roc_curve
fprs, tprs, thresholds = roc_curve(y_test, decision_scores)
plt.plot(fprs, tprs)
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('ROC curve')
plt.show()
- ROC曲线面积,面积越大,模型越好
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,decision_scores)
多分类中的混淆矩阵
# 精确率与召回率
precision_score(y_test, y_predict, average='micro')
recall_score(y_test, y_predict, average='micro')