评价指标
- 分类问题常用指标:
两类:accuracy, Precision,Recall,F-score,Pr曲线,ROC-AUC曲线。
多类:accuracy, 宏平均和微平均,F-score
Actual | Class | ||
---|---|---|---|
1 | 0 | ||
Predict | 1 | True positive | False positive |
Class | 0 | False negetive | True negetive |
- accuracy: T p + T n t o t a l \frac{Tp+Tn}{total} totalTp+Tn
- precision: T p P r e d i c t e d p o s i t i v e = T p T p + F p \frac{Tp}{Predicted \ positive }=\frac{Tp}{Tp+Fp} Predicted positiveTp=Tp+FpTp
- recall: T p A c t u a l p o s i t i v e = T p T p + F n \frac{Tp}{Actual\ positive }=\frac{Tp}{Tp+Fn} Actual positiveTp=Tp+FnTp
- F1 Score: 2 P R P + R 2\frac{PR}{P+R} 2P+RPR
- 回归预测常用指标:平均绝对误差(Mean Absolute Error,MAE),均方误差(Mean Squared Error,MSE),平均绝对百分误差(Mean Absolute Percentage Error,MAPE),均方根误差(Root Mean Squared Error), R2(R-Square)
M A E = 1 N ∑ i = 1 N ∣ y i − y ^ i ∣ MAE=\frac{1}{N} \sum_{i=1}^{N}\left|y_{i}-\hat{y}_{i}\right| MAE=N1i=1∑N∣yi−y^i∣
M S E = 1 N ∑ i = 1 N ( y i − y ^ i ) 2 MSE=\frac{1}{N} \sum_{i=1}^{N}\left(y_{i}-\hat{y}_{i}\right)^{2} MSE=N1i=1∑N(yi−y^i)2
M A P E = 1 N ∑ i = 1 N ∣ y i − y ^ i ∣ y i MAPE=\frac{1}{N} \sum_{i=1}^{N}\frac{|y_{i}-\hat{y}_{i}|}{y_i} MAPE=N1i=1∑Nyi∣yi−y^i∣
R 2 = 1 − S S r e s S S t o t = 1 − ∑ ( y i − y ^ i ) 2 ∑ ( y i − y ‾ ) 2 R^{2}=1-\frac{SS_{res}}{SS_{tot}}=1-\frac{\sum\left(y_{i}-\hat{y}_{i}\right)^{2}}{\sum\left(y_{i}-\overline{y}\right)^{2}} R2=1−SStotSSres=1−∑(yi−y)2∑(yi−y^i)2
代码
导入数据
读取数据
import pandas as pd
data = pd.read_csv('path', sep=' ')
查看数据基本结构
print('Train data shape:',Train_data.shape)
print('TestA data shape:',Test_data.shape)
Train_data.head()
Train.data.describe()
评价指标示例
- 分类指标:accuracy, precision, recall, F1-score, AUC
from sklearn import metrics
print('ACC:', metrics.accuracy_score(y_true, y_pred))
print('Precision', metrics.precision_score(y_true, y_pred))
print('Recall', metrics.recall_score(y_true, y_pred))
print('F1-score:', metrics.f1_score(y_true, y_pred))
print('AUC socre:', metrics.roc_auc_score(y_true, y_scores))
- 回归指标:MSE, RMSE, MAE, MAPE, R2 score
import numpy as np
from sklearn import metrics
# MAPE需要自己实现
def mape(y_true, y_pred):
return np.mean(np.abs((y_pred - y_true) / y_true))
# MSE
print('MSE:',metrics.mean_squared_error(y_true, y_pred))
# RMSE
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true, y_pred)))
# MAE
print('MAE:',metrics.mean_absolute_error(y_true, y_pred))
# MAPE
print('MAPE:',mape(y_true, y_pred))
print('R2-score:',metrics.r2_score(y_true, y_pred))
参考 https://github.com/datawhalechina/team-learning
Datawhale Task1 赛题理解