目录
AUC理论请看 【机器学习】POC & AUC
1、sklearn中计算AUC值的方法
from sklearn.metrics import roc_auc_score auc_score = roc_auc_score(y_truth,y_pred)
y_pred即可以是类别,也可以是概率。
roc_auc_score直接根据真实值和预测值计算auc值,省略计算roc的过程。
import numpy as np
from sklearn.metrics import roc_auc_score
y = np.array([0, 0, 1, 1]) # 真实值
y_pred1 = np.array([0.3, 0.2, 0.25, 0.7]) # 预测值
y_pred2 = np.array([0, 0, 1, 0]) # 预测值
# 预测值是概率
auc_score1 = roc_auc_score(y, y_pred1)
print(auc_score1) # 0.75
# 预测值是类别
auc_score2 = roc_auc_score(y, y_pred2)
print(auc_score2) # 0.75
2、编写函数实现 auc_calculate
import numpy as np
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
#---自己按照公式实现
def auc_calculate(labels,preds,n_bins=100):
postive_len = sum(labels)
negative_len = len(labels) - postive_len
total_case = postive_len * negative_len
pos_histogram = [0 for _ in range(n_bins)]
neg_histogram = [0 for _ in range(n_bins)]
bin_width = 1.0 / n_bins
for i in range(len(labels)):
nth_bin = int(preds[i]/bin_width)
if labels[i]==1:
pos_histogram[nth_bin] += 1
else:
neg_histogram[nth_bin] += 1
accumulated_neg = 0
satisfied_pair = 0
for i in range(n_bins):
satisfied_pair += (pos_histogram[i]*accumulated_neg + pos_histogram[i]*neg_histogram[i]*0.5)
accumulated_neg += neg_histogram[i]
return satisfied_pair / float(total_case)
if __name__ == '__main__':
y = np.array([1,0,0,0,1,0,1,0,])
pred = np.array([0.9, 0.8, 0.3, 0.1,0.4,0.9,0.66,0.7])
fpr, tpr, thresholds = roc_curve(y, pred, pos_label=1)
print("-----sklearn:",auc(fpr, tpr))
print("-----py脚本:",auc_calculate(y,pred))
我觉得写代码的时候用sklearn自带的就好,方便快捷。理解理论的时候看看自己编写(他人编写)的代码比较好。~
参考: