auc是什么
auc是roc的面积,是从一批正负样本中,选择一个正样本,一个负样本,正样本比负样本预测值大的概率。我们使用频率去逼近概率,所以就先对所有样本预测值从大到小排序,并从高到低赋予rank值。抽取正负样本的组合共有 M*N,作为分母。抽取一个正样本,抽取一个比正样本预测值低的负样本,这个组合作为分子。最后的表达式就是
python代码
# 在 0.8, 0.6,0.5,0.5,0.5,0.5,0.3,0.2
# 1 1 0 0 1 0 0 0
#pos_count 是指正类的数量
#cur_sum 是累计的rank值
#count 是正负类一起的数量
def getAuc(labels,preds):
sorted_index = sorted(range(len(labels)),key = lambda i:preds[i])
cur_sum = 0
count = 0
pos = 0
neg = 0
pos_count = 0
auc = 0
last_pred = preds[sorted_index[0]]
for i in range(len(sorted_index)):
index = sorted_index[i]
pred = preds[index]
label = labels[index]
if label>0:
pos+=1
else:
neg+=1
if pred==last_pred:
cur_sum+=i+1
count+=1
if label>0:
pos_count += 1
else:
auc+= pos_count * cur_sum/count
cur_sum = i+1
count =1
last_pred = pred
if label >0:
pos_count = 1
else:
pos_count = 0
auc += pos_count * cur_sum/count
auc = auc - pos*(pos+1)*1.0/2
auc = auc /(pos*neg)
return auc
pred = [0.1,0.2,0.3,0.4,0.5,0.6,0.6,0.6,0.6,0.7,0.8,0.9,1.0]
label = [0,0,0,0,0,1,1,1,1,1,0,1,1]
print(getAuc(label,pred))