classification_report进阶:针对top-k的结果计算precision@k、recall@k、f1-score@k

sklearn自带的classification_report方法可以针对二分类或多分类问题,计算分类器的precision、recall和f1-score。

示例:

from sklearn.metrics import classification_report
y_true=[0,1,2,2,0]
y_pred=[1,0,2,1,1]
print(classification_report(y_true,y_pred))

运行结果为:

                precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00         1
           2       1.00      0.50      0.67         2

    accuracy                           0.20         5
   macro avg       0.33      0.17      0.22         5
weighted avg       0.40      0.20      0.27         5

       可以看出对于该三分类问题(类别分别为0,1,2),classification_report函数可以求出每一类的precision、recall和f1-score值。并且可以给出按照每类类别数据量加权求出的weighted avg的值。不难看出,该函数输入的预测值y_pred要求每条数据只给出一条预测结果。然而,在实际应用中,可能经常需要针对模型给出的top-k个结果进行评估(k一般取3和5)。此时,classification_report方法是否支持呢?不妨试验一下:

y_true=[0, 5, 0, 3, 4, 2, 1, 1, 5, 4]
y_pred=
[[0, 0, 2, 1, 5],
 [2, 2, 4, 1, 4],
 [4, 5, 1, 3, 5],
 [5, 4, 2, 4, 3],
 [2, 0, 0, 2, 3],
 [3, 3, 4, 1, 4],
 [1, 1, 0, 1, 2],
 [1, 4, 4, 2, 4],
 [4, 1, 3, 3, 5],
 [2, 4, 2, 2, 3]]

针对一个六分类场景,对每条数据给出top5的结果(按照概率降序),计算每类的进度

print(classification_report(y_true,y_pred))

发现报错了:

ValueError                                Traceback (most recent call last)
<ipython-input-176-8c81bcbe70d1> in <module>()
----> 1 print(classification_report(y_true,y_pred))

E:\python35\lib\site-packages\sklearn\metrics\_classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
   1965     """
   1966 
-> 1967     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1968 
   1969     labels_given = True

E:\python35\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     88     if len(y_type) > 1:
     89         raise ValueError("Classification metrics can't handle a mix of {0} "
---> 90                          "and {1} targets".format(type_true, type_pred))
     91 
     92     # We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multiclass and multiclass-multioutput targets

显然,原生的classification_report不支持给出多个预测结果时对precision等结果的计算。

因此,针对这个问题,开发方法实现:当模型给出k(k>1)个预测结果时,计算precision@k、recall@k及f1_score@k

方法见下:

#y_true: 1d-list-like
#y_pred: 2d-list-like
#num: 针对num个结果进行计算(num<=y_pred.shape[1])
def precision_recall_fscore_k(y_true,y_pred,num=3):
    if not isinstance(y_pred[0],list):
        y_pred=[[each] for each in y_pred]
#     print(y_pred)
    y_pred=[each[0:num] for each in y_pred]
    unique_label=count_unique_label(y_true,y_pred)
    #计算每个类别的precision、recall、f1-score、support
    res={}
    result=''
    for each in unique_label:
        cur_res=[]
        tp_fn=y_true.count(each)#TP+FN
        #TP+FP
        tp_fp=0
        for i in y_pred:
            if each in i:
                tp_fp+=1
        #TP
        tp=0
        for i in range(len(y_true)):
            if y_true[i] == each and each in y_pred[i]:
                tp+=1
        support=tp_fn
        try:
            precision=round(tp/tp_fp,2)
            recall=round(tp/tp_fn,2)
            f1_score=round(2/((1/precision)+(1/recall)),2)
        except ZeroDivisionError:
            precision=0
            recall=0
            f1_score=0
        cur_res.append(precision)
        cur_res.append(recall)
        cur_res.append(f1_score)
        cur_res.append(support)
        res[str(each)]=cur_res
    title='\t'+'precision@'+str(num)+'\t'+'recall@'+str(num)+'\t'+'f1_score@'+str(num)+'\t'+'support'+'\n'
    result+=title
    for k,v in sorted(res.items()):
        cur=str(k)+'\t'+str(v[0])+'\t'+str(v[1])+'\t'+str(v[2])+'\t'+str(v[3])+'\n'
        result+=cur
    sums=len(y_true)
    weight_info=[(v[0]*v[3],v[1]*v[3],v[2]*v[3]) for k,v in sorted(res.items())]
    weight_precision=0
    weight_recall=0
    weight_f1_score=0
    for each in weight_info:
        weight_precision+=each[0]
        weight_recall+=each[1]
        weight_f1_score+=each[2]
    weight_precision/=sums
    weight_recall/=sums
    weight_f1_score/=sums
    last_line='avg_total'+'\t'+str(round(weight_precision,2))+'\t'+str(round(weight_recall,2))+'\t'+str(round(weight_f1_score,2))+'\t'+str(sums)
    result+=last_line
    return result
#统计所有的类别
def count_unique_label(y_true,y_pred):
    unique_label=[]
    for each in y_true:
        if each not in unique_label:
            unique_label.append(each)
    for i in y_pred:
        for j in i:
            if j not in unique_label:
                unique_label.append(j)
    unique_label=list(set(unique_label))
    return unique_label

运行precision_recall_fscore_k方法

res=precision_recall_fscore_k(y_true,y_pred,num=3)
print(res)

得到结果:

   precision@3	recall@3	f1_score@3	support
0	0.33	0.5	0.4	2
1	0.5	1.0	0.67	2
2	0	0	0	1
3	0	0	0	1
4	0.14	0.5	0.22	2
5	0	0	0	2
avg_total	0.19	0.4	0.26	10

ok。实现功能

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~·

讨论:

对模型给出的top-k结果进行precision、recall和f1-score进行评估:k>1和k=1相比,前者的precision一般会降低;recall一般会提高;f1-score的变化不一定,取决于precision和recall的综合变化。

原因:(1)precision=TP/(TP+FP)。对于k>1的情况,整体测试集的TP值是确定的,当模型为每个测试数据给出更多的结果时,TP+FP的结果一般会增加,所以precision一般减小(2)recall=TP/(TP+FN),对于k>1的情况,TP+FN的数值一般减小,所以recall上升(3)f1-score=2/((1/precision)+(1/recall))。所以f1-score的变化趋势不定,取决于precision和recall的共同变化。

  • 6
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值