classification_report进阶：针对top-k的结果计算precision@k、recall@k、f1-score@k

最新推荐文章于 2024-06-28 09:23:11 发布

老穷酸

最新推荐文章于 2024-06-28 09:23:11 发布

阅读量1.7w

点赞数 6

分类专栏： Python 文章标签：机器学习数据挖掘算法

本文链接：https://blog.csdn.net/dipizhong7224/article/details/104579159

版权

Python 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

sklearn自带的classification_report方法可以针对二分类或多分类问题，计算分类器的precision、recall和f1-score。

示例：

from sklearn.metrics import classification_report
y_true=[0,1,2,2,0]
y_pred=[1,0,2,1,1]
print(classification_report(y_true,y_pred))

运行结果为：

                precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00         1
           2       1.00      0.50      0.67         2

    accuracy                           0.20         5
   macro avg       0.33      0.17      0.22         5
weighted avg       0.40      0.20      0.27         5

可以看出对于该三分类问题（类别分别为0，1，2），classification_report函数可以求出每一类的precision、recall和f1-score值。并且可以给出按照每类类别数据量加权求出的weighted avg的值。不难看出，该函数输入的预测值y_pred要求每条数据只给出一条预测结果。然而，在实际应用中，可能经常需要针对模型给出的top-k个结果进行评估（k一般取3和5）。此时，classification_report方法是否支持呢？不妨试验一下：

y_true=[0, 5, 0, 3, 4, 2, 1, 1, 5, 4]
y_pred=
[[0, 0, 2, 1, 5],
 [2, 2, 4, 1, 4],
 [4, 5, 1, 3, 5],
 [5, 4, 2, 4, 3],
 [2, 0, 0, 2, 3],
 [3, 3, 4, 1, 4],
 [1, 1, 0, 1, 2],
 [1, 4, 4, 2, 4],
 [4, 1, 3, 3, 5],
 [2, 4, 2, 2, 3]]

针对一个六分类场景，对每条数据给出top5的结果（按照概率降序），计算每类的进度

print(classification_report(y_true,y_pred))

发现报错了：

ValueError                                Traceback (most recent call last)
<ipython-input-176-8c81bcbe70d1> in <module>()
----> 1 print(classification_report(y_true,y_pred))

E:\python35\lib\site-packages\sklearn\metrics\_classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
   1965     """
   1966 
-> 1967     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1968 
   1969     labels_given = True

E:\python35\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     88     if len(y_type) > 1:
     89         raise ValueError("Classification metrics can't handle a mix of {0} "
---> 90                          "and {1} targets".format(type_true, type_pred))
     91 
     92     # We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multiclass and multiclass-multioutput targets

显然，原生的classification_report不支持给出多个预测结果时对precision等结果的计算。

因此，针对这个问题，开发方法实现：当模型给出k（k>1）个预测结果时，计算precision@k、recall@k及f1_score@k

方法见下：

#y_true: 1d-list-like
#y_pred: 2d-list-like
#num: 针对num个结果进行计算（num<=y_pred.shape[1]）
def precision_recall_fscore_k(y_true,y_pred,num=3):
    if not isinstance(y_pred[0],list):
        y_pred=[[each] for each in y_pred]
#     print(y_pred)
    y_pred=[each[0:num] for each in y_pred]
    unique_label=count_unique_label(y_true,y_pred)
    #计算每个类别的precision、recall、f1-score、support
    res={}
    result=''
    for each in unique_label:
        cur_res=[]
        tp_fn=y_true.count(each)#TP+FN
        #TP+FP
        tp_fp=0
        for i in y_pred:
            if each in i:
                tp_fp+=1
        #TP
        tp=0
        for i in range(len(y_true)):
            if y_true[i] == each and each in y_pred[i]:
                tp+=1
        support=tp_fn
        try:
            precision=round(tp/tp_fp,2)
            recall=round(tp/tp_fn,2)
            f1_score=round(2/((1/precision)+(1/recall)),2)
        except ZeroDivisionError:
            precision=0
            recall=0
            f1_score=0
        cur_res.append(precision)
        cur_res.append(recall)
        cur_res.append(f1_score)
        cur_res.append(support)
        res[str(each)]=cur_res
    title='\t'+'precision@'+str(num)+'\t'+'recall@'+str(num)+'\t'+'f1_score@'+str(num)+'\t'+'support'+'\n'
    result+=title
    for k,v in sorted(res.items()):
        cur=str(k)+'\t'+str(v[0])+'\t'+str(v[1])+'\t'+str(v[2])+'\t'+str(v[3])+'\n'
        result+=cur
    sums=len(y_true)
    weight_info=[(v[0]*v[3],v[1]*v[3],v[2]*v[3]) for k,v in sorted(res.items())]
    weight_precision=0
    weight_recall=0
    weight_f1_score=0
    for each in weight_info:
        weight_precision+=each[0]
        weight_recall+=each[1]
        weight_f1_score+=each[2]
    weight_precision/=sums
    weight_recall/=sums
    weight_f1_score/=sums
    last_line='avg_total'+'\t'+str(round(weight_precision,2))+'\t'+str(round(weight_recall,2))+'\t'+str(round(weight_f1_score,2))+'\t'+str(sums)
    result+=last_line
    return result
#统计所有的类别
def count_unique_label(y_true,y_pred):
    unique_label=[]
    for each in y_true:
        if each not in unique_label:
            unique_label.append(each)
    for i in y_pred:
        for j in i:
            if j not in unique_label:
                unique_label.append(j)
    unique_label=list(set(unique_label))
    return unique_label

运行precision_recall_fscore_k方法

res=precision_recall_fscore_k(y_true,y_pred,num=3)
print(res)

得到结果：

   precision@3	recall@3	f1_score@3	support
0	0.33	0.5	0.4	2
1	0.5	1.0	0.67	2
2	0	0	0	1
3	0	0	0	1
4	0.14	0.5	0.22	2
5	0	0	0	2
avg_total	0.19	0.4	0.26	10

ok。实现功能

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~·

讨论：

对模型给出的top-k结果进行precision、recall和f1-score进行评估：k>1和k=1相比，前者的precision一般会降低；recall一般会提高；f1-score的变化不一定，取决于precision和recall的综合变化。

原因：（1）precision=TP/(TP+FP)。对于k>1的情况，整体测试集的TP值是确定的，当模型为每个测试数据给出更多的结果时，TP+FP的结果一般会增加，所以precision一般减小（2）recall=TP/(TP+FN)，对于k>1的情况，TP+FN的数值一般减小，所以recall上升（3）f1-score=2/((1/precision)+(1/recall))。所以f1-score的变化趋势不定，取决于precision和recall的共同变化。