python计算召回率_如何从Python的混淆矩阵中获取精度,召回率和f度量

1586010002-jmsa.png

I'm using Python and have some confusion matrixes. I'd like to calculate precisions and recalls and f-measure by confusion matrixes in multiclass classification. My result logs don't contain y_true and y_pred, just contain confusion matrix.

Could you tell me how to get these scores from confusion matrix in multiclass classification?

解决方案

Let's consider the case of MNIST data classification (10 classes), where for a test set of 10,000 samples we get the following confusion matrix cm (Numpy array):

array([[ 963, 0, 0, 1, 0, 2, 11, 1, 2, 0],

[ 0, 1119, 3, 2, 1, 0, 4, 1, 4, 1],

[ 12, 3, 972, 9, 6, 0, 6, 9, 13, 2],

[ 0, 0, 8, 975, 0, 2, 2, 10, 10, 3],

[ 0, 2, 3, 0, 953, 0, 11, 2, 3, 8],

[ 8, 1, 0, 21, 2, 818, 17, 2, 15, 8],

[ 9, 3, 1, 1, 4, 2, 938, 0, 0, 0],

[ 2, 7, 19, 2, 2, 0, 0, 975, 2, 19],

[ 8, 5, 4, 8, 6, 4, 14, 11, 906, 8],

[ 11, 7, 1, 12, 16, 1, 1, 6, 5, 949]])

In order to get the precision & recall (per class), we need to compute the TP, FP, and FN per class. We don't need TN, but we will compute it, too, as it will help us for our sanity check.

The True Positives are simply the diagonal elements:

# numpy should have already been imported as np

TP = np.diag(cm)

TP

# array([ 963, 1119, 972, 975, 953, 818, 938, 975, 906, 949])

The False Positives are the sum of the respective column, minus the diagonal element (i.e. the TP element):

FP = np.sum(cm, axis=0) - TP

FP

# array([50, 28, 39, 56, 37, 11, 66, 42, 54, 49])

Similarly, the False Negatives are the sum of the respective row, minus the diagonal (i.e. TP) element:

FN = np.sum(cm, axis=1) - TP

FN

# array([17, 16, 60, 35, 29, 74, 20, 53, 68, 60])

Now, the True Negatives are a little trickier; let's first think what exactly a True Negative means, with respect to, say class 0: it means all the samples that have been correctly identified as not being 0. So, essentially what we should do is remove the corresponding row & column from the confusion matrix, and then sum up all the remaining elements:

num_classes = 10

TN = []

for i in range(num_classes):

temp = np.delete(cm, i, 0) # delete ith row

temp = np.delete(temp, i, 1) # delete ith column

TN.append(sum(sum(temp)))

TN

# [8970, 8837, 8929, 8934, 8981, 9097, 8976, 8930, 8972, 8942]

Let's make a sanity check: for each class, the sum of TP, FP, FN, and TN must be equal to the size of our test set (here 10,000): let's confirm that this is indeed the case:

l = 10000

for i in range(num_classes):

print(TP[i] + FP[i] + FN[i] + TN[i] == l)

The result is

True

True

True

True

True

True

True

True

True

True

Having calculated these quantities, it is now straightforward to get the precision & recall per class:

precision = TP/(TP+FP)

recall = TP/(TP+FN)

which for this example are

precision

# array([ 0.95064166, 0.97558849, 0.96142433, 0.9456838 , 0.96262626,

# 0.986731 , 0.93426295, 0.95870206, 0.94375 , 0.9509018])

recall

# array([ 0.98265306, 0.98590308, 0.94186047, 0.96534653, 0.97046843,

# 0.91704036, 0.97912317, 0.94844358, 0.9301848 , 0.94053518])

You should now be able to compute these quantities virtually for any size of your confusion matrix.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值