python计算召回率_如何从Python的混淆矩阵中获取精度，召回率和f度量

最新推荐文章于 2023-03-19 00:05:43 发布

weixin_39599081

最新推荐文章于 2023-03-19 00:05:43 发布

阅读量321

点赞数 1

文章标签： python计算召回率

I'm using Python and have some confusion matrixes. I'd like to calculate precisions and recalls and f-measure by confusion matrixes in multiclass classification. My result logs don't contain y_true and y_pred, just contain confusion matrix.

Could you tell me how to get these scores from confusion matrix in multiclass classification?

解决方案

Let's consider the case of MNIST data classification (10 classes), where for a test set of 10,000 samples we get the following confusion matrix cm (Numpy array):

array([[ 963, 0, 0, 1, 0, 2, 11, 1, 2, 0],

[ 0, 1119, 3, 2, 1, 0, 4, 1, 4, 1],

[ 12, 3, 972, 9, 6, 0, 6, 9, 13, 2],

[ 0, 0, 8, 975, 0, 2, 2, 10, 10, 3],

[ 0, 2, 3, 0, 953, 0, 11, 2, 3, 8],

[ 8, 1, 0, 21, 2, 818, 17, 2, 15, 8],

[ 9, 3, 1, 1, 4, 2, 938, 0, 0, 0],

[ 2, 7, 19, 2, 2, 0, 0, 975, 2, 19],

[ 8, 5, 4, 8, 6, 4, 14, 11, 906, 8],

[ 11, 7, 1, 12, 16, 1, 1, 6, 5, 949]])

In order to get the precision & recall (per class), we need to compute the TP, FP, and FN per class. We don't need TN, but we will compute it, too, as it will help us for our sanity check.

The True Positives are simply the diagonal elements:

# numpy should have already been imported as np

TP = np.diag(cm)

# array([ 963, 1119, 972, 975, 953, 818, 938, 975, 906, 949])

The False Positives are the sum of the respective column, minus the diagonal element (i.e. the TP element):

FP = np.sum(cm, axis=0) - TP

# array([50, 28, 39, 56, 37, 11, 66, 42, 54, 49])

Similarly, the False Negatives are the sum of the respective row, minus the diagonal (i.e. TP) element:

FN = np.sum(cm, axis=1) - TP

# array([17, 16, 60, 35, 29, 74, 20, 53, 68, 60])

Now, the True Negatives are a little trickier; let's first think what exactly a True Negative means, with respect to, say class 0: it means all the samples that have been correctly identified as not being 0. So, essentially what we should do is remove the corresponding row & column from the confusion matrix, and then sum up all the remaining elements:

num_classes = 10

TN = []

for i in range(num_classes):

temp = np.delete(cm, i, 0) # delete ith row

temp = np.delete(temp, i, 1) # delete ith column

TN.append(sum(sum(temp)))

# [8970, 8837, 8929, 8934, 8981, 9097, 8976, 8930, 8972, 8942]

Let's make a sanity check: for each class, the sum of TP, FP, FN, and TN must be equal to the size of our test set (here 10,000): let's confirm that this is indeed the case:

l = 10000

for i in range(num_classes):

print(TP[i] + FP[i] + FN[i] + TN[i] == l)

The result is

True

Having calculated these quantities, it is now straightforward to get the precision & recall per class:

precision = TP/(TP+FP)

recall = TP/(TP+FN)

which for this example are

precision

# array([ 0.95064166, 0.97558849, 0.96142433, 0.9456838 , 0.96262626,

# 0.986731 , 0.93426295, 0.95870206, 0.94375 , 0.9509018])

recall

# array([ 0.98265306, 0.98590308, 0.94186047, 0.96534653, 0.97046843,

# 0.91704036, 0.97912317, 0.94844358, 0.9301848 , 0.94053518])

You should now be able to compute these quantities virtually for any size of your confusion matrix.

weixin_39599081

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python计算召回率_如何从Python的混淆矩阵中获取精度，召回率和f度量

I'm using Python and have some confusion matrixes. I'd like to calculate precisions and recalls and f-measure by confusion matrixes in multiclass classification. My result logs don't contain y_true an...
复制链接

扫一扫