Accuracy index of object detection: F1 & IoU

Reference:

https://stats.stackexchange.com/questions/273537/f1-dice-score-vs-iou

https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

 

Definition:

IoU(Intersection over Union) / Jaccard:

TP/(TP+FP+FN)

F1 score/Dice:

2TP/(2TP+FP+FN)

         Thefollowing fig refers to IoU.

 

More explanation:

From the definition of the two metrics, we have that IoU and F score are always within a factor of 2 of each other:

F/2≤IoU≤F

F/2≤IoU≤F

and also that they meet at the extremes of one and zero under the conditions that you would expect (perfect match and completely disjoint).

Note also that the ratio between them can be related explicitly to the IoU:

IoU/F=1/2+IoU/2

IoU/F=1/2+IoU/2

so that the ratio approaches 1/2 as both metrics approach zero.

But there's a stronger statement that can be made for the typical application of classification a la machine learning. For any fixed "ground truth", the two metrics are always positively correlated. That is to say that if classifier A is better than B under one metric, it is also better than classifier B under the other metric.

It is tempting then to conclude that the two metrics are functionally equivalent to the choice between them is arbitrary, but not so fast! The problem comes when taking the average score over a set of inferences. Then the difference emerges when quantifying how much worse classifier B is than A for any given case.

In general, the IoU metric tends to penalize single instances of bad classification more than the F score quantitatively even when they can both agree that this one instance is bad. Similarly to how L2 can penalize the largest mistakes more than L1, the IoUmetric tends to have a "squaring" effect on the errors relative to the F score. So the F score tends to measure something closer to average performance, while the IoU score measures something closer to the worst case performance.

Suppose for example that the vast majority of the inferences are moderately better with classifier A than B, but some of them of them are significantly worse using classifier A. It may be the case then that the F metric favors classifier A while the IoU metric favors classifier B.

To be sure, both of these metrics are much more alike than they are different. But both of them suffer from another disadvantage from the standpoint of taking averages of these scores over many inferences: they both overstate the importance of sets with little-to-no actual ground truth positive sets. In the common example of image segmentation, if an image only has a single pixel of some detectable class, and the classifier detects that pixel and one other pixel, its F score is a lowly 2/3 and the IoU is even worse at 1/2. Trivial mistakes like these can seriously dominate the average score taken over a set of images. In short, it weights each pixel error inversely proportionally to the size of the selected/relevant set rather than treating them equally.

There is a far simpler metric that avoids this problem. Simply use the total error: FN + FP (e.g. 5% of the image's pixels were miscategorized). In the case where one is more important than the other, a weighted average may be used: c0FP + c1FN.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值