视频目标分割VOS的评价指标J&F

最新推荐文章于 2024-12-02 17:37:34 发布

汐梦聆海

最新推荐文章于 2024-12-02 17:37:34 发布

阅读量5.6k

点赞数 8

分类专栏： VOS

本文链接：https://blog.csdn.net/jackzhang11/article/details/108413171

版权

VOS 专栏收录该内容

9 篇文章

订阅专栏

本文深入探讨视频目标分割(VOS)任务中的两大核心评价标准：Jaccard指数(J)和F-score。J用于衡量预测mask与真实mask的重叠程度，而F-score则评估预测边界与真实边界的一致性，通过查准率(P)和查全率(R)的平衡，全面反映分割精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文简要概括VOS任务中两个最重要的评价指标，即J&F（全称应该是Jaccard和F-Score）。其中J描述的是预测的mask和gt之间的IOU，F描述的是预测mask边界与gt边界之间的吻合程度。下面分别进行介绍：

Jaccard

J的计算其实非常简单，就是单纯的计算预测mask和gt mask之间的IOU，即一个比值的形式：分子是预测mask和gt这两张图foreground部分的交，而分母部分就是两者之间的并集。其代码实现如下：

def db_eval_iou(annotation,segmentation):

    """ Compute region similarity as the Jaccard Index.
    Arguments:
        annotation   (ndarray): binary annotation   map.
        segmentation (ndarray): binary segmentation map.
    Return:
        jaccard (float): region similarity
 """

    annotation   = annotation.astype(np.bool)
    segmentation = segmentation.astype(np.bool)

    if np.isclose(np.sum(annotation),0) and np.isclose(np.sum(segmentation),0):
        return 1
    else:
        return np.sum((annotation & segmentation)) / \
                np.sum((annotation | segmentation),dtype=np.float32)

F-score

F-score评估的是预测mask的边界是否与gt mask的边界对应。首先应提取预测mask和gt的边界元素坐标，将边界上的元素置为True，非边界的元素置为False。由于F-score的定义为：

$F=\frac{2PR}{P+R}$

P表示precision，即查准率；R表示recall，即查全率。其计算公式分别如下：

$P=\frac{TP}{TP+FP}$

$R=\frac{TP}{TP+FN}$

对于P的计算，分母应是预测mask的边界元素总数，分子则是在预测为边界的那些元素中真正属于gt的。换句话说，预测mask假设有100个元素为边界元素，但实际上可能只有70个存在于gt中，属于true positive，所以此时的查准率为70%。那么如何确定70这个数，也就是说如何确定有多少个预测为positive的元素属于true positive呢？这里采用了gt的边界（经过了一个binary_dilation的操作，感觉像是提升容错率），利用预测mask的边界和处理过后的gt边界做点乘，再通过sum即可计算true positive的个数。

同样地，对于R的计算，分母是gt mask的边界元素总数，分子表示多少个本质的正样本被预测出来。例如gt mask的边界有100个元素，但实际预测的mask中，只有70个真实的正样本被预测为positive，还有30个被误预测为negative，那么此时的recall为70%。具体计算是将预测mask的边界先进行binary_dilation，再用gt mask的边界和处理后的mask边界做点积，通过sum计算出true positive的个数。

上面的叙述还是比较晦涩的，一言来说，就是查准率P基于预测结果，判定这些预测为正的边界元素有多少真正的属于边界元素（参照gt）；而查全率R是从标注的gt出发，我gt边界mask中正样本有N个，那么需要看看实际预测出来为正，且准确预测的元素有多少个（参照预测mask）。

这个衡量指标的算法如下：

def db_eval_boundary(foreground_mask,gt_mask,bound_th=0.008):
    """
    Compute mean,recall and decay from per-frame evaluation.
    Calculates precision/recall for boundaries between foreground_mask and
    gt_mask using morphological operators to speed it up.

    Arguments:
        foreground_mask (ndarray): binary segmentation image.
        gt_mask         (ndarray): binary annotated image.

    Returns:
        F (float): boundaries F-measure
        P (float): boundaries precision
        R (float): boundaries recall
    """
    assert np.atleast_3d(foreground_mask).shape[2] == 1

    bound_pix = bound_th if bound_th >= 1 else \
            np.ceil(bound_th*np.linalg.norm(foreground_mask.shape))

    # Get the pixel boundaries of both masks
    fg_boundary = seg2bmap(foreground_mask);
    gt_boundary = seg2bmap(gt_mask);

    from skimage.morphology import binary_dilation,disk

    fg_dil = binary_dilation(fg_boundary,disk(bound_pix))
    gt_dil = binary_dilation(gt_boundary,disk(bound_pix))

    # Get the intersection
    gt_match = gt_boundary * fg_dil
    fg_match = fg_boundary * gt_dil

    # Area of the intersection
    n_fg     = np.sum(fg_boundary)
    n_gt     = np.sum(gt_boundary)

    #% Compute precision and recall
    if n_fg == 0 and  n_gt > 0:
        precision = 1
        recall = 0
    elif n_fg > 0 and n_gt == 0:
        precision = 0
        recall = 1
    elif n_fg == 0  and n_gt == 0:
        precision = 1
        recall = 1
    else:
        precision = np.sum(fg_match)/float(n_fg)
        recall    = np.sum(gt_match)/float(n_gt)

    # Compute F measure
    if precision + recall == 0:
        F = 0
    else:
        F = 2*precision*recall/(precision+recall);

    return F

def seg2bmap(seg,width=None,height=None):
    """
    From a segmentation, compute a binary boundary map with 1 pixel wide
    boundaries.  The boundary pixels are offset by 1/2 pixel towards the
    origin from the actual segment boundary.

    Arguments:
        seg     : Segments labeled from 1..k.
        width	  :	Width of desired bmap  <= seg.shape[1]
        height  :	Height of desired bmap <= seg.shape[0]

    Returns:
        bmap (ndarray):	Binary boundary map.

     David Martin <dmartin@eecs.berkeley.edu>
     January 2003
 """

    seg = seg.astype(np.bool)
    seg[seg>0] = 1

    assert np.atleast_3d(seg).shape[2] == 1

    width  = seg.shape[1] if width  is None else width
    height = seg.shape[0] if height is None else height

    h,w = seg.shape[:2]

    ar1 = float(width) / float(height)
    ar2 = float(w) / float(h)

    assert not (width>w | height>h | abs(ar1-ar2)>0.01),\
            'Can''t convert %dx%d seg to %dx%d bmap.'%(w,h,width,height)

    e  = np.zeros_like(seg)
    s  = np.zeros_like(seg)
    se = np.zeros_like(seg)

    e[:,:-1]    = seg[:,1:]
    s[:-1,:]    = seg[1:,:]
    se[:-1,:-1] = seg[1:,1:]

    b        = seg^e | seg^s | seg^se
    b[-1,:]  = seg[-1,:]^e[-1,:]
    b[:,-1]  = seg[:,-1]^s[:,-1]
    b[-1,-1] = 0

    if w == width and h == height:
        bmap = b
    else:
        bmap = np.zeros((height,width))
        for x in range(w):
            for y in range(h):
                if b[y,x]:
                    j = 1+floor((y-1)+height / h)
                    i = 1+floor((x-1)+width  / h)
                    bmap[j,i] = 1;

    return bmap