Interactive Image Segmentation

步步咏凉天

已于 2023-10-08 19:47:11 修改

阅读量505

点赞数

分类专栏： CV 文章标签：深度学习

于 2023-07-15 22:48:34 首次发布

本文链接：https://blog.csdn.net/qq_39753778/article/details/131745215

版权

CV 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

ECCV 2020
Nanyang Technological University, Singapore
Existing interactive object segmentation methods mainly
take spatial interactions such as bounding boxes or clicks as input.
However, these interactions do not contain information about explicit
attributes of the target-of-interest and thus cannot quickly specify
what the selected object exactly is, especially when there are diverse
scales of candidate objects or the target-of-interest contains multiple
objects. Therefore, excessive user interactions are often required to
reach desirable results. On the other hand, in existing approaches
attribute information of objects is often not well utilized in interactive
segmentation. We propose to employ phrase expressions as another
interaction input to infer the attributes of target object. In this way,
we can 1) leverage spatial clicks to locate the target object and 2)
utilize semantic phrases to qualify the attributes of the target object.
Specifically, the phrase expressions focus on “what” the target object
is and the spatial clicks are in charge of “where” the target object is,
which together help to accurately segment the target-of-interest with
smaller number of interactions. Moreover, the proposed approach is
flexible in terms of interaction modes and can efficiently handle complex
scenarios by leveraging the strengths of each type of input. Our multi-
modal phrase+click approach achieves new state-of-the-art performance
on interactive segmentation. To the best of our knowledge, this is the first
work to leverage both clicks and phrases for interactive segmentation.
在这里插入图片描述

FocalClick: Towards Practical Interactive Image Segmentation

阿里巴巴

CVPR2022
Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. For higher efficiency, we decompose the slow prediction on the entire image into two fast inferences on small crops: a coarse segmentation on the Target Crop, and a local refinement on the Focus Crop. To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution. Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively. FocalClick achieves competitive results against SOTA methods with significantly smaller FLOPs. It also shows significant superiority when making corrections on preexisting masks. Code and data will be released at github.com/XavierCHEN34/ClickSEG
在这里插入图片描述

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

ICCV 2023
University of North Carolina at Chapel Hill
Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the de-facto architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves 4.15 NoC@90 on SBD, improving 21.8% over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We further develop an extremely tiny ViT backbone for SimpleClick and provide a detailed computational analysis, highlighting its suitability as a practical annotation tool.
在这里插入图片描述

Interactive Segmentation as Gaussian Process Classification

CVPR2023
西安交通大学
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. For this task, most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. To solve this model, we utilize amortized variational inference to approximate the intractable GP posterior in a data-driven manner and then decouple the approximated GP posterior into double space forms for efficient sampling with linear complexity. Then, we correspondingly construct a GP classification framework, named GPCIS, which is integrated with the deep kernel learning mechanism for more flexibility. The main specificities of the proposed GPCIS lie in: 1) Under the explicit guidance of the derived GP posterior, the information contained in clicks can be finely propagated to the entire image and then boost the segmentation; 2) The accuracy of predictions at clicks has good theoretical support. These merits of GPCIS as well as its good generality and high efficiency are substantiated by comprehensive experiments on several benchmarks, as compared with representative methods both quantitatively and qualitatively.
在这里插入图片描述

在这里插入图片描述

FocalClick: Towards Practical Interactive Image Segmentation

CVPR2022 阿里巴巴
Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. For higher efficiency, we decompose the slow prediction on the entire image into two fast inferences on small crops: a coarse segmentation on the Target Crop, and a local refinement on the Focus Crop. To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution. Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively. FocalClick achieves competitive results against SOTA methods with significantly smaller FLOPs. It also shows significant superiority when making corrections on preexisting masks. Code and data will be released at github. com/XavierCHEN34/ClickSEG
在这里插入图片描述

$SAM^{Med}$ : A medical image annotation framework based on large vision model

华东师范大学
https://arxiv.org/pdf/2307.05617.pdf
Recently, large vision model, Segment Anything Model (SAM), has revolutionized the computer vision field, especially for image segmentation. SAM presented a new promptable segmentation paradigm that exhibit its remarkable zero-shot generalization ability. An extensive researches have explore the potential and limits of SAM in various downstream tasks. In this study, we presents SAMMed, an enhanced framework for medical image annotation that leverages the capabilities of SAM. SAMMed framework consisted of two submodules, namely SAMassist and SAMauto. The SAMassist demonstrates the generalization ability of SAM to the downstream medical segmentation task using the prompt-learning approach. Results show a significant improvement in segmentation accuracy with only approximately 5 input points. The SAMauto model aims to accelerate the annotation process by automatically generating input prompts. The proposed SAP-Net model achieves superior segmentation performance with only five annotated slices, achieving an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively. Overall, SAMMed demonstrates promising results in medical image annotation. These findings highlight the potential of leveraging large-scale vision models in medical image annotation tasks.
在这里插入图片描述

AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

https://arxiv.org/abs/2305.04276
湖南大学
Interactive Image Segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity notably hindering segmentation quality, has been under-researched. To address this, we introduce AdaptiveClick – a clicks-aware transformer incorporating an adaptive focal loss, which tackles annotation inconsistencies with tools for mask- and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the Clicks-aware Mask-adaptive Transformer Decoder (CAMD), which enhances the interaction between clicks and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized Adaptive Focal Loss (AFL) with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL loss. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. Code will be publicly available at this https URL.

在这里插入图片描述

Focused and Collaborative Feedback Integration for Interactive Image Segmentation

CVPR 2023
清华
Interactive image segmentation aims at obtaining a segmentation mask for an image using simple user annotations. During each round of interaction, the segmentation result from the previous round serves as feedback to guide the user’s annotation and provides dense prior information for the segmentation model, effectively acting as a bridge between interactions. Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. FCFI first focuses on a local area around the new click and corrects the feedback based on the similarities of high-level features. It then alternately and collaboratively updates the feedback and deep features to integrate the feedback into the features. The efficacy and efficiency of FCFI were validated on four benchmarks, namely GrabCut, Berkeley, SBD, and DAVIS. Experimental results show that FCFI achieved new state-of-the-art performance with less computational overhead than previous methods. The source code is available at https://github.com/veizgyauzgyauz/FCFI. 在这里插入图片描述