1. Motivation
-
Instance segmentation requires costly annotations such as bounding boxes and segmentation masks for learning.
-
We propose a fully unsupervised learning method that learns class-agnostic instance segmentation without any annotations.
-
a novel localization-aware pre-training framework
-
FreeSOLO, contains two major pillars: Free Mask and Self-supervised SOLO,
2. Contribution
- We propose the Free Mask approach, which leverages the specific design of SOLO to effectively extract coarse ob- ject masks and semantic embeddings in an unsupervised manner.
- We further propose Self-Supervised SOLO, which takes the coarse masks and semantic embeddings from Free Mask and trains the SOLO instance segmentation model, with several novel design elements to overcome label noise in the coarse masks.
- With the above methods, FreeSOLO presents a simple and effective framework that demonstrates unsupervised instance segmentation successfully for the first time. Notably, it outperforms some proposal generation methods that use manual annotations. FreeSOLO also outperforms state-of-the-art methods for unsupervised object detection/discovery by a significant margin (relative +100% in COCO AP).
- In addition, FreeSOLO serves as a strong self-supervised pretext task for representation learning for instance segmentation. For example, when fine-tuning on COCO dataset with 5% labeled masks, FreeSOLO outperforms DenseCL [14] by +9.8% AP
3. Method
3.1 background
简要回顾一下SOLO
- 2个分支 category-branch and mask branch
- mask branch: H x W X S^2
- category-branch: S x S x C
SOLOV2
- 多了mask NMS
- 多了dynamic mask branch 分割为了 kernel branch S x S x D 以及features branch H x W x D。
- 如果是1x1的kernel 那么D=E, 如果是3x3的kernel 那么D=9E 因为都是对于一个1x1的 grid的D维特征 作为 HxW 特征图的 kernel<