reading : SOLO: Segmenting Objects by Locations

最新推荐文章于 2023-10-13 15:31:56 发布

brightandjk

最新推荐文章于 2023-10-13 15:31:56 发布

阅读量251

点赞数

分类专栏： SOLO 文章标签：计算机视觉

本文链接：https://blog.csdn.net/brightandjk/article/details/107252842

版权

SOLO 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Abstract

instance segmentation :

"detect - then - segment " like mask r-cnn

in this paper: "instance categories"

assigns categories to each pixel within an instance according to the instance’s location and size, thus nicely converting instance mask segmentation into a classification-solvable problem.

通过引入“实例类别”的概念，根据实例的位置和大小为实例中的每个像素分配类别，将实例分割转化为分类问题；

simpler and flexible than mask r-cnn

introdction

结论： using SOLO - code idea-: separarte object instances by location and sizes

location : An image can be divided into a grid of S×S cells, thus leading to S 2 center location classes

grid cell = center location category

center location category = channel axis

therefore, each output channel response a center location categories

将位置预测回归转化成分类：分类使用固定数量的通道对不同数量的实例进行建模更加简单明了，同时又不依赖后处理

size： distinguish instance with different object sizes -using :

FPN - designed for the purposes of detecting objects of different sizes in an image.

SOLO通过离散量化将坐标回归转换为分类。

目的： avoidance of heuristic coordination normalization and log-transformation typically used in detectors such as YOLO

Related Work

实例分割（Instance Segmentation）是视觉四任务中相对最难的一个，它既具备语义分割（Semantic Segmentation）的特点，需要做到像素层面上的分类，也具备目标检测（Object Detection）的一部分特点，即需要定位出不同实例，即使它们是同一种类物体。因此，实例分割的研究长期以来都依赖较为复杂的两阶段的方法，两阶段方法又分为两条线，分别是自下而上的基于语义分割的方法和自上而下的基于检测的方法

Top-down Instance Segmentation

SOLO: totally box-free thus not being restricted by (anchor) box locations and scales, and naturally benefits from the inherent advantages of FCNs.

自上而下的实例分割方法的思路是：首先通过目标检测的方法找出实例所在的区域（bounding box），再在检测框内进行语义分割，每个分割结果都作为一个不同的实例输出。

Bottom-up Instance Segmentation

SOLO directly learns with the instance mask annotations solely during training, and predicts instance masks and semantic categories end-to-end without grouping post-processing.

SOLO is a direct end-to-end instance segmentation approach

自下而上的实例分割方法的思路是：首先进行像素级别的语义分割，再通过聚类、度量学习等手段区分不同的实例。

Direct Instance Segmentation

SOLO takes an image as input, directly outputs instance masks and corresponding class probabilities, in a fully convolutional, box-free and grouping-free paradigm.

Our simple network can be optimized end-to-end without the need of box supervision. To predict, the network directly maps an input image to masks for each individual instance, relying on neither intermediate operators like RoI feature cropping, nor grouping post-processing.

Our Method: SOLO

Problem Formulation

central idea of solo framework :

reformulate the instance segmentation -

stimultaneous category-aware prediction problem - semantic category
instance-aware mask generation problem - segmenting that object instance

Semantic Category

SOLO : C-dimensional output to indicate the semantic class probabilities

If we divide the input image into S×S grids, the output space will be S×S×C

C维度的输出表示每个实例的类别概率

补充：

经过fpn得到的feature map为H*W经过上采样变为S*S，可通过三种方法：

1、直接双线性差值

2、adpative pooling

3、区域网格插值

经作者实验，三种方法差别不大。

正负样本设置：网格落到中心区域则为正样本，否则为负样本。给出真值mask的cx，cy，w，h；中心区域为(cx, cy, 0.2w, 0.2h)，设置为0.2时，每个真值mask平均生成3个正样本。

Instance Mask

这里需要一个空间变化的模型，或者更精确地说，是位置敏感的模型，因为分割掩码是以网格为条件，并且必须由不同的特征通道分开。

如果原来的特征张量大小是H×W×D，现在新的张量就是H×W×(D+2)，最后两个通道为x−y像素坐标
我们构建一个和输入空间大小一样的张量，它包含归一化后的像素坐标，介于[−1,1]
该张量然后拼接到输入特征，传递到后面的层。

将输入的坐标信息给到卷积操作，我们就给传统的FCN模型添加了空间功能。

Forming Instance Segmentation.

使用NMS 来得到最终的实例分割结果

Network Architecture

为了证明方法的通用性和有效性，作者使用多种架构实例化了SOLO。区别包括：

用于特征提取的骨干架构；
用于计算实例分割结果的head；
用于优化模型的训练损失函数。

head 结构：

SOLO Learning

label assignment：

类别预测分支

positive sample： grid (i, j) is considered as a positive sample if it falls into the center region of any ground truth mask

grid（i，j）ground truth 掩码的 center region重叠大于某阈值，则将其视为正样本

negtive sample

正负样本设置：网格落到中心区域则为正样本，否则为负样本。给出真值mask的cx，cy，w，h；中心区域为(cx, cy, 0.2w, 0.2h)，设置为0.2时，每个ground truth 平均生成3个正样本。

any positive sample has a ground truth

对于每个正样本，都有一个二值的掩码

上分支中标记出正例所在的网格后 ---------找到其所对应的下分支s^2的通道中的一个通道进行标注。

Mask预测分支

每个正样本提供了一个二进制分割掩码

Loss Function

对于类别分支Category Branch，采用Focal Loss。better than BCE， mask 大多negative
对于实例掩码分支Mask Branch，采用dice loss。无需调参数的情况下取得了更好的效果

Inference

SOLO is very straightforward.

image --- backbone network and FPN -----Pi,j and mk(k =i*s +j )

信度阈值0.1来过滤低置信度的预测

排名前500位的得分掩码，并将其输入到NMS操作要将预测的soft mask转换为二进制掩码，使用阈值0.5将二进制的预测的soft mask二进制化。我们保留前100个实例掩码进行评估。

卷积有平移不变性的特点，但生成mask的分支类似于语义分割，是FCN(全卷积神经网络)，具有平移不变性，然而本方法的mask不是直接生成，而是基于网格的位置(S2S^2S 2个通道)，所以需要平移可变性
如何实现平移不变性？创建 D+2 ，提供全局位置信息

回顾：

Experiments

results SOLO achieves good results even under challenging conditions.

How SOLO Works?

SOLO converts the instance segmentation problem into a position-aware classification task

Only one instance will be activated at each grid, and one in- stance may be predicted by multiple adjacent mask channels. During inference, we use NMS to suppress these redundant masks

最终的结果就是所有网格的instance mask（即输出的每个通道）叠加的结果。最后再使用NMS过滤掉 Ablation

Ablation Experiments

1.grid number :这说明了FPN在处理多尺度预测中的重要性

2.Multi-level Prediction

3.CoordConv:单个CoordConv已经使预测对空间变体/位置敏感

4.Loss function:Dice Loss无需手动调整损耗超参数即可获得最佳结果，Dice Loss可将像素视为一个整体，并可以自动在前景像素和背景像素之间建立适当的平衡。

Alignment in the category branch:

Here, we compare three common implementations: interpolation, adaptive-pool, and region-grid- interpolation.there is no noticeable performance gap between these variants (± 0.1AP), indicating the alignment process is fairly flexible.

三种方法没太大区别

Different head depth：

compare different head depth

when the depth grows beyond 7, the performance becomes sta- ble.

In this paper, we use depth being 7 in other experiments.