reading : Mask R-CNN(Kaiming He Georgia Gkioxari Piotr Dolla ́r Ross Girshick Facebook AI Research)

最新推荐文章于 2021-09-16 19:50:44 发布

brightandjk

最新推荐文章于 2021-09-16 19:50:44 发布

阅读量648

点赞数

分类专栏： Mask R-CNN 文章标签：计算机视觉

本文链接：https://blog.csdn.net/brightandjk/article/details/107322024

版权

Mask R-CNN 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

abstract

method : Mask R-CNN
在R-CNN中加入一个分支预测object mask(目标掩模)
类似于加了一个分支做bounding box recognition。
a small overhead to faster R-CNN,5 fps
easy to generalize to other task (eg.human pose- 姿态预测)

introduction

Our goal in this work is to develop a comparably enabling framework for instance segmentation.

instance segmentation:

correct detection all objects

precisely segmenting each instance

therefore:
instance segmentation = object detection + semantic segmentation

在这里插入图片描述 Mask R-CNN:
mask r-cnn = Faster R-CNN + predicting segmentation masks on RoI( a small FCN to RoI) = classification +bounding box regression

Mask RCNN是Faster RCNN的扩展，对于Faster RCNN的每个Proposal Box都要使用FCN进行语义分割，分割任务与定位、分类任务是同时进行的。
在这里插入图片描述

cnn 特征提取
FNP (代替之前的RNP)，每个图生成N个建议窗口（proposal）
建议窗口映射至feature map上
通过 RoIAlign将feature map 固定尺寸
最后利用全连接分类，边框，mask进行回归

RoIAlign

在这里插入图片描述

improve mask accuracy 10%~50%:
RoI Align代替Faster RCNN中的RoI Pooling
bc Faster R- CNN was not designed for pixel-to-pixel alignment be- tween network inputs and outputs.
it essential to decouple mask and class prediction:
predict a diary mask for class in segmentation(语义分割)
predict the category on the network’s RoI classification branch（类型预测）

将上述两种分别通过两个branch实现分割

Related Work

FCIS: fully convolutional instance segmentation ：shortcoming - 对重叠部分有出错且会产生虚假预测

对比FCIS和Mask-R-CNN(based on an instance-first strategy)
在这里插入图片描述

Mask R-CNN

a class label
a bounding-box offset
mask output(requiring finer spatial layout of an object)
essential part —— pixel-to-pixel alignment

Mask R-CNN:

结构

RPN
对PRN找到的RoI进行class and box offset(bounding-box classification and regression)
output a binary mask for each RoI

loss function

在这里插入图片描述 classification loss + bounding-box loss + the ac=verage binary cross-entropy loss

RoIAlign
解决坐标不对称问题

避免对RoI边界或块进行量化
使用双线性插值[17]来计算每个位置的精确值，并将结果汇总（使用最大或平均池化）
RoIWarp忽略了对齐问题

什么是避免对边界的量化？
RoIPooling存在两次量化：
候选框边界量化
量化后的候选框平均分割成 K*K的单元，每一个单元的边界再次进行量化
作者把它总结为“不匹配问题（misalignment）

具体例子分析：
https://blog.csdn.net/qq_16065939/article/details/84641916?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159471604919725247655915%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159471604919725247655915&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_alltop_click~default-2-84641916.pc_ecpm_v3_pc_rank_v3&utm_term=mask+r+c+n+n
在这里插入图片描述

如何对边界的不量化？

遍历每一个候选区域，保持浮点数边界不做量化。
将候选区域分割成k x k个单元，每个单元的边界也不做量化。
在每个单元中计算固定四个坐标位置，用双线性内插的方法计算出这四个位置的值，然后进行最大池化操作

图像直观对比：
在这里插入图片描述

Network Architecture:
使用了不同的backbone：resnet-50，resnet-101，resnext-50，resnext-101；

使用了不同的head Architecture：Faster RCNN使用resnet50时，从CONV4导出特征供RPN使用，这种叫做ResNet-50-C4

作者使用除了使用上述这些结构外，还使用了一种更加高效的backbone——FPN
（https://blog.csdn.net/wangdongwei0/article/details/83110305）
在这里插入图片描述

implementation details

没细看

Experiments: Instance Segmentation

Architecture
在这里插入图片描述

Mask RCNN随着增加网络的深度、采用更先进的网络，都可以提高效果。

Multinomial vs. Independent Masks:
在这里插入图片描述

sigmoid(二分类)和使用softmax(多类别分类)的AP相差很大，证明了分离类别和mask的预测是很有必要的

Class-Specific vs. Class-Agnostic Masks:

目前使用的mask rcnn都使用class-specific masks，即每个类别都会预测出一个mxm的mask，然后根据类别选取对应的类别的mask。但是使用Class-Agnostic Masks，即分割网络只输出一个mxm的mask，可以取得相似的成绩29.7vs30.3

Bounding Box Detection Results

在这里插入图片描述

Mask RCNN精度高于Faster RCNN
Faster RCNN使用RoI Align的精度更高
Mask R-CNN在其掩模和边框的AP之间的差距很小：比如，AP 37.1（掩模，表1）与AP 39.8（边框，表3）之间的差距仅在2.7个点。这表明我们的方法在很大程度上弥补了目标检测与更具挑战性的实例分割任务之间的差距。

Mask R-CNN for Human Pose Estimation

在这里插入图片描述

将关键点的位置建模为one-hot掩模，并采用Mask R-CNN来预测K个掩模，每个对应K种关键点类型之一（例如左肩、右肘）。此任务有助于展示Mask R-CNN的灵活性。
像素级定位的对齐是至关重要的，包括掩模和关键点。

参考：
https://blog.csdn.net/qq_37392244/article/details/88844681?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.compare&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.compare

https://blog.csdn.net/myGFZ/article/details/79136610?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159463374719195239843550%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=159463374719195239843550&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_allfirst_rank_ecpm_v3~pc_rank_v3-3-79136610.pc_ecpm_v3_pc_rank_v3&utm_term=mask+r+c+n+n

https://blog.csdn.net/wangdongwei0/article/details/83110305?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.compare&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.compare

brightandjk

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
reading : Mask R-CNN(Kaiming He Georgia Gkioxari Piotr Dolla ́r Ross Girshick Facebook AI Research)

abstractmethod : Mask R-CNN在R-CNN中加入一个分支预测object mask(目标掩模)类似于加了一个分支做bounding box recognition。a small overhead to faster R-CNN,5 fpseasy to generalize to other task (eg.human pose- 姿态预测)introductionOur goal in this work is to develop a compa
复制链接

扫一扫

专栏目录