Mask RCNN -- Mask Scoring R-CNN

最新推荐文章于 2022-10-14 16:07:38 发布

Blue_Whale2020

最新推荐文章于 2022-10-14 16:07:38 发布

阅读量494

点赞数

文章标签：人工智能 python 目标检测

本文链接：https://blog.csdn.net/blue_whale2020/article/details/122504289

版权

https://zhuanlan.zhihu.com/p/37998710https://zhuanlan.zhihu.com/p/37998710 https://blog.csdn.net/qq_37392244/article/details/88844681https://blog.csdn.net/qq_37392244/article/details/88844681

———————————————————————————————————————————

有关语义分割中的类间竞争（为什么mask rcnn选择了sigmoid）

semantic segmentation,which typically uses a per-pixel softmax and a multinomial　cross-entropy loss. In that case, masks across classes　compete; in our case, with a per-pixel sigmoid and a binary　loss, they do not

《深度学习》第六章6.2输出单元

sigmoid 将value映射到（0，1）的区间去做二分类 softmax 用于多分类https://blog.csdn.net/u014422406/article/details/52805924https://blog.csdn.net/u014422406/article/details/52805924
这一篇介绍了sigmoid和 cross entropy联用的情况 --可以简化求导形式
https://zhuanlan.zhihu.com/p/35709485https://zhuanlan.zhihu.com/p/35709485
这一篇介绍了softmax与 cross entropy 联用的情况，与sigmoid的情况类似https://zhuanlan.zhihu.com/p/25723112https://zhuanlan.zhihu.com/p/25723112
但如果采用均方差替换cs 就会有梯度消失的问题

https://zhuanlan.zhihu.com/p/35707643https://zhuanlan.zhihu.com/p/35707643

softmax loss和 cross entropy loss 的区别和联系https://blog.csdn.net/luoxuexiong/article/details/90062937https://blog.csdn.net/luoxuexiong/article/details/90062937

——————————————————————————————————————————

有关softmax类间竞争的理解

来源：https://blog.csdn.net/bianlongpeng/article/details/113144455https://blog.csdn.net/bianlongpeng/article/details/113144455

softmax with loss 是一种常见的分类loss,优点是：擅长类间竞争，凸显正确标签与错误标签的差异。
缺点：对类内差异的表征差，学到的特征较为松散，一般用于人脸loss时，会将bias项置为零
假设是一个十个分类问题，那么每个类都会对应一个权值向量 w0 ... w9某个特征f会被分为哪一类，取决于和哪一个权值向量的内积最大。模型固定后，权重w ww固定，因此f ff与w ww的内积只取决于它们的夹角，故特征呈辐射装分布。在推理时f1和f2是否相似由他们的欧式距离决定，但由于模长差异大，常有如图的结果

来源：https://blog.csdn.net/Bruce_0712/article/details/106387837https://blog.csdn.net/Bruce_0712/article/details/106387837

softmax loss擅长于学习类间的信息，因为它采用了类间竞争机制，它只关心对于正确标签预测概率的准确性，忽略了其他非正确标签的差异，导致学习到的特征比较散。

里面提到了 [3] Large-Margin Softmax Loss 这篇文章。

文章主要分析了softmax是根据模长和夹角的方式将不同的x向量加以区分，同时加入夹角权重m，是softmax的学习难度加大，加大不同类间距离。

有关这篇文章的理解：

https://blog.csdn.net/u014380165/article/details/76864572https://blog.csdn.net/u014380165/article/details/76864572

———————————————————————————————————————————

Mask Scoring RCNN

先放一个解读

https://blog.csdn.net/m0_38007695/article/details/88256702https://blog.csdn.net/m0_38007695/article/details/88256702

这个文章的主要创新来源于，作者发现mask rcnn的mask评分来自于分类以及检测的结果。

其实mask rcnn的原文里也提到了这个情况（因为我还没看代码）

We run the box prediction branch on these proposals, followed by non-maximum suppression [14]. The mask branch is then applied to the highest scoring 100 detection boxes. Although this differs from the parallel computation used in training, it speeds up inference and improves accuracy (due to the use of fewer, more accurate RoIs). The mask branch can predict K masks per RoI, but we only use the k-th mask, where k is the predicted class by the classification branch.The m*m floating-number mask output is then resized to the RoI size, and binarized at a threshold of 0.5.

意思就是在推理阶段和训练阶段的策略是不同的，主要是出于推理速度的考量（我就感觉mask rcnn对mask的训练策略没啥问题啊），这也导致mask分支没有参与到结果的评价中。

这就是scoring的灵感

文章先在图2里分析了一下，mask r-cnn的mask score(评价质量)和maskiou(真正的质量)关系不大。

训练策略在3.2节的train部分说的挺清楚。

推理阶段是

Inference: During inference, we just use MaskIoU head to calibrate classification score generated from R-CNN. Specifically, suppose the R-CNN stage of Mask R-CNN
outputs N bounding boxes, and among them top-k (i.e.k = 100) scoring boxes after SoftNMS [2] are selected. Then the top-k boxes are fed into the Mask head to generate
multi-class masks. This is the standard Mask R-CNN inference procedure. We follow this procedure as well, and feed the top-k target masks to predict the MaskIoU. The predicted MaskIoU are multiplied with classification score, to get the new calibrated mask score as the final mask confidence.

意思就是之前class是作为评价指标之一的，现在只需要它提供分数就好了，这个class分数会被用来与mask score相乘，成为MaskIOU的最终结果。MaskIOU与检测置信度就成为了inference的评价指标。