Mask-RCNN源码阅读笔记

最新推荐文章于 2024-04-13 09:38:34 发布

置顶

听着远山和炊烟

最新推荐文章于 2024-04-13 09:38:34 发布

阅读量3k

点赞数

分类专栏：深度学习-语义分割

本文链接：https://blog.csdn.net/qq_17172467/article/details/79748854

版权

本文详细解读了Mask-RCNN模型的训练过程，从创建模型、输入构建、共享卷积层、生成Anchors，到RPN模型、Proposal生成、检测目标和网络头的构建。重点探讨了Anchors的生成、ProposalLayer的作用以及RPN的目标生成，揭示了从Anchors到最终Proposal的转化流程。

摘要由CSDN通过智能技术生成

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locationNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件（包括coco源码）解析；

这些天研读了一下那些源码，有错误，忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记：

coco 提供了图片，和一个图片可能的（多个）标注（annotations）,coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats；

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats；一个图片就对应了多种类别的mask；

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是：

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locationNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode：

创建模型，就相当于创建一个骨架放在那里，此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

（1）使用keras.Layer.Input（）得到input_image和input_image_meta，创建输入层的骨架

（2） RPN GT

使用keras.Layer.Input（）得到input_rpn_match [None,1]，input_rpn_bbox [None,4]

（3）Detection GT (class IDs, bounding boxes, and masks)

使用keras.Layer.Input（）得到

# 1. GT Class IDs (zero padded) : input_gt_class_ids [None],

# 2. GT Boxes in pixels (zero padded)
# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

# Normalize coordinates

# 3. GT Masks (zero padded) 使用keras.Layer.Input（）得到

# [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6]

mrcnn_feature_maps = [P2, P3, P4, P5] depth都是256

# Generate Anchors

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels 的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors