深度篇——实例分割(二) 细说 mask rcnn 实例分割代码训练自己数据

最新推荐文章于 2024-08-03 20:59:04 发布

万道一

最新推荐文章于 2024-08-03 20:59:04 发布

阅读量6.5k

点赞数 17

分类专栏： AI章文章标签：深度学习

本文链接：https://blog.csdn.net/qq_38299170/article/details/106258170

版权

本文详细介绍了如何使用Mask R-CNN进行实例分割，包括从制作Labelme的JSON数据，转换为COCO dataset，再到训练和测试的全过程。通过阅读，你可以了解到如何配置和运行代码，以及训练自己的数据集。

摘要由CSDN通过智能技术生成

返回主目录

返回实例分割目录

上一章：深度篇——实例分割(一) mask rcnn 论文翻译和总结

下一章: 深度篇——实例分割(三) 细说 mask rcnn 实例分割代码训练自己数据之相关网络，数据处理，工具等

论文地址：《Mask R-CNN》

作者代码地址：Mask R-CNN code

我优化的代码地址：mask_rcnn_pro (只需修改一下 config.py 配置，就可以直接跑的)

本小节，细说 mask rcnn 实例分割代码讲解，下一小节细说 mask rcnn 实例分割代码训练自己数据相关网络，数据处理，工具等

在开始代码讲解前，先向何恺明大帝敬礼！

我的代码，是参考论文提供的代码来优化的。在对 mask rcnn 的学习中，我发现，作者提供的 mask_rcnn_coco.h5 模型非常强大(毕竟人家当初是用 8 个 GPU 训练出来的，而且经过不断调参验证得到的，而大帝在论文中，也把别人超得理所当然)。

作者提供的数据:

train2014 data:
http://images.cocodataset.org/zips/train2014.zip
http://images.cocodataset.org/annotations/annotations_trainval2014.zip

val2014 data(valminusminival):
http://images.cocodataset.org/zips/val2014.zip
https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0

val2014 data(minival):
http://images.cocodataset.org/zips/val2014.zip
https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0

对于无法科学上网的朋友，提供如下百度云链接:

mask_rcnn_coco.h5 模型: 链接：https://pan.baidu.com/s/1-_zxnWCmE9Vea7UOsl0naQ 提取码：y7gr

coco data 链接：https://pan.baidu.com/s/1gYAR4cxp1femrh8YgB6R7A 提取码：8xkx

下面开始搬砖:

二. 制作 labelme 的 json 数据

跑别人的代码，别人的数据，不是目的。要做一个适合自己数据的代码，跑自己的数据，才是硬道理。

1. 首先，先安装 labelme，这个非常简单，下面讲解在 conda 环境下安装 labelme：

# 打开命令窗口

# 创建虚拟环境，并命名为 labelme，指定解释器为python3.6
conda create -n labelme python=3.6
# windows 进入conda 的虚拟环境:
activate labelme
# linux 下进入虚拟环境:
source activate labelme
# 第一次进入虚拟环境，先更新pip, 避免后面安装报版本错误:
python -m pip install --upgrade pip
# 接下来是重头戏, labelme 只需安装以下 3 个库 即可使用:
conda install pyqt
conda install pillow
pip install labelme

# 安装好后，直接输入命令: labelme 可以开启labelme 工具

2. 使用 labelme 标记图像

3. 生成的数据如下:

4. 将数据放入到对应文件夹中：

三. 将我们的数据制作成 coco dataset。

在接触代码前，先看一下项目文件结构

README.md 文件信息如下:

# Mask R-CNN for Object Detection and Segmentation
# [mask_rcnn_pro](https://github.com/wandaoyi/mask_rcnn_pro)

- [论文地址](https://arxiv.org/abs/1703.06870)
- [我的 CSDN 博客](https://blog.csdn.net/qq_38299170/article/details/105233638) 
本项目使用 python3, keras 和 tensorflow 相结合。本模型基于 FPN 网络 和 resNet101 背骨，对图像中的每个目标生成 bounding boxes 和 分割 masks。

The repository includes:
* Source code of Mask R-CNN built on FPN and ResNet101.
* Training code for MS COCO
* Pre-trained weights for MS COCO
* Jupyter notebooks to visualize the detection pipeline at every step
* ParallelModel class for multi-GPU training
* Evaluation on MS COCO metrics (AP)
* Example of training on your own dataset

```bashrc
mask_rcnn_coco.h5:
- https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

train2014 data:
- http://images.cocodataset.org/zips/train2014.zip
- http://images.cocodataset.org/annotations/annotations_trainval2014.zip

val2014 data(valminusminival):
- http://images.cocodataset.org/zips/val2014.zip
- https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0

val2014 data(minival):
- http://images.cocodataset.org/zips/val2014.zip
- https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0
```


# Getting Started

* 参考 config.py 文件配置。
* 下面文件中 def __init__(self) 方法中的配置文件，基本都是来自于 config.py


* 测试看效果:
* mask_test.py 下载好 mask_rcnn_coco.h5 模型，随便找点数据，设置好配置文件，直接运行看结果吧。


* 数据处理:
* prepare.py 直接运行代码，将 labelme json 数据制作成 coco json 数据。
* 并将数据进行划分


* 数据训练:
* mask_train.py 直接运行代码，观察 loss 情况。
* mask_rcnn_coco.h5 作为预训练模型很强，训练模型会有一个很好的起点。


* 多 GPU 训练:
* parallel_model.py: 本人没有多 GPU，这一步没做到验证,里面的代码，是沿用作者的。


* 本项目，操作上，就三板斧搞定，不搞那么复杂，吓到人。

代码，我基本做成傻瓜式一键运行，基本不用修改代码，修改一下配置就 OK 的了。

config.py 文件如下:

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/04/15 22:53
# @Author   : WanDaoYi
# @FileName : config.py
# ============================================

import os
from easydict import EasyDict as edict

# mask_rcnn_coco.h5 预训练模型下载地址: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

__C = edict()
# Consumers can get config by: from config import cfg
cfg = __C

# common options 公共配置文件
__C.COMMON = edict()

# 论文中的模型 url
__C.COMMON.COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5"

# 相对路径 当前路径
__C.COMMON.RELATIVE_PATH = "./"

# mask 默认背景 类别, 背景为第一个类别
__C.COMMON.DEFAULT_CLASS_INFO = [{"source": "", "id": 0, "name": "BG"}]

__C.COMMON.DATA_SET_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset")

# 原始图像 文件 路径
__C.COMMON.IMAGE_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset/images")
# labelme 生成的 json 注释文件 路径
__C.COMMON.JSON_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset/ann_json")

# 是否删除已有文件，True 为删除，False 为不删除
__C.COMMON.FILE_EXISTS_FLAG = True

# 数据划分比例
__C.COMMON.TEST_PERCENT = 0.7
__C.COMMON.VAL_PERCENT = 0.2
__C.COMMON.TEST_PERCENT = 0.1

# 数据来源
__C.COMMON.DATA_SOURCE = "our_data"

# 文件后缀名
__C.COMMON.JSON_SUFFIX = ".json"
__C.COMMON.PNG_SUFFIX = ".png"
__C.COMMON.JPG_SUFFIX = ".jpg"
__C.COMMON.TXT_SUFFIX = ".txt"

# 划分数据的保存路径
__C.COMMON.TRAIN_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/train_data.txt")
__C.COMMON.VAL_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/val_data.txt")
__C.COMMON.TEST_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/test_data.txt")

__C.COMMON.LOGS_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "logs")

# coco_class_names.txt 文件路径
__C.COMMON.COCO_CLASS_NAMES_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/coco_class_names.txt")
__C.COMMON.OUR_CLASS_NAMES_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/our_class_names.txt")

# Input image resizing
# Generally, use the "square" resizing mode for training and predicting
# and it should work well in most cases. In this mode, images are scaled
# up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
# scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
# padded with zeros to make it a square so multiple images can be put
# in one batch.
# Available resizing modes:
# none:   No resizing or padding. Return the image unchanged.
# square: Resize and pad with zeros to get a square image
#         of size [max_dim, max_dim].
# pad64:  Pads width and height with zeros to make them multiples of 64.
#         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
#         up before padding. IMAGE_MAX_DIM is ignored in this mode.
#         The multiple of 64 is needed to ensure smooth scaling of feature
#         maps up and down the 6 levels of the FPN pyramid (2**6=64).
# crop:   Picks random crops from the image. First, scales the image based
#         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
#         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
#         IMAGE_MAX_DIM is not used in this mode.
__C.COMMON.IMAGE_RESIZE_MODE = "square"
__C.COMMON.IMAGE_MIN_DIM = 800
__C.COMMON.IMAGE_MAX_DIM = 1024

# Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
# up scaling. For example, if set to 2 then images are scaled up to double
# the width and height, or more, even if MIN_IMAGE_DIM doesn't require it.
# However, in 'square' mode, it can be overruled by IMAGE_MAX_DIM.
__C.COMMON.IMAGE_MIN_SCALE = 0

# 是否 crop 操作，True 为 crop
__C.COMMON.CROP_FLAG = False
# 训练输入图像的 shape
if __C.COMMON.CROP_FLAG:
    # [h, w, c]
    __C.COMMON.IMAGE_SHAPE = [800, 800, 3]
else:
    __C.COMMON.IMAGE_SHAPE = [1024, 1024, 3]

# 1 background + n classes
# __C.COMMON.CLASS_NUM = 1 + 80
__C.COMMON.CLASS_NUM = 1 + 1

# image_id(1维) + original_image_shape(3维) + image_shape(3维) + image_coor(y1, x1, y2, x2)(4维) +
# scale(1维) + class_num(类别数)
# 参考 compose_image_meta() 方法
__C.COMMON.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + __C.COMMON.CLASS_NUM

# backbone 支持 resNet50 和 resNet101
__C.COMMON.BACKBONE = "resNet101"

# The strides of each layer of the FPN Pyramid. These values
# are based on a resNet101 backbone.
__C.COMMON.BACKBONE_STRIDES = [4, 8, 16, 32, 64]

# Train or freeze batch normalization layers
#     None: Train BN layers. This is the normal mode
#     False: Freeze BN layers. Good when using a small batch size
#     True: (don't use). Set layer in training mode even when predicting
# Defaulting to False since batch size is often small
__C.COMMON.TRAIN_FLAG = False

# Size of the top-down layers used to build the feature pyramid
__C.COMMON.TOP_DOWN_PYRAMID_SIZE = 256

# Length of square anchor side in pixels
__C.COMMON.RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)

# Ratios of anchors at each cell (width/height)
# A value of 1 represents a square anchor, and 0.5 is a wide anchor
__C.COMMON.RPN_ANCHOR_RATIOS = [0.5, 1, 2]

# Anchor stride
# If 1 then anchors are created for each cell in the backbone feature map.
# If 2, then anchors are created for every other cell, and so on.
__C.COMMON.RPN_ANCHOR_STRIDE = 1

# Bounding box refinement standard deviation for RPN and final detections.
__C.COMMON.RPN_BBOX_STD_DEV = [0.1, 0.1, 0.2, 0.2]
__C.COMMON.BBOX_STD_DEV = [0.1, 0.1, 0.2, 0.2]

# Image mean (RGB)
__C.COMMON.MEAN_PIXEL = [123.7, 116.8, 103.9]

# ROIs kept after tf.nn.top_k and before non-maximum suppression
__C.COMMON.PRE_NMS_LIMIT = 6000

# Non-max suppression threshold to filter RPN proposals.
# You can increase this during training to generate more propsals.
__C.COMMON.RPN_NMS_THRESHOLD = 0.7

# Minimum probability value to accept a detected instance
# ROIs below this threshold are skipped
__C.COMMON.DETECTION_MIN_CONFIDENCE = 0.7

# Pooled ROIs
__C.COMMON.POOL_SIZE =