深度篇——实例分割(二) 细说 mask rcnn 实例分割代码 训练自己数据

返回主目录

返回 实例分割 目录

上一章:深度篇——实例分割(一)  mask rcnn 论文 翻译 和 总结

下一章:  深度篇——实例分割(三)  细说 mask rcnn 实例分割代码 训练自己数据 之 相关网络,数据处理,工具等

 

论文地址:《Mask R-CNN》

作者代码地址:Mask R-CNN code

我优化的代码地址:mask_rcnn_pro (只需修改一下 config.py 配置,就可以直接跑的)

 

本小节,细说 mask rcnn 实例分割代码讲解,下一小节 细说 mask rcnn 实例分割代码 训练自己数据 相关网络,数据处理,工具等

 

在开始代码讲解前,先向 何恺明 大帝敬礼!

我的代码,是参考论文提供的代码来优化的。在对 mask rcnn 的学习中,我发现,作者提供的 mask_rcnn_coco.h5 模型非常强大(毕竟人家当初是用 8 个 GPU 训练出来的,而且经过不断调参验证得到的,而大帝在论文中,也把别人超得理所当然)。

作者提供的数据:

train2014 data:
              http://images.cocodataset.org/zips/train2014.zip
              http://images.cocodataset.org/annotations/annotations_trainval2014.zip

val2014 data(valminusminival):
              http://images.cocodataset.org/zips/val2014.zip
              https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0

val2014 data(minival):
               http://images.cocodataset.org/zips/val2014.zip
               https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0

对于无法科学上网的朋友,提供如下百度云链接:

mask_rcnn_coco.h5 模型:  链接:https://pan.baidu.com/s/1-_zxnWCmE9Vea7UOsl0naQ   提取码:y7gr

coco data   链接:https://pan.baidu.com/s/1gYAR4cxp1femrh8YgB6R7A    提取码:8xkx

 

下面开始搬砖:

 

二. 制作 labelme 的 json 数据

跑别人的代码,别人的数据,不是目的。要做一个适合自己数据的代码,跑自己的数据,才是硬道理。

1. 首先,先安装 labelme,这个非常简单,下面讲解在 conda 环境下安装 labelme:

# 打开命令窗口

# 创建虚拟环境,并命名为 labelme,指定解释器为python3.6
conda create -n labelme python=3.6
# windows 进入conda 的虚拟环境:
activate labelme
# linux 下进入虚拟环境:
source activate labelme
# 第一次进入虚拟环境,先更新pip, 避免后面安装报版本错误:
python -m pip install --upgrade pip
# 接下来是重头戏, labelme 只需安装以下 3 个库 即可使用:
conda install pyqt
conda install pillow
pip install labelme

# 安装好后,直接输入命令: labelme 可以开启labelme 工具

2. 使用 labelme 标记图像

3. 生成的数据如下: 

4. 将数据放入到对应文件夹中:

 

三. 将我们的数据制作成 coco dataset。

 

在接触代码前,先看一下项目文件结构

README.md 文件信息如下: 

# Mask R-CNN for Object Detection and Segmentation
# [mask_rcnn_pro](https://github.com/wandaoyi/mask_rcnn_pro)

- [论文地址](https://arxiv.org/abs/1703.06870)
- [我的 CSDN 博客](https://blog.csdn.net/qq_38299170/article/details/105233638) 
本项目使用 python3, keras 和 tensorflow 相结合。本模型基于 FPN 网络 和 resNet101 背骨,对图像中的每个目标生成 bounding boxes 和 分割 masks。

The repository includes:
* Source code of Mask R-CNN built on FPN and ResNet101.
* Training code for MS COCO
* Pre-trained weights for MS COCO
* Jupyter notebooks to visualize the detection pipeline at every step
* ParallelModel class for multi-GPU training
* Evaluation on MS COCO metrics (AP)
* Example of training on your own dataset

```bashrc
mask_rcnn_coco.h5:
- https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

train2014 data:
- http://images.cocodataset.org/zips/train2014.zip
- http://images.cocodataset.org/annotations/annotations_trainval2014.zip

val2014 data(valminusminival):
- http://images.cocodataset.org/zips/val2014.zip
- https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0

val2014 data(minival):
- http://images.cocodataset.org/zips/val2014.zip
- https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0
```


# Getting Started

* 参考 config.py 文件配置。
* 下面文件中 def __init__(self) 方法中的配置文件,基本都是来自于 config.py


* 测试看效果:
* mask_test.py 下载好 mask_rcnn_coco.h5 模型,随便找点数据,设置好配置文件,直接运行看结果吧。


* 数据处理:
* prepare.py 直接运行代码,将 labelme json 数据制作成 coco json 数据。
* 并将数据进行划分


* 数据训练:
* mask_train.py 直接运行代码,观察 loss 情况。
* mask_rcnn_coco.h5 作为预训练模型很强,训练模型会有一个很好的起点。


* 多 GPU 训练:
* parallel_model.py: 本人没有多 GPU,这一步没做到验证,里面的代码,是沿用作者的。


* 本项目,操作上,就三板斧搞定,不搞那么复杂,吓到人。



代码,我基本做成傻瓜式一键运行,基本不用修改代码,修改一下配置就 OK 的了。

config.py 文件如下:

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/04/15 22:53
# @Author   : WanDaoYi
# @FileName : config.py
# ============================================

import os
from easydict import EasyDict as edict

# mask_rcnn_coco.h5 预训练模型下载地址: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

__C = edict()
# Consumers can get config by: from config import cfg
cfg = __C

# common options 公共配置文件
__C.COMMON = edict()

# 论文中的模型 url
__C.COMMON.COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5"

# 相对路径 当前路径
__C.COMMON.RELATIVE_PATH = "./"

# mask 默认背景 类别, 背景为第一个类别
__C.COMMON.DEFAULT_CLASS_INFO = [{"source": "", "id": 0, "name": "BG"}]

__C.COMMON.DATA_SET_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset")

# 原始图像 文件 路径
__C.COMMON.IMAGE_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset/images")
# labelme 生成的 json 注释文件 路径
__C.COMMON.JSON_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "dataset/ann_json")

# 是否删除已有文件,True 为删除,False 为不删除
__C.COMMON.FILE_EXISTS_FLAG = True

# 数据划分比例
__C.COMMON.TEST_PERCENT = 0.7
__C.COMMON.VAL_PERCENT = 0.2
__C.COMMON.TEST_PERCENT = 0.1

# 数据来源
__C.COMMON.DATA_SOURCE = "our_data"

# 文件后缀名
__C.COMMON.JSON_SUFFIX = ".json"
__C.COMMON.PNG_SUFFIX = ".png"
__C.COMMON.JPG_SUFFIX = ".jpg"
__C.COMMON.TXT_SUFFIX = ".txt"

# 划分数据的保存路径
__C.COMMON.TRAIN_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/train_data.txt")
__C.COMMON.VAL_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/val_data.txt")
__C.COMMON.TEST_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/test_data.txt")

__C.COMMON.LOGS_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "logs")

# coco_class_names.txt 文件路径
__C.COMMON.COCO_CLASS_NAMES_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/coco_class_names.txt")
__C.COMMON.OUR_CLASS_NAMES_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/our_class_names.txt")

# Input image resizing
# Generally, use the "square" resizing mode for training and predicting
# and it should work well in most cases. In this mode, images are scaled
# up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
# scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
# padded with zeros to make it a square so multiple images can be put
# in one batch.
# Available resizing modes:
# none:   No resizing or padding. Return the image unchanged.
# square: Resize and pad with zeros to get a square image
#         of size [max_dim, max_dim].
# pad64:  Pads width and height with zeros to make them multiples of 64.
#         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
#         up before padding. IMAGE_MAX_DIM is ignored in this mode.
#         The multiple of 64 is needed to ensure smooth scaling of feature
#         maps up and down the 6 levels of the FPN pyramid (2**6=64).
# crop:   Picks random crops from the image. First, scales the image based
#         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
#         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
#         IMAGE_MAX_DIM is not used in this mode.
__C.COMMON.IMAGE_RESIZE_MODE = "square"
__C.COMMON.IMAGE_MIN_DIM = 800
__C.COMMON.IMAGE_MAX_DIM = 1024

# Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
# up scaling. For example, if set to 2 then images are scaled up to double
# the width and height, or more, even if MIN_IMAGE_DIM doesn't require it.
# However, in 'square' mode, it can be overruled by IMAGE_MAX_DIM.
__C.COMMON.IMAGE_MIN_SCALE = 0

# 是否 crop 操作,True 为 crop
__C.COMMON.CROP_FLAG = False
# 训练输入图像的 shape
if __C.COMMON.CROP_FLAG:
    # [h, w, c]
    __C.COMMON.IMAGE_SHAPE = [800, 800, 3]
else:
    __C.COMMON.IMAGE_SHAPE = [1024, 1024, 3]

# 1 background + n classes
# __C.COMMON.CLASS_NUM = 1 + 80
__C.COMMON.CLASS_NUM = 1 + 1

# image_id(1维) + original_image_shape(3维) + image_shape(3维) + image_coor(y1, x1, y2, x2)(4维) +
# scale(1维) + class_num(类别数)
# 参考 compose_image_meta() 方法
__C.COMMON.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + __C.COMMON.CLASS_NUM

# backbone 支持 resNet50 和 resNet101
__C.COMMON.BACKBONE = "resNet101"

# The strides of each layer of the FPN Pyramid. These values
# are based on a resNet101 backbone.
__C.COMMON.BACKBONE_STRIDES = [4, 8, 16, 32, 64]

# Train or freeze batch normalization layers
#     None: Train BN layers. This is the normal mode
#     False: Freeze BN layers. Good when using a small batch size
#     True: (don't use). Set layer in training mode even when predicting
# Defaulting to False since batch size is often small
__C.COMMON.TRAIN_FLAG = False

# Size of the top-down layers used to build the feature pyramid
__C.COMMON.TOP_DOWN_PYRAMID_SIZE = 256

# Length of square anchor side in pixels
__C.COMMON.RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)

# Ratios of anchors at each cell (width/height)
# A value of 1 represents a square anchor, and 0.5 is a wide anchor
__C.COMMON.RPN_ANCHOR_RATIOS = [0.5, 1, 2]

# Anchor stride
# If 1 then anchors are created for each cell in the backbone feature map.
# If 2, then anchors are created for every other cell, and so on.
__C.COMMON.RPN_ANCHOR_STRIDE = 1

# Bounding box refinement standard deviation for RPN and final detections.
__C.COMMON.RPN_BBOX_STD_DEV = [0.1, 0.1, 0.2, 0.2]
__C.COMMON.BBOX_STD_DEV = [0.1, 0.1, 0.2, 0.2]

# Image mean (RGB)
__C.COMMON.MEAN_PIXEL = [123.7, 116.8, 103.9]

# ROIs kept after tf.nn.top_k and before non-maximum suppression
__C.COMMON.PRE_NMS_LIMIT = 6000

# Non-max suppression threshold to filter RPN proposals.
# You can increase this during training to generate more propsals.
__C.COMMON.RPN_NMS_THRESHOLD = 0.7

# Minimum probability value to accept a detected instance
# ROIs below this threshold are skipped
__C.COMMON.DETECTION_MIN_CONFIDENCE = 0.7

# Pooled ROIs
__C.COMMON.POOL_SIZE =
  • 17
    点赞
  • 110
    收藏
    觉得还不错? 一键收藏
  • 24
    评论
评论 24
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值