利用Tensorflow打造自己数据集的mask rcnn

最新推荐文章于 2022-07-13 16:35:31 发布

DeepHao

最新推荐文章于 2022-07-13 16:35:31 发布

阅读量3.1k

点赞数 3

文章标签： python tensorflow 深度学习

本文链接：https://blog.csdn.net/qq_39567427/article/details/104989739

版权

文章目录

一、数据文件准备
- 1.数据文件下载
- 2.数据文件命名规范
二、数据集制作
三、训练与部署
- 1.下载预训练模型
- 2.创建mask rcnn config文件
选修操作
- 1.segmentation.py
- 2.classification.py
四、模型调用与实现
五、参考

这是一篇拖了快半年的博客(○´･д･)ﾉ
mask

TensorFlow 是目前应用最广泛的深度学习框架，除了提供 faster rcnn，同样提供 mask rcnn，利用 TensorFlow Models 可以快速搭建自己的 mask rcnn 模型

一、数据文件准备

1.数据文件下载

本次博客打算以“人”这个类别为例，所以我们需要大量含有“人”的图片，通过 Python 的爬虫方式，可以快速爬取大量图片
python 爬虫源码文件为 image_gather.py，运行方式为在此源码的同级目录下新建一个 name.txt 文件，里面写入你想要下载的图片名称，我以“美女”为例，然后运行如下命令

python image_gather.py

然后输入你需要下载的数量，我输入为 20

import re
import requests
from urllib import error
from bs4 import BeautifulSoup
import os

num = 0
numPicture = 0
file = ''
List = []


def Find(url):
    global List
    print('INFO:detecting all images,please waiting.....')
    t = 0
    i = 1
    s = 0
    while t < 1000:
        Url = url + str(t)
        try:
            Result = requests.get(Url, timeout=7)
        except BaseException:
            t = t + 60
            continue
        else:
            result = Result.text
            pic_url = re.findall('"objURL":"(.*?)",', result, re.S)  # 先利用正则表达式找到图片url
            s += len(pic_url)
            if len(pic_url) == 0:
                break
            else:
                List.append(pic_url)
                t = t + 60
    return s


def recommend(url):
    Re = []
    try:
        html = requests.get(url)
    except error.HTTPError as e:
        return
    else:
        html.encoding = 'utf-8'
        bsObj = BeautifulSoup(html.text, 'html.parser')
        div = bsObj.find('div', id='topRS')
        if div is not None:
            listA = div.findAll('a')
            for i in listA:
                if i is not None:
                    Re.append(i.get_text())
        return Re


def dowmloadPicture(html, keyword):
    global num
    # t =0
    pic_url = re.findall('"objURL":"(.*?)",', html, re.S)  # 先利用正则表达式找到图片url
    print('INFO:find keyword:' + keyword + '\'s image，downloading...')
    for each in pic_url:
        print('INFO:downloading:' + str(num + 1) + ' images，image address:' + str(each))
        try:
            if each is not None:
                pic = requests.get(each, timeout=7)
            else:
                continue
        except BaseException:
            print('Error，can not download this image')
            continue
        else:
            string = file + r'\\' + keyword + '_' + str(num) + '.jpg'
            fp = open(string, 'wb')
            fp.write(pic.content)
            fp.close()
            num += 1
        if num >= numPicture:
            return

if __name__ == '__main__':  # 主函数入口
    tm = int(input('Please input the num of each name:'))
    numPicture = tm
    line_list = []
    with open('./name.txt', encoding='utf-8') as file:
        line_list = [k.strip() for k in file.readlines()]  # 用 strip()移除末尾的空格
    for word in line_list:
        url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=' + word + '&pn='
        tot = Find(url)
        Recommend = recommend(url)  # 记录相关推荐
        print('INFO:through detection %s has %d images' % (word, tot))
        file = word + '文件'
        y = os.path.exists(file)
        if y == 1:
            print('INFO:file exists,input again')
            file = word+'2'
            os.mkdir(file)
        else:
            os.mkdir(file)
        t = 0
        tmp = url
        while t < numPicture:
            try:
                url = tmp + str(t)
                result = requests.get(url, timeout=10)
                print(url)
            except error.HTTPError as e:
                print('INFO:timeout error，adjust the network')
                t = t + 60
            else:
                dowmloadPicture(result.text, word)
                t = t + 60
        numPicture = numPicture + tm
    print('INFO:download finished')

效果如下

2.数据文件命名规范

下载好图片文件后，检查有没有不能打开的图片，然后对文件夹与文件名重命名等，美女文件如下

正儿八经的数据文件如下

关于快速重命名的方法查看这里 NO.5 Tensorflow在win10下实现object detection

二、数据集制作

1、数据文件分类

将文件分为两类：train，test

2、标签框图

打开 labelme
选择 OpenDir 定位到自己的文件夹→Creat Polygon 就可以开始框选
labelme
框选完保存为 person 标签，演示我框的比较简单，可以利用鼠标滚轮放大再框选

有关labelme的安装与使用，详细在这里 NO.3 Tensorflow在win10下实现object detection

当全部完成后文件如下，每一张图片都有自己对应的 json 文件，json文件里面存储了标签与你框图时每一个点的坐标

创建 labelmap.pbtxt，具体创建的细节参考 NO.5 Tensorflow在win10下实现object detection

item {
  id: 1
  name: 'person'
}

3、数据集生成

首先需要将 json 文件与图片文件放在不同文件夹下，如下

test #里面是test原图片
test_json #里面是test的json文件
train #里面是train原图片
train_json #里面是train的json文件

json
然后需要如下三个文件将数据集转为 tfrecord 形式
create_tf_record.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Convert raw dataset to TFRecord for object_detection.

Please note that this tool only applies to labelme's annotations(json file).

Example usage:
    python create_tf_record.py \
        --images_dir=your absolute path to read images.
        --annotations_json_dir=your path to annotaion json files.
        --label_map_path=your path to label_map.pbtxt
        --output_path=your path to write .record.
"""

import cv2
import glob
import hashlib
import io
import json
import numpy as np
import os
import PIL.Image
import tensorflow as tf

import read_pbtxt_file


flags = tf.app.flags

flags.DEFINE_string('images_dir', None, 'Path to images directory.')
flags.DEFINE_string('annotations_json_dir', 'datasets/annotations', 
                    'Path to annotations directory.')
flags.DEFINE_string('label_map_path', None, 'Path to label map proto.')
flags.DEFINE_string('output_path', None, 'Path to the output tfrecord.')

FLAGS = flags.FLAGS


def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))


def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def bytes_list_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))


def float_list_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def create_tf_example(annotation_dict, label_map_dict=None):
    """Converts image and annotations to a tf.Example proto.
    
    Args:
        annotation_dict: A dictionary containing the following keys:
            ['height', 'width', 'filename', 'sha256_key', 'encoded_jpg',
             'format', 'xmins', 'xmaxs', 'ymins', 'ymaxs', 'masks',
             'class_names'].
        label_map_dict: A dictionary maping class_names to indices.
            
    Returns:
        example: The converted tf.Example.
        
    Raises:
        ValueError: If label_map_dict is None or is not containing a class_name.
    """
    if annotation_dict is None:
        return None
    if label_map_dict is None:
        raise ValueError('`label_map_dict` is None')
        
    height = annotation_dict.get('height', None)
    width = annotation_dict.get('width', None)
    filename = annotation_dict.get('filename', None)
    sha256_key = annotation_dict.get('sha256_key', None)
    encoded_jpg = annotation_dict.get('encoded_jpg', None)
    image_format = annotation_dict.get('format', None)
    xmins = annotation_dict.get('xmins', None)
    xmaxs = annotation_dict.get('xmaxs', None)
    ymins = annotation_dict.get('ymins', None)
    ymaxs = annotation_dict.get('ymaxs', None)
    masks = annotation_dict.get('masks', None)
    class_names = annotation_dict.get('class_names', None)
    
    labels = []
    for class_name in class_names:
        label = label_map_dict.get(class_name, 'None')
        if label is None:
            raise ValueError('`label_map_dict` is not containing {}.'.format(
                class_name))
        labels.append(label)
            
    encoded_masks = []
    for mask in masks:
        pil_image = PIL.Image.fromarray(mask.astype(np.uint8))
        output_io = io.BytesIO()
        pil_image.save(output_io, format='PNG')
        encoded_masks.append(output_io.getvalue())
        
    feature_dict = {
        'image/height': int64_feature(height),
        'image/width': int64_feature(width),
        'image/filename': bytes_feature(filename.encode('utf8')),
        'image/source_id': bytes_feature(filename.encode('utf8')),
        'image/key/sha256': bytes_feature(sha256_key.encode('utf8')),
        'image/encoded': bytes_feature(encoded_jpg),
        'image/format': bytes_feature(image_format.encode('utf8')),
        'image/object/bbox/xmin': float_list_feature(xmins),
        'image/object/bbox/xmax': float_list_feature(xmaxs),
        'image/object/bbox/ymin': float_list_feature(ymins),
        'image/object/bbox/ymax': float_list_feature(ymaxs),
        'image/object/mask': bytes_list_feature(encoded_masks),
        'image/object/class/label': int64_list_feature(labels)}
    example = tf.train.Example(features=tf.train.Features(
        feature=feature_dict))
    return example


def _get_annotation_dict(images_dir, annotation_json_path):  
    """Get boundingboxes and masks.
    
    Args:
        images_dir: Path to images directory.
        annotation_json_path: Path to annotated json file corresponding to
            the image. The json file annotated by labelme with keys:
                ['lineColor', 'imageData', 'fillColor', 'imagePath', 'shapes',
                 'flags'].
            
    Returns:
        annotation_dict: A dictionary containing the following keys:
            ['height', 'width', 'filename', 'sha256_key', 'encoded_jpg',
             'format', 'xmins', 'xmaxs', 'ymins', 'ymaxs', 'masks',
             'class_names'].
#            
#    Raises:
#        ValueError: If images_dir or annotation_json_path is not exist.
    """
#    if not os.path.exists(images_dir):
#        raise ValueError('`images_dir` is not exist.')
#    
#    if not os.path.exists(annotation_json_path):
#        raise ValueError('`annotation_json_path` is not exist.')
    
    if (not os.path.exists(images_dir) or
        not os.path.exists(annotation_json_path)):
        return None
    
    with open(annotation_json_path, 'r') as f:
        json_text = json.load(f)
    shapes = json_text.get('shapes', None)
    if shapes is None:
        return None
    image_relative_path = json_text.get('imagePath', None)
    if image_relative_path is None:
        return None
    image_name = image_relative_path.split('/')[-1]
    image_path = os.path.join(images_dir, image_name)
    image_format = image_name.split('.')[-1].replace('jpg', 'jpeg')
    if not os.path.exists(image_path):
        return None
    
    with tf.gfile.GFile(image_path, 'rb') as fid:
        encoded_jpg = fid.read()
    image = cv2.imread(image_path)
    height = image.shape[0]
    width = image.shape[1]
    key = hashlib.sha256(encoded_jpg).hexdigest()
    
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    masks = []
    class_names = []
    hole_polygons = []
    for mark in shapes:
        class_name = mark.get('label')
        class_names.append(class_name)
        polygon = mark.get('points')
        polygon = np.array(polygon, dtype=np.int)
        if class_name == 'hole':
            hole_polygons.append(polygon)
        else:
            mask = np.zeros(image.shape[:2])
            cv2.fillPoly(mask, [polygon], 1)
            masks.append(mask)
            
            # Boundingbox
            x = polygon[:, 0]
            y = polygon[:, 1]
            xmin = np.min(x)
            xmax = np.max(x)
            ymin = np.min(y)
            ymax = np.max(y)
            xmins.append(float(xmin) / width)
            xmaxs.append(float(xmax) / width)
            ymins.append(float(ymin) / height)
            ymaxs.append(float(ymax) / height)
    # Remove holes in mask
    for mask in masks:
        mask = cv2.fillPoly(mask, hole_polygons, 0)
        
    annotation_dict = {'height': height,
                       'width': width,
                       'filename': image_name,
                       'sha256_key': key,
                       'encoded_jpg': encoded_jpg,
                       'format': image_format,
                       'xmins': xmins,
                       'xmaxs': xmaxs,
                       'ymins': ymins,
                       'ymaxs': ymaxs,
                       'masks': masks,
                       'class_names': class_names}
    return annotation_dict


def main(_):
    if not os.path.exists(FLAGS.images_dir):
        raise ValueError('`images_dir` is not exist.')
    if not os.path.exists(FLAGS.annotations_json_dir):
        raise ValueError('`annotations_json_dir` is not exist.')
    if not os.path.exists(FLAGS.label_map_path):
        raise ValueError('`label_map_path` is not exist.')
        
    label_map = read_pbtxt_file.get_label_map_dict(FLAGS.label_map_path)
    
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
        
    num_annotations_skiped = 0
    annotations_json_path = os.path.join(FLAGS.annotations_json_dir, '*.json')
    for i, annotation_file in enumerate(glob.glob(annotations_json_path)):
        if i % 100 == 0:
            print('On image %d', i)
            
        annotation_dict = _get_annotation_dict(
            FLAGS.images_dir, annotation_file)
        if annotation_dict is None:
            num_annotations_skiped += 1
            continue
        tf_example = create_tf_example(annotation_dict, label_map)
        writer.write(tf_example.SerializeToString())
    
    print('Successfully created TFRecord to {}.'.format(FLAGS.output_path))


if __name__ == '__main__':
    tf.app.run()

read_pbtxt_file.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Aug 26 13:42:50 2018

@author: shirhe-lyh
"""

"""A tool to read .pbtxt file.

See Details at:
    TensorFlow models/research/object_detetion/protos/string_int_label_pb2.py
    TensorFlow models/research/object_detection/utils/label_map_util.py
"""

import tensorflow as tf

from google.protobuf import text_format

import string_int_label_map_pb2


def load_pbtxt_file(path):
    """Read .pbtxt file.
    
    Args: 
        path: Path to StringIntLabelMap proto text file (.pbtxt file).
        
    Returns:
        A StringIntLabelMapProto.
        
    Raises:
        ValueError: If path is not exist.
    """
    if not tf.gfile.Exists(path):
        raise ValueError('`path` is not exist.')
        
    with tf.gfile.GFile(path, 'r') as fid:
        pbtxt_string = fid.read()
        pbtxt = string_int_label_map_pb2.StringIntLabelMap()
        try:
            text_format.Merge(pbtxt_string, pbtxt)
        except text_format.ParseError:
            pbtxt.ParseFromString(pbtxt_string)
    return pbtxt


def get_label_map_dict(path):
    """Reads a .pbtxt file and returns a dictionary.
    
    Args:
        path: Path to StringIntLabelMap proto text file.
        
    Returns:
        A dictionary mapping class names to indices.
    """
    pbtxt = load_pbtxt_file(path)
    
    result_dict = {}
    for item in pbtxt.item:
        result_dict[item.name] = item.id
    return result_dict

string_int_label_map_pb2.py

# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: object_detection/protos/string_int_label_map.proto

import sys
_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor.FileDescriptor(
  name='object_detection/protos/string_int_label_map.proto',
  package='object_detection.protos',
  syntax='proto2',
  serialized_options=None,
  serialized_pb=_b('\n2object_detection/protos/string_int_label_map.proto\x12\x17object_detection.protos\"G\n\x15StringIntLabelMapItem\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\n\n\x02id\x18\x02 \x01(\x05\x12\x14\n\x0c\x64isplay_name\x18\x03 \x01(\t\"Q\n\x11StringIntLabelMap\x12<\n\x04item\x18\x01 \x03(\x0b\x32..object_detection.protos.StringIntLabelMapItem')
)




_STRINGINTLABELMAPITEM = _descriptor.Descriptor(
  name='StringIntLabelMapItem',
  full_name='object_detection.protos.StringIntLabelMapItem',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='name', full_name='object_detection.protos.StringIntLabelMapItem.name', index=0,
      number=1, type=9, cpp_type=9, label=1,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='id', full_name='object_detection.protos.StringIntLabelMapItem.id', index=1,
      number=2, type=5, cpp_type=1, label=1,
      has_default_value=False, default_value=0,
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='display_name', full_name='object_detection.protos.StringIntLabelMapItem.display_name', index=2,
      number=3, type=9, cpp_type=9, label=1,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[],
  enum_types=[
  ],
  serialized_options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=79,
  serialized_end=150,
)


_STRINGINTLABELMAP = _descriptor.Descriptor(
  name='StringIntLabelMap',
  full_name='object_detection.protos.StringIntLabelMap',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='item', full_name='object_detection.protos.StringIntLabelMap.item', index=0,
      number=1, type=11, cpp_type=10, label=3,
      has_default_value=False, default_value=[],
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[],
  enum_types=[
  ],
  serialized_options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=152,
  serialized_end=233,
)

_STRINGINTLABELMAP.fields_by_name['item'].message_type = _STRINGINTLABELMAPITEM
DESCRIPTOR.message_types_by_name['StringIntLabelMapItem'] = _STRINGINTLABELMAPITEM
DESCRIPTOR.message_types_by_name['StringIntLabelMap'] = _STRINGINTLABELMAP
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

StringIntLabelMapItem = _reflection.GeneratedProtocolMessageType('StringIntLabelMapItem', (_message.Message,), dict(
  DESCRIPTOR = _STRINGINTLABELMAPITEM,
  __module__ = 'object_detection.protos.string_int_label_map_pb2'
  # @@protoc_insertion_point(class_scope:object_detection.protos.StringIntLabelMapItem)
  ))
_sym_db.RegisterMessage(StringIntLabelMapItem)

StringIntLabelMap = _reflection.GeneratedProtocolMessageType('StringIntLabelMap', (_message.Message,), dict(
  DESCRIPTOR = _STRINGINTLABELMAP,
  __module__ = 'object_detection.protos.string_int_label_map_pb2'
  # @@protoc_insertion_point(class_scope:object_detection.protos.StringIntLabelMap)
  ))
_sym_db.RegisterMessage(StringIntLabelMap)


# @@protoc_insertion_point(module_scope)

最后文件列表应如下，raw 为下载的原图片文件，未经任何分类，images 为已进行分类并上标签图像

raw 与 images 内文件列表如下，请忽略 segmentation 这是后面的选修操作😀，train.record 为下面命令行运行生成的 record 文件

运行如下命令，分别对 train，test 进行，你将会得到 train.record，test.record

python create_tf_record.py --images_dir=images/train --annotations_json_dir=images/train_json --label_map_path=labelmap.pbtxt --output_path=images/train.record

三、训练与部署

1.下载预训练模型

下载地址：model zoo
找到 mask_rcnn 的预训练模型，根据自己需要选择一个即可，下载解压即可
zoo

2.创建mask rcnn config文件

参考 NO.5 Tensorflow在win10下实现object detection 创建 faster rcnn config类似，你只需要从官方给定的 config 文件选择符合你想训练模型的mask config就行，参数设置方式同样参考，最后以我的为例
mask_rcnn_inception_v2_coco.config

# Mask R-CNN with Inception V2
# Configured for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 800
        max_dimension: 1365
      }
    }
    number_of_stages: 3
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        predict_instance_masks: true
        mask_height: 15
        mask_width: 15
        mask_prediction_conv_depth: 0
        mask_prediction_num_conv_layers: 2
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        conv_hyperparams {
          op: CONV
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.01
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    second_stage_mask_prediction_loss_weight: 4.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 1000000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "train.record"
  }
  label_map_path: "labelmap.pbtxt"
  load_instance_masks: true
  mask_type: PNG_MASKS
}

eval_config: {
  num_examples: 50
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "test.record"
  }
  label_map_path: "labelmap.pbtxt"
  load_instance_masks: true
  mask_type: PNG_MASKS
  shuffle: false
  num_readers: 1
}

你只需要将 NO.5 Tensorflow在win10下实现object detection 这篇博客的 tfrecord 文件分别对应替换为你的 tfrecord 文件，替换 config 文件等，如果你成功操作了 faster rcnn 的部署，我想这篇博客会很容易实现

train.record → train.record
validation.record → test.record

模型训练

!python train.py --train_dir training/ --pipeline_config_path mask_rcnn_inception_v2_coco_2018_01_28.config

模型冻结

!python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path mask_rcnn_inception_v2_coco_2018_01_28.config \
--trained_checkpoint_prefix training/model.ckpt-500000 \
--output_directory export/

选修操作

1.segmentation.py

这个文件是配合 labelme_json_to_dataset.exe 一起使用的，创建了一个 optional 文件夹，里面存放了 test 的图片文件以及 json 文件，来说明 segmentation.py 的功能，其功能是对图像进行语义分割，path_file_name 名称根据自己文件夹修改，你可以对 test 与 train 都这样操作
segmentation.py

# coding=utf-8
import os
import glob

path_file_name=glob.glob('optional/*.json')
file_num = len(path_file_name)
file_name = [i for i in range(file_num)]
for i in range(file_num):
    file_name[i] = path_file_name[i].split('\\')[-1]
    print("INFO:" + file_name[i])
print("INFO:There are " + str(file_num) +" json files")
json_files = [ os.path.join('optional/', '{}.json'.format(i)) for i in range(1, file_num + 1) ]

for json_file in json_files:
    run = "labelme_json_to_dataset.exe %s" % (json_file)
    os.system(run)
    
print("INFO:finished")

运行命令

python segmentation.py

效果如下，在 optional 文件夹有 50 个文件夹，数量等于 test 图片数，每一个文件夹下有 4 个文件

2.classification.py

作用为对 1 中生成的文件进行分类，我们将 1 中生成的文件夹放在

images/segmentation

下，里面有 test 的，也有 train 的

classification.py

import glob
import os
import argparse
import shutil

ap = argparse.ArgumentParser()

ap.add_argument("-d", "--classification", type=str, required=True,
	help="which classification you need")
    
args = vars(ap.parse_args())

classification_path = os.path.join(args["classification"])

print('INFO:This script is aimed to classify the images and text files')

path_file_name=glob.glob('images/segmentation/train/*_json')
file_num = len(path_file_name)
print('INFO:There are ' + str(file_num) + ' folders need to be dealt')

PATH_TO_CLASSIFICATION_DIR = "images/segmentation/train"
out_dir = [os.path.join(PATH_TO_CLASSIFICATION_DIR, classification_path.split('.')[0])]
out_dir = out_dir[0]
print('INFO:' + classification_path + ' will be dealt')
if not os.path.exists(out_dir):
    os.mkdir(out_dir)
print('INFO:' + 'your destination folder is ' + out_dir)
for i in range(1,file_num + 1):
    source_file = out_dir + '/' + classification_path
    destination_file = out_dir + '/{}'.format(i) + '.' + classification_path.split('.')[-1]
    temp_file = out_dir + '/' + classification_path
    classification_file_name = os.path.join(PATH_TO_CLASSIFICATION_DIR, '{}_json'.format(i), classification_path)
    print('INFO:' + classification_file_name)
    if not os.path.exists(destination_file):
        shutil.copy(classification_file_name,out_dir)
        os.rename(source_file, destination_file)

print('INFO:finished')

运行命令如下：
label.png 还可以为 label_viz.png 等，修改源码文件夹实现分别对 test，train操作，你甚至可以重写一个 argparse.ArgumentParser() 来变得更人性化

python classification.py --classification label.png #修改源码test、train

结果如下，会自动创建去掉分类文件后缀名的文件夹

label_viz.png 如下

四、模型调用与实现

模型的训练，冻结都参考博客 NO.6 Tensorflow在win10下实现object detection 描述很清楚，并且有很清楚的操作方式，具体可以参考 Tensorflow.ipynb，我将会在最后给出我的 github 地址，我的模型是训练的 500000 步，有关调用的代码同样参考上述博客，你只需要修改文件中一小部分路径，模型名称即可，我对视频进行了识别，放两张截图
video
video
视频识别源码

import numpy as np
import os
import sys
import tensorflow as tf
import cv2
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

# This is needed to display the images.

from utils import label_map_util

from utils import visualization_utils as vis_util

MODEL_NAME = 'export'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('labelmap.pbtxt')

detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[1], image.shape[2])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: image})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.int64)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict


video = cv2.VideoWriter("Ip.avi", cv2.VideoWriter_fourcc(*'XVID'), 30.00, (856, 458))
datapath = 'IpMan.mp4'
video_capture=cv2.VideoCapture(datapath)
i=1
while True:
    ret, frame = video_capture.read()
    image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    image_np = load_image_into_numpy_array(image)
    image_np_expanded = np.expand_dims(image_np, axis=0)
    output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8)
    img=cv2.cvtColor(np.asarray(image_np), cv2.COLOR_RGB2BGR)
    print(i)
    video.write(img)
    i=i+1

video_capture.release()
video.release()