Object Detection Api使用记录-CSDN博客

本文链接：https://blog.csdn.net/wolfcsharp/article/details/90717229

0. Ubuntu安装tensorflow-gpu

有关于ubuntu下使用anaconda安装tensorflow-gpu的方法可以参考我这篇文章ubuntu一行命令在Anaconda下安装tensorflow-gpu
安装好之后，tensorflow默认的路径为~/anaconda3/lib/site-packages/tensorflow

1.object detection API下载地址及目录

https://github.com/tensorflow/models
文件下载下来得到models-master.zip
解压之后，将其放在安装好的tensorflow文件夹下，并且将models-master改名字为models，object_detection路径为models/research/object_detection下。

最终得到的我们需要的目录结构为：~/anaconda3/lib/site-packages/tensorflow/models/research/object_detection

2. protoc对object_detection/protos下的文件进行编译

在ubuntu下查看protoc版本

protoc --version

tensorflow object detection API要求protoc版本为2.6.0以上，否则会编译失败，如果版本过低，需要进行升级，升级好之后就可以进行编译了，我的版本是3.6.0。编译命令为：

cd ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/
protoc object_detection/protos/*.proto --python_out=.

3.添加环境变量

tensorflow/models/research/ 和 tensorflow/models/research/slim 目录需要添加到PYTHONPATH环境变量中.

gedit ~/.bashrc                     #打开.bashrc

添加环境变量

export PYTHONPATH="/home/leo/anaconda3/lib/python3.5/site-packages/tensorflow/models/research:$PYTHONPATH"
export PYTHONPATH="/home/leo/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/slim:$PYTHONPATH"

source ~/.bashrc                 #使修改立即生效

4.测试object_detection api：

cd ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research
python object_detection/builders/model_builder_test.py

如果返回ok表明配置正确
在这里插入图片描述

5. 测试已经训练好的模型

COCO数据集是Microsoft发布的用于图像识别训练的数据集，图像中的目标都经过准确的分割及位置定位的，共包括90类目标。Object Detection API默认提供了5个预训练模型，它们都是COCO数据集训练的:
SSD + MobileNet
Inception V2 + SSD
ResNet101 + R-CNN
ResNet101 + Faster R-CNN
Inception-ResNet V2 + Faster R-CNN

这些预训练模型下载地址为：
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

*新建一个文件夹my_object_detection

新建my_object_detection/models/
将下载的ssd_mobilenet_v1_coco_2018_01_28解压到其中。形成my_object_detection/models/ssd_mobilenet_v1_coco_2018_01_28/目录结构
新建my_object_detection/images/
在里面新建test_images/,形成my_object_detection/images/test_images/目录结构
在里面放两张测试图片或者将原object_detection/test_images/中的两张图片复制过来
新建my_object_detection/data/，将原object_detection/data/中的mscoco_label_map.pbtxt文件复制过来
新建my_object_detection/leo_object_detection_APP_ssd_mobilenet_v1_coco_2018_01_28.py,内容如下：

#encoding:utf-8
import tensorflow as tf
import numpy as np
 
import os
from matplotlib import pyplot as plt
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_utils
 
#下载下来的模型的目录
MODEL_DIR = 'models/ssd_mobilenet_v1_coco_2018_01_28'
#下载下来的模型的文件
MODEL_CHECK_FILE = os.path.join(MODEL_DIR, 'frozen_inference_graph.pb')
#数据集对于的label
MODEL_LABEL_MAP = os.path.join('data', 'mscoco_label_map.pbtxt')
#数据集分类数量，可以打开mscoco_label_map.pbtxt文件看看
MODEL_NUM_CLASSES = 90
 
#这里是获取实例图片文件名，将其放到数组中
PATH_TO_TEST_IMAGES_DIR = 'images/test_images'
TEST_IMAGES_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3)]
 
#输出图像大小，单位是in
IMAGE_SIZE = (12, 8)
 
tf.reset_default_graph()
 
#将模型读取到默认的图中
with tf.gfile.GFile(MODEL_CHECK_FILE, 'rb') as fd:
    _graph = tf.GraphDef()
    _graph.ParseFromString(fd.read())
    tf.import_graph_def(_graph, name='')
 
#加载COCO数据标签，将mscoco_label_map.pbtxt的内容转换成
# {1: {'id': 1, 'name': u'person'}...90: {'id': 90, 'name': u'toothbrush'}}格式
label_map = label_map_util.load_labelmap(MODEL_LABEL_MAP)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=MODEL_NUM_CLASSES)
category_index = label_map_util.create_category_index(categories)
 
#将图片转化成numpy数组形式
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
 
#在图中开始计算
detection_graph = tf.get_default_graph()
with tf.Session(graph=detection_graph) as sess:
    for image_path in TEST_IMAGES_PATHS:
        print(image_path)
        #读取图片
        image = Image.open(image_path)
        #将图片数据转成数组
        image_np = load_image_into_numpy_array(image)
        #增加一个维度
        image_np_expanded = np.expand_dims(image_np, axis=0)
        #下面都是获取模型中的变量，直接使用就好了
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        #存放所有检测框
        boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        #每个检测结果的可信度
        scores = detection_graph.get_tensor_by_name('detection_scores:0')
        #每个框对应的类别
        classes = detection_graph.get_tensor_by_name('detection_classes:0')
        #检测框的个数
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        #开始计算
        (boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],
                                                            feed_dict={image_tensor : image_np_expanded})
        #打印识别结果
        print(num_detections)
        print(boxes)
        print(classes)
        print(scores)
 
        #得到可视化结果
        vis_utils.visualize_boxes_and_labels_on_image_array(
            image_np,
            np.squeeze(boxes),
            np.squeeze(classes).astype(np.int32),
            np.squeeze(scores),
            category_index,
            use_normalized_coordinates=True,
            line_thickness=8
        )
        #显示
        plt.figure(figsize=IMAGE_SIZE)
        plt.imshow(image_np)
        plt.show()

运行结果：
在这里插入图片描述

在这里插入图片描述

6. 初步训练自己的模型

下载数据集：
下载PASCAL VOC 2012数据集，这是别人已经整理好的数据集，一共有17125张图片，每张图片都有标注，标注的内容包括人、动物、交通工具、家具等20个类别。
将下载好的数据集解压到my_object_detection/images/文件夹下，形成my_object_detection/images/VOCdevkit/VOC2012/的目录结构，VOC2012/目录包含了五个文件夹，其中JPEGImages文件夹中存储了所有的图片，Annotations文件夹存储了每一张图片对应目标的标注。

数据格式转换：
tensorflow训练需要的数据格式是tfrecord类型的，所以在开始训练之前要将PASCAL VOC2012数据集转换为tfrecord格式。
新建my_object_detection/create_pascal_tf_record_train.sh，写入如下内容：

python dataset_tools/create_pascal_tf_record.py --data_dir=my_images/VOCdevkit/ --year=VOC2012 --output_path=my_images/VOCdevkit/pascal_train.record --set=train

新建my_object_detection/create_pascal_tf_record_val.sh，写入如下内容：

python dataset_tools/create_pascal_tf_record.py --data_dir=my_images/VOCdevkit/ --year=VOC2012 --output_path=my_images/VOCdevkit/pascal_val.record --set=val

执行bash create_pascal_tf_record_train.sh
执行bash create_pascal_tf_record_val.sh
可以在my_object_detection/images/VOCdevkit/文件夹下得到pascal_train.record和pascal_val.record两个文件，说明转换成功。

下载预训练模型：
下载faster_rcnn_inception_resnet_v2_atrous_coco模型，将其解压到my_object_detection/models/中。

导入PASCAL_VOC2012数据集的pbtxt文件：
将原object_detection/data/中的pascal_label_map.pbtxt文件复制到my_object_detection/data/中

配置训练参数：
新建my_object_detection/training_config/,将原object_detection/samples/config/faster_rcnn_inception_resnet_v2_atrous_coco.config文件复制到该文件夹下，改名字为faster_rcnn_inception_resnet_v2_atrous_coco_voc2012.config，并对其进行修改：

将num_classes: 90改为num_classes: 20
将num_examples: 8000改为num_examples: 5823
5处PATH_TO_BE_CONFIGURED的修改为对应的目录

我最终修改的目录为：

# Faster R-CNN with Inception Resnet v2, Atrous version;
# Configured for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 20
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 8
        width_stride: 8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 17
    maxpool_kernel_size: 1
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "./models/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "./images/VOCdevkit/pascal_train.record"
  }
  label_map_path: "./data/pascal_label_map.pbtxt"
}

eval_config: {
  num_examples: 5823
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "./images/VOCdevkit/pascal_val.record"
  }
  label_map_path: "./data/pascal_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

开始训练：
新建my_object_detection/training_dir/文件夹，用于存放训练结果
新建my_object_detection/leo_object_detection_COCO_VOC2012_training.sh,写入内容如下：

python ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/object_detection/legacy/train.py \
	--train_dir=./train_dir/ \
	--pipeline_config_path=./training_config/faster_rcnn_inception_resnet_v2_atrous_coco_voc2012.config \

执行bash leo_object_detection_COCO_VOC2012_training.sh就可以开始训练了:)

7.导出自己训练的模型

新建my_object_detection/export_dir/文件夹，用于存放导出模型
新建my_object_detection/leo_export_inference_graph.sh文件，写入如下内容：

python ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/object_detection/export_inference_graph.py \
    --input_type=image_tensor \
    --pipeline_config_path=./training_config/faster_rcnn_inception_resnet_v2_atrous_coco_voc2012.config \
    --trained_checkpoint_prefix=./train_dir/model.ckpt-200000 \
    --output_directory=./export_dir/ \

执行bash leo_export_inference_graph.sh就可以将训练的模型导出了。
导出之后，可以在my_object_detection/export_dir/看到如下结果：
在这里插入图片描述
其中frozen_inference_graph.pb就是最终的结果。
新建my_object_detection/models/faster_rcnn_inception_resnet_v2_atrous_COCO_VOC2012文件夹，
将my_object_detection/export_dir/中的内容剪切到该文件夹中，以便调用。

8.测试自己训练的模型

修改第5步中my_object_detection/leo_object_detection_APP_ssd_mobilenet_v1_coco_2018_01_28.py文件中的相应路径，新生成一个python文件,起名字为my_object_detection/leo_object_detection_APP_faster_rcnn_inception_resnet_v2_atrous_COCO_VOC2012.py，测试训练结果。网上随便找了张图，虽然有的时候把马识别成了狗，但是整个流程工作的很好。
在这里插入图片描述

9.训练自己真正的模型

实际应用中，需要识别的目标千差万别，因此需要对特定目标进行训练。

9.1目标标注

标注工具选用labelImage，github上已经给出了详细的安装和使用教程。
标注完成之后，每张图片会对应一个xml文件，其中存储着标注信息，我这里用爬虫爬取了很多GYY的图片，并对其进行了标注，结果如下。
在这里插入图片描述

新建my_object_detection/images/myimages/文件夹，用于存放自己的图片及标注信息。
新建my_object_detection/images/myimages/my_train_images/GYY/文件夹，并将训练图片以及标注好的xml文件copy进去。
新建my_object_detection/images/myimages/my_test_images/GYY/文件夹，并将测试图片以及标注好的xml文件copy进去。

9.2数据预处理:将xml转换成csv

新建my_object_detection/images/myimages/xml2csv_my_train_images_GYY.py文件，并在其中写入如下内容：

# xml2csv.py

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

os.chdir('/home/leo/Desktop/object_detection_API_work/my_object_detection/images/myimages/my_train_images/GYY/')
path = '/home/leo/Desktop/object_detection_API_work/my_object_detection/images/myimages/my_train_images/GYY/'

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    image_path = path
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('GYY_train.csv', index=None)
    print('Successfully converted xml to csv.')


main()

新建my_object_detection/images/myimages/xml2csv_my_test_images_GYY.py文件，并在其中写入如下内容：

# xml2csv.py

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

os.chdir('/home/leo/Desktop/object_detection_API_work/my_object_detection/images/myimages/my_test_images/GYY/')
path = '/home/leo/Desktop/object_detection_API_work/my_object_detection/images/myimages/my_test_images/GYY/'

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    image_path = path
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('GYY_test.csv', index=None)
    print('Successfully converted xml to csv.')


main()

上面两个python文件实现的是将xml标注信息转换成csv格式，cd 到my_object_detection/images/myimages/，执行：

python xml2csv_my_train_images_GYY.py
python xml2csv_my_test_images_GYY.py

然后查看my_object_detection/images/myimages/my_train_images/GYY和my_object_detection/images/myimages/my_test_images/GYY/文件夹，正常情况在两个文件夹下分别会生成GYY_train.csv和GYY_test.csv文件，打开这两个文件，确保里面的内容正确。

9.3数据预处理:将csv转换成tfrecord

新建my_object_detection/images/myimages/leo_generate_tfrecord_train_GYY.py文件，写入：

# leo_generate_tfrecord_train_GYY.py

# -*- coding: utf-8 -*-


"""
Usage:
 # From tensorflow/models/
 # Create train data:
 python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv  --output_path=train.record
 # Create test data:
 python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""


import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict


flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
   if row_label == 'GaoYY':     # 需改动
       return 1
   else:
       None


def split(df, group):
   data = namedtuple('data', ['filename', 'object'])
   gb = df.groupby(group)
   return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
   with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
       encoded_jpg = fid.read()
   encoded_jpg_io = io.BytesIO(encoded_jpg)
   image = Image.open(encoded_jpg_io)
   width, height = image.size

   filename = group.filename.encode('utf8')
   image_format = b'jpg'
   xmins = []
   xmaxs = []
   ymins = []
   ymaxs = []
   classes_text = []
   classes = []

   for index, row in group.object.iterrows():
       xmins.append(row['xmin'] / width)
       xmaxs.append(row['xmax'] / width)
       ymins.append(row['ymin'] / height)
       ymaxs.append(row['ymax'] / height)
       classes_text.append(row['class'].encode('utf8'))
       classes.append(class_text_to_int(row['class']))

   tf_example = tf.train.Example(features=tf.train.Features(feature={
       'image/height': dataset_util.int64_feature(height),
       'image/width': dataset_util.int64_feature(width),
       'image/filename': dataset_util.bytes_feature(filename),
       'image/source_id': dataset_util.bytes_feature(filename),
       'image/encoded': dataset_util.bytes_feature(encoded_jpg),
       'image/format': dataset_util.bytes_feature(image_format),
       'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
       'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
       'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
       'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
       'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
       'image/object/class/label': dataset_util.int64_list_feature(classes),
   }))
   return tf_example


def main(_):
   writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
   path = os.path.join(os.getcwd(), 'my_train_images/GYY')         #  需改动
   examples = pd.read_csv(FLAGS.csv_input)
   grouped = split(examples, 'filename')
   for group in grouped:
       tf_example = create_tf_example(group, path)
       writer.write(tf_example.SerializeToString())

   writer.close()
   output_path = os.path.join(os.getcwd(), FLAGS.output_path)
   print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
   tf.app.run()

新建my_object_detection/images/myimages/leo_generate_tfrecord_test_GYY.py文件，写入：

# leo_generate_tfrecord_test_GYY.py

# -*- coding: utf-8 -*-


"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""


import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict


flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'GaoYY':     # 需改动
        return 1
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'my_test_images/GYY')         #  需改动
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

新建my_object_detection/images/myimages/leo_generate_tfrecord_train_GYY.sh文件，写入：

python leo_generate_tfrecord_train_GYY.py \
    --csv_input=./my_train_images/GYY/GYY_train.csv \
    --output_path=./my_train_images/GYY_train.record \

新建my_object_detection/images/myimages/文件，写入：

python leo_generate_tfrecord_test_GYY.py \
    --csv_input=./my_test_images/GYY/GYY_test.csv \
    --output_path=./my_test_images/GYY_test.record \

cd到my_object_detection/images/myimages/文件夹下，执行

bash leo_generate_tfrecord_train_GYY.sh
bash leo_generate_tfrecord_test_GYY.sh

正常生成成功的话，会在my_object_detection/images/myimages/my_train_images文件夹下生成GYY_train.record文件，在my_object_detection/images/myimages/my_test_images文件夹下生成GYY_test.record文件。

9.4设置pbtxt文件

新建my_object_detection/data/GYY_label_map.pbtxt文件，写入：

9.5配置训练参数

复制my_object_detection/training_config/faster_rcnn_inception_resnet_v2_atrous_coco.config文件到该文件夹下，改名字为faster_rcnn_inception_resnet_v2_atrous_coco_GYY.config。并对其进行修改：

将num_classes: 20改为num_classes: 1						#我只有一类，即GaoYY
将num_examples: 5823改为num_examples: 21					#我的验证集图片数量只有21张
5处PATH_TO_BE_CONFIGURED的修改为对应的目录

我最终修改的目录为：

# Faster R-CNN with Inception Resnet v2, Atrous version;
# Configured for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 8
        width_stride: 8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 17
    maxpool_kernel_size: 1
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "./models/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "./images/myimages/my_train_images/GYY_train.record"
  }
  label_map_path: "./data/GYY_label_map.pbtxt"
}

eval_config: {
  num_examples: 21
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "./images/myimages/my_test_images/GYY_test.record"
  }
  label_map_path: "./data/GYY_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

9.6开始训练

新建my_object_detection/train_dir/GYY/文件夹，用于存放bolt的训练结果
新建my_object_detection/leo_object_detection_COCO_GYY_training.sh,写入内容如下：

python ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/object_detection/legacy/train.py \
    --train_dir=./train_dir/GYY/ \
    --pipeline_config_path=./training_config/faster_rcnn_inception_resnet_v2_atrous_coco_GYY.config \

执行bash leo_object_detection_COCO_GYY_training.sh就可以开始训练了:)

9.7训练监测

在my_object_detection/文件夹下输入：

tensorboard --logdir=./train_dir/GYY

然后，在浏览器输入127.0.0.1:6006就可以实时看到训练情况了
在这里插入图片描述

9.7导出训练好的模型

新建my_object_detection/export_dir/GYY/ 文件夹，用于存放导出模型
新建my_object_detection/leo_export_inference_graph_GYY.sh文件，写入如下内容：

python ~/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/object_detection/export_inference_graph.py \
    --input_type=image_tensor \
    --pipeline_config_path=./training_config/faster_rcnn_inception_resnet_v2_atrous_coco_GYY.config \
    --trained_checkpoint_prefix=./train_dir/GYY/model.ckpt-5739 \
    --output_directory=./export_dir/GYY/ \

执行bash leo_export_inference_graph_GYY.sh就可以将训练的模型导出了。

9.8测试自己的模型

从网上找几张图片，作为测试，放在my_object_detection/images/test_images/文件夹下，名字分别为GaoYY_test0.jpeg,GaoYY_test1.jpeg,GaoYY_test2.jpeg,GaoYY_test3.jpeg,GaoYY_test4.jpeg,GaoYY_test5.jpeg,GaoYY_test6.jpeg
新建my_object_detection/leo_object_detection_APP_faster_rcnn_inception_resnet_v2_atrous_COCO_GYY.py文件
写入：

#encoding:utf-8
import tensorflow as tf
import numpy as np
 
import os
from matplotlib import pyplot as plt
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_utils
 
#下载下来的模型的目录
MODEL_DIR = 'export_dir/GYY'
#下载下来的模型的文件
MODEL_CHECK_FILE = os.path.join(MODEL_DIR, 'frozen_inference_graph.pb')
#数据集对于的label
MODEL_LABEL_MAP = os.path.join('data', 'GYY_label_map.pbtxt')
#数据集分类数量，可以打开mscoco_label_map.pbtxt文件看看
MODEL_NUM_CLASSES = 1
 
#这里是获取实例图片文件名，将其放到数组中
PATH_TO_TEST_IMAGES_DIR = 'images/test_images'
TEST_IMAGES_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, 'GaoYY_test{}.jpeg'.format(i)) for i in range(0, 7)]
 
#输出图像大小，单位是in
IMAGE_SIZE = (12, 8)
 
tf.reset_default_graph()
 
#将模型读取到默认的图中
with tf.gfile.GFile(MODEL_CHECK_FILE, 'rb') as fd:
    _graph = tf.GraphDef()
    _graph.ParseFromString(fd.read())
    tf.import_graph_def(_graph, name='')
 
#加载COCO数据标签，将mscoco_label_map.pbtxt的内容转换成
# {1: {'id': 1, 'name': u'person'}...90: {'id': 90, 'name': u'toothbrush'}}格式
label_map = label_map_util.load_labelmap(MODEL_LABEL_MAP)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=MODEL_NUM_CLASSES)
category_index = label_map_util.create_category_index(categories)
 
#将图片转化成numpy数组形式
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
 
#在图中开始计算
detection_graph = tf.get_default_graph()
with tf.Session(graph=detection_graph) as sess:
    for image_path in TEST_IMAGES_PATHS:
        print(image_path)
        #读取图片
        image = Image.open(image_path)
        #将图片数据转成数组
        image_np = load_image_into_numpy_array(image)
        #增加一个维度
        image_np_expanded = np.expand_dims(image_np, axis=0)
        #下面都是获取模型中的变量，直接使用就好了
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        #存放所有检测框
        boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        #每个检测结果的可信度
        scores = detection_graph.get_tensor_by_name('detection_scores:0')
        #每个框对应的类别
        classes = detection_graph.get_tensor_by_name('detection_classes:0')
        #检测框的个数
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        #开始计算
        (boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],
                                                            feed_dict={image_tensor : image_np_expanded})
        #打印识别结果
        print(num_detections)
        print(boxes)
        print(classes)
        print(scores)
 
        #得到可视化结果
        vis_utils.visualize_boxes_and_labels_on_image_array(
            image_np,
            np.squeeze(boxes),
            np.squeeze(classes).astype(np.int32),
            np.squeeze(scores),
            category_index,
            use_normalized_coordinates=True,
            line_thickness=8
        )
        #显示
        plt.figure(figsize=IMAGE_SIZE)
        plt.imshow(image_np)
        plt.show()

在my_object_detection/文件夹下执行python leo_object_detection_APP_faster_rcnn_inception_resnet_v2_atrous_COCO_GYY.py,结果如下：

在这里插入图片描述
化妆成这样还能识别出来？怕是水平有点厉害

在这里插入图片描述
等等，难道她老公跟她的关系都能识别出来？水平相当不一样啊

什么情况，所有的女生都被识别成了GaoYY?水平有所下降啊

在这里插入图片描述
连ChaiJ都被识别成GaoYY？水平一般啊

在这里插入图片描述
连男生都识别成了GaoYY？水平也是有限啊

什么？？？连高老师都沦陷了！？这水平真的相当相当不行和有限啊

10.总结

Object Detection Api是google提供的基于tensorflow的目标检测API，通过数据标注，确定了训练前的类别，将标注xml转换成csv，然后再转换成tensorflow喜欢的tfrecord，配置一些训练参数，然后开始训练，训练完成后，将训练结果导出成模型，然后再调用这个模型进行目标检测就可以了。
我使用的特定人脸数据进行标注，训练结果在预测范围内，对人脸这类大目标检测的很好，连高老师也被检测出来了，我很严肃，搞通了目标识别API的使用方法只是瞎玩一玩，还是要多看论文，多看文献啊。