TensorFlow Objection Detection API使用教程

最新推荐文章于 2024-10-05 15:41:27 发布

npupengsir

最新推荐文章于 2024-10-05 15:41:27 发布

阅读量6.6k

点赞数 2

分类专栏： object detection 计算机视觉文章标签： object detection

本文链接：https://blog.csdn.net/u012897374/article/details/80252699

版权

计算机视觉同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

object detection

3 篇文章 0 订阅

订阅专栏

安装参考官方教程

注意在安装的时候需要将protoc升级到3.*版本,否则编译将不能成功。可能报以下错误：

cannot import name 'preprocessor_pb2'
cannot import name string_int_label_map_pb2
Import "object_detection/protos/ssd.proto" was not found or had errors.

注意一定要先编译object_detection/protos文件夹,否则报错。

1. 训练

1.1 制作lable_map.pbtxt文件

参考官方代码，中间的过程需要自己修改

import pandas as pd

def create_labelmap(word_count_file="../data/sub_obj_word_count.txt",
                    labelmap_outfile="../data/labelmap.pbtxt"):
    """

    :param word_count_file: "../data/sub_obj_word_count.txt"
    :param labelmap_outfile:
    :return:
    """
    df = pd.read_csv(word_count_file, header=None,
                    names=["obj_name", "obj_cnt"])
    objects = df.obj_name.tolist()
    end = "\n"
    s = " "
    class_map = {}
    for id, name in enumerate(objects):
        out = ""
        out += "item" + s + "{" + end
        out += (s * 2 + "id:" + " " + (str(id + 1)) + end)
        out += (s * 2 + "name:" + " " + "\'" + name + "\'" + end)
        out += ("}" + end * 2)
        with open(labelmap_outfile, "a") as f:
            f.write(out)
        class_map[name] = id + 1

1.2 制作TFRecord文件

import tensorflow as tf

from object_detection.utils import dataset_util


flags = tf.app.flags
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


def create_tf_example(example):
  # TODO(user): Populate the following variables from your example.
  height = None # Image height
  width = None # Image width
  filename = None # Filename of the image. Empty if image is not from file
  encoded_image_data = None # Encoded image bytes
  image_format = None # b'jpeg' or b'png'

  xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
  xmaxs = [] # List of normalized right x coordinates in bounding box
             # (1 per box)
  ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
  ymaxs = [] # List of normalized bottom y coordinates in bounding box
             # (1 per box)
  classes_text = [] # List of string class name of bounding box (1 per box)
  classes = [] # List of integer class id of bounding box (1 per box)

  tf_example = tf.train.Example(features=tf.train.Features(feature={
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(filename),
      'image/source_id': dataset_util.bytes_feature(filename),
      'image/encoded': dataset_util.bytes_feature(encoded_image_data),
      'image/format': dataset_util.bytes_feature(image_format),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes),
  }))
  return tf_example


def main(_):
  writer = tf.python_io.TFRecordWriter(FLAGS.output_path)

  # TODO(user): Write code to read in your dataset to examples variable

  for example in examples:
    tf_example = create_tf_example(example)
    writer.write(tf_example.SerializeToString())

  writer.close()


if __name__ == '__main__':
  tf.app.run()

还可以将自己的标签制作成csv文件，格式如下:

filename	width	height	class	xmin	ymin	xmax	ymax
cam_image1.jpg	480	270	queen	173	24	260	137
cam_image1.jpg	480	270	queen	165	135	253	251
cam_image1.jpg	480	270,ten	255	96	337	208
cam_image10.jpg	960	540	ten	501	116	700	353
cam_image10.jpg	960	540	queen	261	124	453	370
cam_image11.jpg	960	540	nine	225	96	490	396
cam_image12.jpg	960	540	king	362	149	560	389
cam_image13.jpg	960	540	jack	349	142	550	388
cam_image14.jpg	960	540	jack	297	167	512	420
cam_image15.jpg	960	540	ace	367	181	589	457
cam_image16.jpg	960	540	ace	303	155	525	456

此时，需要得到三个文件：labelmap、train.csv, test.csv。然后用下面的程序来生成tfrecord文件:

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train_img --output_path=train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=images/test_labels.csv  --image_dir=images/test_img --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('image_dir', '', 'Path to the image directory')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    words =pd.read_csv("/home/jamesben/relationship_vrd/data/sub_obj_word_count.txt", header=None, names=["name", "freq"]).name.tolist()
    word2ix = {y: x for x, y in enumerate(words)}
    return word2ix[row_label]

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), FLAGS.image_dir)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

然后分别使用上面注释中的命令生成train.record和test.record文件。推荐该脚本来生成。

1.3 修改samples/configs/*.config文件

配置模型，训练和输入输出参数。重点需要修改的是model中的num_classes, train_config中的fine_tune_checkpoint, 以及train_input_reader、eval_config、eval_input_reader、eval_input_reader。

model {
  faster_rcnn {
    num_classes: 100
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "test_ckpt/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "object_detection/vrd_tfrecord/vrd_train.record"
  }
  label_map_path: "object_detection/data/vrd_labelmap.pbtxt"
}

eval_config: {
  num_examples: 955  #注意该参数是测试集中图像的数目
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "object_detection/vrd_tfrecord/vrd_val.record"
  }
  label_map_path: "object_detection/data/vrd_labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

1.4 设置train的命令行参数

设置参数

--train_dir=train_dir\
--pipeline_config_path=pipeline_config_path

2. 评估预测好的模型

2.1 先将训练好的ckpt模型导出为pb文件

模型训练好了之后，会得到以下三个文件:

model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001,
model.ckpt-${CHECKPOINT_NUMBER}.index
model.ckpt-${CHECKPOINT_NUMBER}.meta

运行export_inference_graph.py文件：

# From tensorflow/models/research/
python export_inference_graph \
    --input_type image_tensor \
    --pipeline_config_path path/to/ssd_inception_v2.config \
    --trained_checkpoint_prefix path/to/model.ckpt-369 \
    --output_directory path/to/exported_model_directory \

然后会在output_directory目录下会得到一个frozen_inference_graph.pb文件。

2.2 预测

运行infer_detections文件

# From tensorflow/models/research/oid
SPLIT=validation  # or test
TF_RECORD_FILES=$(ls -1 ${SPLIT}_tfrecords/* | tr '\n' ',')  # 获取素有tfrecord文件

PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/inference/infer_detections \
  --input_tfrecord_paths=$TF_RECORD_FILES \
  --output_tfrecord_path=${SPLIT}_detections.tfrecord\
  --inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
  --discard_image_pixels  # 预测的结果用来算mAP,不需要保存图片内容

运行完毕之后会得到一个validation_detections.tfrecord文件。该文件会被用来计算 $mAP$ 。

2.3 生成指标相关的配置文件

# From tensorflow/models/research/oid
SPLIT=validation  # or test
NUM_SHARDS=1  # Set to NUM_GPUS if using the parallel evaluation script above

mkdir -p ${SPLIT}_eval_metrics

echo "
label_map_path: '../object_detection/data/oid_bbox_trainable_label_map.pbtxt'
tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS}' }
" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt

其中metrics_set有以下选项:

pascal_voc_detection_metrics
weighted_pascal_voc_detection_metrics
pascal_voc_instance_segmentation_metrics
open_images_detection_metrics
coco_detection_metrics
coco_mask_metrics

该脚本运行完毕之后,会生成两个配置文件：

validation_eval_config.pbtxt
validation_input_config.pbtxt

这两个配置文件在生成评估结果时会用到。

2.4 得到评价指标的结果

运行以下脚本：

# From tensorflow/models/research/oid
SPLIT=validation  # or test

PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/metrics/offline_eval_map_corloc \
  --eval_dir=${SPLIT}_eval_metrics \
  --eval_config_path=${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt \
  --input_config_path=${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt

运行完毕之后会打印评价结果,并将相关的结果写进文件metrics.csv文件中。

3. 在tensorboard中查看模型训练和过拟合情况

要想实现tensorboard中查看，需要按照官方要求将数据组织成以下形式:

+data(folder)
  -label_map file
  -train TFRecord file
  -eval TFRecord file
+models(folder)
  + model(folder)
    -pipeline config file
    +train(folder)
    +eval(folder)

然后在训练的时候，运行以下命令:

# From the tensorflow/models/research/ directory
python object_detection/train.py \
    --logtostderr \
    --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
    --train_dir=${PATH_TO_TRAIN_DIR}

其中${PATH_TO_YOUR_PIPELINE_CONFIG}是上面我们的config文件的路径。${PATH_TO_TRAIN_DIR}是训练时checkpoint和events会被写入的目录,即上面的train目录。
训练的同时，开启预测程序:

# From the tensorflow/models/research/ directory
python object_detection/eval.py \
    --logtostderr \
    --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
    --checkpoint_dir=${PATH_TO_TRAIN_DIR} \
    --eval_dir=${PATH_TO_EVAL_DIR}

预测程序会周期性地取train目录下最新的checkpoint文件来对测试数据进行评估。其中${PATH_TO_YOUR_PIPELINE_CONFIG}是config文件的目录，${PATH_TO_TRAIN_DIR}是上面的训练的checkpoint所在目录，${PATH_TO_EVAL_DIR}是评估时的event文件将会被写入的目录。

开启上面的两个程序后，就可以在tensorboard中查看模型的效果。此时进入到上面的models目录，然后运行下面的命令：

tensorboard --logdir=${PATH_TO_MODEL_DIRECTORY}

其中，${PATH_TO_MODEL_DIRECTORY}指的是train目录和eval目录的父目录，即上面的model目录。

得到的tensorboard就会有train和eval的loss及mAP:
这里写图片描述