基于Tensorflow的自定义对象识别检测模型的训练及视频实时识别（二）

最新推荐文章于 2023-08-29 09:55:02 发布

Aln_

最新推荐文章于 2023-08-29 09:55:02 发布

阅读量2.4k

点赞数 3

分类专栏： tensorflow

本文链接：https://blog.csdn.net/yang1159/article/details/89514497

版权

tensorflow 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

开题

一、 Python 、Tensorflow 安装及环境配置

二、 Object Detection API配置

三、 LabelImage对训练样本标注处理

四、标注后训练样本验证样本格式转换tfrecord

五、训练模型选取及参数配置

六、定位在Object Detection文件下train.py开始训练

七、上一步训练结果固化成pb模型

八、视频流中调用模型预测

基于Tensorflow的自定义对象识别检测模型的训练及视频实时识别（一）

跟着上一篇的节奏，接下来是

1、模型类别选择参数配置

现在已经有了可供tensorflow直接使用的tfrecord数据了，接下来是对模型类别的选择，
官方模型分类 https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md 提供了众多可使用完整模型样例 TIM截图20190424170119.png
红色标记栏是官方给出的对应模型预测耗时(实际应用中远高出标准)，也可以看出ssd_mobilenet_****系列的的确是轻量化快速化的（速度快但识别率相对rcnn类偏低）
了解了各个模型性能后在该文件下选择适合的模型配置文件 TIM截图20190424174146.png
实时类的ssd系列的好点，追求精度的话就是faster_rcnn_resnet50系列的，其实打开不同config文件里面需要我们配置的东西都是一样的（ssd_mobilenet_v1_coco 、faster_rcnn_inception_v2_pets这两个我特意都配置完训练一遍），还是以ssd_mobilenet_v1_pets.config为例，Copy一份到我们自己的目录下打开，需要我们配置的自上往下依次为：
TIM截图20190425093534.png
num_class 对应模型识别的对象分类数量

TIM截图20190425093623.png
batch_size对应每次喂的图片数据数目，根据电脑性能自己调整

TIM截图20190425094009.png
1.156 157这两行是选择原有模型(ssd_mobilenet_v1_coco)的节点作为我们自定义模型训练，可以直接删除掉
2. num_steps训练步数设置

TIM截图20190425094115.png
分别对应训练数据tfrecord 和验证数据tfrecord路径如：

train_input_reader: {
  tf_record_input_reader {
    input_path: "D:\PyCharm\\raccoon_dataset_sample\\data\\train.record"
  }
  label_map_path: "D:\PyCharm\\raccoon_dataset_sample\\object_label_map.pbtxt"
}

eval_config: {
  num_examples: 4
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:\PyCharm\\raccoon_dataset_sample\\data\\test.record"
  }
  label_map_path: "D:\PyCharm\\raccoon_dataset_sample\\object_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

配置后完整config如下：

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 1
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "D:\PyCharm\\raccoon_dataset_sample\\data\\train.record"
  }
  label_map_path: "D:\PyCharm\\raccoon_dataset_sample\\object_label_map.pbtxt"
}

eval_config: {
  num_examples: 4
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "D:\PyCharm\\raccoon_dataset_sample\\data\\test.record"
  }
  label_map_path: "D:\PyCharm\\raccoon_dataset_sample\\object_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

同样建议绝对路径 + 双斜杠避免歧义报错

object_label_map.pbtxt文件则是训练的标签文件如：有几个写几个 id 递增

item{
 id:1
 name:'cigarette'
}
item{
 id:2
 name:'raccoon'
}

2.开始训练

新建一个train_dir保存训练过程数据
新建一个export_dir保存导出模型数据如下
TIM截图20190425103017.png

在****\models\research\object_detection 下按shift + 右键打开命令窗口,输入指令执行

python legacy\\train.py --train_dir ***\\train_dir\ --pipeline_config_path *****\\ssd_mobilenet_v1_pets.config

若报错类型为

tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open:  : ϵͳ\udcd5Ҳ\udcbb\udcb5\udcbdָ\udcb6\udca8\udcb5\udcc4·\udcbe\u
dcb6\udca1\udca3

检查路径问题，再来！不出意外进入如下训练过程，开启漫长等待！每隔10分钟会保存一次训练节点数据信息。
TIM截图20190425104725.png

3.导出模型

训练完成后在train_dir目录下文件信息，events文件供tensorboard可视化训练过程，model.ckpt-****.meta是我们需要操作的文件，****代表的数字也是你训练过程根据训练步数生成的。
TIM截图20190425110851.png

在****\models\research\object_detection 下按shift + 右键打开命令窗口,输入指令执行

模型导出 pb文件
python export_inference_graph.py --input_type image_tensor --pipeline_config_path YOUR_PATH/ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix YOUR_PATH/train_dir/model.ckpt-*****  --output_directory  YOUR_PATH/export_dir/

完成后export_dir目录文件 TIM截图20190425111832.png

4.模型调用（在官方基础上的优化）

先上代码：

# -*- coding: utf-8 -*-

import time
start = time.time()
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import cv2

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
import pandas as pd

cv2.namedWindow("frame",0)
cv2.resizeWindow("frame", 640, 480)

if tf.__version__ < '1.0.0':
    raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')
os.chdir('D:\\ObjectDetection\\models\\research\\object_detection')

# Env setup
# This is needed to display the images.
# %matplotlib inline
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
# Object detection imports
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'YOUR_PB_PATH\\frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('YOUR_PBTXT_PATH', 'object_label_map.pbtxt')
NUM_CLASSES = 3  # YOUR_CLASS_NUM

# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()

# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,
                                                            use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# Helper code
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)

def video_capture(image_tensor, detection_boxes, detection_scores, detection_classes, num_detections, sess, video_path):
    if video_path == "novideo":
        # 0是代表摄像头编号，只有一个的话默认为0
        cap = cv2.VideoCapture(0)
    else:
        cap = cv2.VideoCapture(video_path)
    i = 1
    while 1:
        ref, frame = cap.read()
        if ref:
            i = i + 1
            if i % 3 == 0:
                loss_show(image_tensor, detection_boxes, detection_scores, detection_classes, num_detections, frame,
                          sess)
            else:
                cv2.imshow("frame", frame)
            # 等待30ms显示图像，若过程中按“Esc”退出
            c = cv2.waitKey(30) & 0xff
            if c == 27:  # ESC 按键 对应键盘值 27
                cap.release()
                break
        else:
            print("ref == false ")
            break


def init_ogject_detection(video_path):
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            od_graph_def = tf.GraphDef()
            with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')
            # Definite input and output Tensors for detection_graph
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
            detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            video_capture(image_tensor, detection_boxes, detection_scores, detection_classes, num_detections, sess, video_path)


def loss_show(image_tensor, detection_boxes, detection_scores, detection_classes, num_detections, image_np, sess):
    starttime = time.time()
    image_np = Image.fromarray(cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB))
    image_np = load_image_into_numpy_array(image_np)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    (boxes, scores, classes, num) = sess.run(
        [detection_boxes, detection_scores, detection_classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})
    #print("--scores--->", scores)
    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=5,
        min_score_thresh=.6)
    # write images
    # 保存识别结果图片
    print("------------use time ====> ", time.time() - starttime)
    image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB)
    cv2.imshow("frame", image_np)

def load_pic(path):
    i = 0
    starttime = time.time()
    i = i + 1
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            od_graph_def = tf.GraphDef()
            with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')
                # Definite input and output Tensors for detection_graph
                image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
                # Each box represents a part of the image where a particular object was detected.
                detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
                # Each score represent how level of confidence for each of the objects.
                # Score is shown on the result image, together with the class label.
                detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
                detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
                num_detections = detection_graph.get_tensor_by_name('num_detections:0')
                # TEST_IMAGE_PATHS = os.listdir(os.path.join(image_folder))
                # os.makedirs(output_image_path+image_folder)
                # data = pd.DataFrame()
                # for image_path in TEST_IMAGE_DIRS:


                filelist = os.listdir(path)  # 该文件夹下的所有文件
                print(len(filelist))
                for file in filelist:  # 遍历所有文件
                    print(file)
                    image = Image.open(path+file)
                    # the array based representation of the image will be used later in order to prepare the
                    # result image with boxes and labels on it.
                    image_np = load_image_into_numpy_array(image)
                    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
                    image_np_expanded = np.expand_dims(image_np, axis=0)
                    # Actual detection.
                    (boxes, scores, classes, num) = sess.run(
                        [detection_boxes, detection_scores, detection_classes, num_detections],
                        feed_dict={image_tensor: image_np_expanded})


                    # Visualization of the results of a detection.
                    vis_util.visualize_boxes_and_labels_on_image_array(
                        image_np,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8)

                    # 显示标记图片
                    #print(str(i), "------------use time ====> ", time.time() - starttime)
                    image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB)
                    cv2.imshow("pic", image_np)
                    cv2.waitKey(0)


if __name__ == '__main__':
    PIC_PATH = "YOUR_PATH"
    VIDEO_PATH = "YOUR_PATH"  # 本地文件传入文件路径   调用camera 传入'nopath'
    # 图片列表预测
    #load_pic(PIC_PATH)
    # 视频流预测
    #init_ogject_detection(VIDEO_PATH)

init_ogject_detection() # 视频实时预测
load_pic() # 单张图片预测

def visualize_boxes_and_labels_on_image_array(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  """Overlay labeled boxes on an image with formatted scores and label names.

  This function groups boxes that correspond to the same location
  and creates a display string for each detection and overlays these
  on the image. Note that this function modifies the image in place, and returns
  that same image.

  Args:
    image: uint8 numpy array with shape (img_height, img_width, 3)
    boxes: a numpy array of shape [N, 4]
    classes: a numpy array of shape [N]. Note that class indices are 1-based,
      and match the keys in the label map.
    scores: a numpy array of shape [N] or None.  If scores=None, then
      this function assumes that the boxes to be plotted are groundtruth
      boxes and plot all boxes as black with no classes or scores.
    category_index: a dict containing category dictionaries (each holding
      category index `id` and category name `name`) keyed by category indices.
    instance_masks: a numpy array of shape [N, image_height, image_width] with
      values ranging between 0 and 1, can be None.
    instance_boundaries: a numpy array of shape [N, image_height, image_width]
      with values ranging between 0 and 1, can be None.
    keypoints: a numpy array of shape [N, num_keypoints, 2], can
      be None
    use_normalized_coordinates: whether boxes is to be interpreted as
      normalized coordinates or not.
    max_boxes_to_draw: maximum number of boxes to visualize.  If None, draw
      all boxes.
    min_score_thresh: minimum score threshold for a box to be visualized
    agnostic_mode: boolean (default: False) controlling whether to evaluate in
      class-agnostic mode or not.  This mode will display scores but ignore
      classes.
    line_thickness: integer (default: 4) controlling line width of the boxes.
    groundtruth_box_visualization_color: box color for visualizing groundtruth
      boxes
    skip_scores: whether to skip score when drawing a single detection
    skip_labels: whether to skip label when drawing a single detection

  Returns:
    uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
  """

这个函数原型见上，默认预测值50%的会再图片才显示出来，所以可以根据要求自定义：如
修改，

 vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8,
        min_score_thresh=.6)

后60%以上的才会显示框出来

优化处理主要针对视频实时类，将原来先读取视频，每帧都调用的with…with 逻辑
（思路来源于Stack Overflow上的一个问答）

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:

更改为先加载模型后维持一个session，在开启视频帧预测，时间上由原来的一帧（640*480）耗时2秒左右缩减为现在的0.3秒–0.5秒，效率提升很明显！

欢迎测试优化！

Aln_

关注

3
点赞
踩
20

收藏

觉得还不错? 一键收藏
6
评论
基于Tensorflow的自定义对象识别检测模型的训练及视频实时识别（二）

#开题一、 Python 、Tensorflow 安装及环境配置二、 Object Detection API配置三、 LabelImage对训练样本标注处理四、标注后训练样本验证样本格式转换tfrecord五、训练模型选取及参数配置六、定位在Object Detection文件下train.py开始训练七、上一步训练结果固化成pb模型八、视频流中调用模型预测基于Te...
复制链接

扫一扫