对于谷歌开源的TensorFlow Object Detection API视频物体识别系统实现教程

本教程针对Windows10实现谷歌近期公布的TensorFlow Object Detection API视频物体识别系统，其他平台也可借鉴。
本教程将网络上相关资料筛选整合（文末附上参考资料链接），旨在为快速搭建环境以及实现视频物体识别功能提供参考，关于此API的更多相关信息请自行搜索。
注意： windows用户名不能出现中文！！！
安装Python
注意： Windows平台的TensorFlow仅支持3.5.X版本的Python
进入Python3.5.2下载页，选择 Files 中Windows平台的Python安装包，下载并安装。

安装TensorFlow
进入TensorFlow on Windows下载页，本教程使用最简便的组合 CPU support only + Native pip。
打开cmd，输入以下指令即进行TensorFlow的下载安装，下载位置为python\Lib\site-packages\tensorflow：

打开 IDLE，输入以下指令：

如果出现如下结果则安装成功：

若出现问题，请参考TensorFlow on Windows下载页底端的常见问题。

安装Protoc
Protoc用于编译相关程序运行文件，进入Protoc下载页，下载类似下图中带win32的压缩包。

解压后将bin文件夹内的protoc.exe拷贝到c:\windows\system32目录下（用于将protoc.exe所在的目录配置到环境变量当中）。
安装git
进入git官网下载Windows平台的git，详细安装及配置注意事项可参考此文。

安装其余组件
在cmd内输入如下指令下载并安装相关API运行支持组件：

注意： Native pip会受电脑中另外Python应用的影响，博主因为之前做仿真安装了Anaconda，导致下载的jupyter等相关组件安装到了Anaconda内的site-packages文件夹，后期调用失败。
下载代码并编译
在cmd中输入如下代码：

从github下载谷歌tensorflow/models的代码，一般默认下载到C盘。
同样在cmd进入到models文件夹，编译Object Detection API的代码：

运行notebook demo
继续在models文件夹下运行如下命令：

浏览器自动开启，显示如下界面：

进入object_detection文件夹中的object_detection_tutorial.ipynb：

点击Cell内的Run All，等待三分钟左右（博主电脑接近报废），即可显示如下结果：

修改文件路径，即可检测自己的图片：

注意：要将图片名称设置的和代码描述相符合，如image1.jpg

TensorFlow Object Detection API中提供了五种可直接调用的识别模型，默认的是最简单的ssd + mobilenet模型。

可直接将MODEL_NAME修改为如下值调用其他模型：
MODEL_NAME = 'ssd_inception_v2_coco_11_06_2017'
MODEL_NAME = 'rfcn_resnet101_coco_11_06_2017'
MODEL_NAME = 'faster_rcnn_resnet101_coco_11_06_2017'
MODEL_NAME = 'faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'

将模型换为faster_rcnn_inception_resnet，结果如下：

准确率确实获得了极大提高，但是速度却下降了，在博主的老爷机上需要五分钟才能跑出结果。
视频物体识别
谷歌在github上公布了此项目的完整代码，接下来我们将在现有代码基础上添加相应模块实现对于视频中物体的识别。
第一步：下载opencv的cv2包
在Python官网即可下载opencv相关库，点击此处直接进入。

博主安装的版本如下：

下载完成后，在cmd中执行安装命令
pip install opencv_python-3.2.0.8-cp35-cp35m-win_amd64.whl

安装完成后，进入IDLE输入命令
import cv2

若未报错，则opencv-python库成功导入，环境搭配成功。
第二步：在原代码中引入cv2包

第三步：添加视频识别代码
主要步骤如下：
1.使用 VideoFileClip 函数从视频中抓取图片。
2.用fl_image函数将原图片替换为修改后的图片，用于传递物体识别的每张抓取图片。
3.所有修改的剪辑图像被组合成为一个新的视频。
在原版代码基础上，在最后面依次添加如下代码（可从完整代码处复制，但需要作出一些改变，当然也可以直接从下文复制修改后的代码）：

# Import everything needed to edit/save/watch video clips
import imageio
imageio.plugins.ffmpeg.download()
from moviepy.editor import VideoFileClip
from IPython.display import HTML

此处会下载一个剪辑必备的程序ffmpeg.win32.exe，内网下载过程中容易断线，可以使用下载工具下载完然后放入如下路径：
C:\Users\ 用户名 \AppData\Local\imageio\ffmpeg\ffmpeg.win32.exe
def detect_objects(image_np, sess, detection_graph):
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
    (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)
    return image_np

def process_image(image):
    # NOTE: The output you return should be a color image (3 channel) for processing video below
    # you should return the final output (image with lines are drawn on lanes)
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            image_process = detect_objects(image, sess, detection_graph)
            return image_process

white_output = 'video1_out.mp4'
clip1 = VideoFileClip("video1.mp4").subclip(25,30)
white_clip = clip1.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output, audio=False)

其中video1.mp4已经从电脑中上传至object_detection文件夹，subclip（25,30）代表识别视频中25-30s这一时间段。
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format(white_output))

原版视频：

展示识别完毕的视频：

from moviepy.editor import *
clip1 = VideoFileClip("video1_out.mp4")
clip1.write_gif("final.gif")

将识别完毕的视频导为gif格式，并保存至object_detection文件夹。
至此，快速教程结束。各位应该都能使用谷歌开放的API实现了视频物体识别。
相关参考资料
知乎：何之源对于“谷歌开放的TensorFlow Object Detection API 效果如何？”的回答
林俊宇的博客：导入opencv-python库
myboyliu2007的专栏：ffmpeg安装方法
陈强：安装protocolbuffer详解
机器之心：如何使用TensorFlow API构建视频物体识别系统
windows安装git和环境变量配置

来自 <http://m.blog.csdn.net/xiaoxiao123jun/article/details/76605928>

In [1]:
import os
import cv2
import time
import argparse
import multiprocessing
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
%matplotlib inline
In [2]:
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
In [3]:
CWD_PATH = os.getcwd()
# Path to frozen detection graph. This is the actual model that is used for the object detection.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
PATH_TO_CKPT = os.path.join(CWD_PATH, 'object_detection', MODEL_NAME, 'frozen_inference_graph.pb')
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join(CWD_PATH, 'object_detection', 'data', 'mscoco_label_map.pbtxt')
In [4]:
NUM_CLASSES = 90
# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,
                                                            use_display_name=True)
category_index = label_map_util.create_category_index(categories)
In [5]:
def detect_objects(image_np, sess, detection_graph):
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
    (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)
    return image_np
In [6]:
# First test on images
PATH_TO_TEST_IMAGES_DIR = 'object_detection/test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
In [7]:
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)
In [8]:
from PIL import Image
for image_path in TEST_IMAGE_PATHS:
    image = Image.open(image_path)
    image_np = load_image_into_numpy_array(image)
    plt.imshow(image_np)
    print(image.size, image_np.shape)
(1024, 636) (636, 1024, 3)
(1352, 900) (900, 1352, 3)
In [9]:
#Load a frozen TF model
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
In [10]:
with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        for image_path in TEST_IMAGE_PATHS:
            image = Image.open(image_path)
            image_np = load_image_into_numpy_array(image)
            image_process = detect_objects(image_np, sess, detection_graph)
            print(image_process.shape)
            plt.figure(figsize=IMAGE_SIZE)
            plt.imshow(image_process)
(636, 1024, 3)
(900, 1352, 3)

In [11]:
# Import everything needed to edit/save/watch video clips
from moviepy.editor import VideoFileClip
from IPython.display import HTML
In [12]:
def process_image(image):
    # NOTE: The output you return should be a color image (3 channel) for processing video below
    # you should return the final output (image with lines are drawn on lanes)
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            image_process = detect_objects(image, sess, detection_graph)
            return image_process
In [13]:
white_output = 'video1_out.mp4'
clip1 = VideoFileClip("video1.mp4").subclip(0,2)
white_clip = clip1.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output, audio=False)
[MoviePy] >>>> Building video video1_out.mp4
[MoviePy] Writing video video1_out.mp4
100%|██████████| 48/48 [00:33<00:00, 1.44it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: video1_out.mp4
CPU times: user 1min 6s, sys: 3.29 s, total: 1min 10s
Wall time: 33.9 s

In [14]:
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format(white_output))
Out[14]:
In [38]:
white_output1 = 'cars_out.mp4'
clip1 = VideoFileClip("cars.mp4").subclip(0,2)
white_clip = clip1.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output1, audio=False)
[MoviePy] >>>> Building video cars_out.mp4
[MoviePy] Writing video cars_out.mp4
100%|██████████| 60/60 [00:44<00:00, 1.38it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: cars_out.mp4
CPU times: user 1min 25s, sys: 4.6 s, total: 1min 30s
Wall time: 44.4 s

In [39]:
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format(white_output1))
Out[39]:
In [17]:
white_output2 = 'fruits1_out.mp4'
clip2 = VideoFileClip("fruits1.mp4").subclip(0,1)
white_clip = clip2.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output2, audio=False)
[MoviePy] >>>> Building video fruits1_out.mp4
[MoviePy] Writing video fruits1_out.mp4
100%|██████████| 30/30 [00:21<00:00, 1.42it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: fruits1_out.mp4
CPU times: user 41.9 s, sys: 1.65 s, total: 43.6 s
Wall time: 21.3 s

In [18]:
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format(white_output2))
Out[18]:
In [40]:
white_output3 = 'dog_out.mp4'
clip3 = VideoFileClip("dog.mp4").subclip(12,14)
white_clip = clip3.fl_image(process_image) #NOTE: this function expects color images!!s
%time white_clip.write_videofile(white_output3, audio=False)
[MoviePy] >>>> Building video dog_out.mp4
[MoviePy] Writing video dog_out.mp4
100%|██████████| 60/60 [00:43<00:00, 1.43it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: dog_out.mp4
CPU times: user 1min 25s, sys: 3.58 s, total: 1min 29s
Wall time: 43.7 s

In [41]:
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format(white_output3))
Out[41]:
In [42]:
# Merge videos
from moviepy.editor import VideoFileClip, concatenate_videoclips
clip1 = VideoFileClip("cars_out.mp4")
clip2 = VideoFileClip("fruits1_out.mp4")
clip3 = VideoFileClip("dog_out.mp4")
final_clip = concatenate_videoclips([clip1,clip2,clip3], method="compose")
final_clip.write_videofile("my_concatenation.mp4",bitrate="5000k")
[MoviePy] >>>> Building video my_concatenation.mp4
[MoviePy] Writing video my_concatenation.mp4
100%|██████████| 150/150 [00:00<00:00, 189.12it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: my_concatenation.mp4
In [43]:
from moviepy.editor import *
clip = VideoFileClip("my_concatenation.mp4")
clip.write_gif("final.gif")
[MoviePy] Building file final.gif with imageio
100%|██████████| 151/151 [00:11<00:00, 12.35it/s]
In [ ]: