如何使用TensorFlow对象检测API播放Quidditch

最新推荐文章于 2024-09-15 22:31:42 发布

cumi7754

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量273

点赞数

文章标签： python 深度学习 java 人工智能 tensorflow

原文链接：https://www.freecodecamp.org/news/how-to-play-quidditch-using-the-tensorflow-object-detection-api-b0742b99065d/

版权

by Bharath Raj

巴拉斯·拉吉(Bharath Raj)

如何使用TensorFlow对象检测API播放Quidditch (How to play Quidditch using the TensorFlow Object Detection API)

Deep Learning never ceases to amaze me. It has had a profound impact on several domains, beating benchmarks left and right.

深度学习永远不会令我惊讶。它对多个领域产生了深远的影响，超越了左右基准。

Image classification using convolutional neural networks (CNNs) is fairly easy today, especially with the advent of powerful front-end wrappers such as Keras with a TensorFlow back-end. But what if you want to identify more than one object in an image?

如今，使用卷积神经网络(CNN)进行图像分类非常容易，尤其是随着功能强大的前端包装程序(例如带有TensorFlow后端的Keras)的出现。但是，如果您想在一个图像中识别多个对象怎么办？

This problem is called “object localization and detection.” It is much more difficult than simple classification. In fact, until 2015, image localization using CNNs was very slow and inefficient. Check out this blog post by Dhruv to read about the history of object detection in Deep Learning, if you’re interested.

此问题称为“对象定位和检测”。这比简单分类困难得多。实际上，直到2015年，使用CNN进行图像定位都非常缓慢且效率低下。如果您有兴趣，请查看Dhruv的这篇博客文章，以了解有关深度学习中对象检测的历史记录。

Sounds cool. But is it hard to code?

听起来不错。 但是很难编码吗？

Worry not, TensorFlow’s Object Detection API comes to the rescue! They have done most of the heavy lifting for you. All you need to do is to prepare the dataset and set some configurations. You can train your model and use then it for inference.

不用担心， TensorFlow的对象检测API可以助您一臂之力！他们为您完成了大部分繁重的工作。您需要做的只是准备数据集并设置一些配置。您可以训练模型，然后将其用于推理。

TensorFlow also provides pre-trained models, trained on the MS COCO, Kitti, or the Open Images datasets. You could use them as such, if you just want to use it for standard object detection. The drawback is that, they are pre-defined. It can only predict the classes defined by the datasets.

TensorFlow还提供预先训练的模型，这些模型在MS COCO，Kitti或Open Images数据集上进行训练。如果您只想将其用于标准对象检测，则可以按原样使用它们。缺点是它们是预定义的。它只能预测数据集定义的类。

But, what if you wanted to detect something that’s not on the possible list of classes? That’s the purpose of this blog post. I will guide you through creating your own custom object detection program, using a fun example of Quidditch from the Harry Potter universe! (For all you Star Wars fans, here’s a similar blog post that you might like).

但是，如果您想检测出可能不在类列表中的东西怎么办？这就是这篇博客的目的。我将通过一个有趣的哈利波特宇宙中的魁地奇示例，指导您创建自己的自定义对象检测程序！ (对于您所有的《星球大战》粉丝，这里可能都是您喜欢的类似博客 )。

入门 (Getting started)

Start by cloning my GitHub repository, found here. This will be your base directory. All the files referenced in this blog post are available in the repository.

通过克隆我的GitHub库，发现开始在这里。这将是您的基本目录。该博客文章中引用的所有文件都可以在资源库中找到。

Alternatively, you can clone the TensorFlow models repo. If you choose the latter, you only need the folders named “slim” and “object_detection,” so feel free to remove the rest. Don’t rename anything inside these folders (unless you’re sure it won’t mess with the code).

另外，您可以克隆TensorFlow 模型库。如果选择后者，则只需要名为“ slim”和“ object_detection”的文件夹，因此可以随意删除其余的文件夹。不要重命名这些文件夹中的任何内容(除非您确定它不会与代码混淆)。

依存关系 (Dependencies)

Assuming you have TensorFlow installed, you may need to install a few more dependencies, which you can do by executing the following in the base directory:

假设您已安装TensorFlow，则可能需要安装更多依赖关系，可以通过在基本目录中执行以下操作来完成此依赖关系：

pip install -r requirements.txt

The API uses Protobufs to configure and train model parameters. We need to compile the Protobuf libraries before using them. First, you have to install the Protobuf Compiler using the below command:

该API使用Protobufs来配置和训练模型参数。在使用它们之前，我们需要先编译Protobuf库。首先，您必须使用以下命令安装Protobuf编译器：

sudo apt-get install protobuf-compiler

Now, you can compile the Protobuf libraries using the following command:

现在，您可以使用以下命令编译Protobuf库：

protoc object_detection/protos/*.proto --python_out=.

You need to append the path of your base directory, as well as your slim directory to your Python path variable. Note that you have to complete this step every time you open a new terminal. You can do so by executing the below command. Alternatively, you can add it to your ~/.bashrc file to automate the process.

您需要将基本目录的路径以及苗条目录附加到Python路径变量中。请注意，每次打开新终端时必须完成此步骤。您可以通过执行以下命令来实现。或者，您可以将其添加到〜/ .bashrc中文件以自动执行该过程。

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

准备输入 (Preparing the inputs)

My motive was pretty straightforward. I wanted to build a Quidditch Seeker using TensorFlow. Specifically, I wanted to write a program to locate the snitch at every frame.

我的动机很简单。我想使用TensorFlow构建Quidditch Seeker。具体来说，我想编写一个程序来定位每帧的小节。

But then, I decided to up the stakes. How about trying to identify all the moving pieces of equipment used in Quidditch?

但是后来，我决定举足轻重。如何尝试识别魁地奇中使用的所有移动设备？

We start by preparing the label_map.pbtxt file. This would contain all the target label names as well as an ID number for each label. Note that the label ID should start from 1. Here’s the content of the file that I used for my project.

我们首先准备label_map.pbtxt文件。这将包含所有目标标签名称以及每个标签的ID号。请注意，标签ID应该从1开始。这是我用于项目的文件的内容。

item { id: 1 name: ‘snitch’}

item { id: 2 name: ‘quaffle’}

item { id: 3 name: ‘bludger’}

Now, its time to collect the dataset.

现在，该收集数据集了。

Fun! Or boring, depending on your taste, but it’s a mundane task all the same.

好玩！还是无聊，取决于您的口味，但这都是一个平凡的任务。

I collected the dataset by sampling all the frames from a Harry Potter video clip, using a small code snippet I wrote, using the OpenCV framework. Once that was done, I used another code snippet to randomly sample 300 images from the dataset. The code snippets are available in utils.py in my GitHub repo if you would like to do the same.

我使用OpenCV框架，使用我编写的一个小代码段，通过对Harry Potter视频剪辑中的所有帧进行采样来收集数据集。完成此操作后，我使用了另一个代码段从数据集中随机采样了300张图像 。如果您想这样做，可以在我的GitHub存储库中的utils.py中找到这些代码片段。

You heard me right. Only 300 images. Yeah, my dataset wasn’t huge. That’s mainly because I can’t afford to annotate a lot of images. If you want, you can opt for paid services like Amazon Mechanical Turk to annotate your images.

你没听错仅300张图像。是的，我的数据集并不庞大。这主要是因为我无法注释很多图像。如果需要，您可以选择Amazon Mechanical Turk之类的付费服务来注释图像。

注解 (Annotations)

Every image localization task requires ground truth annotations. The annotations used here are XML files with 4 coordinates representing the location of the bounding box surrounding an object, and its label. We use the Pascal VOC format. A sample annotation would look like this:

每个图像本地化任务都需要地面真相注释。此处使用的注释是XML文件，具有4个坐标，分别表示围绕对象的边框及其标签的位置。我们使用Pascal VOC格式。注释示例如下所示：

<annotation>  <filename>182.jpg</filename>  <size>    <width>1280</width>    <height>586</height>    <depth>3</depth>  </size>  <segmented>0</segmented>  <object>    <name>bludger</name>    <bndbox>      <xmin>581</xmin>      <ymin>106</ymin>      <xmax>618</xmax>      <ymax>142</ymax>    </bndbox>  </object>  <object>    <name>quaffle</name>    <bndbox>      <xmin>127</xmin>      <ymin>406</ymin>      <xmax>239</xmax>      <ymax>526</ymax>    </bndbox>  </object></annotation>

You might be thinking, “Do I really need to go through the pain of manually typing in annotations in XML files?” Absolutely not! There are tools which let you use a GUI to draw boxes over objects and annotate them. Fun! LabelImg is an excellent tool for Linux/Windows users. Alternatively, RectLabel is a good choice for Mac users.

您可能会想：“我真的需要经历手动输入XML文件中注释的痛苦吗？” 绝对不！有一些工具可让您使用GUI在对象上绘制框并进行注释。好玩！ LabelImg是Linux / Windows用户的绝佳工具。另外，对于Mac用户， RectLabel是一个不错的选择。

A few footnotes before you start collecting your dataset:

开始收集数据集之前的一些脚注：

Do not rename you image files after you annotate them. The code tries to look up an image using the file name specified inside your XML file (Which LabelImg automatically fills in with the image file name). Also, make sure your image and XML files have the same name.
对图像文件进行注释后，请勿重命名它们。该代码尝试使用XML文件中指定的文件名查找图像(其中LabelImg自动填充图像文件名)。另外，请确保您的图片和XML文件具有相同的名称 。
Make sure you resize the images to the desired size before you start annotating them. If you do so later on, the annotations will not make sense, and you will have to scale the annotation values inside the XMLs.
开始注释图像之前，请确保将图像调整为所需的尺寸。如果以后再这样做，注释将没有意义，并且您将不得不在XML内部缩放注释值。
LabelImg may output some extra elements to the XML file (Such as <pose>, <truncated>, <path>). You do not need to remove those as they won’t interfere with the code.
LabelImg可能会将一些额外的元素输出到XML文件(例如<pose>，<truncated>，<path>)。您无需删除它们，因为它们不会干扰代码。

In case you messed up anything, the utils.py file has some utility functions that can help you out. If you just want to give Quidditch a shot, you could download my annotated dataset instead. Both are available in my GitHub repository.

万一您搞砸了一切， utils.py文件具有一些实用程序功能可以为您提供帮助。如果您只想给Quidditch一炮而红，则可以下载我带注释的数据集。两者都可以在我的GitHub 存储库中找到。

Lastly, create a text file named trainval. It should contain the names of all your image/XML files. For instance, if you have img1.jpg, img2.jpg and img1.xml, img2.xml in your dataset, you trainval.txt file should look like this:

最后，创建一个名为trainval的文本文件。它应该包含所有图像/ XML文件的名称。例如，如果数据集中有img1.jpg，img2.jpg和img1.xml，img2.xml，则trainval.txt文件应如下所示：

img1img2

Separate your dataset into two folders, namely images and annotations. Place the label_map.pbtxt and trainval.txt inside your annotations folder. Create a folder named xmls inside the annotations folder and place all your XMLs inside that. Your directory hierarchy should look something like this:

将数据集分成两个文件夹，即images和notes 。将label_map.pbtxt和trainval.txt放在注释文件夹中。在注解文件夹中创建一个名为xmls的文件夹，并将所有XML放入其中。您的目录层次结构应如下所示：

-base_directory|-images|-annotations||-xmls||-label_map.pbtxt||-trainval.txt

The API accepts inputs in the TFRecords file format. Worry not, you can easily convert your current dataset into the required format with the help of a small utility function. Use the create_tf_record.py file provided in my repo to convert your dataset into TFRecords. You should execute the following command in your base directory:

该API接受TFRecords文件格式的输入。不用担心，您可以借助一个小的实用程序功能轻松地将当前数据集转换为所需格式。使用我的仓库中提供的create_tf_record.py文件将您的数据集转换为TFRecords。您应该在基本目录中执行以下命令：

python create_tf_record.py \    --data_dir=`pwd` \    --output_dir=`pwd`

You will find two files, train.record and val.record, after the program finishes its execution. The standard dataset split is 70% for training and 30% for validation. You can change the split fraction in the main() function of the file if needed.

程序完成执行后，您将找到两个文件train.record和val.record 。标准数据集拆分为训练的70％和验证的30％。如果需要，可以在文件的main()函数中更改拆分分数。

训练模型 (Training the model)

Whew, that was a rather long process to get things ready. The end is almost near. We need to select a localization model to train. Problem is, there are so many options to choose from. Each vary in performance in terms of speed or accuracy. You have to choose the right model for the right job. If you wish to learn more about the trade-off, this paper is a good read.

ew，这是一个漫长的准备过程。末日快到了。我们需要选择一种本地化模型进行训练。问题是，有太多选项可供选择。每个方面在速度或准确性方面都有差异。您必须为正确的工作选择正确的模型。如果您想了解更多有关权衡的知识，可以阅读这篇文章。

In short, SSDs are fast but may fail to detect smaller objects with decent accuracy, whereas Faster RCNNs are relatively slower and larger, but have better accuracy.

简而言之，SSD速度很快，但可能无法以适当的精度检测较小的对象，而Faster RCNN相对较慢且较大，但具有更好的精度。

The TensorFlow Object Detection API has provided us with a bunch of pre-trained models. It is highly recommended to initialize training using a pre-trained model. It can heavily reduce the training time.

TensorFlow对象检测API为我们提供了一堆预训练的模型。强烈建议使用预训练模型初始化训练。它可以大大减少培训时间。

Download one of these models, and extract the contents into your base directory. Since I was more focused on the accuracy, but also wanted a reasonable execution time, I chose the ResNet-50 version of the Faster RCNN model. After extraction, you will receive the model checkpoints, a frozen inference graph, and a pipeline.config file.

下载这些模型之一，然后将内容提取到您的基本目录中。由于我更加关注精度，而且还希望有合理的执行时间，因此我选择了Faster RCNN模型的ResNet-50版本。提取后，您将收到模型检查点，冻结的推理图和pipeline.config文件。

One last thing remains! You have to define the “training job” in the pipeline.config file. Place the file in the base directory. What really matters is the last few lines of the file — you only need to set the highlighted values to your respective file locations.

最后一件事仍然存在！您必须在pipeline.config文件中定义“培训工作”。将文件放在基本目录中。真正重要的是文件的最后几行-您只需将突出显示的值设置为您各自的文件位置。

gradient_clipping_by_norm: 10.0  fine_tune_checkpoint: "model.ckpt"  from_detection_checkpoint: true  num_steps: 200000}train_input_reader {  label_map_path: "annotations/label_map.pbtxt"  tf_record_input_reader {    input_path: "train.record"  }}eval_config {  num_examples: 8000  max_evals: 10  use_moving_averages: false}eval_input_reader {  label_map_path: "annotations/label_map.pbtxt"  shuffle: false  num_epochs: 1  num_readers: 1  tf_record_input_reader {    input_path: "val.record"  }}

If you have experience in setting the best hyper parameters for your model, you may do so. The creators have given some rather brief guidelines here.

如果您有为模型设置最佳超级参数的经验，则可以这样做。创作者在这里给出了一些相当简短的指导。

You’re all set to train your model now! Execute the below command to start the training job.

您现在就可以训练模型了！执行以下命令以开始培训工作。

python object_detection/train.py \--logtostderr \--pipeline_config_path=pipeline.config \--train_dir=train

My Laptop GPU couldn’t handle the model size (Nvidia 950M, 2GB) so I had to run it on the CPU instead. It took around 7–13 seconds per step on my device. After about 10,000 excruciating steps, the model achieved a pretty good accuracy. I stopped training after it reached 20,000 steps, solely because it had taken two days already.

我的笔记本电脑GPU无法处理模型尺寸(Nvidia 950M，2GB)，因此我不得不在CPU上运行它。我的设备每步花费了大约7-13秒的时间。经过大约10,000个步骤，该模型达到了相当好的精度。达到20,000步后，我停止了训练，这完全是因为已经花了两天时间。

You can resume training from a checkpoint by modifying the “fine_tune_checkpoint” attribute from model.ckpt to model.ckpt-xxxx, where xxxx represents the global step number of the saved checkpoint.

您可以通过将model.ckpt的“ fine_tune_checkpoint”属性从model.ckpt修改为model.ckpt-xxxx来从训练点恢复训练，其中xxxx代表已保存检查点的全局步骤号。

导出模型以进行推断 (Exporting the model for inference)

What’s the point of training the model if you can’t use it for object detection? API to the rescue again! But there’s a catch. Their inference module requires a frozen graph model as an input. Not to worry though: using the following command, you can export your trained model to a frozen graph model.

如果不能将其用于对象检测，训练模型有什么意义？ API再次解救！但是有一个陷阱。他们的推理模块需要一个冻结的图模型作为输入。不过不用担心：使用以下命令，您可以将训练后的模型导出为冻结的图形模型。

python object_detection/export_inference_graph.py \--input_type=image_tensor \--pipeline_config_path=pipeline.config \--trained_checkpoint_prefix=train/model.ckpt-xxxxx \--output_directory=output

Neat! You will obtain a file named frozen_inference_graph.pb, along with a bunch of checkpoint files.

整齐！您将获得一个名为Frozen_inference_graph.pb的文件，以及一堆检查点文件。

You can find a file named inference.py in my GitHub repo. You can use it to test or run your object detection module. The code is pretty self explanatory, and is similar to the Object Detection Demo, presented by the creators. You can execute it by typing in the following command:

您可以在我的GitHub存储库中找到一个名为inference.py的文件。您可以使用它来测试或运行对象检测模块。该代码很容易解释，并且类似于创建者提供的“对象检测演示”。您可以通过键入以下命令来执行它：

python object_detection/inference.py \--input_dir={PATH} \--output_dir={PATH} \--label_map={PATH} \--frozen_graph={PATH} \--num_output_classes={NUM}

Replace the highlighted characters {PATH} with the filename or path of the respective file/directory. Replace {NUM} with the number of objects you have defined for your model to detect (In my case, 3).

将突出显示的字符{PATH}替换为相应文件/目录的文件名或路径。将{NUM}替换为为模型定义的要检测的对象数(在我的情况下为3)。

结果 (Results)

Check out these videos to see its performance for yourself! The first video demonstrates the model’s capability to distinguish all three objects, whereas the second video flaunts its prowess as a seeker.

观看这些视频，亲自体验一下！第一个视频演示了模型区分所有三个对象的能力，而第二个视频则彰显了其作为搜寻者的能力。

Pretty impressive I would say! It does have an issue with distinguishing heads from Quidditch objects. But considering the size of our dataset, the performance is pretty good.

我会说非常令人印象深刻！将头部与魁地奇对象区分开来确实存在问题。但是考虑到数据集的大小，性能相当不错。

Training it for too long led to massive over-fitting (it was no longer size invariant), even though it reduced some mistakes. You can overcome this by having a larger dataset.

训练时间过长会导致严重的过度拟合(不再大小不变)，即使它减少了一些错误。您可以通过拥有更大的数据集来克服这一问题。

Thank you for reading this article! Hit that clap button if you did! Hope it helped you create your own Object Detection program. If you have any questions, you can hit me up on LinkedIn or send me an email (bharathrajn98@gmail.com).

感谢您阅读本文！如果您按下了拍手按钮！希望它可以帮助您创建自己的对象检测程序。如有任何疑问，您可以在LinkedIn上打我，或给我发送电子邮件(bharathrajn98@gmail.com)。