tensorflow架构
Object detection is one of the most popular and used computer vision methods nowadays, where the intention is not only to determine whether the object is found or not in the image in the same way as most common classification problems but also point the location of these objects of interest, being the necessary approach for situations where multiple objects may appear simultaneously in the image.
对象检测是当今最流行和使用的计算机视觉方法之一,其目的不仅在于确定是否以与大多数常见分类问题相同的方式在图像中找到对象,而且还指出了这些对象的位置感兴趣的对象,这是在图像中可能同时出现多个对象的情况下的必要方法。
One of the challenges of this method is to create the dataset, once it’s necessary to manually set the positions of all objects in the image, spending a lot of time to do so in a large number of observations.
这种方法的挑战之一是创建数据集,一旦需要手动设置图像中所有对象的位置,就需要花费大量时间进行大量观察。
This process is inefficient, expensive, and time-consuming, mainly in some problems that are required to label dozens of objects in each image or demand specialized knowledge.
该过程效率低下,昂贵且耗时,主要是在一些问题上,这些问题需要在每个图像中标记数十个对象或需要专门知识。
Based on this, I created a TensorFlow Semi-supervised Object Detection Architecture (TSODA) to interactively train an object detection model, and use it to automatically label new images based on a confidence threshold level, aggregating them to the later training process.
基于此,我创建了一个TensorFlow半监督对象检测架构(TSODA)来交互式地训练对象检测模型,并使用它基于置信度阈值级别自动标记新图像,并将其聚集到以后的训练过程中。
In this article, I’ll show you the necessary steps to reproduce this approach in your object detection project. With this, you’ll be able to create labels in your images automatically while measuring the model performance!
在本文中,我将向您展示在对象检测项目中重现这种方法的必要步骤。 这样,您就可以在测量模型性能的同时自动在图像中创建标签!
目录: (Table of contents:)
How TSODA Works
TSODA如何运作
Example Application
应用范例
Implementation
实作
Results
结果
Conclusion
结论
TSODA如何运作 (How TSODA works)
The working is similar to any other semi-supervised method, where the training is done with labeled and unlabeled data, unlike the most common supervised approach.
该工作类似于任何其他半监督方法,在该方法中,使用标记的和未标记的数据进行训练,这与最常见的监督方法不同。
An initial model is trained using strongly labeled data done by hand, learns some features from these data, and then create inferences in the unlabeled data to aggregate these new labeled images to a new training process.
使用手工完成的带有强标签的数据训练初始模型,从这些数据中学习一些特征,然后在未标记的数据中创建推论,以将这些新的标记图像聚合到新的训练过程中。
The whole idea can be illustrated by the following image:
整个想法可以通过下图说明:
This operation is done until the stop criterion is reached, either the number of executions or no remaining unlabeled data.
执行此操作,直到达到停止标准(执行次数或没有剩余的未标记数据)为止。
As we saw in the schema, a confidence threshold of 80% was initially configured. This is an important parameter once the new images will be used to a new training process and if incorrectly labeled could create undesirable noise, undermining the model performance.
正如我们在模式中看到的,最初配置了80%的置信度阈值。 一旦新图像将用于新的训练过程,这是一个重要的参数,如果标注不正确会产生不希望的噪声,从而破坏模型的性能。
The propose of TSODA is to introduce a simple and fast way to use semi-supervised learning in your object detection project.
TSODA的建议是引入一种简单快速的方法来在对象检测项目中使用半监督学习。
应用范例 (Example Application)
To exemplify the approach and test if everything is working properly, a random sample of 1,100 images of the Asirra dataset was done in a proportion of 50% per class.
为了举例说明该方法并测试一切是否正常运行,以每类50%的比例对Asirra数据集的1100张图像进行了随机抽样。
The images were labeled manually to a later comparison, you can download the same data on Kaggle.
图像被手动标记为以后的比较,您可以在Kaggle上下载相同的数据。
I used Single Shot Multibox Detector (SSD) as the object detection architecture and Inception as the base network instead of VGG 16 like in the original paper.
我使用Single Shot Multibox Detector(SSD)作为对象检测体系结构,并使用Inception作为基础网络,而不是原始论文中的VGG 16。
SSD and Inception have a good trade-off between training speed and accuracy, so I think it’s a great start point, mainly because in each iteration the TSODA needs to save a checkpoint of the trained model, infer new images and load the model to train it again, so a faster training is good to iterate more and aggregate these images to the learning.
SSD和Inception在训练速度和准确性之间取得了很好的权衡,所以我认为这是一个很好的起点,主要是因为在每次迭代中,TSODA需要保存训练后的模型的检查点,推断新图像并加载模型以进行训练再来一次,所以更快的训练对迭代更多并将这些图像聚合到学习中是有益的。
测试性能 (Testing performance)
To test TSODA performance just 100 labeled images of each class were provided to split into training and test while 900 were let as unlabeled, simulating a situation where just a little time was spent creating the labeled dataset. the obtained results were compared to a model trained with all the manually labeled images.
为了测试TSODA的性能,仅提供每个类别的100张带标签的图像进行训练和测试,同时将900张带标签的图像设为未标签,以模拟仅花费很少时间创建标签数据集的情况。 将获得的结果与使用所有手动标记图像训练的模型进行比较。
The data were randomly split into 80% of images for training and 20% for testing.
数据被随机分为80%的图像用于训练和20%的图像用于测试。
实作 (Implementation)
As the name suggests, the whole architecture is done using the TensorFlow environment, in version 2.x.
顾名思义,整个架构是使用2.x版的TensorFlow环境完成的。
This new TF version is not yet fully compatible with object detection, and some parts were difficult to adapt, but in the next months this will be the default and more used version of TF in all projects, that’s why I think it’s important to adapt the code to use it.
这个新的TF版本尚未与对象检测完全兼容,并且某些部分难以适应,但是在接下来的几个月中,它将是所有项目中TF的默认版本和更常用的版本,这就是为什么我认为重要的是使用它的代码。
To create TSODA, new scripts and folders were added in a fork of TF Model Garden repository, so you can easily clone and with just small modifications run your semi-supervised project, besides be a familiar structure for those who work with TF.
为了创建TSODA,在TF模型花园存储库的分支中添加了新的脚本和文件夹,因此您可以轻松地克隆并且只需进行少量修改就可以运行半监督项目,并且是使用TF的人熟悉的结构。
You can clone my repository to easily follow these steps or adapt your TF model repository.
您可以克隆我的存储库以轻松地遵循这些步骤,也可以改编TF模型存储库。
The work was done inside models/research/object_detection, where you will find the following folders and files:
该工作是在models / research / object_detection内部完成的,您将在其中找到以下文件夹和文件:
inference_from_model.py: This file will be executed to use the model to infer new images.
inference_from_model.py:将执行此文件以使用模型来推断新图像。
generate_xml.py and generate_tfrecord.py: Will both be used to create the train and test TF records used in the training of the object detection model (these scripts are adapted from raccoon dataset).
generate_xml.py 和 generate_tfrecord.py :将同时用于创建训练和测试在对象检测模型训练中使用的TF记录(这些脚本改编自浣熊数据集 )。
test_images and train_images folder: Have the JPG images and XML files that will be used.
test_images和train_images文件夹:具有将要使用的JPG图像和XML文件。
unlabeled_images and labeled_images folder: Contains respectively all images without labels and the images automatically labeled by the algorithm that will be divided into training and test folder to keep the proportion ratio.
unlabeled_images和labeled_images文件夹:包含分别无标签,并且图像通过算法自动标记将被分为训练和测试文件夹,以保持比重比所有图像。
Inside utils folder we also have some things:
在utils文件夹中,我们还有一些东西:
generate_xml.py: This script is responsible to get the model inference and generate a new XML that will be stored inside the labeled_images folder.
generate_xml.py :该脚本负责获取模型推断并生成一个新的XML,该XML将存储在labeled_images文件夹中。
visualization_utils.py: This file also has some modifications in the code to capture the model inference and pass to the “generateXml” class.
visualization_utils.py:此文件还对代码进行了一些修改,以捕获模型推断并将其传递给“ generateXml”类。
That’s it, this is all you need to have in your repository!
就是这样,这就是您存储库中所需的全部内容!
准备环境 (Preparing Environment)
To run this project you will need nothing!?
要运行此项目,您将不需要任何东西!
The training process is in a Google Colab Notebook, so it’s fast and simple to train your model, you will literally just need to replace my images by yours and choose another base model if you and.
训练过程在Google Colab Notebook中进行,因此训练模型既快速又简单,实际上,您只需要替换我的图像,然后选择其他基本模型即可。
Make a copy of the original Colab Notebook to your Google Drive and execute it.
将原始Colab笔记本的副本复制到您的Google云端硬盘并执行。
If you really want to run TSODA in your machine, at the beginning of the Jupiter notebook you’ll see the installation requirements, just follow it but don’t forget to also install TF 2.x. I recommend creating a virtual environment.
如果您真的想在计算机中运行TSODA,则在Jupiter笔记本电脑开始时,您会看到安装要求,只需遵循它,但不要忘记也安装TF2.x。 我建议创建一个虚拟环境。
了解代码 (Understanding the code)
The inference_from_model.py was responsible to load the saved_model.pb that was created in the training and use it to make new inferences in the unlabeled images. Most of the code was got from the object_detection_tutorial.ipynb found in the colab_tutorials folder.
inference_from_model.py负责加载在训练中创建的saved_model.pb ,并使用它在未标记的图像中进行新的推断。 大部分代码来自colab_tutorials文件夹中的object_detection_tutorial.ipynb 。
If you don’t want to use Colab for training you’ll need to replace the paths at the beginning of the file.
如果您不想使用Colab进行培训,则需要替换文件开头的路径。
Another important method in this file is the partition_data which is responsible to split the inferred images (that will be in the labeled_images folder) into training and test to keep the same ratio.
该文件中的另一个重要方法是partition_data ,它负责将推断的图像(将位于labeled_images文件夹中)分成训练和测试以保持相同的比率。
A change that you may want to do is in the split ratio, in my case, I chose an 80/20 proportion, but if you want something different, you can set it in the method parameter.
您可能要进行的更改是拆分比例,在我的情况下,我选择了80/20的比例,但是如果您想要不同的内容,可以在method参数中进行设置。
The visualization_utils.py is where the bounding boxes are drawn into the image, so we use this to get the boxes’ positions, class name, file name, and pass it into our XML generator. The following code shows the most of the process:
visualization_utils.py是将边框绘制到图像中的位置,因此我们使用它来获取边框的位置,类名,文件名,并将其传递到我们的XML生成器中。 以下代码显示了大部分过程:
The XML is generated if a box is detected into the image with a higher confidence level than specified.
如果在图像中以比指定的置信度高的置信度检测到一个框,则会生成XML。
All the information arrives in the generate_xml.py and the XML is created using ElementTree.
所有信息都到达generate_xml.py,并使用ElementTree创建XML。
Inside the code, there are comments that will help you to understand how everything is working.
在代码中,有一些注释可以帮助您了解所有工作方式。
结果 (Results)
To evaluate the model performance was used the mean Average Precision (mAP), if you have some doubt about how it works, check out this.
为了评估模型性能,使用了平均平均精度(mAP),如果您对模型的工作方式有疑问, 请查看 。
The first test was done training a model by 4,000 epochs, using all the images strongly labeled.
第一次测试是使用所有强烈标记的图像,以4,000个纪元训练模型。
The training took about twenty-one minutes and the results are shown in Table 1.
培训耗时约21分钟,结果如表1所示。
As expected, the model got a high mAP, mainly in a lower UoI rate.
不出所料,该模型的mAP很高,主要是在较低的UoI率上。
The second test was done using the same configurations but with TSODA considering just 100 labeled images. In each iteration, the model was trained by 1,000 epochs and then used to infer and create new labeled images. The results are shown in Figure 2.
使用相同的配置进行了第二次测试,但使用TSODA仅考虑了100张标记的图像。 在每次迭代中,模型经过1,000个时期的训练,然后用于推断和创建新的标记图像。 结果如图2所示。
The whole training process took thirty-eight minutes, about seventeen minutes more than the previous one, and the model reached a worse final mAP, as shown in Table 2:
整个训练过程花费了38分钟,比上一个过程多了17分钟,并且模型达到了更差的最终mAP,如表2所示:
As Table 3 reveals, most images were successfully annotated in the first iteration, being aggregated in the training. This could mean that the minimum confidence threshold isn’t high enough, as in the first thousand iterations the model doesn’t converge properly yet, possibly creating wrong annotations.
如表3所示,大多数图像在第一次迭代中均已成功注释,并在训练中进行了汇总。 这可能意味着最小置信度阈值不够高,因为在前一千次迭代中,模型尚未正确收敛,可能会创建错误的注释。
TSODA requires more time and epochs to improve model performance and get close to the original method. This happens because the addition of new images in the training set leads to a loss in mAP once the model needs to learn how to generalize new patterns as proved in figure 2, where the mAP decreases as new images are included before starting increasing again when model learns new features.
TSODA需要更多的时间和时间来改善模型性能并接近原始方法。 发生这种情况的原因是,一旦模型需要学习如何概括新模式,如图2所示,在训练集中添加新图像会导致mAP丢失,其中,当包含新图像时,mAP会减小,然后在模型开始再次增大之前学习新功能。
In Figure 3 there are some examples of images automatically annotated. Notably, some labels are not so well marked, but it’s enough to guarantee more information to the model.
在图3中,有一些自动注释图像的示例。 值得注意的是,有些标签的标记不是很好,但是足以保证为模型提供更多信息。
Some new experiments were performed considering a different epoch increment behavior as well as a higher confidence threshold. The result is present in Table 4:
考虑到不同的历元增量行为以及较高的置信度阈值,进行了一些新的实验。 结果示于表4:
Setting a confidence threshold to 90% ensures a higher chance of a correct label in predictions, being an important factor for model convergence. Although the training was done for 2,500 epochs in the initial iteration instead of just 1,000 once the first iteration is where most images are labeled, being necessary to the model learn more features and be able to beat the higher confidence. After the first iteration, the subsequent ones increment one 1,500 epochs until a limit of 8,500. These new configurations improved the final results.
将置信度阈值设置为90%可确保在预测中获得正确标签的机会更高,这是模型收敛的重要因素。 尽管训练是在初始迭代中进行2500个时期的训练,而不是仅在第一次迭代中标记了大多数图像的情况下才进行1,000个训练,但模型必须学习更多功能并击败更高的置信度。 在第一次迭代之后,随后的迭代增加一个1,500个历元,直到达到8,500个极限。 这些新配置改善了最终结果。
TSODA may perform differently based on the kind of object of interest and it’s complexity. The results could be improved if trained by more epochs or set a higher confidence threshold with the drawback to increasing the training time. Also, the epochs increment by iteration must change depending on the problem, to control the model convergence based on the number of unlabeled images and threshold.
根据感兴趣对象的种类及其复杂性,TSODA可能会执行不同的操作。 如果训练更多的时间段或设置较高的置信度阈值,则可能会改善结果,但会增加训练时间。 而且,迭代的历元增量必须根据问题而变化,以基于未标记图像的数量和阈值来控制模型收敛。
Nevertheless, this is a good alternative, once training time is cheaper than the manually labeling time that requires a human, and the TSODA was constructed in a manner that with just a few modifications it’s possible to train a completely new large-scale model from scratch.
尽管如此,这是一个很好的选择,一旦训练时间比需要人工标记的时间便宜,并且TSODA的构建方式只需进行少量修改就可以从头开始训练一个全新的大规模模型。 。
The auto-created labels could also be manually adjusted in some images, which can improve the overall performance and is faster than creating all the labels manually.
还可以在某些图像中手动调整自动创建的标签,这可以提高整体性能,并且比手动创建所有标签要快。
结论 (Conclusion)
The proposed TSODA can achieve satisfactory results in creating new labels to unlabeled images, reaching similar results to a strongly-labeled training approach, but with considerably less human effort. The solution also is adaptable for any other CNN detector architecture and is easy and fast to implement, helping the dataset creation process while measuring the overall object detector performance.
所提出的TSODA可以在创建未标记图像的新标签方面取得令人满意的结果,达到与强标签训练方法相似的结果,但是所需的人力却更少。 该解决方案还适用于任何其他CNN检测器体系结构,并且易于实现,可在测量整体对象检测器性能的同时帮助数据集创建过程。
翻译自: https://towardsdatascience.com/tensorflow-semi-supervised-object-detection-architecture-757b9c88f270
tensorflow架构