servlet与cgi_与CGI一起训练AI

最新推荐文章于 2024-01-15 14:38:54 发布

weixin_26756255

最新推荐文章于 2024-01-15 14:38:54 发布

阅读量175

点赞数

文章标签：深度学习

原文链接：https://towardsdatascience.com/training-ai-with-cgi-b2fb3ca43929

版权

servlet与cgi

In this article we go through how we trained a computer vision model (AI) to detect sub-components of a Raspberry PI using only synthetic data (CGI).

在本文中，我们将介绍如何训练计算机视觉模型(AI)以仅使用合成数据(CGI)来检测Raspberry PI的子组件。

Training with synthetic data is an increasingly popular way to quench the thirst of data hungry deep learning models. The datasets used for this project are available for free at app.zumolabs.ai [1]. We want to make using synthetic data easy for everyone and plan to release more datasets in the future.

使用合成数据进行训练是一种日益流行的方法，可以消除对数据渴望的深度学习模型的渴求。该项目使用的数据集可从app.zumolabs.ai [1]免费获得。我们希望使所有人都能轻松使用合成数据，并计划在将来发布更多数据集。

The Problem

问题

The Raspberry Pi is a single-board computer very popular with hobbyists. Our goal was to detect some of the sub-components that sit on the board: the pin connectors, the audio jack, and the ethernet port. Though this is a toy problem, it is not far from what you see in the real world — where automating component and defect detection using computer vision can improve the speed and reliability of manufacturing.

Raspberry Pi是一款单板计算机，在爱好者中非常受欢迎。我们的目标是检测板上的一些子组件：引脚连接器，音频插Kong和以太网端口。尽管这是一个玩具问题，但与您在现实世界中看到的情况相距不远，在现实世界中，使用计算机视觉自动进行组件和缺陷检测可以提高制造速度和可靠性。

The Data

数据

To generate synthetic data, we first need a 3D model of the object. Luckily, in today’s world, most objects already exist in the virtual world. Asset aggregation sites like SketchFab, TurboSquid, or Thangs, have commoditized 3D models [2]. Tip for the wise: if you can’t find a model on the internet, try contacting the manufacturer directly, or scan and model the object yourself.

为了生成合成数据，我们首先需要对象的3D模型。幸运的是，在当今世界中， 大多数对象已经存在于虚拟世界中 。诸如SketchFab，TurboSquid或Thangs之类的资产聚合站点已经将3D模型商品化[2]。明智的提示：如果您无法在互联网上找到模型，请尝试直接与制造商联系，或者自己扫描和建模对象。

Image for post — (Top) Synthetic images and (bottom) segmentation masks.

We then use a game engine (such as Unity or Unreal Engine) to take thousands of images of our 3D model from a variety of camera angles and under a variety of lighting conditions. Each image has a corresponding segmentation mask, which sections out the different components in the image. In future articles we will dive deeper into the process of creating synthetic images (so stay tuned!).

然后，我们使用游戏引擎(例如Unity或Unreal Engine)从各种摄像机角度和在各种光照条件下拍摄3D模型的数千张图像。每个图像都有一个对应的分割蒙版，该蒙版将图像中的不同成分分割开。在以后的文章中，我们将深入研究创建合成图像的过程(敬请期待！)。

So now that we have thousands of synthetic images, we should be good right? No! It is very important to test synthetically-trained models on real data to know whether the model is successfully generalizing to real data. There is a gap between simulation-produced data and real data known as the sim2real gap. One way to think about it is that deep learning models will overfit on the smallest of details, and if we aren’t careful many of these details may only exist in the synthetic data.

所以，既然我们有成千上万的合成图像，我们应该很好吧？没有！在真实数据上测试经过综合训练的模型，以了解该模型是否已成功推广到真实数据，这一点非常重要。在模拟生成的数据和真实数据之间存在一个间隙，称为sim2real间隙 。一种思考的方式是，深度学习模型将在最小的细节上过度拟合，并且如果我们不小心的话，其中许多细节可能仅存在于综合数据中。

For this problem, we manually annotated a small test dataset of a dozen real images. Manual annotation is time consuming and expensive. Important to note that if we were using real images for training we would have to manually annotate thousands of images instead of just a handful for testing! That’s, unfortunately, the current way of doing things, the status quo we are trying to change. Getting rid of this manual annotation process is a critical step in building better AI.

对于此问题，我们手动注释了包含十二个真实图像的小型测试数据集。手动注释既费时又昂贵。重要的是要注意，如果我们使用真实的图像进行训练，我们将不得不手动注释数千个图像，而不仅仅是进行少量的测试！不幸的是，这就是当前的工作方式，我们正在努力改变的现状。摆脱这种手动注释过程是构建更好的AI的关键步骤。

One way we can start closing the sim2real gap is through a technique known as domain randomization [3][4]. This strategy involves randomizing properties of the virtual RPi, especially the visual appearance of the backgrounds and the RPi itself. This has the downstream effect of making the model we train on this data more robust to variations in color and lighting. This is also known as the network’s ability to generalize.

我们可以开始缩小sim2real差距的一种方法是通过称为域随机化的技术[3] [4]。该策略涉及随机化虚拟RPi的属性，尤其是背景和RPi本身的视觉外观。这具有使我们在此数据上训练的模型对颜色和光照变化更鲁棒的下游影响。这也称为网络的概括能力。

The Model and Training

模型与训练

Now we get to the model. There are many different types of computer vision models to choose from. Models which leverage deep learning are the most popular right now. They work very well for detection tasks, such as this project. We used a model from PyTorch’s torchvision library based on the ResNet architecture [5]. Synthetic data will work with any model architecture, so feel free to experiment and find the one that best fits your use case.

现在我们进入模型。有许多不同类型的计算机视觉模型可供选择。利用深度学习的模型是当前最受欢迎的模型。它们非常适合执行检测任务，例如此项目。我们使用了基于ResNet架构[5]的PyTorch火炬视觉库中的模型。 合成数据可与任何模型体系结构一起使用，因此可以随时进行实验并找到最适合您的用例的模型 。

We trained our model with four different synthetic datasets to show how domain randomization and dataset size affect performance on our real test dataset:

我们用四个不同的合成数据集训练了模型，以显示域随机化和数据集大小如何影响实际测试数据集的性能：

Dataset A — 15 thousand realistic synthetic images.
数据集A — 15,000张逼真的合成图像。
Dataset B — 15 thousand domain randomized synthetic images.
数据集B — 15,000个域随机合成图像。
Dataset C — 6 thousand realistic synthetic images.
数据集C — 6000张逼真的合成图像。
Dataset D — 6 thousand domain randomized synthetic images.
数据集D — 6000个域随机合成图像。

We use mAP (mean average precision) to measure the performance of our computer vision model. It is important to note that performance metrics can be very arbitrary, so make sure to look at model predictions to make sure your model is performing as it should. As we predicted, the performance of the models increases with the more synthetic data we use. Deep learning models will almost always improve with larger datasets, but, more interestingly, training with a domain randomized synthetic dataset results in a significant performance boost over our real test dataset.

我们使用mAP(平均平均精度)来衡量我们的计算机视觉模型的性能。重要的是要注意性能指标可以是任意的，因此请确保查看模型预测以确保模型按预期运行。正如我们所预测的，模型的性能随着我们使用的更多综合数据而增加。深度学习模型几乎总是会随着更大的数据集而改善，但是，更有趣的是，使用领域随机综合数据集进行训练会大大提高我们的真实测试数据集的性能。

Conclusion

结论

TLDR: in this article, we trained a computer vision model to detect sub-components of a Raspberry PI using entirely synthetic data. We used the technique of domain randomization to improve the performance of our model on real images. And, ta-da! Our trained model works on real data despite it never having seen a single real image.

TLDR：在本文中，我们训练了计算机视觉模型，以使用完全合成的数据检测Raspberry PI的子组件。我们使用域随机化技术来提高模型在真实图像上的性能。而且，塔达！ 尽管从未见过单个真实图像，但我们训练有素的模型仍可对真实数据进行处理。

Thanks for reading and make sure to check out the datasets for yourself at app.zumolabs.ai! If you have any questions or are curious about synthetic data, send us an email at info@zumolabs.ai, we love to chat.

感谢您的阅读，并确保在app.zumolabs.ai上自己检查数据集！如果您有任何疑问或对综合数据感到好奇，请发送电子邮件至info@zumolabs.ai ，我们很喜欢聊天。

[1] Zumo Labs Data Portal. (app.zumolabs.ai)

[1] Zumo Labs数据门户。 (app.zumolabs.ai)

[2] 3D AssetSites: SketchFab (sketchfab.com), TurboSquid (turbosquid.com), Thangs (thangs.com).

[2] 3D AssetSites：SketchFab(sketchfab.com)，TurboSquid(turbosquid.com)，Thangs(thangs.com)。

[3] Lilian Weng. “Domain Randomization for Sim2Real Transfer”. (https://lilianweng.github.io/lil-log/2019/05/05/domain-randomization.html).

[3]翁丽莲。 “用于Sim2Real传输的域随机化”。 ( https://lilianweng.github.io/lil-log/2019/05/05/domain-randomization.html )。

[4] Josh Tobin, et al. “Domain randomization for transferring deep neural networks from simulation to the real world.” IROS, 2017. (https://arxiv.org/abs/1703.06907).

[4] Josh Tobin等。 “将深度神经网络从仿真转移到现实世界的域随机化。” IROS，2017年(https://arxiv.org/abs/1703.06907)。

[5] Torchvision on GitHub. (https://github.com/pytorch/vision).

[5] GitHub上的Torchvision。 (https://github.com/pytorch/vision)。

翻译自: https://towardsdatascience.com/training-ai-with-cgi-b2fb3ca43929

servlet与cgi

weixin_26756255

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
servlet与cgi_与CGI一起训练AI

servlet与cgiIn this article we go through how we trained a computer vision model (AI) to detect sub-components of a Raspberry PI using only synthetic data (CGI). 在本文中，我们将介绍如何训练计算机视觉模型(AI)以仅使用合成数据(CGI)来...
复制链接

扫一扫