计算机视觉 图像合成_合成数据如何促进计算机视觉

计算机视觉 图像合成

This article originally appeared in Hacker Noon.

本文最初发表于 Hacker Noon

Signup for our newsletter here

在此处注册我们的新闻通讯

In the spring of 1993, a Harvard statistics professor named Donald Rubin sat down to write a paper. Rubin’s paper would go on to change the way that artificial intelligence is researched and practiced, but its stated goal was more modest: analyze data from the 1990 U.S. census, while preserving the anonymity of its respondents.

1993年Spring,哈佛大学统计学教授唐纳德·鲁宾(Donald Rubin)坐下来写论文。 鲁宾的论文将继续改变研究和实践人工智能的方式,但鲁宾的既定目标更为谦虚:分析1990年美国人口普查的数据,同时保留受访者的匿名性。

It wasn’t feasible to simply anonymize the data, because individuals could still be identified by their home address, phone number, or social security number, all of which was crucial to the analyses that Rubin’s colleagues wanted to perform. To solve the problem, Rubin generated a set of anonymized census responses whose population statistics mirrored those of the original data set. This way, Rubin’s colleagues could draw valid statistical inferences about the complexion of the United States without compromising the identity of its citizens.

仅对数据进行匿名化是不可行的,因为仍然可以通过其家庭住址,电话号码或社会安全号码来识别个人,所有这些对于鲁宾的同事想要进行的分析都是至关重要的。 为了解决该问题,鲁宾生成了一组匿名的人口普查响应,其人口统计反映了原始数据集的人口统计。 这样,鲁宾的同事们就可以在不损害美国公民身份的情况下得出关于美国肤色的有效统计推断。

Rubin’s solution was original. He had produced synthetic data, and in doing so contributed the term to our academic vocabulary. His approach was popularized by statisticians, economists, and medical researchers.

鲁宾的解决方案是原始的。 他已经产生了综合数据,并以此为我们的学术词汇量创造了条件。 他的方法受到统计学家,经济学家和医学研究人员的欢迎。

Synthetic Data in Machine Learning

机器学习中的合成数据

Decades later, synthetic data found a new use as an accelerant to machine learning. Machine learning systems are predictive, and most require data — the more, the better.

几十年后,合成数据发现了一种新的用途,可以促进机器学习。 机器学习系统是可预测的,并且大多数都需要数据-越多越好。

For example, the accuracy of a supervised machine learning model that predicts election outcomes will improve with more data. But elections are infrequent events, which means the data-derived predictive power of such a model is limited (changes to the model’s architecture could yield small performance improvements but would be dwarfed by the impact of doubling its training dataset).

例如,预测选举结果的监督式机器学习模型的准确性将随着更多数据而提高。 但是选举是很少发生的事件,这意味着此类模型的数据衍生预测能力有限(对模型体系结构的更改可能会产生较小的性能改进,但与将其训练数据集加倍的影响相形见))。

To achieve more predictive power, the model needs more data. It must also be able to account for changes to the mechanisms that determine election results, so that valid inferences about the relationship between the two can be made.

为了获得更大的预测能力,该模型需要更多数据。 它还必须能够解释决定选举结果的机制的变化,以便可以对两者之间的关系做出有效的推断。

Generating synthetic data whose properties enable valid inference was the original purpose of Rubin’s work. Inspired by it, researchers at Caltech and UC Irvine created synthetic electoral data that might have been recorded at the ballot box, but was not.

生成其属性能够进行有效推断的合成数据是鲁宾工作的初衷。 受此启发,加州理工学院和加州大学欧文分校的研究人员创建了综合选举数据,这些数据可能已记录在投票箱中,但没有。

In that study, synthetic data was used to overcome data scarcity, but data privacy is another grave concern. Industries that traffic in highly sensitive personal information, like healthcare, are zealous advocates of synthetic data because regulations often preclude their data scientists from working with real patient records.

在该研究中,使用合成数据来克服数据稀缺性,但是数据隐私是另一个严重的问题。 诸如医疗保健等处理高度敏感的个人信息的行业热衷于合成数据,因为法规通常会阻止其数据科学家处理真实的患者记录。

Privacy and scarcity are important data access problems, and solving them makes models more performant. But in a different corner of the machine learning community, synthetic data is being used to give models new capabilities — the ability to see things they otherwise wouldn’t, and to make novel predictions.

隐私和稀缺性是重要的数据访问问题,解决这些问题使模型的性能更高。 但是在机器学习社区的另一个角落,合成数据被用于赋予模型新的功能-能够查看原本不会看到的东西并做出新颖的预测。

Synthetic Images

合成图像

The subset of machine learning that processes images is called computer vision. Like models for predicting elections, most computer vision models improve with data.

处理图像的机器学习子集称为计算机视觉。 像预测选举的模型一样,大多数计算机视觉模型都随数据而改进。

The dominant approach to data acquisition in computer vision relies on humans sitting in a room, labelling images according to their contents. This is a crucial but labor intensive process (a now famous collection of photos called ImageNet was hand-annotated nearly 14 million times).

在计算机视觉中,数据采集的主要方法依赖于坐在房间里的人,并根据其内容对图像进行标记。 这是一个关键但劳动密集的过程(现在著名的影像集ImageNet被手工标注了近1400万次)。

Labels are important because they are the method by which we encode our semantic understanding of the world into a computer. For example, the people sitting in that room, labelling images, might be annotating photos as Cat or Dog, to show a computer how to recognize the difference. But labels need not be constrained to things that are discernible by the human eye.

标签之所以重要,是因为标签是我们将对世界的语义理解编码到计算机中的方法。 例如,坐在那个房间里的人们在给图像加标签时,可能会将照片注释为“猫”或“狗”,以向计算机展示如何识别差异。 但是标签不必局限于人眼可以识别的事物。

Synthetic images created by computers can contain labels whose dimensions cannot even be reliably quantified by humans — parameters like depth, or transparency.

由计算机创建的合成图像可能包含标签,这些标签的尺寸甚至不能被人类可靠地量化-诸如深度或透明度之类的参数。

Imagine trying to measure the relative depth of thousands of individual plastic bottles in an image. Now measure their transparency, and the angles at which they reflect light. The task is impossible for a human, but photos with these attributes broaden the inference possibilities for a computer vision model.

想象一下如何尝试测量图像中成千上万个单个塑料瓶的相对深度。 现在测量它们的透明度,以及它们反射光的角度。 对于人类而言,这项任务是不可能的,但是具有这些属性的照片扩大了计算机视觉模型的推断可能性。

In the retail and waste management industries for example, robots can pick stock and recycle plastic bottles with far greater dexterity when they are trained on synthetic datasets that include depth and transparency labels (researchers proved this earlier this year). Using synthetic data, the robots became more intelligent.

例如,在零售和废物管理行业中,当机器人在包括深度和透明度标签的合成数据集上接受训练时,机器人可以拣选存货并回收具有更大灵活性的塑料瓶(研究人员今年早些时候证明了这一点)。 使用合成数据,机器人变得更加智能。

A Contrarian Bet

逆势下注

It’s true that computers have been generating images for decades, but doing so photorealistically, with aesthetic diversity, and at scale, is very difficult. Generative Adversarial Networks, or GANs, are a sophisticated solution. They create information procedurally, which means they can provide infinite variation in images, yet require no more human guidance than that of other deep learning models. For those who know how to use them, GANs have enabled an advantaged data supply chain.

的确,计算机已经产生了数十年的图像,但是要做到逼真的,具有美学多样性并且要大规模地进行图像生成是非常困难的。 生成对抗网络(GAN)是一种复杂的解决方案。 它们以程序方式创建信息,这意味着它们可以提供图像的无限变化,但与其他深度学习模型相比,不需要更多的人工指导。 对于那些知道如何使用它们的人,GAN启用了一条有利的数据供应链。

Still, synthetic data is, for now, a contrarian bet because the conventional wisdom assumes that models trained with human-labelled images are more performant than those trained with synthetic images.

到目前为止,合成数据仍然是一个逆势赌注,因为传统观点认为,使用人工标记图像训练的模型比使用合成图像训练的模型具有更高的性能。

But evidence from the academic community suggests that the conventional wisdom is wrong. In many cases, models that are trained on, or augmented with synthetic data are more performant than models trained on real-world data, and they can perceive things that other models cannot.

但是,来自学术界的证据表明,传统观念是错误的。 在许多情况下,在合成数据上训练或增强的模型比在实际数据上训练的模型更有效,并且它们可以感知其他模型无法做到的事情。

This is already evident in the autonomous vehicles industry, where real-world uncertainty and dynamism has created unprecedented demand for synthetic data. Uber, Tesla, Waymo, and Zoox won’t put cars on the road unless they are safe, but how can they anticipate every driving scenario that might occur?

这在自动驾驶汽车行业已经很明显,在该行业中,现实世界中的不确定性和动态性对合成数据产生了前所未有的需求。 除非安全,否则Uber,Tesla,Waymo和Zoox不会在路上放汽车,但是它们如何预测可能发生的每种驾驶情况?

Capturing millions of hours of rainy, nighttime, mountainous driving scenarios with a real driver in a real car is impractical. It would take too long and put people in unnecessary danger. A better solution is an image generation pipeline that can provide unlimited scenic diversity. It’s likely that all major autonomous vehicle companies have incorporated synthetic data into their computer vision systems.

在真正的汽车中让真正的驾驶员捕捉数百万小时的下雨,夜间和山区驾驶场景是不切实际的。 这将花费太长时间,并使人们处于不必要的危险中。 更好的解决方案是可以提供无限风景多样性的图像生成管道。 所有主要的自动驾驶汽车公司都有可能将合成数据纳入其计算机视觉系统。

Another argument for synthetic images is economic. Like other digital goods, its marginal cost of production is near-zero. As long as the alternative is humans labelling images, synthetic data will be cheaper, or so the argument goes.

合成图像的另一个论点是经济。 与其他数字商品一样,其边际生产成本几乎为零。 只要是人类为图像加标签的替代方法,合成数据将更便宜,或者这样的说法就可以了。

In reality, the unit economics are more complicated. Many high-value use cases require custom 3D assets, which must be purchased, or drawn by CGI artists using animation software. Competitive advantage amongst the first wave of synthetic data startups may come in their ability to spread the fixed cost of such artists. The returns to doing so would be large, but also require a consistent, recurring use case across customers, which doesn’t yet exist in all synthetic image markets.

实际上,单位经济学更加复杂。 许多高价值的用例需要自定义3D资产,必须购买这些资产或由CGI艺术家使用动画软件绘制。 在第一批合成数据初创企业中,竞争优势可能在于其摊派此类艺术家固定成本的能力。 这样做的回报很大,但是还需要在客户之间有一个一致的,重复出现的用例,而这在所有合成图像市场中还不存在。

As it becomes more dramatic, the performance advantage of synthetic data will be appreciated outside of the academic and startup community. Google, Amazon, and Microsoft will incorporate synthetic data into their end-to-end machine learning pipelines. They may choose to expose the actual data synthesis tools to users, but will more likely monitor inference and automatically retrain models in response to confounding data (this will not be limited to computer vision).

随着戏剧性的发展,综合数据的性能优势将在学术界和初创企业界之外得到认可。 谷歌,亚马逊和微软将把合成数据整合到他们的端到端机器学习管道中。 他们可能选择向用户展示实际的数据综合工具,但将更有可能监视推理并响应于混淆的数据自动重新训练模型(这不仅限于计算机视觉)。

Still, startups that can identify differentiated and high-value use cases and build predictable revenue streams around them will enjoy enviable market positions.

尽管如此,能够识别差异化和高价值用例并围绕它们建立可预测的收入流的初创公司将享有令人羡慕的市场地位。

For startups and incumbents alike, one thing is clear: Society’s performance expectations of machine learning systems are rising, and synthetic data is being used to meet them.

对于初创企业和老牌企业来说,有一件事很清楚:社会对机器学习系统的性能期望正在提高,并且使用合成数据来满足它们。

Sign up for our newsletter to be alerted when new pieces are live.

注册我们的时事通讯 ,以便在有新作品发布时得到通知。

翻译自: https://medium.com/zetta-venture-partners/how-synthetic-data-is-accelerating-computer-vision-d21556a0d8af

计算机视觉 图像合成

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值