照片去游客 算法_从互联网上的游客公开照片中重建真实感的场景

照片去游客 算法

计算机视觉(Computer Vision)

Using tourists' public photos from the internet, they were able to reconstruct multiple viewpoints of a scene conserving the realistic shadows and lighting!This is a huge advancement of the state-of-the-art techniques for photorealistic scene rendering and their results are simply amazing.Let’s see how they achieved that and some more examples.

利用游客从互联网上获得的公开照片,他们能够重构场景的多个视点,从而保留逼真的阴影和照明!这是对逼真的场景渲染的最新技术的巨大进步,其结果很简单让我们看看他们如何实现这一目标以及更多示例。

论文介绍 (Paper introduction)

Image for post
https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html

Researchers at Cornell University introduced a new way to use online public photos taken by tourists to construct a continuous set of light fields and to synthesize novel views capturing all-times-of-day scene appearance. The complexity behind this task is that all the pictures are taken at different times of the day, different seasons, and different orientations. In order to answer this problem, they introduced DeepMPI. Which is a new multi-plane image representation that does exactly what they needed? Their method is completely unsupervised, needing zero information other than the photo itself from the internet, and allows the real-time synthesis of photorealistic views that are continuous in both space and lighting.

康奈尔大学的研究人员介绍了一种新方法,可以使用游客拍摄的在线公共照片构建连续的光场,并合成捕捉全天场景外观的新颖视图。 这项任务的复杂性在于,所有图片都是在一天的不同时间,不同的季节和不同的方位拍摄的。 为了回答这个问题,他们引入了DeepMPI。 哪种新的多平面图像表示完全可以满足他们的需求? 他们的方法是完全不受监督的,除了来自互联网的照片本身之外,还需要零信息,并且可以实时合成在空间和照明中连续的真实感视图。

You can see how much better their results are compared with the previous state-of-the-art models.

您可以看到它们的结果与以前的最新模型相比要好得多。

Image for post
https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html

Now that we’ve covered what they’ve done and why it is so impressive, let’s see how they’ve achieved that and some more results.

既然我们已经介绍了他们所做的工作以及为什么如此令人印象深刻,让我们看看他们是如何实现的以及更多的结果。

In short, they synthesize arbitrary views of a scene with continuous viewing conditions, such as lighting, by using pictures from the internet of multiple lighting and angle sources.It takes unstructured Internet pictures of a specific place and learns how to reconstruct a representation of the light field that respects the real-world shadow physics.

简而言之,他们通过使用来自多个照明和角度源的互联网上的图片,在具有连续观看条件(例如照明)的情况下合成场景的任意视图,它可以拍摄特定位置的非结构化互联网图片,并学习如何重建场景的表示尊重现实世界阴影物理的光场。

As you just saw, the previous work’s light fields are inconsistent through the scene, which is the greatest contribution of the paper.

如您所见,前一作品的光场在整个场景中是不一致的,这是本文的最大贡献。

他们是如何实现的? (How have they achieved that?)

This is done with a two-stage model’s architecture.

这是通过两阶段模型的体系结构完成的。

Image for post
https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html

At first, they use their new DeepMPI representation. They start by reprojecting every image to the reference viewpoint and averaging all these reprojected images at each depth plane, thus creating a mean RGB plane sweep volume (PSV), which is a set of views warped with disparities in a given range. Since this mean RGB PSV cannot accurately model a scene content that is obstructed in a reference view, they introduced this second phase of their network.

首先,他们使用其新的DeepMPI表示形式。 他们首先将每个图像重新投影到参考视点,然后在每个深度平面上对所有这些重新投影的图像进行平均,从而创建一个平均RGB平面扫描体积(PSV),该图像是一组在给定范围内存在视差的视图。 由于这意味着RGB PSV无法准确建模参考视图中被遮挡的场景内容,因此他们引入了网络的第二阶段。

This second part optimizes the latent features in their DeepMPI representation using an encoder and a rendering network. It is able to capture and re-render time-varying appearance.

第二部分使用编码器和渲染网络优化DeepMPI表示中的潜在功能。 它能够捕获并重新渲染随时间变化的外观。

The encoder’s role is to produce an “appearance vector” from an exemplar image and an auxiliary deep buffer, containing semantic and depth information of the scene. The deep buffer allows the encoder to learn complex appearance by aligning the illumination information in the exemplar image using the scene intrinsic properties encoded in the DeepMPI representation. Without this alignment, the results would be as inconsistent as the previous work we’ve seen. This aligned deep buffer is the main reason for the realistic shadows and lighting in the rendered scenes.

编码器的作用是从示例图像和辅助深度缓冲区生成“外观向量”,其中包含场景的语义和深度信息。 深缓冲区允许编码器使用DeepMPI表示中编码的场景固有属性通过对齐示例图像中的照明信息来学习复杂的外观。 没有这种一致性,结果将与我们之前看到的结果不一致。 对齐的深缓冲区是渲染场景中逼真的阴影和光照的主要原因。

Image for post
https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html https://research.cs.cornell.edu/crowdplenoptic/templates/comparison_i2i.html

Then, the rendering network, represented by G in this model’s architecture takes both the DeepMPI projected to a specific target viewpoint and its appearance vector, produced from the encoder, and predicts the corresponding RGB color layers.

然后,在此模型的体系结构中,以G表示的渲染网络将投影到特定目标视点的DeepMPI及其从编码器生成的外观向量都提取出来,并预测相应的RGB颜色层。

Image for post
https://arxiv.org/abs/1703.06868 https://arxiv.org/abs/1703.06868

This rendering network is a variant of a U-Net architecture with an encoder-decoder architecture called AdaIN, used for style transfer applications. This model produces a natural scene appearance while stabilizing the training.Preserving the color and style of the exemplar images.I linked the AdaIN’s architecture paper at the end of this article for more information.

此渲染网络是U-Net体系结构的一种变体,具有称为AdaIN的编码器-解码器体系结构,用于样式转换应用程序。 该模型在稳定训练的同时产生自然的场景外观。保留示例图像的颜色和样式。我在本文末尾链接了AdaIN的建筑论文,以获取更多信息。

结果 (Results)

In short, given a specific exemplar photo, they were able to synthesize novel views close to the reference viewpoint, while preserving the exemplar’s appearance. It is mind-blowingly accurate, just take a minute to see the results they got with multiple lighting in this short video:

简而言之,给定一张特定的示例照片,他们能够在保持示例外观的同时,合成接近参考视点的新颖视图。 这真是令人难以置信的准确,只需花一分钟时间,在这个简短的视频中看到他们在多种照明下获得的结果:

The project website is linked below, with the code and dataset coming soon as per the authors.

该项目的网站在下面链接,作者根据代码和数据集即将发布。

Of course, this was a simple overview of this new technique.I strongly recommend reading the paper linked below for more information.

当然,这只是这项新技术的简单概述。我强烈建议阅读下面链接的文章以获取更多信息。

Project page, code and paper: https://research.cs.cornell.edu/crowdplenoptic/AdaIN’s architecture paper: https://arxiv.org/abs/1703.06868

项目页面,代码和论文https: //research.cs.cornell.edu/crowdplenoptic/ AdaIN的架构论文https : //arxiv.org/abs/1703.06868

If you like my work and want to support me, I’d greatly appreciate it if you follow me on my social media channels:

如果您喜欢我的工作并希望支持我,那么如果您在我的社交媒体频道上关注我,我将不胜感激:

  • The best way to support me is by following me on Medium.

    支持我的最佳方法是在Medium上关注我。

  • Subscribe to my YouTube channel.

    订阅我的YouTube频道

  • Follow my projects on LinkedIn

    LinkedIn上关注我的项目

  • Learn AI together, join our Discord community, share your projects, papers, best courses, find kaggle teammates and much more!

    一起学习AI,加入我们的Discord社区共享您的项目,论文,最佳课程,寻找kaggle队友等等!

翻译自: https://medium.com/towards-artificial-intelligence/reconstruct-photorealistic-scenes-from-tourists-public-photos-on-the-internet-bb9ad39c96f3

照片去游客 算法

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值