可变自动编码器VA生成模型

The generative model is one of the interesting fields in machine learning where the network is trained to learn the data distribution which then can be used to generate new content instead of classifying data. The two approaches most commonly used for generative modeling are Generative Adversarial Network (GAN) and Variational Autoencoder (VAE). In this article, I will attempt to explain the intuition behind Variational Autoencoder (VAE) and how it can generate data like the faces above.

生成模型是机器学习中有趣的领域之一,在该领域中,训练网络学习数据分布,然后可以将其用于生成新内容而不是对数据进行分类。 生成建模最常用的两种方法是生成对抗网络(GAN)和变分自动编码器(VAE)。 在本文中,我将尝试解释可变自动编码器(VAE)背后的直觉以及它如何生成像上面的面Kong这样的数据。

自动编码器(AE) (Auto Encoder (AE))

Before going to Variational Autoencoder, we will first discuss Auto Encoder. Auto Encoder is a self-supervised neural network that learns how to encode the input into lower dimensions, then decode and reconstruct the data again to be as close as the input as efficiently as possible.

在进入变分自动编码器之前,我们将首先讨论自动编码器。 自动编码器是一个自我监督的神经网络,它学习如何将输入编码为较低的维度,然后再次解码和重建数据,使其尽可能接近输入。

Image for post
Source] 来源]

Autoencoder consists of 3 parts:

自动编码器包括3个部分:

  • Encoder, the layers that encode the input data into a lower dimension representation.

    编码器,将输入数据编码为较低维度表示的层。
  • Compressed, the layer that contains the encoded/compressed representation and the lowest dimension. Also known as the Bottleneck.

    压缩,包含编码/压缩表示和最低尺寸的图层。 也称为瓶颈。
  • Decoder, the layers that learn to decode or reconstruct back the encoded representation into the data as close as the input data.

    解码器,指的是学习解码或重建编码表示形式的层,使其与输入数据尽可能接近。

For Autoencoder to learn the best encoding and decoding, Autoencoder will aim to minimize the reconstruction error, which is basically the difference between the reconstructed data and the input.

为了使自动编码器学习最佳的编码和解码,自动编码器将旨在最大程度地减少重构误差,该误差基本上是重构数据与输入之间的差异。

Image for post
L2 (Squared) Reconstruction Loss
L2(平方)重建损失

Notice that we use L2 (squared) Reconstruction loss instead of L1. If you want to know the intuition behind choosing L2 instead of L1. You may want to read this article.

请注意,我们使用L2(平方)重建损失代替L1。 如果您想了解选择L2而不是L1的直觉。 您可能需要阅读这篇文章。

What Auto Encoder is used for?

什么是自动编码器?

You may be thinking that Auto Encoder will be used for compression, but surprisingly it is not very popular in the compression field as compression algorithms still perform better. Instead, these are some of the common applications of Auto Encoder:

您可能会认为自动编码器将用于压缩,但是令人惊讶的是,由于压缩算法的性能仍然更好,因此它在压缩领域不是很流行。 而是,这些是自动编码器的一些常见应用程序:

  • Denoising: To make Auto Encoder learns to denoise an image, we use a corrupted or noisy image as the input, then modify the reconstruction loss to minimize the difference between the reconstructed output and the original clean image instead of the corrupted input. The goal is for the encoder to only encodes the useful features, hence, random noise should be lost during the reconstruction.

    去噪:为了使自动编码器学会对图像进行去噪,我们使用损坏或嘈杂的图像作为输入,然后修改重建损失,以最大程度地减少重建输出和原始干净图像(而不是损坏的输入)之间的差异。 目标是使编码器仅对有用的特征进行编码,因此,在重建过程中应丢失随机噪声。

  • Dimensionality Reduction: By using an “under-complete” Auto Encoder where the output layer has fewer dimensions than the input, Auto Encoder able to represent the data in lower dimensions non-linearly in contrast to the PCA (Principal Component Analysis) method which is limited to linear transformations.

    降维:通过使用“欠完善”自动编码器,其中输出层的尺寸小于输入维,自动编码器能够非线性地以较低的维数表示数据,而PCA(主成分分析)方法是限于线性变换。

However, what if we want our autoencoder to generate new data instead of just giving a similar output as the input? We will discuss this in the next section.

但是,如果我们希望我们的自动编码器生成新数据而不是仅仅提供类似的输出作为输入呢? 我们将在下一节中对此进行讨论。

Generating New Data with Auto Encoder

使用自动编码器生成新数据

The idea behind generating new data using Autoencoder is by modifying the encoded data (latent vector), we should be able to get different data than the input. To simplify this, let’s imagine this scenario where you are trying to encode several images into 2d encoding like below.

使用自动编码器生成新数据的想法是通过修改编码数据(潜在矢量),我们应该能够获得与输入不同的数据。 为了简化此操作,让我们想象一下这种情况,您正在尝试将多个图像编码为2d编码,如下所示。

Image for post
Source: Joseph Rocca] 来源:Joseph Rocca ]

Now, to generate a new image, we can simply sample a point from the latent space above. For example, if we sample a point between the dog and bird, we may be able to get an image of a bird and dog hybrid or a new animal that half able to fly like a chicken.

现在,要生成新图像,我们可以简单地从上方的潜在空间中采样一个点。 例如,如果我们对狗和鸟之间的一个点进行采样,则我们可能会获得鸟和狗杂种或一半像鸡一样能飞的新动物的图像。

Image for post
[Source: Joseph Rocca] [资料来源:约瑟夫·罗卡]

However, the latent vectors (encoding) that the encoder generated tends to be irregular, unorganized, or uninterpretable as it only aims to reconstruct the input as similar as possible without any constraint on the latent space itself. Therefore it does not care how it encodes the data as long as it can reconstruct the input perfectly.

但是,编码器生成的潜在矢量(编码)往往是不规则的,无组织的或不可解释的,因为它仅旨在尽可能地重建输入,而对潜在空间本身没有任何限制。 因此,只要能够完美地重构输入,它就不在乎如何编码数据。

Image for post
[Source: Joseph Rocca] [来源:Joseph Rocca]

Due to the freedom given to the Auto Encoder model on how it encodes the latent vector, the latent space is likely to have many areas where the empty area will give a random/uninterpretable output as shown in the empty area on the figure. In contrast, we would like the latent space areas that have meaningful output to be continuous but separated like the right figure below which also allow easy interpolation between different attributes.

由于自动编码器模型在编码潜在矢量方面的自由度,潜在空间很可能具有许多区域,其中空白区域将提供随机/无法解释的输出,如图中的空白区域所示。 相反,我们希望具有有意义输出的潜在空间区域是连续的,但如下面的右图所示是分开的,这也使不同属性之间的插值变得容易。

Image for post
[Source: Joseph Rocca] [资料来源:约瑟夫·罗卡]

Hence, Variational Auto Encoder tries to solve this problem by adding regularizer to avoid overfitting and ensure that the latent space has good properties such as continuity that enable the generative process.

因此,变分自动编码器试图通过添加正则化函数来解决此问题,以避免过度拟合,并确保潜在空间具有良好的属性,例如可进行生成过程的连续性。

可变自动编码器(VAE) (Variational Auto Encoder (VAE))

Variational Auto Encoder is able to generate new data by regularizing the latent space to be continuous like below, hence, allowing enabling smooth interpolation between different attribute and also removing gaps where it might return unrealistic output.

可变自动编码器能够通过将潜在空间调整为连续的形式来生成新数据,如下所示,因此,允许在不同属性之间进行平滑插值,并消除可能返回不真实输出的间隙。

Image for post
[Source] [来源]

But how does VAE optimize the model to be like this?

但是VAE如何将模型优化为这样?

Image for post
[Source: Joseph Rocca] [资料来源:约瑟夫·罗卡]

Variational autoencoder encodes the latent attributes of the input in a probabilistic manner (distribution) instead of a deterministic manner (single value) like vanilla autoencoder.

可变自动编码器以概率方式(分布)而不是像香草自动编码器那样以确定性方式(单值)对输入的潜在属性进行编码。

Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)]

Imagine the example above where an autoencoder encodes the image into a latent attribute that represents the smile in the photo (Note that in real training, we won’t know what each attribute actually represents). A vanilla autoencoder will give a single value for the latent attribute, but a variational autoencoder instead store the latent attribute as a probability distribution of the attribute like in the right figure above.

想象一下上面的示例,其中一个自动编码器将图像编码成一个潜在的属性,该属性表示照片中的笑容(请注意,在实际训练中,我们将不知道每个属性实际代表什么)。 原始自动编码器将为潜在属性提供一个,但可变自动编码器将潜在属性存储为该属性的概率分布,如右图所示。

Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)
Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)]

Now, as we have the probability distribution of each attribute, we can simply sample any value from the distribution to generate a new output.

现在,有了每个属性的概率分布,我们可以简单地从分布中采样任何值以生成新的输出。

How to Store a Distribution?

如何存储分布?

The question that first popped to my mind when I learned that VAE stores the latent variable as a probability distribution was how to actually store a distribution.

当我得知VAE将潜在变量存储为概率分布时,我首先想到的问题是如何实际存储分布。

One important assumption that we make that simplifies this process is that we assume the latent distribution is always a Gaussian Distribution. Gaussian distribution can be easily described with two values, the mean and the variance or standard deviation (You can calculate the standard deviation from the variance.

我们做出的一个简化此过程的重要假设是,我们假设潜分布始终是高斯分布。 高斯分布可以很容易地用平均值和方差或标准差两个值来描述(您可以根据方差计算标准差。

Image for post
[Source] [来源]

Now, our encoder will output the mean and variance for each latent dimension that we want and sample z from the distribution to generate the new data.

现在,我们的编码器将输出我们想要的每个潜在维的均值和方差,并从分布中采样z以生成新数据。

Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)]

Mathematical Details

数学细节

Now we will go deeper into the implementation of the VAE. We will denote x as our input data and z as the latent variable (encoded representation). In vanilla autoencoder, the encoder will transform our input x to the latent variables z and the decoder will transform z into the reconstructed output. However, in variational autoencoder, the encoder will instead transform x into probability distribution of the latent variable p(z|x) which then will be randomly sampled for latent variable z and decoded into the reconstructed output by the decoder.

现在,我们将更深入地执行VAE。 我们将表示x 作为我们的输入数据, z作为潜在变量(编码表示)。 在普通自动编码器中,编码器会将输入x转换为潜在变量z ,而解码器会将z转换为重构的输出。 但是,在变分自动编码器中,编码器会将x转换为潜变量p(z | x)的概率分布,然后将其随机采样为潜变量z,然后由解码器解码为重构输出。

Image for post
[Source: Joseph Rocca] [资料来源:约瑟夫·罗卡]

To compute latent distribution p(z|x), we can use the Bayesian formula to get

要计算潜在分布p(z | x),我们可以使用贝叶斯公式得出

Image for post

Where

哪里

Image for post

Unfortunately, computing p(x) is hard and it is usually an intractable distribution which means it cannot be expressed in closed form and the problem cannot be solved in a polynomial-time algorithm.

不幸的是,计算p(x)很困难,并且通常是难解的分布,这意味着它不能以闭合形式表示,并且无法在多项式时间算法中解决问题。

Therefore, we will use the variational inference method to approximate the distribution instead. Basically, we will select some other distribution q that is tractable to approximate the distribution p.To do this we want the parameters of q(z|x) to be very similar to p(z|x). For a deeper explanation of variational inference, you may want to read this article by Jonathan Hui.

因此,我们将使用变分推断方法来近似分布。 基本上,我们会选择一些其他分布Q是容易处理的近似分布P。要做到这一点,我们希望q的参数(Z | X)非常相似,P(Z | X)。 有关变分推理的更深入说明,您可能需要阅读Jonathan Hui的这篇文章。

Kullback Leibler Divergence (KL-Divergence)

Kullback Leibler散度(KL-散度)

To make q(z|x) to be similar to p(z|x), we weed to minimize and calculate the difference between the two distributions using Kullback Leibler divergence (KL-divergence). KL-divergence is a measure of the difference between two distribution. The easiest way to understand KL-divergence is by visualizing the graph below.

为了使q(z | x)与p(z | x)相似,我们使用Kullback Leibler散度(KL-散度)来最小化并计算两个分布之间的差异。 KL散度是两个分布之间差异的度量。 了解KL散度的最简单方法是可视化下图。

Image for post
[Source: Jonathan Hui] [资料来源:乔纳森·惠]

From the graph, you can see that the KL-divergence is 0 at the intersection where the distribution is the same. Hence, by minimizing the KL-divergence, we are making the two distribution to be as similar as possible.

从图中可以看出,在相交相同的交叉点,KL散度为0。 因此,通过最小化KL散度,我们使两个分布尽可能相似。

Finally, we can define our loss function to be the following.

最后,我们可以将损失函数定义如下。

Image for post

The first term is the reconstruction loss which is just the difference between the reconstructed output with the input usually using MSE (Mean Squared Error). The second term is the KL-divergence between the true distribution p(z) with our chosen distribution q(z|x) where q will usually be a normal distribution with zero mean and unit variance, N(0,1). The distribution q(z|x) will be encouraged to learn the true distribution p(z) during the training.

第一项是重建损失,它只是重建输出与输入之间通常使用MSE(均方误差)的差。 第二项是真实分布p(z)与我们选择的分布q(z | x)之间的KL散度,其中q通常是零均值和单位方差N(0,1)的正态分布。 在训练过程中,将鼓励分布q(z | x)学习真实分布p(z)

Why Use Both the Reconstruction Loss and KL-Divergence?

为什么同时使用重建损失和KL散度?

After all the talk about KL-divergence, why do we still use the reconstruction loss inside the overall loss function? To understand the intuition behind the loss function and the effects of both the reconstruction loss and the KL divergence to the latent space. Let’s look at the diagram below.

在讨论了KL散度之后,为什么我们仍然在整体损失函数中使用重建损失? 了解损失函数的直觉以及重建损失和KL向潜在空间的散度的影响。 让我们看下面的图。

Image for post
Image credit (modified) 图片信用(已修改)

The problem with just autoencoder that only use reconstruction loss previously is that the latent space will have gaps that do not really represent any meaningful data. Hence, the reason why variational autoencoder uses distribution instead and minimize the difference with KL-divergence. However, if we only focus on mimicking the prior distribution with our KL-divergence loss term, we will be describing each unit as unit normal distribution instead and fail to describe the original data.

仅以前仅使用重建损失的自动编码器的问题在于,潜在空间将具有实际上无法表示任何有意义数据的间隙。 因此,变分式自动编码器改用分布并使用KL散度最小化差异的原因。 但是,如果我们仅关注用KL-散度损失项来模仿先前的分布,则我们会将每个单位描述为单位正态分布,而不会描述原始数据。

Therefore, by using a combination of the two, we will have a balance of having a latent representation that is close to the prior distribution but still describing certain features of the input.

因此,通过使用两者的组合,我们将具有一个隐式表示的平衡,该表示接近于先验分布,但仍然描述了输入的某些特征。

Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)]

Reparameterization Trick

重新参数化技巧

One problem that you may face when implementing a variational autoencoder would be implementing the sampling process. When training the model, backpropagation cannot be done due to random sampling. However, we can use a trick called “reparameterization” by separating the random process to allow the backpropagation to happen.

在实现可变自动编码器时,您可能会遇到的一个问题是实现采样过程。 训练模型时,由于随机抽样,无法进行反向传播。 但是,我们可以使用一种称为“重新参数化”的技巧,通过分离随机过程来进行反向传播。

The “reparameterization trick” is an idea where we randomly sample ε from a unit normal distribution, multiply it with the variance and shift it by the mean like the figure below.

“重新参数化技巧”是一个想法,我们从单位正态分布中随机采样ε,将其乘以方差,然后将其移动平均值,如下图所示。

Image for post
[Source: Jeremy Jordan] [来源:杰里米·乔丹(Jeremy Jordan)]

最近关于VAE的有影响的工作 (Recent Impactful Work on VAE)

Although a variational autoencoder network is able to generate new content, the outputs tend to be blurry. Generative Adversarial Network (GAN), another approach in building generative models, has been more popular due to its ability in producing sharper images although it can be quite unstable during training.

尽管可变自动编码器网络能够生成新内容,但输出趋于模糊。 生成对抗网络(GAN)是构建生成模型的另一种方法,由于它可以生成更清晰的图像,因此尽管在训练过程中可能会非常不稳定,但已变得越来越流行。

However, recent papers from NVIDIA that was published this year, NVAE: A Deep Hierarchical Variational Autoencoder, introduced a new architecture designed for VAE and managed to produce high-quality faces using CelebA HQ.

但是,今年NVIDIA的最新论文《 NVAE:一种深层次的变化型自动编码器介绍了一种针对VAE设计的新架构,并使用CelebA HQ设法产生了高质量的面Kong。

Image for post
[Source] [来源]

Additionally, there is also an idea of combining autoencoder with GAN such as VAE-GAN and AAE. Adversarial Autoencoder (AAE) is an approach similar to VAE but replaced the KL-divergence loss with an adversarial loss instead and have been used for certain purposes such as anomaly detection. In conclusion, VAE is still worth researching and can be very suitable in certain use cases.

另外,还有一种将自动编码器与GAN结合在一起的想法,例如VAE-GAN和AAE。 对抗自动编码器(AAE)是一种类似于VAE的方法,但用对抗性损耗代替了KL发散损耗,并已用于某些目的,例如异常检测。 总之,VAE仍然值得研究,并且在某些用例中可能非常适合。

致谢 (Acknowledgment)

I’d like to give a huge thanks to Joseph Rocca and Jeremy Jordan for their great articles in explaining variational autoencoder intuitively. Their visual aids have been very useful in helping me to understand and visualize the concepts. These are the types of visual aids and clarity that I hope I’m able to make in the future. Therefore, I really recommend you to read their articles for further understanding.

我要非常感谢Joseph Rocca和Jeremy Jordan的出色文章,他们直观地介绍了变分自动编码器。 他们的视觉辅助对于帮助我理解和可视化概念非常有用。 这些是我希望将来能够实现的视觉辅助和清晰度的类型。 因此,我真的建议您阅读他们的文章以进一步了解。

翻译自: https://medium.com/vitrox-publication/generative-modeling-with-variational-auto-encoder-vae-fc449be9890e

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值