20190604_vector of ones-CSDN博客

本文链接：https://blog.csdn.net/yj13811596648/article/details/90769639

TF-调整矩阵维度 tf.reshape 介绍
函数原型为
def reshape(tensor, shape, name=None)
第1个参数为被调整维度的张量。
第2个参数为要调整为的形状。
返回一个shape形状的新tensor
注意shape里最多有一个维度的值可以填写为-1，表示自动计算此维度。

自编码是啥意思？
如果你了解 PCA 主成分分析, 再提取主要特征时, 自编码和它一样,甚至超越了 PCA. 换句话说, 自编码可以像 PCA 一样给特征属性降维.

至于解码器 Decoder, 我们也能那它来做点事情. 我们知道, 解码器在训练的时候是要将精髓信息解压成原始信息, 那么这就提供了一个解压器的作用, 甚至我们可以认为是一个生成器 (类似于GAN). 那做这件事的一种特殊自编码叫做 variational autoencoders（变分自解码器）, 你能在这里找到他的具体说明.
http://kvfrans.com/variational-autoencoders-explained/

In my previous post about generative adversarial networks, I went over a simple method to training a network that could generate realistic-looking images.
在我之前的一篇关于生成对抗性网络的文章中，我介绍了一种简单的方法来训练一个能够生成逼真图像的网络。
However, there were a couple of downsides to using a plain GAN.
然而，使用普通GAN也有一些缺点。
First, the images are generated off some arbitrary noise.
首先，图像是由一些任意的噪声产生的。
If you wanted to generate a picture with specific features, there’s no way of determining which initial noise values would produce that picture, other than searching over the entire distribution.
如果您想生成具有特定特性的图片，除了搜索整个分布之外，没有其他方法可以确定哪个初始噪声值将生成该图片。
Second, a generative adversarial model only discriminates between “real” and “fake” images.
其次，生成式对抗性模型只区分“真实”和“虚假”图像。
There’s no constraints that an image of a cat has to look like a cat.
没有任何限制，猫的形象必须看起来像猫。
This leads to results where there’s no actual object in a generated image, but the style just looks like picture.
这导致生成的图像中没有实际对象，但是样式看起来就像图片。
In this post, I’ll go over the variational autoencoder, a type of network that solves these two problems.
在这篇文章中，我将介绍变分自动编码器，一种解决这两个问题的网络。
What is a variational autoencoder?
什么是变分自动编码器?
To get an understanding of a VAE, we’ll first start from a simple network and add parts step by step.
为了了解VAE，我们首先从一个简单的网络开始，一步一步地添加部件。
An common way of describing a neural network is an approximation of some function we wish to model.
描述神经网络的一种常用方法是对我们想要建模的函数进行近似。
However, they can also be thought of as a data structure that holds information.
然而，它们也可以被看作是一个保存信息的数据结构。
Let’s say we had a network comprised of a few deconvolution layers.
假设我们有一个由几个反卷积层组成的网络。
We set the input to always be a vector of ones.
我们把输入设为1向量。
Then, we can train the network to reduce the mean squared error between itself and one target image.
然后对网络进行训练，使其与目标图像的均方误差减小。
The “data” for that image is now contained within the network’s parameters.
该图像的“数据”现在包含在网络的参数中。

Now, let’s try it on multiple images.
现在，让我们在多个图像上尝试一下。
Instead of a vector of ones, we’ll use a one-hot vector for the input.
我们将使用一个热向量作为输入，而不是一个由1组成的向量。
[1, 0, 0, 0] could mean a cat image, while [0, 1, 0, 0] could mean a dog.
[1,0,0,0]可以表示猫的图像，而[0,1,0,0]可以表示狗。
This works, but we can only store up to 4 images.
这是可行的，但我们只能存储最多4个图像。
Using a longer vector means adding in more and more parameters so the network can memorize the different images.
使用更长的向量意味着添加更多的参数，这样网络就可以记住不同的图像。
To fix this, we use a vector of real numbers instead of a one-hot vector.
为了解决这个问题，我们使用一个实数向量而不是一个热向量。
We can think of this as a code for an image, which is where the terms encode/decode come from.
我们可以把它看作图像的代码，这就是术语encode/decode的来源。
For example, [3.3, 4.5, 2.1, 9.8] could represent the cat image, while [3.4, 2.1, 6.7, 4.2] could represent the dog.
例如，[3.3,4.5,2.1,9.8]可以表示猫的图像，[3.4,2.1,6.7,4.2]可以表示狗的图像。
This initial vector is known as our latent variables.
这个初始向量被称为潜在变量。
Choosing the latent variables randomly, like I did above, is obviously a bad idea.
像我上面做的那样，随机选择潜在变量显然不是一个好主意。
In an autoencoder, we add in another component that takes in the original images and encodes them into vectors for us.
在自动编码器中，我们添加另一个组件，该组件接收原始图像并将其编码为矢量。
The deconvolutional layers then “decode” the vectors back to the original images.
然后反卷积层将矢量“解码”回原始图像。
We’ve finally reached a stage where our model has some hint of a practical use.
我们终于到达了一个阶段，我们的模型有了一些实际应用的暗示。
We can train our network on as many images as we want.
我们可以训练我们的网络使用我们想要的任意多的图像。
If we save the encoded vector of an image, we can reconstruct it later by passing it into the decoder portion.
如果我们保存图像的编码向量，我们可以稍后通过将其传递到解码器部分来重构它。
What we have is the standard autoencoder.
我们有的是标准的自动编码器。
However, we’re trying to build a generative model here, not just a fuzzy data structure that can “memorize” images.
然而，我们试图在这里建立一个生成模型，而不仅仅是一个可以“记忆”图像的模糊数据结构。
We can’t generate anything yet, since we don’t know how to create latent vectors other than encoding them from images.
我们还不能生成任何东西，因为除了从图像中编码外，我们不知道如何创建潜在向量。
There’s a simple solution here.
这里有一个简单的解决方案。
We add a constraint on the encoding network, that forces it to generate latent vectors that roughly follow a unit gaussian distribution.
我们在编码网络上添加了一个约束，迫使它生成大致遵循单位高斯分布的潜在向量。
It is this constraint that separates a variational autoencoder from a standard one.
正是这个约束将变分自编码器与标准自编码器分开。
Generating new images is now easy: all we need to do is sample a latent vector from the unit gaussian and pass it into the decoder.
生成新图像现在很容易:我们所需要做的就是从单位高斯函数中采样一个潜在向量，并将其传递到解码器中。
In practice, there’s a tradeoff between how accurate our network can be and how close its latent variables can match the unit gaussian distribution.
在实践中，我们的网络有多精确和它的潜在变量有多接近单位高斯分布之间有一个权衡。
We let the network decide this itself.
我们让网络自己决定。
For our loss term, we sum up two separate losses: the generative loss, which is a mean squared error that measures how accurately the network reconstructed the images, and a latent loss, which is the KL divergence that measures how closely the latent variables match a unit gaussian.
对于损失项，我们总结了两个单独的损失:生成损失，即衡量网络重建图像的准确性的均方误差，和潜在损失，即衡量潜在变量与单位高斯分布匹配程度的KL散度。

generation_loss = mean(square(generated_image - real_image))
latent_loss = KL-Divergence(latent_variable, unit_gaussian)
loss = generation_loss + latent_loss