Autoencoders(自编码器)
一个自编码器的神经网络包含两部分:
- An encoder network that compresses high-dimensional input data into a lower-dimensional representation vector(一个encoder网络能够将高维的原始输出数据压缩成一个低维的表示向量)
- A decoder network that decompresses a given representation vector back to the original domain(将一个向量解压成原始的形式)
The network is trained to find weights for the encoder and decoder that minimize the loss between the original input and the reconstruction of the input after it has passed through the encoder and decoder. (loss函数是原始图像和模型输出的图像之间的重建损失)
B站上一位UP制作的一个视频我觉得很形象,下面这张图片是从其视频中截取过来的:
视频中说到,其跟一般的神经网络比较大的区别就是,神经网络的中间部分会有一个瓶颈,其隐藏层的单元数目很少,能够对输入进行过滤,特征提取
Variational Autoencoders(变分自编码器)
Variational Autoencoders是在Autoencoder的基础上作两个改进即可:
- 此时不会将输入映射到一个输入上,而是会映射到一个分布上
- 损失函数中加入了KL散度,代表学习到的分布和标准正态分布之间的差异
同样这个视频中给出了一幅图进行叙述:
Using VAEs to Generate Faces
书中最后用一个人脸生成的例子来搭建VAE模型,基本的框架就是上述所述的原理部分。以下简要记录一下看到的觉得比较具有启发意义的部分:
- One benefit of mapping images into a lower-dimensional space is that we can perform arithmetic on vectors in this latent space that has a visual analogue when decoded back into the original image domain.(能够对低维的向量进行操作,改变图片。例如有一个伤心😟的人,想给他加一个笑脸)
- To do this we first need to find a vector in the latent space that points in the direction of increasing smile. Adding this vector to the encoding of the original image in the latent space will give us a new point which, when decoded, should give us a more smiley version of the original image.(首先需要找到一个向量能够表示笑脸😊)
- If we take the average position of encoded images in the latent space with the attribute smiling an subtract the average position of encoded images that do not have the attribute smiling, we will obtain the vector that points from not smiling to smiling, which is exactly what we need.(两个标注的数据集相减即可)
We can use a similar idea to morph between two faces. Imagine two points in the latent space, A and B, that represent two images. If you started at point A and walked toward point B in a straight line, decoding each point on the line as you went, you would see a gradual transition from the starting face to the end face.(还有一个脸部变形的例子)