VAE损失函数从公式到代码的全理解

phoenix@Capricornus

已于 2024-09-04 09:34:56 修改

阅读量1.4k

点赞数 15

分类专栏：深度学习文章标签：深度学习

于 2024-09-04 09:30:32 首次发布

本文链接：https://blog.csdn.net/u013600306/article/details/141883386

版权

深度学习专栏收录该内容

9 篇文章

订阅专栏

VAE的损失函数实际上是最大化训练样本的极大似然估计，由于直接最大化做不到，所以最大化证据下界（ELBO）。称为证据的原因是 $p(\bm x)$ 是贝叶斯公式中的证据因子，就是常数，归一化用的。而为什么称为证据因子，那就说来话长了。因为贝叶斯公式习惯性利用证据和假设来解释……

在这里插入图片描述

然后就是最大化ELBO。对ELBO剖析：
第一种利用Jensen’s Inequality
在这里插入图片描述
第二种利用KL(Kullback-Leibler) Divergence

然后可以写为两项: the reconstruction loss and the KL (Kullback-Leibler) divergence loss.

第一项作假设再推导就是MSE损失

第二项是两个高斯的KL散度，利用KL Divergence between two Gaussian distributions
在这里插入图片描述
出来是这个

以下是如何实现

ELBO Loss Components

The ELBO loss is given by the sum of two separate loss terms:

Reconstruction Loss: This measures how well the decoder reconstructs the input data. It is typically computed using a distance metric such as the mean squared error (MSE) or cross-entropy loss between the original input and the reconstructed output.

$\text{reconstruction loss} = \text{MSE}(\text{reconstructed image}, \text{input image})$
KL Loss (Kullback-Leibler Divergence): This measures the difference between the learned distribution over the latent variables and a target (prior) distribution, which is typically a standard normal distribution. Minimizing the KL loss ensures that the learned means and variances are as close as possible to those of the target distribution.

For a latent dimension of size ( K ), the KL loss is given by:
$\text{KL loss} = -\frac{1}{2} \sum_{i=1}^{K} \left( 1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2 \right)$
where ( \mu_i ) and ( \sigma_i ) are the mean and standard deviation of the ( i )-th latent dimension, respectively.

ELBO Loss Formula

The ELBO loss is the sum of the reconstruction loss and the KL loss:
$\text{ELBO loss} = \text{reconstruction loss} + \text{KL loss}$

直观理解

Practical Effect of Including a KL Loss Term

The practical effect of including a KL loss term is to pack the clusters learned due to the reconstruction loss tightly around the center of the latent space, forming a continuous space to sample from. This helps in generating diverse and meaningful samples from the latent space by ensuring that the latent variables are distributed according to the target (normal) distribution.

Summary

Reconstruction Loss: Measures the discrepancy between the reconstructed data and the original input.
KL Loss: Ensures that the learned latent distribution is close to the target (prior) distribution.
ELBO Loss: The sum of the reconstruction loss and the KL loss, which is optimized during training to learn a good latent representation of the data.

By minimizing the ELBO loss, the VAE learns a latent space that is both informative (through the reconstruction loss) and structured (through the KL loss), enabling it to generate new data that is similar to the training data.

function loss = elboLoss(Y,T,mu,logSigmaSq)
% Reconstruction loss.
reconstructionLoss = mse(Y,T);
% KL divergence.
KL = -0.5 * sum(1 + logSigmaSq - mu.^2 - exp(logSigmaSq),1);
KL = mean(KL);
% Combined loss.
loss = reconstructionLoss + KL;
end