generative models notes

weixin_44504134

已于 2024-05-27 14:12:20 修改

阅读量33

点赞数

文章标签：深度学习

于 2023-11-06 11:36:33 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_44504134/article/details/132969843

版权

The approach: estimate the distribution of data (images)

Intractable normalizing constant

GAN

Generator + Discriminator 结构

缺点：1. 同时训练两个网络，不够稳定 2. 多样性不足，主要来自随机噪声 $z$

Auto Encoder Family

AE

DAE

$x_c$ 相对于 $x$ 添加noise，使得模型更加robust，类似MAE模型

AE/DAE 的目的是压缩 $x$ ，ouput是 $z$ (bottleneck)，再用于分类等任务，并非用于生成任务

$z$ 并非一个分布，只是一个固定特征用于重建

VAE

Encoder预测的是一个posterior分布 $q(z|x)$

给定先验 $z$ , $p(x|z)$ 就是likihood，目标为最大似然

VQVAE (Vector-Quantized)

Diffusion Models

Use diffusion process to add noise $x_0 \rightarrow x_1 \rightarrow, ... , x_T$ . Diffuse the original image to a Gaussian Distribution. If this works, we can reverse diffusion (using U-Net structure) to obtain image from random noise.

DDPM - improved DDPM

predict step noise instead of $x_{t-1}$ at each step;

temporal emedding的作用，因为diffusion每一step的模型用同一套参数，但是从完全随机到图像每一步逐渐清晰

每一步误差 $\epsilon$ 是一个distribution, DDPM限定normal和固定方差，仅需预测均值。

DDPM也可以想成是类似VAE的Encoder-Decoder模型，但有以下区别：

1. DDPM的encode过程是固定的，而VAE的encoder是学出来的

2. DDPM的解码编码前后维度一致，而一般VAE的bottleneck维度比图像小很多

Diffusion beats GAN

用大模型替代U-net，减少步数至25

classifier guidance 牺牲一定多样性，换取生成效果更好 (classfier选择添加噪声的imagenet数据)

所有的引导都是 $f_\theta$ 中的 $y$

question: 用guided diffusion，可以将guided loss计入生成更好的 $f$ ,其对于y进行优化，但是使用时必须提供和训练相同的y？

GLIDE - DALLE2

classifier-free guidance

*Reference

DALL·E 2（内含扩散模型介绍）【论文精读】_哔哩哔哩_bilibili

VAE模型细致详解 - 知乎

From Autoencoder to Beta-VAE | Lil'Log

Berkeley Course

The general idea of generation model:

assume that all data (images) are drawn from an underlying distribution $p_{data}$ . We parametrize this distrbution with $p_{\theta}$ , and estimate $\theta$ using Maximum Likelihood on the training data.

李宏毅 course

Text-to-Image framework

Framework: text encoder - generation - (latent variable representation of image) - image decoder

text encoder and image decoder can be trained seperately (with no text-image pairs data)

DDPM

训练时随机抽取 $t$ ，并直接0-t的error，inference时是step-by-step

weixin_44504134

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
generative models notes

temporal emedding的作用，因为diffusion每一step的模型用同一套参数，但是从完全随机到图像每一步逐渐清晰。2. DDPM的解码编码前后维度一致，而一般VAE的bottleneck维度比图像小很多。1. DDPM的encode过程是固定的，而VAE的encoder是学出来的。，并直接0-t的error，inference时是step-by-step。(bottleneck)，再用于分类等任务，并非用于生成任务。添加noise，使得模型更加robust，类似MAE模型。
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。