How-Diffusion-Models-Work

最新推荐文章于 2024-09-11 23:02:09 发布

AI周红伟

最新推荐文章于 2024-09-11 23:02:09 发布

阅读量707

点赞数 16

文章标签： AIGC chatgpt 大模型

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/starzhou/article/details/136487603

版权

Notes from How Diffusion Models Work by DeepLearning.ai

Contents

Intuition

Sampling

With Extra Noise

explorer_pC0437cXSo.mp4

Training

Context Embedding

Faster Sampling

Notes

Taught By Sharon Zhou

Noted by Atul

Missing Prerequisite: Backprop

Example used throughout the course: Generate 16X16 size sprites for video games.

Intuition

Goal : Given a lot of sprite images, generate even more sprite images

What does the network learn?
- Fine details
- General outline
- Everything in between
Noising Process (bob as ink drop analogy)

Denoising Process (what should the NN think?)
- If its' Bob the sprite, keep it as it is
- If its likely to be Bob, suggest more details to be filled
- If its just an outline of a sprite, suggest general details for likely sprite(bob/fred/...)
- If its nothing, suggest outline of a sprite
Give the NN input noise, whose pixels are obtained from Normal distribution, and get a completely new sprite !

Sampling

Assume you have a trained NN
At each denoising step, it predicts noise, and subtracts it to get a better image
NOTE: At each denoising step, some random noise is added again to prevent "mode collapse"

Neural Network

UNet Architecture
- Input and output of same size
- First used for image segmentation

Takes a noisy image, embeds into small space by downsampling, and upsamples to predict noise
Can take more info. in form of embeddings
- Time: related to timestep, and noise level added
- Context: guides generation process
Checkout forward() in sampling notebook

Training

Learns the distribution of what is "not noise"

Sample training image, timestep t, and noise, randomly
- Timestep helps control level of noise
- randomisation ensures a stable model
Add noise to image
Input this into NN, which predicts the noise
Compute loss between actual and predicted noise
Backprop and learn

Control

Embeddings are vectors , for instance, strings represented as number vectors
Given as input to NN along with training image
Get associated with a training example, and its properties
Uses: Generate funky mixtures by combining embeddings
Context formats
- Text
- Categories, one hot encoded (Eg. hero, non-hero, spells ...)

Fast Sampling : DDIM

DDPM is slow!
- Multiple timesteps, and markovian nature
Skips steps, making the process deterministic
Lower quality than DDPM

Summary

Other applications : Music, Inpainting, Textual Inversion

关注

16
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
0
评论
How-Diffusion-Models-Work

explorer_pC0437cXSo.mp4 编辑编辑What does the network learn?Noising Process (bob as ink drop analogy)编辑Denoising Process (what should the NN think?)Give the NN input noise, whose pixels are obtained from Normal distribution, and get a completely new sprite
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

AI周红伟 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。