扩散模型从原理到实践-任务一介绍扩散模型-CSDN博客

本文链接：https://blog.csdn.net/github_38929225/article/details/133955313

本文详细介绍了如何从头创建一个自定义的扩散模型，包括加载数据、训练UNet模型、使用调度器添加噪声、以及利用HuggingFaceAPI进行多GPU加速和模型上传。教程涵盖了基础知识到实践应用的过程。

摘要由CSDN通过智能技术生成

教程地址

扩散模型从原理到实践，第一单元教程地址：https://github.com/darcula1993/diffusion-models-class-CN/blob/main/unit1/01_introduction_to_diffusers_CN.ipynb

环境：

windows 10
git bash
conda
编码-utf-8
jupyterlab

即将学到的知识点

看到一个强大的自定义扩散模型管道 (并了解到如何制作一个自己的版本)
通过以下方式创建你自己的迷你管道:
回顾扩散模型背后的核心思想
从 Hub 中加载数据进行训练
探索如何使用 scheduler 将噪声添加到数据中
创建和训练一个 UNet 模型
将各个组件拼装在一起来形成一个工作管道 (working pipelines)
编辑并运行一个脚本，用于初始化一个较长的训练，该脚本将处理
使用 Accelerate 来进行多 GPU 加速训练
实验日志记录以跟踪关键统计数据
将最终的模型上传到 Hugging Face Hub

hugging face API

由于hugging face 的模型都比较大，需要用git-lfs，来上传或者下载模型

%%capture
!sudo apt -qq install git-lfs
!git config --global credential.helper store

如果上面没有问题了，可以通过notebook来输入apps tokens,登陆了

#通过hugging face登陆hugging face
from huggingface_hub import notebook_login
notebook_login()

登陆成功的截图
在这里插入图片描述

创建模型，把模型push到hub，创建模型卡，方便随时下载。

from huggingface_hub import get_full_repo_name

model_name = "sd-class-butterflies-32"
hub_model_id = get_full_repo_name(model_name)
hub_model_id

from huggingface_hub import HfApi, create_repo
#在 🤗 Hub 上创建模型仓库并 push 它吧
create_repo(hub_model_id)
api = HfApi()
api.upload_folder(
    folder_path="my_pipeline/scheduler", path_in_repo="", repo_id=hub_model_id
)
api.upload_folder(folder_path="my_pipeline/unet", path_in_repo="", repo_id=hub_model_id)
api.upload_file(
    path_or_fileobj="my_pipeline/model_index.json",
    path_in_repo="model_index.json",
    repo_id=hub_model_id,
)
#创建一个超棒的模型卡
from huggingface_hub import ModelCard

content = f"""
---
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
---
# Model Card for Unit 1 of the [Diffusion Models Class 🧨](https://github.com/huggingface/diffusion-models-class)
This model is a diffusion model for unconditional image generation of cute 🦋.
## Usage

from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained('{hub_model_id}')
image = pipeline().images[0]
image
"""

card = ModelCard(content)
card.push_to_hub(hub_model_id)

检验模型是否成功

from diffusers import DDPMPipeline

image_pipe = DDPMPipeline.from_pretrained(hub_model_id)
pipeline_output = image_pipe()
pipeline_output.images[0]

accelerate launch 加速上传，windows命令执行失败，可以后续验证一下效果。使用更多数据训练一个更大的模式时，可能所需要用到的内容，如多块 GPU 支持，进度记录和样例图片，用于支持更大 batchsize 的导数记录功能，自动上传模型等等。

diffusers 的核心 API 被分为三个主要部分:

管道: 从高层出发设计的多种类函数，旨在以易部署的方式，能够做到快速通过主流预训练好的扩散模型来生成样本。
模型: 训练新的扩散模型时用到的主流网络架构，e.g. UNet.
管理器 (or 调度器): 在推理中使用多种不同的技巧来从噪声中生成图像，同时也可以生成在训练中所需的带噪图像。

训练一个扩散模型的流程看起来像是这样：

从训练集中加载一些图像
加入噪声，从不同程度上
把带了不同版本噪声的数据送进模型
评估模型在对这些数据做增强去噪时的表现
使用这个信息来更新模型权重，然后重复此步骤

1.从训练集中加载一些图像

1.调用pytorch,加载数据函数load_dataset(),后将处理的数据，再调dataset.set_transform(),将处理好的数据，传到torch.utils.data.DataLoade，进行构造函数。

import torchvision
from datasets import load_dataset
from torchvision import transforms
#第一步构造dataset,从hugingface加载现成的数据，到本地
dataset = load_dataset("huggan/smithsonian_butterflies_subset", split="train")
#
中间处理数据（伪代码），具体看教程
#第二步，调用pytorch现有函数，对图像进行处理，并通过DataLoader，来构造迭代对象
dataset.set_transform(transform)

# Create a dataloader from the dataset to serve up the transformed images in batches
train_dataloader = torch.utils.data.DataLoader(
    dataset, batch_size=batch_size, shuffle=True
)
#第三步，取Dataloder
xb = next(iter(train_dataloader))["images"].to(device)[:8]
print("X shape:", xb.shape)
show_images(xb).resize((8 * 64, 64), resample=Image.NEAREST)

如果不了解pytorch,简单看下这里
pytorch数据处理pipeline 三步走的一般格式如下：
参考链接：pytorch 数据预处理三剑客
dataset = MyDataset()           # 第一步：构造Dataset对象
dataloader = DataLoader(dataset)# 第二步：通过DataLoader来构造迭代对象
num_epoches = 100
for epoch in range(num_epoches):# 第三步：逐步迭代数据
for img, label in dataloader:
# 训练代码

最终的效果展示
在这里插入图片描述

2.transforms.Compose() 是pytorch中的图像预处理包,可以对图像处理步骤进行合并，如教程代码所述，先对图像尺寸进行重新定义，然后随机水平裁剪，将图像转变成张量，最后在做归一化。

# Define data augmentations
preprocess = transforms.Compose(
    [
        transforms.Resize((image_size, image_size)),  # Resize
        transforms.RandomHorizontalFlip(),  # Randomly flip (data augmentation)
        transforms.ToTensor(),  # Convert to tensor (0, 1)
        transforms.Normalize([0.5], [0.5]),  # Map to (-1, 1)
    ]
)

PIL图像预处理函数Compose()函数，参考链接：transforms.Compose()函数

2.加入噪声，从不同程度上

我们的训练计划是，取出这些输入图片然后对它们增添噪声，在这之后把带噪的图片送入模型。在推理阶段，我们将用模型的预测值来不断迭代去除这些噪点。在diffusers中，这两个步骤都是由管理器（调度器）来处理的。
噪声管理器决定在不同的迭代周期时分别加入多少噪声。我们可以这样创建一个管理器，是取自于训练并能取样 ‘DDPM’ 的默认配置。 (基于此篇论文 “Denoising Diffusion Probabalistic Models”:

#从扩散模型库获取DDPMScheduler管理，我们迭代1000，每一次增加点点噪音
from diffusers import DDPMScheduler
noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

DDPM 论文这样来描述一个损坏过程，为每一个 ’ 迭代周期 '(timestep) 增添一点少量的噪声。设在某个迭代周期有 $x_{t-1}$ , 我们可以得到它的下一个版本 $x_t$ （比之前更多一点点噪声）:

$(\mathbf {x}_t \vert \mathbf {x}_{t-1}) = \mathcal {N}(\mathbf {x}_t; \sqrt {1 - \beta_t} \mathbf {x}_{t-1}, \beta_t\mathbf {I}) \quad q (\mathbf {x}_{1:T} \vert \mathbf {x}_0) = \prod^T_{t=1} q (\mathbf {x}_t \vert \mathbf {x}_{t-1})$

这就是说，我们取 $x_{t-1}$ , 给他一个 $\sqrt {1 - \beta_t}$ 的系数，然后加上带有 $\beta_t$ 系数的噪声。这里 $\beta$ 是根据一些管理器来为每一个 t 设定的，来决定每一个迭代周期中添加多少噪声。现在，我们不想把这个推演进行 500 次来得到 $x_{500}$ ，所以我们用另一个公式来根据给出的 $x_0$ 计算得到任意 t 时刻的 $x_t$ :

$\begin {aligned} q (\mathbf {x}_t \vert \mathbf {x}_0) &= \mathcal {N}(\mathbf {x}_t; \sqrt {\bar {\alpha}_t} \mathbf {x}_0, {(1 - \bar {\alpha}_t)} \mathbf {I}) \end {aligned}$ where $\bar {\alpha}_t = \prod_{i=1}^T \alpha_i$ and $\alpha_i = 1-\beta_i$

数学符号看起来总是很可怕！好在有管理器来为我们完成这些运算。我们可以画出 $\sqrt {\bar {\alpha}_t}$ (标记为sqrt_alpha_prod) 和 $\sqrt {(1 - \bar {\alpha}_t)}$ (标记为sqrt_one_minus_alpha_prod) 来看一下输入 (x) 与噪声是如何在不同迭代周期中量化和叠加的:

在这里插入图片描述

timesteps = torch.linspace(0, 999, 8).long().to(device)
noise = torch.randn_like(xb)
noisy_xb = noise_scheduler.add_noise(xb, noise, timesteps)
print("Noisy X shape", noisy_xb.shape)
show_images(noisy_xb).resize((8 * 64, 64), resample=Image.NEAREST)

加入噪声后的结果
在这里插入图片描述

线性间距向量返回多个均匀间隔点，参考链接 torch.linspace

torch.linspace(start, end, steps=100, out=None) → Tensor
返回一个1维张量，包含在区间start和end上均匀间隔的step个点。
输出张量的长度由steps决定。
参数：
start (float) - 区间的起始点
end (float) - 区间的终点
steps (int) - 在start和end间生成的样本数
out (Tensor, optional) - 结果张量

torch.randn_like()

torch.randn_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, >memory_format=torch.preserve_format) -> Tensor
返回一个和输入大小相同的张量，其由均值为0、方差为1的标准正态分布填充。即
torch.randn_like(input)等价于torch.randn(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)

3.把带了不同版本噪声的数据送进模型

现在我们来到了核心部分：模型本身。
大多数扩散模型使用的模型结构都是一些 [U-net] 的变形 (https://arxiv.org/abs/1505.04597) 也是我们在这里会用到的结构。
概括来说:
- 输入模型中的图片经过几个由 ResNetLayer 构成的层，其中每层都使图片尺寸减半。
- 之后在经过同样数量的层把图片升采样。
- 其中还有对特征在相同位置的上、下采样层残差连接模块。
模型一个关键特征既是，输出图片尺寸与输入图片相同，这正是我们这里需要的。
Diffusers 为我们提供了一个易用的UNet2DModel类，用来在 PyTorch 创建所需要的结构。
我们来使用 U-net 为我们生成目标大小的图片吧。 
注意这里down_block_types对应下采样模块 (上图中绿色部分), 而>up_block_types对应上采样模块 (上图中红色部分):

在这里插入图片描述

#定义模型
from diffusers import UNet2DModel

# Create a model
model = UNet2DModel(
    sample_size=image_size,  # the target image resolution
    in_channels=3,  # the number of input channels, 3 for RGB images
    out_channels=3,  # the number of output channels
    layers_per_block=2,  # how many ResNet layers to use per UNet block
    block_out_channels=(64, 128, 128, 256),  # More channels -> more parameters
    down_block_types=(
        "DownBlock2D",  # a regular ResNet downsampling block
        "DownBlock2D",
        "AttnDownBlock2D",  # a ResNet downsampling block with spatial self-attention
        "AttnDownBlock2D",
    ),
    up_block_types=(
        "AttnUpBlock2D",
        "AttnUpBlock2D",  # a ResNet upsampling block with spatial self-attention
        "UpBlock2D",
        "UpBlock2D",  # a regular ResNet upsampling block
    ),
)
model.to(device);

4.评估模型在对这些数据做增强去噪时的表现,使用这个信息来更新模型权重，然后重复此步骤

下面这是 PyTorch 中经典的优化迭代循环，在这里一批一批的送入数据然后通过优化器来一步步更新模型参数 - 在这个样例中我们使用学习率为 0.0004 的 AdamW 优化器。
对于每一批的数据，我们要
随机取样几个迭代周期
根据预设为数据加入噪声
把带噪数据送入模型
使用 MSE 作为损失函数来比较目标结果与模型预测结果（在这里是加入噪声的场景）
通过loss.backward ()与optimizer.step ()来更新模型参数
在这个过程中我们记录 Loss 值用来后续的绘图。

NB: 这段代码大概需 10 分钟来运行 - 你也可以跳过以下两块操作直接使用预训练好的模型。供你选择，你可以探索下通过缩小模型层中的通道数会对运行速度有多少提升。

# Set the noise scheduler
noise_scheduler = DDPMScheduler(
    num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2"
)

# Training loop
optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)

losses = []

for epoch in range(30):
    for step, batch in enumerate(train_dataloader):
        clean_images = batch["images"].to(device)
        # Sample noise to add to the images
        noise = torch.randn(clean_images.shape).to(clean_images.device)
        bs = clean_images.shape[0]

        # Sample a random timestep for each image
        timesteps = torch.randint(
            0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device
        ).long()

        # Add noise to the clean images according to the noise magnitude at each timestep
        noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

        # Get the model prediction
        noise_pred = model(noisy_images, timesteps, return_dict=False)[0]

        # Calculate the loss
        loss = F.mse_loss(noise_pred, noise)
        loss.backward(loss)
        losses.append(loss.item())

        # Update the model parameters with the optimizer
        optimizer.step()
        optimizer.zero_grad()

    if (epoch + 1) % 5 == 0:
        loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
        print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

在这里插入图片描述

5.获取生成图像，需要管道

方法一：建立一个管道

from diffusers import DDPMPipeline
image_pipe = DDPMPipeline(unet=model, scheduler=noise_scheduler)
pipeline_output = image_pipe()
pipeline_output.images[0]

在这里插入图片描述

本地保存一个这样的管道

image_pipe.save_pretrained("my_pipeline")

方法二：写一个取样循环

从随机噪声开始，遍历管理器的迭代周期来看从最嘈杂直到最微小的噪声变化，基于模型的预测一步步减少一些噪声

# Random starting point (8 random images):
sample = torch.randn(8, 3, 32, 32).to(device)

for i, t in enumerate(noise_scheduler.timesteps):

    # Get model pred
    with torch.no_grad():
        residual = model(sample, t).sample

    # Update sample with step
    sample = noise_scheduler.step(residual, t, sample).prev_sample

show_images(sample)