【Diffusers库】第一篇快速入门

最新推荐文章于 2025-03-22 10:30:25 发布

Robin C

最新推荐文章于 2025-03-22 10:30:25 发布

阅读量5.9k

点赞数 22

分类专栏： AIGC 文章标签： AI作画 AIGC stable diffusion

本文链接：https://blog.csdn.net/qq_38423499/article/details/136527457

版权

AIGC 专栏收录该内容

2 篇文章

订阅专栏

写在前面的话

这是我们研发的用于 消费决策的AI助理 ，我们会持续优化，欢迎体验与反馈。微信扫描二维码，添加即可。
官方链接：https://ailab.smzdm.com/

************************************************************** 分割线 *******************************************************************

由于笔者最近在搞一些Aigc的事情，不可避免的要接触到Diffusers库，但是在网上搜资料的时候，发现资料比较少，所以做一些总结。一方面，加深一下自己对这个库的认识；另一方面，给大家分享一下，如有问题，欢迎留言交流，不甚感激！

首先，甩一下官方的链接吧：https://huggingface.co/docs/diffusers/index，这里是官方的使用说明，随着版本的更新，也越来越丰富了，但是可能会需要梯子吧。

Diffusers描述

diffusers 在图像生成方面可以概括如下5个方面的任务吧。
在这里插入图片描述

Diffusers安装

pip install --upgrade diffusers accelerate transformers

安装完成后，可以打开python的终端，使用import调研一下，看看是否有问题。

Diffusers快算入门(简单实用)

1. pipeline的实例化

DiffusionPipeline是用预训练的扩散系统进行推理的最简单方法。它是一个包含模型和调度器的端到端系统。你可以直接使用DiffusionPipeline完成许多任务。
在使用DiffusionPipeline类进行实例化模型的时候，假如指定的模型没有下载的话，在运行该命令行时，开始自动下载(模型下载比较费时，梯子是要用到的)。例子下载的是“runwayml/stable-diffusion-v1-5”模型。

from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

模型的默认下载路径在这里：

# macos
/Users/用户名/.cache/huggingface/hub

# linux
/root/.cache/huggingface/hub

# windows
# 我没试过，不过都是根目录下的.cache/huggingface/hub里

下载下来的模型文件夹名称是有固定格式的：【models–作者–模型名称】
在这里插入图片描述
通过输入实例化的对象，可以查看到实例化对象中的组件。

# 输入如下命令：
pipeline
# 输出如下：
StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.13.1",
  ...,
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  ...,
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

另外，如何想修改加载路径的话，要提前下载模型到指定路径

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
# 本地加载，也可以尝试如下
model_path = "/Users/用户名/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9"
pipeline = DiffusionPipeline.from_pretrained(model_path)

2. 计算位置的切换

强烈建议在GPU上运行这个pipeline，因为该模型由大约14亿个参数组成。你可以像在Pytorch里那样把生成器对象移到GPU上：

pipeline.to("cuda")

3. 图像生成及保存

现在你可以向pipeline传递一个文本提示来生成图像，然后获得去噪的图像。默认情况下，图像输出被放在一个PIL.Image对象中。

# 图像生成
image = pipeline("An image of a squirrel in Picasso style").images[0]
image
# 保存图像
image.save("image_of_squirrel_painting.png")

在这里插入图片描述

4. 调度器(重点)

调度器其实就是stable diffusion web ui中的采样器。
不同的调度器对去噪速度和质量的权衡是不同的。最近，也相继推出了一些高速的调度器，比如：LCM。

注意：调度器与模型不同，调度器没有可训练的权重，而且是无参数的。

pipeline的默认调度器是PNDMScheduler

from diffusers import EulerDiscreteScheduler
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# 本实例中使用的 EulerDiscreteScheduler 调度器
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

在快速入门教程中，你将用它的 from_config()方法实例化DDPMScheduler，但在日后使用的过程中是不需要这么使用的，直接加载到pipeline就好了，这里是便于理解扩散模型生成图片的过程：

from diffusers import DDPMScheduler

scheduler = DDPMScheduler.from_config(repo_id)
scheduler
# 输出如下：
DDPMScheduler {
  "_class_name": "DDPMScheduler",
  "_diffusers_version": "0.13.1",
  "beta_end": 0.02,
  "beta_schedule": "linear",
  "beta_start": 0.0001,
  "clip_sample": true,
  "clip_sample_range": 1.0,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "trained_betas": null,
  "variance_type": "fixed_small"
}

num_train_timesteps：去噪过程的长度，或者换句话说，将随机高斯噪声处理成数据样本所需的时间步数。
beta_schedule：用于推理和训练的噪声表。
beta_start和beta_end：噪声表的开始和结束噪声值。

import torch

torch.manual_seed(0)
noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
with torch.no_grad():
    noisy_residual = model(sample=noisy_sample, timestep=2).sample
	less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample

5. 模型介绍

扩散模型在进行生成图片的时候，是噪声扩散的逆过程。

用一个通俗易懂的比喻来说：在一个白纸上散了很多各种颜色的豆子，然后预测每个颜色的豆子是不是想要的。假如想要，这个豆子就预测为True，假如不想要豆子就预测为False。每次抛豆子的时候，是以上一轮的预测结果为基准的，这样抛几十轮，预测几十轮，最后就生成了一幅画。

模型是用from_pretrained()方法启动的，该方法还在本地缓存了模型权重，所以下次加载模型时更快。
对于快速入门，你默认加载的是UNet2DModel（不常用），这是一个基础的无条件图像生成模型，该模型有一个在猫咪图像上训练的检查点：

from diffusers import UNet2DModel

repo_id = "google/ddpm-cat-256"
model = UNet2DModel.from_pretrained(repo_id)
pipeline.config

输出：

FrozenDict([('sample_size', 256), ('in_channels', 3), ('out_channels', 3), ('center_input_sample', False), ('time_embedding_type', 'positional'), ('freq_shift', 1), ('flip_sin_to_cos', False), ('down_block_types', ['DownBlock2D', 'DownBlock2D', 'DownBlock2D', 'DownBlock2D', 'AttnDownBlock2D', 'DownBlock2D']), ('up_block_types', ['UpBlock2D', 'AttnUpBlock2D', 'UpBlock2D', 'UpBlock2D', 'UpBlock2D', 'UpBlock2D']), ('block_out_channels', [128, 128, 256, 256, 512, 512]), ('layers_per_block', 2), ('mid_block_scale_factor', 1), ('downsample_padding', 0), ('downsample_type', 'conv'), ('upsample_type', 'conv'), ('dropout', 0.0), ('act_fn', 'silu'), ('attention_head_dim', None), ('norm_num_groups', 32), ('attn_norm_num_groups', None), ('norm_eps', 1e-06), ('resnet_time_scale_shift', 'default'), ('add_attention', True), ('class_embed_type', None), ('num_class_embeds', None), ('num_train_timesteps', None), ('_use_default_values', ['upsample_type', 'num_class_embeds', 'class_embed_type', 'resnet_time_scale_shift', 'dropout', 'num_train_timesteps', 'add_attention', 'attn_norm_num_groups', 'downsample_type']), ('_class_name', 'UNet2DModel'), ('_diffusers_version', '0.0.4'), ('_name_or_path', 'google/ddpm-cat-256')])

想知道模型的参数，调用 model.config:

模型配置是一个🧊冻结的🧊字典，意思是这些参数在模型创建后就不变了。这是特意设置的，确保在开始时用于定义模型架构的参数保持不变，其他参数仍然可以在推理过程中进行调整。
一些最重要的参数：

sample_size：输入样本的高度和宽度尺寸。
in_channels：输入样本的输入通道数。
down_block_types和up_block_types：用于创建U-Net架构的下采样和上采样块的类型。
block_out_channels：下采样块的输出通道数；也以相反的顺序用于上采样块的输入通道数。
layers_per_block：每个U-Net块中存在的ResNet块的数量。

前面提到过，既然既然是在噪声的基础上进行预测到，那么在预测之前，应该建立一些随机数种子。

import sys
import time

import torch
import PIL.Image
import numpy as np
import tqdm
from diffusers import UNet2DModel, DDPMScheduler

# 模型加载

model_id = "google/ddpm-cat-256"
model = UNet2DModel.from_pretrained(model_id)
print(model.config)

# 调配器加载
repo_id = "google/ddpm-cat-256"
scheduler = DDPMScheduler.from_config(model.config)


# 展示图片的函数
def display_sample(sample, i):
    image_processed = sample.cpu().permute(0, 2, 3, 1)
    image_processed = (image_processed + 1.0) * 127.5
    image_processed = image_processed.numpy().astype(np.uint8)

    image_pil = PIL.Image.fromarray(image_processed[0])
    # display(f"Image at step {i}")
    # display(image_pil)
    image_pil.save("%f.png"%time.time())
    # image_pil.show()
# 处理过程转移到GPU处理
model.to("cuda:0")


# 生成随机种子
torch.manual_seed(0)
noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
sample = noisy_sample.to("cuda:0")

noisy_sample = noisy_sample.to("cuda:0")
# 开始生成一个只猫
for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
    # 1. predict noise residual
    with torch.no_grad():
        residual = model(sample, t).sample

    # 2. compute less noisy image and set x_t -> x_t-1
    sample = scheduler.step(residual, t, sample).prev_sample

    # 3. optionally look at image
    if (i + 1) % 50 == 0:
        display_sample(sample, i + 1)