Diffusers的入门实操（很好玩）

最新推荐文章于 2025-03-11 21:41:34 发布

组学之心

最新推荐文章于 2025-03-11 21:41:34 发布

阅读量2.3k

点赞数 30

分类专栏： Diffusion模型应用文章标签： stable diffusion

本文链接：https://blog.csdn.net/weixin_56751316/article/details/140068914

版权

Diffusion模型应用专栏收录该内容

4 篇文章

订阅专栏

留意后续更新，请关注微信公众号：组学之心

Diffusers实操

–https://github.com/huggingface/diffusers

1.环境准备

运行以下代码安装需要的packages

pip install -qq -U diffusers datasets transformers accelerate ftfy pyarrow huggingface_hub

然后访问
https://huggingface.co/settings/tokens创建“WRITE”的hugging face访问权限，记录访问码。

我用的是pycharm，通过在pycharm终端登录，输入：huggingface-cli login

输入“WRITE”的访问码，随后选择Y，登陆成功

其它登录方式请见：https://huggingface.co/docs/hub/models-adding-libraries

接下来需要安装Git LFS，用来后续上传模型检查点。https://git-lfs.com/

1.1 DreamBooth

DreamBooth是存放在Hugging Face的一个文生图项目模型，我们可以用diffusers包来调用，当然也可以自己克隆项目来创建自己的后端产品，此外能够让我们对stable diffusion模型进行微调，这个过程可以引入特定的面部、物体或风格等额外的信息。目前有249个各色各样的模型来玩耍~

https://huggingface.co/sd-dreambooth-library

更多操作视频可以在这里找到https://www.youtube.com/watch?v=tgRiZzwSdXg

我挑选一个最受欢迎的项目来试一试–disco-diffusion-style

from diffusers import DiffusionPipeline
#加载管线，会下载训练好的模型
pipeline = DiffusionPipeline.from_pretrained("sd-dreambooth-library/disco-diffusion-style")
#你要输入的prompt，生成你想要的style
prompt = "A cyberpunk-style building"
image = pipeline(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

plt.imshow(image)
plt.axis('off')
plt.savefig("00zuxuezhixin/diffusers_practice/disco.png")
plt.show()

效果如下：

1.2 Diffusers核心API

三部分

管线：从高层次设计的多种类函数，目的在于方便部署和实现任务，能够快速的用与训练好的主流扩散模型来生成样本
模型：在训练新的扩散模型的时候需要用到网络结构，比如UNet
调度器：在推理的过程中使用多种不同的技巧来从噪声中生成图像，同时也可以生成训练过程中带噪声的图像。

训练扩散模型的简易过程如下：

1.从训练集中加载图像

2.添加不同级别的噪声

3.将添加了不同级别的噪声的数据输入模型

4.评估模型对这些输入去噪的效果

5.使用得到的性能信息更新模型权重，循环步骤。

2.实战一下生成图像

2.1 下载数据集

这里用到的是Hugging Face Hub的1000张像素的蝴蝶图像数据，resize后变成32×32像素，比较小，好练手。

from datasets import load_dataset
from torchvision import transforms

#####################------------------下载数据
dataset = load_dataset("huggan/smithsonian_butterflies_subset", split="train")
image_size = 32
batch_size = 64

# 数据增强处理
preprocess = transforms.Compose(
    [   transforms.Resize((image_size, image_size)),  # 32*32
        transforms.RandomHorizontalFlip(),  # 随即翻转
        transforms.ToTensor(),  # 将图像的像素值从 [0, 255] 范围内的整数转换为 [0, 1] 范围内
        transforms.Normalize([0.5], [0.5]),  # 归一化图像，[0, 1] 范围映射到 [-1, 1]
    ]
)

def transform(examples):
    images = [preprocess(image.convert("RGB")) for image in examples["image"]]
    return {"images": images}
dataset.set_transform(transform)
# 载入处理好的数据
train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

#####################------------------挑一些图像查看一下
def show_images(xb):
    grid = torchvision.utils.make_grid(xb, nrow=8, padding=2, normalize=True)
    npimg = grid.cpu().numpy()
    return np.transpose(npimg, (1, 2, 0))

# 从数据加载器中获取一批图像
xb = next(iter(train_dataloader))["images"].to(device)[:8]
print("X shape:", xb.shape)
img_grid = show_images(xb)
resized_img = Image.fromarray((img_grid * 255).astype('uint8')).resize((8 * 64, 64), resample=Image.NEAREST)

# 作图
plt.figure(figsize=(16, 2))
plt.imshow(resized_img)
plt.axis('off')
plt.savefig("00zuxuezhixin/diffusers_practice/butterflies.png")
plt.show()

2.2 扩散模型的调度器

在模型训练的过程中，我们要获取图像并给它们添加噪声，把带噪音图像输入到模型中。逆过程则使用模型的预测结果来去噪。这两个过程是由调度器scheduler来完成的。

from diffusers import DDPMScheduler
noise_scheduler = DDPMScheduler(num_train_timesteps=1000)  # 初始化DDPM调度器
timesteps = torch.linspace(0, 999, 8).long().to(device)  # 生成从0到999的等间隔时间步，长度为8
noise = torch.randn_like(xb)  # 生成与xb相同形状的随机噪声
noisy_xb = noise_scheduler.add_noise(xb, noise, timesteps)  # 使用噪声调度器将噪声添加到原始图像xb

# 将图像转换为可视化的格式
img_grid = show_images(noisy_xb)
# 调整尺寸和颜色尺度
resized_img = Image.fromarray((img_grid * 255).astype('uint8')).resize((8 * 64, 64), resample=Image.NEAREST)

plt.figure(figsize=(16, 2))
plt.imshow(resized_img)
plt.axis('off')
plt.savefig("00zuxuezhixin/diffusers_practice/butterflies_noise.png")
plt.show()

有兴趣的小伙伴可以进一步探究一下用不同的调度器参数，对模型效果的影响：

# 只加了一点点噪声
noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_start=0.001, beta_end=0.004)
# cosine调度方式可能更适合尺寸小的图像
noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_schedule='squaredcos_cap_v2')

2.3 定义扩散模型

上一篇推文介绍到，可以考虑用UNet类的模型，能够接收和输出相同shape的噪音图像。

from diffusers import UNet2DModel

model = UNet2DModel(
    sample_size=image_size,
    in_channels=3,  # 输入的通道，RGB图像是3
    out_channels=3,  # 输入的通道数
    layers_per_block=2,  # 每个UNet块需要多少层RestNet
    block_out_channels=(64, 128, 128, 256),
    down_block_types=(
        "DownBlock2D",  # ResNet下采样
        "DownBlock2D",
        "AttnDownBlock2D",  # ResNet下采样模块，有空间自注意力机制
        "AttnDownBlock2D",
    ),
    up_block_types=(
        "AttnUpBlock2D",
        "AttnUpBlock2D",  # ResNet上采样模块，有空间自注意力机制
        "UpBlock2D",
        "UpBlock2D",  # ResNet上采样
    ),
)
model.to(device);

with torch.no_grad():
    model_prediction = model(noisy_xb, timesteps).sample
model_prediction.shape  # ([8, 3, 32, 32])****

2.4 创建扩散模型的循环训练

# 设置噪音调度器
noise_scheduler = DDPMScheduler(num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2")
# 训练优化器
optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)
# 记录损失
losses = []

# 训练循环
for epoch in range(20):
    epoch_losses = []
    for step, batch in enumerate(train_dataloader):
        clean_images = batch["images"].to(device)

        # 生成随机噪声
        noise = torch.randn(clean_images.shape, device=clean_images.device)
        bs = clean_images.shape[0]

        # 为每张图像随机采样一个时间步
        timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device).long()

        # 根据每个时间步的噪声幅度，向清晰的图像中加噪
        with torch.no_grad():
            noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

        # 获得模型预测
        noise_pred = model(noisy_images, timesteps, return_dict=False)[0]
        loss = F.mse_loss(noise_pred, noise)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # 记录损失
        epoch_losses.append(loss.item())

    # 记录每个epoch的损失
    losses.extend(epoch_losses)

    if (epoch + 1) % 5 == 0:
        loss_last_epoch = sum(epoch_losses) / len(epoch_losses)
        print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

绘制损失值图像：

plt.figure(figsize=(10, 5))
plt.plot(losses, label='Training Loss')
plt.xlabel('Step')
plt.ylabel('Loss')
plt.title('Training Loss Over Time')
plt.legend()
plt.grid(True)
plt.savefig("00zuxuezhixin/diffusers_practice/butterflies_loss.png")
plt.show()

2.5 用训练好的模型生成图像

方法1是建立一个管线

from diffusers import DDPMPipeline
image_pipe = DDPMPipeline(unet=model, scheduler=noise_scheduler)
pipeline_output = image_pipe()
generated_image = pipeline_output.images[0]  # 生成图像

plt.figure(figsize=(2, 2))
plt.imshow(generated_image)
plt.axis('off')
plt.savefig("00zuxuezhixin/diffusers_practice/butterflies_generated_image.png")
plt.show()

# 保存image_pipe
image_pipe.save_pretrained("00zuxuezhixin/diffusers_practice/butterflies_my_pipeline")

生成的图像：

管线的文件如下：scheduler和unet子文件夹包含了生成图像所需要的全部组件。可以将其上传到Hugging face hub上和他人共享哈哈，或者通过API检查代码来实现这个操作。

方法2通过采样循环

# 生成随机噪声样本
sample = torch.randn(8, 3, 32, 32).to(device)
# 模拟去噪过程
for i, t in enumerate(noise_scheduler.timesteps):
    with torch.no_grad():
        residual = model(sample, t).sample
    sample = noise_scheduler.step(residual, t, sample).prev_sample

# 将生成的图像转换为可视化的格式
img_grid = show_images(sample)
plt.figure(figsize=(16, 8))
plt.imshow(img_grid)
plt.axis('off')
plt.savefig("00zuxuezhixin/diffusers_practice/butterflies_generated_image2.png")
plt.show()

3.把模型上传到Hugging Face Hub

from huggingface_hub import get_full_repo_name
from huggingface_hub import HfApi, create_repo
from huggingface_hub import ModelCard

model_name = "zuxuezhixin-sd-class-butterflies-32"
hub_model_id = get_full_repo_name(model_name)
hub_model_id

##在hugging face上创建一个模型仓库并上传
create_repo(hub_model_id)
api = HfApi()
api.upload_folder(
    folder_path="00zuxuezhixin/diffusers_practice/butterflies_my_pipeline/scheduler", path_in_repo="", repo_id=hub_model_id
)
api.upload_folder(folder_path="00zuxuezhixin/diffusers_practice/butterflies_my_pipeline/unet",
                  path_in_repo="", repo_id=hub_model_id)
api.upload_file(
    path_or_fileobj="00zuxuezhixin/diffusers_practice/butterflies_my_pipeline/model_index.json",
    path_in_repo="model_index.json",
    repo_id=hub_model_id,
)


##在hugging face对这个模型创建一个卡片，介绍它
content = f"""
---
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
---

This model is a diffusion model for unconditional image generation of butterflies.

## Usage

```python
from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained('{hub_model_id}')
image = pipeline().images[0]
image
```"""

card = ModelCard(content)
card.push_to_hub(hub_model_id)