2024最新Stable Diffusion代码指南，赶紧码住！!

高级绘画师PP

已于 2024-09-04 17:10:58 修改

阅读量929

点赞数 21

文章标签： stable diffusion 人工智能 AI作画

于 2024-09-04 09:20:09 首次发布

本文链接：https://blog.csdn.net/m0_71746299/article/details/141884034

版权

table Diffusion是一种 Latent Diffusion [1]模型，由LMU Munich机器视觉和学习小组（CompVis）的研究者开发。

在 EleutherAI 和 LAION 的支持下，Stability AI、CompVis 和 Runway 合作于 2022 年 8 月底公开发布了训练模型。有关更多信息，大家可以查看官方博客文章。

自公开发布以来，社区共同努力在使Stable Diffusion更快、内存效率更高和性能更高方面做出了令人难以置信的工作。

Diffusers 提供了一个简单的 API 来运行stable diffusion，并改进了所有内存、计算和质量。

本笔记本将逐一介绍改进，以便大家可以最好地利用 StableDiffusionPipeline 进行推理。

Prompt Engineering

在推理中运行 Stable Diffusion 时，我们通常希望生成某种类型或风格的图像，然后对其进行改进。改进先前生成的图像意味着使用不同的prompt和可能不同的种子一遍又一遍地运行推理，直到我们对生成结果感到满意为止。所以首先最重要的是尽可能加快stable diffusion的速度，在给定的时间内生成尽可能多的图片。这可以通过提高计算效率（速度）和内存效率（GPU RAM）来实现。

让我们首先研究计算效率。

在整个notebook中，我们将聚焦于 runwayml/stable-diffusion-v1-5 :

model_id = "runwayml/stable-diffusion-v1-5"

让我们加载pipeline。

Speed Optimization

from diffusers import StableDiffusionPipeline                                                                                                                                                                                                 
                                                                                                                                                                                                                                              
pipe = StableDiffusionPipeline.from_pretrained(model_id)

我们的目标是生成一张old warrior chief的漂亮照片，稍后将尝试找到生成这样一张照片的最佳prompt。现在，让我们保持简单的prompt:

prompt = "portrait photo of a old warrior chief"

首先，我们应该确保在 GPU 上运行推理，所以让我们将pipeline移至 GPU，就像使用任何 PyTorch 模块一样。

pipe = pipe.to("cuda")

了生成一张图片，大家应该使用[~StableDiffusionPipeline.__call__]方法。

为了确保我们可以在每次调用中重现基本相同的图像，让我们使用generator。有关更多信息，请参阅此处有关reproducibility的文档。

generator = torch.Generator("cuda").manual_seed(0)

现在，让我们来看看它生成的结果。

image = pipe(prompt, generator=generator).images[0]                                                                                                                                                                                           
image

Cool，现在这在 T4 GPU 上花费了大约 30 秒（如果分配的 GPU 比 T4 更好，大家可能会看到更快的推理）。

我们在上面执行的，默认使用了 float32 全精度并运行了默认的推理步骤数 (50)。最简单的加速是切换到 float16（half）精度并运行更少的推理步骤。让我们现在改为在 float16 中加载模型。

import torch                                                                                                                                                                                                                                  

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)                                                                                                                                                           
pipe = pipe.to("cuda")

我们可以再次调用pipeline来生成图像。

generator = torch.Generator("cuda").manual_seed(0)                                                                                                                                                                                            

image = pipe(prompt, generator=generator).images[0]                                                                                                                                                                                           
image

Cool，对于基本相同的图像质量，这几乎快了三倍。

我们强烈建议始终在 float16 中运行pipeline，因为到目前为止我们很少看到质量因此下降。

接下来，让我们看看是否需要使用 50 个推理步骤，或者是否可以使用更少的推理步骤。推理步骤的数量与我们使用的去噪scheduler相关。选择更高效的scheduler可以帮助我们减少步骤数。

让我们看一下stable diffusion pipeline兼容的所有scheduler。

pipe.scheduler.compatibles                                                                                                                                                                                                                    
    [diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler,                                                                                                                                                       
     diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,                                                                                                                                                                       
     diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler,                                                                                                                                                                     
     diffusers.schedulers.scheduling_pndm.PNDMScheduler,                                                                                                                                                                                      
     diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,                                                                                                                                                                   
     diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler,                                                                                                                                                
     diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,                                                                                                                                                         
     diffusers.schedulers.scheduling_ddpm.DDPMScheduler,                                                                                                                                                                                      
     diffusers.schedulers.scheduling_ddim.DDIMScheduler]

Cool，有很多scheduler。
Diffusers 不断添加一系列可与 Stable Diffusion 一起使用的新颖scheduler/samplers。有关更多信息，我们建议大家在此处此处查看官方文档。

好的，现在 Stable Diffusion 正在使用PNDMScheduler，它通常需要大约 50 个推理步骤。然而，其他scheduler如DPMSolverMultistepScheduler或者DPMSolverSinglestepScheduler似乎只需要 20 到 25 个推理步骤就可以。让我们试一下。

大家可以使用 from_config 函数设置新的scheduler程序。

from diffusers import DPMSolverMultistepScheduler                                                                                                                                                                                             
                                                                                                                                                                                                                                              
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

现在，让我们尝试将推理步骤的数量减少到 20 个。

generator = torch.Generator("cuda").manual_seed(0)                                                                                                                                                                                            
                                                                                                                                                                                                                                              
image = pipe(prompt, generator=generator, num_inference_steps=20).images[0]                                                                                                                                                                   
image

现在的图像看起来确实有点不同，但可以说它的质量仍然不变。不过，我们现在将推理时间缩短到仅 4 秒。

Memory Optimization

生成任务中使用更少内存间接意味着更快的速度，因为我们经常试图最大化每秒可以生成的图像数量。通常，每次推理运行的图像越多，每秒生成的图像也越多。

查看我们一次可以生成多少图像的最简单方法是简单地尝试一下，看看何时出现“Out-of-Memory(OOM)”错误。

我们可以通过简单地传入prompt和generator列表来运行批量推理。

让我们定义一个为我们生成批处理的快速函数。

def get_inputs(batch_size=1):                                                                                                                                                                                                                 
  generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]                                                                                                                                                             
  prompts = batch_size * [prompt]                                                                                                                                                                                                             
  num_inference_steps = 20                                                                                                                                                                                                                    

  return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}

此函数返回一个prompts列表和一个generators列表，因此我们可以重用生成我们喜欢结果的generator.

我们还需要一种方法，使我们能够轻松地显示一批图像。

from PIL import Image                                                                                                                                                                                                                         

def image_grid(imgs, rows=2, cols=2):                                                                                                                                                                                                         
    w, h = imgs[0].size                                                                                                                                                                                                                       
    grid = Image.new('RGB', size=(cols*w, rows*h))                                                                                                                                                                                            
                                                                                                                                                                                                                                              
    for i, img in enumerate(imgs):                                                                                                                                                                                                            
        grid.paste(img, box=(i%cols*w, i//cols*h))                                                                                                                                                                                            
    return grid

Cool, 让我们看看从batch_size=4开始，我们可以使用多少内存。

images = pipe(**get_inputs(batch_size=4)).images                                                                                                                                                                                              
image_grid(images)

过 4 的 batch_size 将在此notebook中出错（假设我们在 T4 GPU 上运行它）。此外，我们可以看到，与之前的 4s/image 相比，我们每秒仅生成略多的图像（3.75s/image）。

然而，社区已经找到了一些很好的技巧来进一步改善内存限制。稳定版发布后，社区在几天内发现了改进并通过 GitHub 免费共享它们 - 最好的开源！作者相信最初的想法来自这个 GitHub thread。

到目前为止，大部分内存都被交叉注意力层占用了。可以按顺序运行它以节省大量内存，而不是批量运行此操作。
这个可以通过调用enable_attention_slicing轻松启用，如下：

pipe.enable_attention_slicing()

Great，现在attention slicing已启用，让我们再次尝试将批量大小加倍，尝试batch_size=8：

images = pipe(**get_inputs(batch_size=8)).images                                                                                                                                                                                              
image_grid(images, rows=2, cols=4)

很好，它有效。然而，速度增益也并不是很大（但在其他 GPU 上可能更显着）。

生成每张图像大约需要 3.5 秒，这可能是我们在不牺牲质量的情况下使用简单的 T4 可以做到的最快速度。

接下来，让我们看看如何提高质量！

Quality improvements

现在我们的图像生成pipeline非常快，让我们尝试获得最高的图像质量。

首先，图像质量是非常主观的，所以很难在这里做出笼统的评价。

提高质量最明显的方法是使用更好的checkpoints。自 Stable Diffusion 发布以来，已经发布了许多改进版本。

相同参数下，较新的版本不一定意味着图像质量更好。人们提到对于某些prompts，2.0 比 1.5 稍差，但在正确的prompt engineering下，2.0 和 2.1 似乎更好。

总体而言，我们强烈建议试用模型并在线阅读建议。例如已经表明，使用否定prompt对于 2.0 和 2.1 获得尽可能高的质量非常重要。例如，请参阅这篇不错的博客文章。

此外，社区已经开始针对某些样式对上述许多版本进行微调，其中一些具有极高的质量并获得了很大的吸引力。

建议查看所有按下载量排序的diffusers checkpoints并尝试不同的checkpoints。

对于以下内容，为简单起见，我们将继续使用 v1.5。

接下来，我们还可以尝试优化pipeline的单个组件，例如关闭latent decoder。有关整个Stable Diffusion pipeline如何工作的更多详细信息，请查看此博客文章[2]。
让我们加载 stabilityai 最新的auto-decoder。

from diffusers import AutoencoderKL                                                                                                                                                                                                           

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to("cuda")

现在我们可以将它设置为pipeline的vae来使用它。

pipe.vae = vae

让我们运行与之前相同的prompt来比较质量。

images = pipe(**get_inputs(batch_size=8)).images                                                                                                                                                                                              
image_grid(images, rows=2, cols=4)

看起来差异很小，但新一代可以说更锐利一些。

Cool, 最后，让我们看一下prompt engineering。

我们的目标是生成一张“an old warrior chief”的照片。现在让我们尝试为照片添加更多颜色，让照片看起来更令人印象深刻。

最初我们的prompt是"portrait of an old warrior chief"。

为了改进prompt，通常添加已经在线使用来保存高质量照片的提示，以及添加更多细节。

本质上，在进行prompt engineering时，必须考虑：

我想要的照片或类似照片可能是如何存储在互联网上的？
我可以提供哪些额外的细节来引导模型成为我想要的风格？

Cool，让我们添加更多细节。

prompt += ", tribal panther make up, blue on red, side profile, looking away, serious eyes"

我们还添加一些通常有助于生成更高质量图像的提示。

prompt += " 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta"                                                                                                                                         
prompt

Cool, 让我们现在试试这个prompt。

images = pipe(**get_inputs(batch_size=8)).images                                                                                                                                                                                              
image_grid(images, rows=2, cols=4)

相当令人印象深刻！我们获得了一些非常高质量的生成图像。第二张图片是我个人最喜欢的，所以我会重新使用这个种子，看看我是否可以通过使用“oldest warrior”、“old”、“”和“young”而不是“old”来稍微调整prompt。

prompts = [                                                                                                                                                                                                                                   
    "portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",                                                                                                                                                                                                                                                                   
    "portrait photo of a old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",                                                                                                                                                                                                                                                                        
    "portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",                                                                                                                                                                                                                                                                            
    "portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",                                                                                                                                                                                                                                                                      
]                                                                                                                                                                                                                                             

generator = [torch.Generator("cuda").manual_seed(1) for _ in range(len(prompts))]  # 1 because we want the 2nd image                                                                                                                          

images = pipe(prompt=prompts, generator=generator, num_inference_steps=25).images                                                                                                                                                             
image_grid(images)

第一张图看起来不错！眼球运动略有变化，看起来很好。到此我们完成了关于如何使用Stable Diffusion的 101 指南。