精美绝伦，用Stable Diffusion和Dreambooth为宠物创建艺术照（上）

本文链接：https://blog.csdn.net/weixin_39915649/article/details/131552203

本文将介绍如何使用Stable Diffusion和Dreambooth为宠物狗画肖像。

微信搜索关注《Python学研大本营》，加入读者群，分享更多精彩

简介

2022年8月，当第一次开始使用Stable Diffusion文本到图像的生成时，第一反应是，“OMG!我需要为我的艺术墙制作艺术印刷品！”。只是后来又马上脸色一沉，因为vanilla Stable Diffusion的控制是相当具有挑战性的。如果试图再现一个特定的主题，需要利用额外的策略和技术，这些策略和技术在当时都不存在。

在接下来的几个月里，出现了几个新的社区项目，旨在让AI艺术家对他们试图带来的视觉输出进行完全的创造性控制。其中一项技术是LoRA（低等级适应）。在关于用Stable Diffusion制作自画像和混合自定义艺术家风格的文章中探索了LoRA的使用。

一个更受欢迎的技术是Dreambooth，这也是在本文的其余部分要重点关注的内容。将通过介绍整个工作流程/过程，把Stable Diffusion作为一个高质量的框架艺术印刷品带入生活中，将介绍使用Dreambooth、Stable Diffusion、Outpainting、Inpainting、Upscaling制作艺术作品、使用Photoshop准备打印，最后使用Epson XP-15000打印机在美术纸上打印。

接下来，本文将对此进行详细介绍。

什么是Dreambooth？

Dreambooth是一种文本到图像扩散AI模型的微调技术。基本上，这只是意味着可以“微调 ”已经有能力的开源稳定扩散模型，以生成由用户定义的主题可靠和风格一致的图像。

Dreambooth的高层次工作原理图。

如果对这种事情感兴趣，强烈建议阅读Dreambooth论文，可以在这里找到https://arxiv.org/abs/2208.12242；虽然有一些技术部分，但他们也包括许多图像示例，帮助对可能发生的事情建立直觉。Dreambooth论文非常鼓舞人心，它实际上促使制作了这个艺术项目并撰写了这篇文章；也许主要是因为有大量的狗狗照片示例。在下面包括了他们论文中的一些图片。

它就像一个照相馆，但一旦捕捉到主题，就可以将其合成到梦寐以求的任何地方……——https://dreambooth.github.io/

输入5只狗的照片，输出无穷无尽的图像。这些生活的数据集在这里：https://github.com/google/dreambooth/tree/main/dataset/dog5

如何用Replicate训练自己的Dreambooth模型

对于这个项目和帖子，将在最好的朋友🧀Queso的照片基础上训练一个Dreambooth模型。

Queso是一只非常上镜且可爱的英国奶油金毛猎犬，他是有史以来最好的孩子，这使他成为训练定制Dreambooth模型的完美对象！

建立一个图像训练集

在训练一个自定义的Dreambooth模型时，首先需要的是一个“高质量”的图像训练集。把“高质量”放在引号里，因为在过去看到过用不太理想的图像得到相当好的结果。然而，通常的做法是选择主体在不同的姿势、环境和照明条件下的几张图片。拥有的被摄对象的种类越多（姿势、环境和光线），经过微调的Dreambooth模型就越通用、越灵活。

在论文中，他们使用3-5张照片来训练Dreambooth模型；但是在社区中，使用更多照片是很常见的。因此，本文收集了40张Queso在不同姿势、光线和环境中的照片。

选择剪掉图片的背景，因为其中一些图片的环境非常相似，在早期测试中，发现这些背景元素开始出现在生成的图片中。这是非常随意的，除非遇到问题，否则不建议这样做。在Photoshop中用对象选择工具很快就能做到这一点；快速选择Queso，反向选择，并删除背景。

获得所有的图片后，就创建了一个.zip文件，并把它上传到s3，在那里可以通过URL引用它。这一点很重要，因为将在下一步把这个压缩文件的url传递到Dreambooth训练作业中。

下面是Queso训练集的图像网格。

40张Queso照片，背景已移除。

在Replicate上运行Dreambooth训练

对于Dreambooth训练，选择使用Replicate。Replicate对于这样的项目很好，因为它最大限度地减少了摸索云GPU和手动安装和设置一切的痛苦。只需发送一个HTTP请求，而不必考虑GPU或在完成后终止实例的问题。Replicate有一个半文档化的Dreambooth训练API，在本文中有所描述。

如果喜欢冒险并且想深入研究，建议尝试一下@TheLastBen的这套快速-稳定-扩散的google colab笔记本。他们有一个用于训练Dreambooth模型并快速启动Automatic1111 Stable Diffusion网络界面的笔记本。

按照Replicate Dreambooth文档的博文，制作了一个带有一些硬编码输入的快速一次性bash脚本。

下面是queso-1.5.sh bash脚本，它只是从Replicate博客上复制粘贴过来的。

这个脚本从开始运行到结束需要30-40分钟，所以可以休息一下，和一个毛茸茸的朋友去散步。不幸的是，更多的训练步骤意味着更多的训练时间，而4000步是很多的。

可以注意到在JSON请求正文中有一个model字段。一旦训练工作完成，一个私有的复制模型将在https://replicate.com/jakedahn/queso-1-5（本文的模型是私有的，所以它将返回404）这样的URL上创建。一旦这个模型被创建，就可以通过Replicate Web UI或通过Replicate API生成图像。

#!/bin/bash

curl -X POST \
    -H "Authorization: Token $REPLICATE_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
            "input": {
                "instance_prompt": "a photo of a qdg dog",
                "class_prompt": "photograph of a golden retriever dog, 4k hd, high detail photograph, sharp lens, realistic, highly detailed, fur",
                "instance_data": "https://shruggyface.s3-us-west-2.amazonaws.com/queso-2023-transparent-all.zip",
                "max_train_steps": 4000
            },
            "model": "jakedahn/queso-1-5", # The dreambooth model will be added to your Replicate account at this URL. Replace "jakedahn" with your username...
            "trainer_version": "cd3f925f7ab21afaef7d45224790eedbb837eeac40d22e8fefe015489ab644aa",
            "webhook_completed": "https://abc123.m.pipedream.net/queso-1-5"
        }' \
    https://dreambooth-api-experimental.replicate.com/v1/trainings

然后如下所示运行它：

REPLICATE_API_TOKEN=your-token-here ./queso-1-5.sh

分解输入

这个脚本中的输入定义了Dreambooth模型的训练方式，它们很重要。

instance_prompt：实例提示有点像一个示例提示，如果想得到模型主体的图像，就会使用它。建议的格式是 a [identifier] [class noun]。如果希望identifier成为一个独特的“令牌”，这意味着它应该是3-4个字母，而不应该是一个词。听说有些人有更适合他们的特定令牌，对他们来说效果更好，本文选择了qdg。
class_prompt：在训练Dreambooth模型时，需要提供额外的“Regularization Images”，这有助于防止极端的过度拟合。如果没有这些图像，生成的每张图像都将只是试图重现训练集中的确切图像。通过给训练集提供额外的“类似”图像，在示例中，就是更多的金毛猎犬的照片，输出的模型将更加灵活，并在更多场景中给出更好的结果。默认情况下，Replicate将使用类提示生成50张图片；建议尝试使用更多。
instance_data：这是一个包含所有训练图像的压缩文件。Replicate有一个API，可以把这个文件上传到他们的服务器上，但这有点复杂/麻烦，所以只是把文件托管在s3上，这样可以很容易地在未来的项目中重复使用。
max_train_steps：这是训练步骤的数量。某种程度上，越高越好。关于这个值有多种矛盾的说法，但最一致的说法似乎是“每张训练图像100步”。因此，由于有40张图片，使用了4000步。在以前的训练中，用40张图片和3000步得到了很好的结果--所以这是需要自己实验的东西。
trainer_version：训练者的版本很重要！有几个选项可以供你选择。
- 如果想使用Stable Diffusion v1.5，使用cd3f925f7ab21afaef7d45224790eedbb837eeac40d22e8fefe015489ab644aa。
- 如果想使用Stable Diffusion v2.1，使用d5e058608f43886b9620a8fbb1501853b8cbae4f45c857a014011c86ee614ffb。
webhook_completed：训练一个Dreambooth模型需要一些时间，并且在完成时能够收到通知是很好的。为这个webhook url使用了一个来自https://pipedream.com的requestbin，它提供了一个简单的UI来探索发送到webhook端点的数据：

Pipedream.com requestbin的屏幕截图

生成图像

太好了!如果到目前为止一直在仔细阅读，那么应该有自己的定制Dreambooth模型了!接下来是有趣的部分：为毛茸茸的朋友生成大量的图像。

首先，需要写一些提示，然后就可以生成成百上千的图片。

选择了一条简单的路线，在Lexica的无限卷轴上逛了一个小时。Lexica是一个由AI生成的图像的大集合，所有这些图像都与他们的提示一起分享。过了一会儿，从搜索词dog portrait中挑选了十张很酷的图片，并复制了它们的提示语。

收集了以下提示，并用象征性qdg替换了狗的品种：

PROMPTS = [
    "Adorably cute qdg dog portrait, artstation winner by Victo Ngai, Kilian Eng and by Jake Parker, vibrant colors, winning-award masterpiece, fantastically gaudy, aesthetic octane render, 8K HD Resolution",
    "Incredibly cute golden retriever qdg dog portrait, artstation winner by Victo Ngai, Kilian Eng and by Jake Parker, vibrant colors, winning-award masterpiece, fantastically gaudy, aesthetic octane render, 8K HD Resolution",
    "a high quality painting of a very cute golden retriever qdg dog puppy, friendly, curious expression. painting by artgerm and greg rutkowski and alphonse mucha ",
    "magnificent qdg dog portrait masterpiece work of art. oil on canvas. Digitally painted. Realistic. 3D. 8k. UHD.",
    "intricate five star qdg dog facial portrait by casey weldon, oil on canvas, hdr, high detail, photo realistic, hyperrealism, matte finish, high contrast, 3 d depth, centered, masterpiece, vivid and vibrant colors, enhanced light effect, enhanced eye detail, artstationhd ",
    "a portrait of a qdg dog in a scenic environment by mary beale and rembrandt, royal, noble, baroque art, trending on artstation ",
    "a painted portrait of a qdg dog with brown fur, no white fur, wearing a sea captain's uniform and hat, sea in background, oil painting by thomas gainsborough, elegant, highly detailed, anthro, anthropomorphic dog, epic fantasy art, trending on artstation, photorealistic, photoshop, behance winner ",
    "qdg dog guarding her home, dramatic sunset lighting, mat painting, highly detailed, ",
    "qdg dog, realistic shaded lighting poster by ilya kuvshinov katsuhiro otomo, magali villeneuve, artgerm, jeremy lipkin and michael garmash and rob rey ",
    "a painting of a qdg dog dog, greg rutkowski, cinematic lighting, hyper realistic painting",
]

然后写了一个超级快速/糟糕的Python脚本，在每个提示中迭代10次，总共生成了100张图片。这样做了很多次......从来没有厌倦过看AI生成的狗的艺术照。

import os
import urllib
import random
import replicate

USERNAME = 'jakedahn'
MODEL_NAME = 'queso-1.5'
MODEL_SLUG = f'{USERNAME}/{MODEL_NAME}'

# 从replicate中抓取模型
model = replicate.models.get(MODEL_SLUG)
# 抓住最新版本
version = model.versions.list()[0] 

def download_prompt(prompt, negative_prompt=NEGATIVE_PROMPT, num_outputs=1):
    print("=====================================================================")
    print("prompt:", prompt)
    print("negative_prompt:", negative_prompt)
    print("num_outputs:", num_outputs)
    print("=====================================================================")
    image_urls = version.predict(
        prompt=prompt,
        width=512,
        height=512,
        negative_prompt=negative_prompt,
        num_outputs=num_outputs,
    )
    for url in image_urls:
        img_id = url.split("/")[4][:6]
        prompt = prompt.replace(" ", "-").replace(",", "").replace(".", "-")
        out_file = f"data/{MODEL_NAME}/{img_id}--{prompt}"[:200]
        out_file = out_file + ".jpg"

        # 如果文件夹不存在，就创建它
        os.makedirs(os.path.dirname(out_file), exist_ok=True)

        print("Downloading to", out_file)
        urllib.request.urlretrieve(url, out_file)
    print("=====================================================================")

NEGATIVE_PROMPT = "cartoon, blurry, deformed, watermark, dark lighting, image caption, caption, text, cropped, low quality, low resolution, malformed, messy, blurry, watermark"

#这些提示都来自于[https://lexica.art/?q=dog+portrait](https://lexica.art/?q=dog+portrait) 
PROMPTS = [
    "Adorably cute qdg dog portrait, artstation winner by Victo Ngai, Kilian Eng and by Jake Parker, vibrant colors, winning-award masterpiece, fantastically gaudy, aesthetic octane render, 8K HD Resolution",
    "Incredibly cute golden retriever qdg dog portrait, artstation winner by Victo Ngai, Kilian Eng and by Jake Parker, vibrant colors, winning-award masterpiece, fantastically gaudy, aesthetic octane render, 8K HD Resolution",
    "a high quality painting of a very cute golden retriever qdg dog puppy, friendly, curious expression. painting by artgerm and greg rutkowski and alphonse mucha ",
    "magnificent qdg dog portrait masterpiece work of art. oil on canvas. Digitally painted. Realistic. 3D. 8k. UHD.",
    "intricate five star qdg dog facial portrait by casey weldon, oil on canvas, hdr, high detail, photo realistic, hyperrealism, matte finish, high contrast, 3 d depth, centered, masterpiece, vivid and vibrant colors, enhanced light effect, enhanced eye detail, artstationhd ",
    "a portrait of a qdg dog in a scenic environment by mary beale and rembrandt, royal, noble, baroque art, trending on artstation ",
    "a painted portrait of a qdg dog with brown fur, no white fur, wearing a sea captain's uniform and hat, sea in background, oil painting by thomas gainsborough, elegant, highly detailed, anthro, anthropomorphic dog, epic fantasy art, trending on artstation, photorealistic, photoshop, behance winner ",
    "qdg dog guarding her home, dramatic sunset lighting, mat painting, highly detailed, ",
    "qdg dog, realistic shaded lighting poster by ilya kuvshinov katsuhiro otomo, magali villeneuve, artgerm, jeremy lipkin and michael garmash and rob rey ",
    "a painting of a qdg dog dog, greg rutkowski, cinematic lighting, hyper realistic painting",
]

# 随机化PROMPT顺序
random.shuffle(PROMPTS)

# 循环10次，生成100张图片
for i in range(10):
    for prompt in PROMPTS:
        download_prompt(prompt)

在多次运行这个脚本之后，至少生成了1000张图片。大约20%是无稽之谈，大约80%是可爱、有趣或准确的。这些是最喜欢的一些：

精挑细选的最爱图片

寻找唯一

最终，在产生了数以百计的人造Queso后，找到了这个。喜欢这个调色板，它充满活力和对比度。喜欢它的质地和所有的细节。它还很好地捕捉了Queso的眼睛，这也是最终吸引人的地方。每次看到它，都会想，“天哪，那是Queso！”

这个。就是这个。

现在，这个艺术项目的最终目标是最终得到一幅高质量的艺术印刷品，可以把它装裱起来放在艺术墙上。虽然很酷，但这张图片并不能成为一张很好的艺术印刷品；顶部和底部的尴尬裁剪限制了它的潜力。此外，如果在300dpi下打印512x512px，这张图片在纸上只能是1.7x1.7英寸。但目标是11x17”英寸。

因此，这个项目的下一步是解决尴尬的裁剪问题。不能只是在顶部和底部添加新的像素，这真是太无奈了。

可以做到！这就是Outpainting的作用，期待下期内容吧。