沉浸式体验Stability AI文生图、图生图、图片PS功能（中篇）

最新推荐文章于 2025-01-16 10:59:36 发布

佛州小李哥

最新推荐文章于 2025-01-16 10:59:36 发布

阅读量2.6k

点赞数 32

CC 4.0 BY-SA版权

分类专栏： AWS技术文章标签：人工智能科技 aws 亚马逊云科技开发语言模型架构

本文链接：https://blog.csdn.net/m0_66628975/article/details/141759983

AWS技术专栏收录该内容

188 篇文章

订阅专栏

今天小李哥就来介绍亚马逊云科技推出的国际前沿人工智能模型平台Amazon Bedrock上的Stability Diffusion模型开发生成式AI图像生成应用！本系列共有3篇，在上篇中我们学习了如何在亚马逊云科技控制台上体验该模型的每个特色功能，如文生图、图生图、图像修复等。

接下来在中篇中我将带大家沉浸式实操通过API调用的方式访问Stability Difussion模型，体验该模型的特色功能。大家可以通过本博客中的实操项目自己学习AI技能，并应用到日常工作中。也欢迎大家继续关注本系列第三篇，通过Stability Difussion模型API调用的方式，开发一个属于自己的图片生成网页应用。

方案所需基础知识

什么是Amazon Bedrock

Amazon Bedrock 是一项完全托管的服务，通过统一的 API 提供来自 AI21 Labs、Anthropic、Cohere、Meta、Mistral AI、Stability AI 和 Amazon 等领先 AI 公司的高性能基础模型（FMs），同时提供广泛的功能，让开发者能够在确保安全、隐私和负责任 AI 的前提下构建生成式 AI 应用。使用 Amazon Bedrock，开发者们可以：

轻松地测试、评估开发者的用例在不同基础模型下的表现；

使用微调和检索增强生成（RAG）等技术定制化开发应用程序；
构建可以使用开发者的企业系统和数据源自动执行任务的智能 Agents。
由于 Amazon Bedrock 是 Serverless 的服务，开发者无需管理任何基础设施，并且可以使用开发者已经熟悉其它的亚马逊云科技服务安全地集成和部署生成式 AI 功能到开发者的应用中。

什么是 Stability AI 模型？

Stability AI 是一家致力于开发和提供生成式人工智能模型的公司，其模型被广泛应用于图像生成领域。Stability AI 的模型中最著名的莫非是 Stable Diffusion 生成模型，能够根据用户输入的描述，自动生成高度逼真的图像和文本。这些模型以其卓越的生成能力和灵活性，在应用开发中管饭应用和认可。

Stability AI 模型的应用场景

创意设计：

Stability AI 的生成模型广泛应用于创意设计领域，帮助设计师和美工快速生成高质量的图像、插画和视觉内容，可以用于产品展示、品牌推广，社交媒体内容创作。通过简单的文本描述，就可以快速生成符合特定主题或风格的视觉素材，大大提升了设计效率和创意表现力。

游戏开发：

Stability AI 的图像生成技术也被广泛应用于游戏开发中。开发者可以利用这些模型快速生成游戏场景、角色设计和道具，节省大量的美术资源，并加速游戏开发。利用AI的图像生成能力，使得小型开发团队也能够制作出富有视觉冲击力的游戏内容。

教育和培训：

在教育和培训领域，Stability AI 模型能够根据教学需求生成个性化的学习材料和培训教案，帮助教师和培训师提高教学效果，特别是生成式AI相关的主题培训，提升学习者的参与度和学习体验。

本实践包括的内容

1. 通过Amazon Bedrock API调用的方式体验文生图功能

2. 通过Amazon Bedrock API调用的方式体验和图像修复功能

3. 通过Amazon Bedrock API调用的方式体验图生图功能

功能实践具体步骤

文生图功能

1. 首先我们了解Stability模型的文生图推理参数，以Stability AI SDXL 1.0为例，模型参数共分为必要参数和可选参数。

必要参数有：

参数	描述	最低值	最高值
text_prompts	生成文本提示数组，包含提示及权重	0	2000

可选参数有：

参数	描述	默认值	最低值	最高值	可选值
weight	模型应用于提示的权重	1
cfg_scale	决定最终图像对提示的描绘程度	7	0	35
clip_guidance_preset	预设参数				FAST_BLUE, FAST_GREEN, NONE, SIMPLE SLOW, SLOWER, SLOWEST
height	生成图像的高度				1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152
width	生成图像的宽度				1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152
sampler	扩散过程采样器				DDIM, DDPM, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS
samples	要生成的图像数量	1	1	1
seed	决定初始噪声设置的种子	0	0	4294967295
steps	生成步骤数，影响采样次数和结果准确度	30	10	50
style_preset	将图像模型向特定样式引导的样式预设				3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture

实操代码：

首先我们利用Amazon Bedrock API生成推理图片，确认已安装”boto3“、”PIL“和”botocore“等必要依赖，以下为实例代码(具体的代码解释在代码备注中)：

import base64
import io
import json
import os
import sys

import boto3
from PIL import Image
import botocore

boto3_bedrock = boto3.client('bedrock-runtime')

prompt = "a beautiful mountain landscape" #提示词
negative_prompts = [
    "poorly rendered",
    "poor background details",
    "poorly drawn mountains",
    "disfigured mountain features",
]#负向提示词列表，用于指定不希望在生成的图片中出现的特征
style_preset = "photographic"  #风格预设，用于指定生成图片的风格。在这里，选择了“photographic”风格，意味着生成的图片将具有类似摄影照片的效果。 (e.g. photographic, digital-art, cinematic, ...)
clip_guidance_preset = "FAST_GREEN" #这是 Clip 引导预设，用于控制图像生成过程中的一些参数和行为。"FAST_GREEN" 是一种预设选项，可能会影响生成速度和结果的某些方面。 (e.g. FAST_BLUE FAST_GREEN NONE SIMPLE SLOW SLOWER SLOWEST)
sampler = "K_DPMPP_2S_ANCESTRAL" # 这是采样器的选择，用于确定在生成图片时使用的采样方法。"K_DPMPP_2S_ANCESTRAL" 是一种具体的采样器，不同的采样器可能会对生成的图片质量和多样性产生影响。(e.g. DDIM, DDPM, K_DPMPP_SDE, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS)
width = 768 #这是设置生成图片的宽度，单位为像素。这里指定宽度为 768 像素

request = json.dumps({
    "text_prompts": (
        [{"text": prompt, "weight": 1.0}]
        + [{"text": negprompt, "weight": -1.0} for negprompt in negative_prompts]
    ),
    "cfg_scale": 5,
    "seed": 42,
    "steps": 60,
    "style_preset": style_preset,
    "clip_guidance_preset": clip_guidance_preset,
    "sampler": sampler,
    "width": width,
})
modelId = "stability.stable-diffusion-xl-v1"

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
base_64_img_str = response_body["artifacts"][0].get("base64")
print(f"{base_64_img_str[0:80]}...")

os.makedirs("data", exist_ok=True)
image_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, "utf-8"))))
image_1.save("data/image_1.jpg")

推理结果

我们输入的提示词为”a beautiful mountain landscape“，生成一个漂亮的山景图，我们得到生成的图片：

图生图功能：

我们接下来了解Stability模型的图生图推理参数，以Stability AI SDXL 1.0为例，模型参数共分为必要参数和可选参数。

必要参数

参数	描述	最低值	最高值
text_prompts	生成文本提示数组，包含提示及权重	0	2000

可选参数

参数	描述	默认值	最低值	最高值	可选值
weight	模型应用于提示的权重	1
cfg_scale	决定最终图像对提示的描绘程度	7	0	35
clip_guidance_preset	预设参数				FAST_BLUE, FAST_GREEN, NONE, SIMPLE SLOW, SLOWER, SLOWEST
height	生成图像的高度				1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152
width	生成图像的宽度				1024x1024, 1152x896, 1216x832, 1344x768, 1536x640, 640x1536, 768x1344, 832x1216, 896x1152
sampler	扩散过程采样器				DDIM, DDPM, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS
samples	要生成的图像数量	1	1	1
seed	决定初始噪声设置的种子	0	0	4294967295
steps	生成步骤数，影响采样次数和结果准确度	30	10	50
style_preset	将图像模型向特定样式引导的样式预设				3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture
extras	传递给引擎的额外参数

实操代码

我们利用Amazon Bedrock API基于现有图片生成图片推理，确认已安装”boto3“、”PIL“和”botocore“等必要依赖，以下为实例代码(具体的代码解释在代码备注中)：

import base64
import io
import json
import os
import sys

import boto3
from PIL import Image
import botocore


def image_to_base64(img) -> str:
    """Convert a PIL Image or local image file path to a base64 string for Amazon Bedrock"""
    if isinstance(img, str):
        if os.path.isfile(img):
            print(f"Reading image from file: {img}")
            with open(img, "rb") as f:
                return base64.b64encode(f.read()).decode("utf-8")
        else:
            raise FileNotFoundError(f"File {img} does not exist")
    elif isinstance(img, Image.Image):
        print("Converting PIL Image to base64 string")
        buffer = io.BytesIO()
        img.save(buffer, format="jpeg")
        return base64.b64encode(buffer.getvalue()).decode("utf-8")
    else:
        raise ValueError(f"Expected str (filename) or PIL Image. Got {type(img)}")

boto3_bedrock = boto3.client('bedrock-runtime')

img_path = '../image/data/image_1.jpg'
# 打开图片
img = Image.open(img_path)

init_image_b64 = image_to_base64(img)
print(init_image_b64[:80] + "...")

change_prompt = "add denser number of trees, extend lake"
negative_prompts = [
    "poorly rendered",
    "poor background details",
    "poorly drawn mountains",
    "disfigured mountain features",
]#负向提示词列表，用于指定不希望在生成的图片中出现的特征
style_preset = "cinematic"  #风格预设，用于指定生成图片的风格。在这里，选择了“photographic”风格，意味着生成的图片将具有类似摄影照片的效果。 (e.g. photographic, digital-art, cinematic, ...)
clip_guidance_preset = "FAST_BLUE" #这是 Clip 引导预设，用于控制图像生成过程中的一些参数和行为。"FAST_GREEN" 是一种预设选项，可能会影响生成速度和结果的某些方面。 (e.g. FAST_BLUE FAST_GREEN NONE SIMPLE SLOW SLOWER SLOWEST)
sampler = "K_DPMPP_2S_ANCESTRAL" # 这是采样器的选择，用于确定在生成图片时使用的采样方法。"K_DPMPP_2S_ANCESTRAL" 是一种具体的采样器，不同的采样器可能会对生成的图片质量和多样性产生影响。(e.g. DDIM, DDPM, K_DPMPP_SDE, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS)
width = 768 #这是设置生成图片的宽度，单位为像素。这里指定宽度为 768 像素

request = json.dumps({
    "text_prompts": (
        [{"text": change_prompt, "weight": 1.0}]
        + [{"text": negprompt, "weight": -1.0} for negprompt in negative_prompts]
    ),
    "cfg_scale": 10,
    "init_image": init_image_b64,
    "seed": 321,
    "start_schedule": 0.6,
    "steps": 50,
    "style_preset": style_preset,
    "clip_guidance_preset": clip_guidance_preset,
    "sampler": sampler,
})
modelId = "stability.stable-diffusion-xl-v1"

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
image_2_b64_str = response_body["artifacts"][0].get("base64")
print(f"{image_2_b64_str[0:80]}...")

os.makedirs("data", exist_ok=True)
image_2 = Image.open(io.BytesIO(base64.decodebytes(bytes(image_2_b64_str, "utf-8"))))
image_2.save("data/image_2.jpg")

推理结果

我们输入的提示词为”add denser number of trees, extend lake”，为现有图片添加一片茂密的树林和一个湖泊，我们得到生成的图片：

图片编辑功能

我们接下来了解Stability模型的图片编辑推理参数，主要是利用新生成的图片替换原图片中的蒙版部分，以Stability AI SDXL 1.0为例，模型参数共分为必要参数和可选参数。

必要参数

参数	描述	最低值	最高值
text_prompt	用于生成的文本提示数组	0	2000
init_image	初始化扩散过程的base64编码图像
mask_source	确定蒙版来源
mask_image	用作init_image中源图像蒙版的base64编码

可选参数

参数	默认值	最低值	最高值	描述和可选值
weight	1			模型应用于提示的权重。
cfg_scale	7	0	35	确定最终图像对提示的描绘程度。
clip_guidance_preset				模型生成图像流程的预设参数。可选值：FAST_BLUE, FAST_GREEN, NONE, SIMPLE, SLOW, SLOWER, SLOWEST。
sampler				扩散过程的采样器。可选值：DDIM, DDPM, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS。
samples	1	1	1	生成图像的数量。
seed	0	0	4294967295	初始化噪声设置的种子。
steps	30	10	50	对图像进行采样的生成步骤数。
style_preset				图像模型特定样式的预设。可选值：3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture。
extras				引擎的额外参数。注意：'extras'参数用于开发中或实验性功能，可能会有变更，应谨慎使用。

实操代码

我们利用Amazon Bedrock API进行图像编辑功能，确认已安装”boto3“、”PIL“和”botocore“等必要依赖，以下为实例代码(具体的代码解释在代码备注中)：

import base64
import io
import json
import os
import sys

import boto3
from PIL import Image
import botocore

from PIL import ImageOps

def image_to_base64(img) -> str:
    """Convert a PIL Image or local image file path to a base64 string for Amazon Bedrock"""
    if isinstance(img, str):
        if os.path.isfile(img):
            print(f"Reading image from file: {img}")
            with open(img, "rb") as f:
                return base64.b64encode(f.read()).decode("utf-8")
        else:
            raise FileNotFoundError(f"File {img} does not exist")
    elif isinstance(img, Image.Image):
        print("Converting PIL Image to base64 string")
        buffer = io.BytesIO()
        img.save(buffer, format="jpeg")
        return base64.b64encode(buffer.getvalue()).decode("utf-8")
    else:
        raise ValueError(f"Expected str (filename) or PIL Image. Got {type(img)}")

def inpaint_mask(img, box):
    """Generates a segmentation mask for inpainting"""
    img_size = img.size
    assert len(box) == 4  # (left, top, right, bottom)
    assert box[0] < box[2]
    assert box[1] < box[3]
    return ImageOps.expand(
        Image.new(
            mode = "RGB",
            size = (
                box[2] - box[0],
                box[3] - box[1]
            ),
            color = 'black'
        ),
        border=(
            box[0],
            box[1],
            img_size[0] - box[2],
            img_size[1] - box[3]
        ),
        fill='white'
    )
    
img_path = '../image_to_image/data/image_2.jpg'
# 打开图片
image_2 = Image.open(img_path)

img2_size = image_2.size
box = (
        (0),
        (img2_size[1] - 900) ,
        (img2_size[0]),
        img2_size[1] - 700
    )

# Mask
mask = inpaint_mask(
    image_2,
    box
)

# Debug
mask

boto3_bedrock = boto3.client('bedrock-runtime')

inpaint_prompt = "add a helicopter"#添加一架直升机
style_preset = "cinematic"  #风格预设，用于指定生成图片的风格。在这里，选择了“photographic”风格，意味着生成的图片将具有类似摄影照片的效果。 (e.g. photographic, digital-art, cinematic, ...)
clip_guidance_preset = "FAST_BLUE" #这是 Clip 引导预设，用于控制图像生成过程中的一些参数和行为。"FAST_GREEN" 是一种预设选项，可能会影响生成速度和结果的某些方面。 (e.g. FAST_BLUE FAST_GREEN NONE SIMPLE SLOW SLOWER SLOWEST)
sampler = "K_DPMPP_2S_ANCESTRAL" # 这是采样器的选择，用于确定在生成图片时使用的采样方法。"K_DPMPP_2S_ANCESTRAL" 是一种具体的采样器，不同的采样器可能会对生成的图片质量和多样性产生影响。(e.g. DDIM, DDPM, K_DPMPP_SDE, K_DPMPP_2M, K_DPMPP_2S_ANCESTRAL, K_DPM_2, K_DPM_2_ANCESTRAL, K_EULER, K_EULER_ANCESTRAL, K_HEUN, K_LMS)
width = 768 #这是设置生成图片的宽度，单位为像素。这里指定宽度为 768 像素
request = json.dumps({
    "text_prompts":[{"text": inpaint_prompt}],
    "init_image": image_to_base64(image_2),
    "mask_source": "MASK_IMAGE_BLACK",
    "mask_image": image_to_base64(mask),
    "cfg_scale": 10,
    "seed": 32123,
    "style_preset": style_preset,
})

modelId = "stability.stable-diffusion-xl-v1"

response = boto3_bedrock.invoke_model(body=request, modelId=modelId)
response_body = json.loads(response.get("body").read())

print(response_body["result"])
image_3_b64_str = response_body["artifacts"][0].get("base64")

os.makedirs("data", exist_ok=True)
inpaint = Image.open(io.BytesIO(base64.decodebytes(bytes(image_3_b64_str, "utf-8"))))
inpaint.save("data/inpaint.jpg")