flux文生图生成高质量图像

最新推荐文章于 2025-03-06 17:59:03 发布

二分掌柜的

最新推荐文章于 2025-03-06 17:59:03 发布

阅读量1k

点赞数 11

分类专栏：大模型目标检测 YOLOv5 文章标签：深度学习 python flux transformer ViT 大模型

本文链接：https://blog.csdn.net/flyfish1986/article/details/144915102

版权

大模型同时被 3 个专栏收录

231 篇文章

订阅专栏

YOLOv5

98 篇文章

订阅专栏

目标检测

94 篇文章

订阅专栏

flux文生图生成高质量图像

flyfish

import torch
from diffusers import FluxPipeline


# 初始化 FluxPipeline
model_path = "/home/FLUX___1-dev"
pipe = FluxPipeline.from_pretrained(model_path, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()  # save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power


# 创建不同的英文提示列表
prompts = [
    "A magnificent future world where vast, rolling meadows stretch as far as the eye can see, dotted with charming cottages and colorful wildflowers, and the sky is a brilliant canvas of pastel hues.",
    "A vast and idyllic landscape where nature thrives in perfect balance, with towering mountains ranges and crystal - clear rivers flowing through, and the air is filled with the sweet fragrance of blooming flowers."
]


# 设置通用参数
height = 1024
width = 1024
guidance_scale = 3.5
num_inference_steps = 100
max_sequence_length = 512
generator = torch.Generator("cuda")


# 计算prompts总数
total_images = len(prompts)


# 定义生成prompts的数量 每个提示生成的图像数量
num_images_per_prompt = 2


# 循环遍历每个提示并生成图像
for i, prompt in enumerate(prompts):
    # 显示当前生成第几张图像和总共的图像数量
    print(f"Generating images for prompt {i + 1} of {total_images}")
    # 设置随机种子以确保不同的结果（可选）
    seed = i  # 使用索引作为种子，或者使用其他方法设置种子
    generator.manual_seed(seed)


    # 生成图像
    result = pipe(
        prompt,
        height=height,
        width=width,
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        max_sequence_length=max_sequence_length,
        generator=generator,
        num_images_per_prompt=num_images_per_prompt  # 设置每个提示生成的图像数量
    )


    # 保存生成的图像到本地文件
    for j in range(len(result.images)):
        filename = f"flux_dev_hd_{i}_{j}.png"
        result.images[j].save(filename)
        print(f"Saved {filename}")

导入和初始化：

import torch 和 from diffusers import FluxPipeline：导入所需的库。
model_path 和 pipe = FluxPipeline.from_pretrained(...)：指定并加载预训练模型。
pipe.enable_model_cpu_offload()：将模型从 GPU 卸载到 CPU 以节省 GPU 内存。

提示列表：

prompts：存储用于图像生成的英文描述列表，每个元素是对未来世界的不同描述。

参数设置：

height 和 width：设置生成图像的尺寸。
guidance_scale：控制生成图像与提示的贴合程度。
num_inference_steps：决定推理步骤数量，影响图像质量和计算时间。
max_sequence_length：限制输入提示的长度。
generator：为生成图像提供随机数源。

提示和图像数量计算：

total_prompts：计算提示的总数。
num_images_per_prompt：指定每个提示生成的图像数量。

图像生成和保存：

for i, prompt in enumerate(prompts)：遍历提示列表。
generator.manual_seed(seed)：设置随机种子。
result = pipe(...)：调用 pipe 生成图像，传入各种参数。
for j in range(len(result.images))：遍历生成的图像列表。
filename = f"flux_dev_hd_{i}_{j}.png"：生成文件名。
result.images[j].save(filename)：保存图像。

guidance_scale 参数的详细解释：

含义

guidance_scale 是一个用于控制生成图像过程中引导程度的参数。它决定了在生成图像时，文本提示对最终结果的影响程度。

取值范围和影响

最小值：
- 理论上，guidance_scale 的最小值可以是 0，但在实际应用中，将其设置为 0 会导致生成的图像几乎不受文本提示的影响。此时，生成的图像可能看起来像是随机噪声，因为模型几乎不会考虑输入的文本描述，而主要依据噪声分布进行图像的生成，生成结果可能与你期望的内容相差甚远。
- 例如，当 guidance_scale = 0 时，对于一个描述为 “A red apple on a green table” 的提示，生成的图像可能完全没有苹果和绿色桌子的特征，而是呈现出毫无意义的噪声图案。
最大值：
- 通常没有明确的最大值限制，但在实践中，较大的值会使生成的图像更严格地遵循文本提示，但也可能导致一些问题。
- 当 guidance_scale 的值非常大时，可能会使生成的图像过于刻板或不自然。这是因为模型会过度依赖文本提示，可能会导致图像生成的多样性降低，细节上可能出现过度强调某些特征而失去平衡的现象。
- 例如，对于同样的 “A red apple on a green table” 提示，将 guidance_scale 设置为一个很大的值（如 20）可能会导致生成的苹果和桌子的形状、颜色非常符合描述，但可能显得过于完美和不真实，缺少自然感，甚至可能出现颜色过饱和、形状过于规则等问题。
推荐范围：
- 一般情况下，guidance_scale 的合理范围通常在 1 到 10 之间。
- 对于更具创造性和灵活性的图像生成，你可以从较低的值开始尝试，比如 1 到 3 左右，这样生成的图像可能会有更多的创意和变化，但可能不会完全符合文本描述。
- 如果你希望生成的图像更贴近文本描述，可以将其设置在 5 到 10 之间。