Diffusers: Safety Checker, Checkpoint, DiffusionPipeline

Diffusers: Safety Checker, Checkpoint, DiffusionPipeline

https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading
https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md

1. Safety Checker

Safety Checker 可以帮助识别和阻止包含暴力、色情、仇恨言论等不当内容的图像生成,确保输出内容符合社会规范和法律要求。

Diffusers implements a safety checker for Stable Diffusion models which can generate harmful content. The safety checker screens the generated output against known hardcoded not-safe-for-work (NSFW) content. If for whatever reason you’d like to disable the safety checker, pass safety_checker=None to the from_pretrained() method.

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)
"""
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`.
Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public.
Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results.
For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
"""

2. Checkpoint variants

A checkpoint variant is usually a checkpoint whose weights are:

  • Stored in a different floating point type, such as torch.float16, because it only requires half the bandwidth and storage to download. You can’t use this variant if you’re continuing training or using a CPU.
  • Non-exponential mean averaged (EMA) weights which shouldn’t be used for inference. You should use this variant to continue finetuning a model.

When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories. For example, stabilityai/stable-diffusion-2 and stabilityai/stable-diffusion-2-1 are stored in separate repositories.

Otherwise, a variant is identical to the original checkpoint. They have exactly the same serialization format (like safetensors), model structure, and their weights have identical tensor shapes.

checkpoint typeweight nameargument for loading weights
originaldiffusion_pytorch_model.safetensors
floating pointdiffusion_pytorch_model.fp16.safetensorsvariant, torch_dtype
non-EMAdiffusion_pytorch_model.non_ema.safetensorsvariant

There are two important arguments for loading variants:

  • torch_dtype specifies the floating point precision of the loaded checkpoint. For example, if you want to save bandwidth by loading a fp16 variant, you should set variant="fp16" and torch_dtype=torch.float16 to convert the weights to fp16. Otherwise, the fp16 weights are converted to the default fp32 precision.

    If you only set torch_dtype=torch.float16, the default fp32 weights are downloaded first and then converted to fp16.

  • variant specifies which files should be loaded from the repository. For example, if you want to load a non-EMA variant of a UNet from stable-diffusion-v1-5/stable-diffusion-v1-5, set variant="non_ema" to download the non_ema file.

fp16

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)

non-EMA

pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
)

Use the variant parameter in the DiffusionPipeline.save_pretrained method to save a checkpoint as a different floating point type or as a non-EMA variant. You should try save a variant to the same folder as the original checkpoint, so you have the option of loading both from the same folder.

fp16

from diffusers import DiffusionPipeline

pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16")

non-EMA

pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema")

If you don’t save the variant to an existing folder, you must specify the variant argument otherwise it’ll throw an Exception because it can’t find the original checkpoint.

# this won't work
pipeline = DiffusionPipeline.from_pretrained(
    "./stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)

# this works
pipeline = DiffusionPipeline.from_pretrained(
    "./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)

3. DiffusionPipeline explained

As a class method, DiffusionPipeline.from_pretrained is responsible for two things:

  • Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, DiffusionPipeline.from_pretrained reuses the cache and won’t redownload the files.
  • Load the cached weights into the correct pipeline class - retrieved from the model_index.json file - and return an instance of it.

The pipelines’ underlying folder structure corresponds directly with their class instances. For example, the StableDiffusionPipeline corresponds to the folder structure in stable-diffusion-v1-5/stable-diffusion-v1-5.

from diffusers import DiffusionPipeline

repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
print(pipeline)

You’ll see pipeline is an instance of StableDiffusionPipeline, which consists of seven components:

  • "feature_extractor": a class transformers.CLIPImageProcessor from Transformers.
  • "safety_checker": a component for screening against harmful content.
  • "scheduler": an instance of class diffusers.PNDMScheduler.
  • "text_encoder": a class transformers.CLIPTextModel from Transformers.
  • "tokenizer": a class transformers.CLIPTokenizer from Transformers.
  • "unet": an instance of class diffusers.UNet2DConditionModel.
  • "vae": an instance of class diffusers.AutoencoderKL.
StableDiffusionPipeline {
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

Compare the components of the pipeline instance to the stable-diffusion-v1-5/stable-diffusion-v1-5 folder structure, and you’ll see there is a separate folder for each of the components in the repository:

.
├── feature_extractor
│   └── preprocessor_config.json
├── model_index.json
├── safety_checker
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   ├── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   |── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   ├── diffusion_pytorch_model.bin
|   |── diffusion_pytorch_model.fp16.bin
│   |── diffusion_pytorch_model.f16.safetensors
│   |── diffusion_pytorch_model.non_ema.bin
│   |── diffusion_pytorch_model.non_ema.safetensors
│   └── diffusion_pytorch_model.safetensors
|── vae
.   ├── config.json
.   ├── diffusion_pytorch_model.bin
    ├── diffusion_pytorch_model.fp16.bin
    ├── diffusion_pytorch_model.fp16.safetensors
    └── diffusion_pytorch_model.safetensors

You can access each of the components of the pipeline as an attribute to view its configuration:

pipeline.tokenizer
CLIPTokenizer(
    name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
    vocab_size=49408,
    model_max_length=77,
    is_fast=False,
    padding_side="right",
    truncation_side="right",
    special_tokens={
        "bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "pad_token": "<|endoftext|>",
    },
    clean_up_tokenization_spaces=True
)

Every pipeline expects a model_index.json file that tells the DiffusionPipeline:

  • which pipeline class to load from _class_name
  • which version of Diffusers was used to create the model in _diffusers_version
  • what components from which library are stored in the subfolders (name corresponds to the component and subfolder name, library corresponds to the name of the library to load the class from, and class corresponds to the class name)
{
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.6.0",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

References

[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/

<think>好的,用户想了解运行文本生成图像模型所需的配置和完整的代码示例。我需要先整理一下常见的文本到图像模型,比如Stable Diffusion,然后考虑它们的系统需求。首先,硬件方面,GPU是必须的,尤其是NVIDIA的,因为很多深度学习框架对CUDA支持好。显存方面,至少需要4GB,但推荐8GB以上,特别是高分辨率生成的话。内存方面,16GB RAM可能足够,但更大的内存更好。存储空间方面,模型文件通常很大,比如Stable Diffusion的checkpoint可能几个GB,加上数据集的话需要更多空间。 然后是软件环境,Python版本需要3.7以上,PyTorch或TensorFlow框架,以及相应的库,比如diffusers、transformers。可能需要安装CUDA和cuDNN来加速GPU计算。用户可能需要指导如何安装这些依赖,比如用pip或conda。 接下来是代码示例,需要展示加载预训练模型、文本编码、生成图像的过程。例如使用Hugging Face的diffusers库,代码结构可能包括导入库、设置设备、加载模型和调度器、生成图像、保存结果。需要注意的是,代码中要处理不同的精度,比如fp16,以及安全检测器,可能用户不需要,可以禁用。 用户可能遇到的问题包括显存不足,这时候可能需要减少批次大小或图像分辨率。另外,生成速度慢的话,可能需要更强大的GPU或优化代码。还有模型下载的问题,可能需要设置镜像或手动下载。 需要确保提供的代码示例正确,并且符合用户的环境。可能还需要提到不同的模型变体,比如Stable Diffusion的不同版本,以及使用不同的调度器来平衡生成速度和质量。此外,安全注意事项,比如使用NSFW过滤器,但用户可能想关闭它,所以代码里可以加上参数。 最后,生成相关问题,比如如何优化性能,如何处理显存不足,如何调整生成参数,以及不同模型之间的比较。这些问题需要自然地从回答中引出,帮助用户进一步探索。</think>运行文本生成图像模型的系统需求和代码示例如下: ### 系统需求 1. **硬件配置**: - **GPU**:推荐NVIDIA RTX 3060及以上(显存≥8GB),支持CUDA加速[^4] - **内存**:建议≥16GB DDR4 - **存储**:SSD硬盘≥20GB(模型文件通常需要4-10GB) 2. **软件环境**: - Python 3.8+ - PyTorch 2.0+ with CUDA 11.7 - 依赖库:`diffusers`, `transformers`, `accelerate` ### 完整代码示例(基于Stable Diffusion) ```python from diffusers import StableDiffusionPipeline import torch # 初始化模型 model_id = "runwayml/stable-diffusion-v1-5" pipe = StableDiffusionPipeline.from_pretrained( model_id, torch_dtype=torch.float16, use_safetensors=True ).to("cuda") # 文本生成图像 prompt = "A futuristic city with flying cars, digital art" image = pipe( prompt, height=512, width=512, num_inference_steps=50, guidance_scale=7.5 ).images[0] # 保存结果 image.save("generated_image.png") ``` ### 关键参数说明 1. `torch_dtype=torch.float16`:启用混合精度推理,显存占用减少40%[^2] 2. `num_inference_steps`:建议50-100步(步数越多细节越丰富) 3. `guidance_scale`:7-8.5为创意生成,9+为精确匹配文本 ### 常见问题解决 - **显存不足**:添加`pipe.enable_attention_slicing()`减少显存占用 - **加速推理**:使用`torch.compile(pipe.unet)`(需PyTorch 2.0+) - **模型安全检测**:添加`safety_checker=None`参数禁用
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Yongqiang Cheng

梦想不是浮躁,而是沉淀和积累。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值