Diffusers: Safety Checker, Checkpoint, DiffusionPipeline

Yongqiang Cheng

于 2025-04-09 22:30:29 发布

阅读量1.2k

点赞数 28

世上没有白读的书，每一页都算数。

本文链接：https://blog.csdn.net/chengyq116/article/details/147102885

版权

Large Vision Model (LVM) 专栏收录该内容

13 篇文章

订阅专栏

Diffusers: Safety Checker, Checkpoint, DiffusionPipeline

1. Safety Checker
2. Checkpoint variants
3. DiffusionPipeline explained
References

https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading
https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md

1. Safety Checker

Safety Checker 可以帮助识别和阻止包含暴力、色情、仇恨言论等不当内容的图像生成，确保输出内容符合社会规范和法律要求。

Diffusers implements a safety checker for Stable Diffusion models which can generate harmful content. The safety checker screens the generated output against known hardcoded not-safe-for-work (NSFW) content. If for whatever reason you’d like to disable the safety checker, pass safety_checker=None to the from_pretrained() method.

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)
"""
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`.
Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public.
Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results.
For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
"""

2. Checkpoint variants

A checkpoint variant is usually a checkpoint whose weights are:

Stored in a different floating point type, such as torch.float16, because it only requires half the bandwidth and storage to download. You can’t use this variant if you’re continuing training or using a CPU.
Non-exponential mean averaged (EMA) weights which shouldn’t be used for inference. You should use this variant to continue finetuning a model.

When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories. For example, stabilityai/stable-diffusion-2 and stabilityai/stable-diffusion-2-1 are stored in separate repositories.

Otherwise, a variant is identical to the original checkpoint. They have exactly the same serialization format (like safetensors), model structure, and their weights have identical tensor shapes.

checkpoint type	weight name	argument for loading weights
original	diffusion_pytorch_model.safetensors
floating point	diffusion_pytorch_model.fp16.safetensors	`variant`, `torch_dtype`
non-EMA	diffusion_pytorch_model.non_ema.safetensors	`variant`

There are two important arguments for loading variants:

torch_dtype specifies the floating point precision of the loaded checkpoint. For example, if you want to save bandwidth by loading a fp16 variant, you should set variant="fp16" and torch_dtype=torch.float16 to convert the weights to fp16. Otherwise, the fp16 weights are converted to the default fp32 precision.

If you only set torch_dtype=torch.float16, the default fp32 weights are downloaded first and then converted to fp16.
variant specifies which files should be loaded from the repository. For example, if you want to load a non-EMA variant of a UNet from stable-diffusion-v1-5/stable-diffusion-v1-5, set variant="non_ema" to download the non_ema file.

fp16

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)

non-EMA

pipeline = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
)

Use the variant parameter in the DiffusionPipeline.save_pretrained method to save a checkpoint as a different floating point type or as a non-EMA variant. You should try save a variant to the same folder as the original checkpoint, so you have the option of loading both from the same folder.

fp16

from diffusers import DiffusionPipeline

pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16")

non-EMA

pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema")

If you don’t save the variant to an existing folder, you must specify the variant argument otherwise it’ll throw an Exception because it can’t find the original checkpoint.

# this won't work
pipeline = DiffusionPipeline.from_pretrained(
    "./stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)

# this works
pipeline = DiffusionPipeline.from_pretrained(
    "./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)

3. DiffusionPipeline explained

As a class method, DiffusionPipeline.from_pretrained is responsible for two things:

Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, DiffusionPipeline.from_pretrained reuses the cache and won’t redownload the files.
Load the cached weights into the correct pipeline class - retrieved from the model_index.json file - and return an instance of it.

The pipelines’ underlying folder structure corresponds directly with their class instances. For example, the StableDiffusionPipeline corresponds to the folder structure in stable-diffusion-v1-5/stable-diffusion-v1-5.

from diffusers import DiffusionPipeline

repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
print(pipeline)

You’ll see pipeline is an instance of StableDiffusionPipeline, which consists of seven components:

"feature_extractor": a class transformers.CLIPImageProcessor from Transformers.
"safety_checker": a component for screening against harmful content.
"scheduler": an instance of class diffusers.PNDMScheduler.
"text_encoder": a class transformers.CLIPTextModel from Transformers.
"tokenizer": a class transformers.CLIPTokenizer from Transformers.
"unet": an instance of class diffusers.UNet2DConditionModel.
"vae": an instance of class diffusers.AutoencoderKL.

StableDiffusionPipeline {
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

Compare the components of the pipeline instance to the stable-diffusion-v1-5/stable-diffusion-v1-5 folder structure, and you’ll see there is a separate folder for each of the components in the repository:

.
├── feature_extractor
│   └── preprocessor_config.json
├── model_index.json
├── safety_checker
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   ├── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
|   ├── model.fp16.safetensors
│   ├── model.safetensors
│   |── pytorch_model.bin
|   └── pytorch_model.fp16.bin
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   ├── diffusion_pytorch_model.bin
|   |── diffusion_pytorch_model.fp16.bin
│   |── diffusion_pytorch_model.f16.safetensors
│   |── diffusion_pytorch_model.non_ema.bin
│   |── diffusion_pytorch_model.non_ema.safetensors
│   └── diffusion_pytorch_model.safetensors
|── vae
.   ├── config.json
.   ├── diffusion_pytorch_model.bin
    ├── diffusion_pytorch_model.fp16.bin
    ├── diffusion_pytorch_model.fp16.safetensors
    └── diffusion_pytorch_model.safetensors

You can access each of the components of the pipeline as an attribute to view its configuration:

pipeline.tokenizer
CLIPTokenizer(
    name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
    vocab_size=49408,
    model_max_length=77,
    is_fast=False,
    padding_side="right",
    truncation_side="right",
    special_tokens={
        "bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
        "pad_token": "<|endoftext|>",
    },
    clean_up_tokenization_spaces=True
)

Every pipeline expects a model_index.json file that tells the DiffusionPipeline:

which pipeline class to load from _class_name
which version of Diffusers was used to create the model in _diffusers_version
what components from which library are stored in the subfolders (name corresponds to the component and subfolder name, library corresponds to the name of the library to load the class from, and class corresponds to the class name)

{
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.6.0",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}