Diffusers: Safety Checker, Checkpoint, DiffusionPipeline
https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading
https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md
1. Safety Checker
Safety Checker 可以帮助识别和阻止包含暴力、色情、仇恨言论等不当内容的图像生成,确保输出内容符合社会规范和法律要求。
Diffusers implements a safety checker for Stable Diffusion models which can generate harmful content. The safety checker screens the generated output against known hardcoded not-safe-for-work (NSFW) content. If for whatever reason you’d like to disable the safety checker, pass safety_checker=None
to the from_pretrained()
method.
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)
"""
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`.
Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public.
Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results.
For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
"""
2. Checkpoint variants
A checkpoint variant is usually a checkpoint whose weights are:
- Stored in a different floating point type, such as
torch.float16
, because it only requires half the bandwidth and storage to download. You can’t use this variant if you’re continuing training or using a CPU. - Non-exponential mean averaged (EMA) weights which shouldn’t be used for inference. You should use this variant to continue finetuning a model.
When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories. For example, stabilityai/stable-diffusion-2 and stabilityai/stable-diffusion-2-1 are stored in separate repositories.
Otherwise, a variant is identical to the original checkpoint. They have exactly the same serialization format (like safetensors), model structure, and their weights have identical tensor shapes.
checkpoint type | weight name | argument for loading weights |
---|---|---|
original | diffusion_pytorch_model.safetensors | |
floating point | diffusion_pytorch_model.fp16.safetensors | variant , torch_dtype |
non-EMA | diffusion_pytorch_model.non_ema.safetensors | variant |
There are two important arguments for loading variants:
-
torch_dtype
specifies the floating point precision of the loaded checkpoint. For example, if you want to save bandwidth by loading a fp16 variant, you should setvariant="fp16"
andtorch_dtype=torch.float16
to convert the weights to fp16. Otherwise, the fp16 weights are converted to the default fp32 precision.If you only set
torch_dtype=torch.float16
, the default fp32 weights are downloaded first and then converted to fp16. -
variant
specifies which files should be loaded from the repository. For example, if you want to load a non-EMA variant of a UNet from stable-diffusion-v1-5/stable-diffusion-v1-5, setvariant="non_ema"
to download thenon_ema
file.
fp16
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)
non-EMA
pipeline = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
)
Use the variant
parameter in the DiffusionPipeline.save_pretrained
method to save a checkpoint as a different floating point type or as a non-EMA variant. You should try save a variant to the same folder as the original checkpoint, so you have the option of loading both from the same folder.
fp16
from diffusers import DiffusionPipeline
pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16")
non-EMA
pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema")
If you don’t save the variant to an existing folder, you must specify the variant
argument otherwise it’ll throw an Exception
because it can’t find the original checkpoint.
# this won't work
pipeline = DiffusionPipeline.from_pretrained(
"./stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)
# this works
pipeline = DiffusionPipeline.from_pretrained(
"./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
)
3. DiffusionPipeline explained
As a class method, DiffusionPipeline.from_pretrained
is responsible for two things:
- Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache,
DiffusionPipeline.from_pretrained
reuses the cache and won’t redownload the files. - Load the cached weights into the correct pipeline class - retrieved from the
model_index.json
file - and return an instance of it.
The pipelines’ underlying folder structure corresponds directly with their class instances. For example, the StableDiffusionPipeline
corresponds to the folder structure in stable-diffusion-v1-5/stable-diffusion-v1-5
.
from diffusers import DiffusionPipeline
repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
print(pipeline)
You’ll see pipeline is an instance of StableDiffusionPipeline
, which consists of seven components:
"feature_extractor"
: aclass transformers.CLIPImageProcessor
from Transformers."safety_checker"
: a component for screening against harmful content."scheduler"
: an instance ofclass diffusers.PNDMScheduler
."text_encoder"
: aclass transformers.CLIPTextModel
from Transformers."tokenizer"
: aclass transformers.CLIPTokenizer
from Transformers."unet"
: an instance ofclass diffusers.UNet2DConditionModel
."vae"
: an instance ofclass diffusers.AutoencoderKL
.
StableDiffusionPipeline {
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
Compare the components of the pipeline instance to the stable-diffusion-v1-5/stable-diffusion-v1-5
folder structure, and you’ll see there is a separate folder for each of the components in the repository:
.
├── feature_extractor
│ └── preprocessor_config.json
├── model_index.json
├── safety_checker
│ ├── config.json
| ├── model.fp16.safetensors
│ ├── model.safetensors
│ ├── pytorch_model.bin
| └── pytorch_model.fp16.bin
├── scheduler
│ └── scheduler_config.json
├── text_encoder
│ ├── config.json
| ├── model.fp16.safetensors
│ ├── model.safetensors
│ |── pytorch_model.bin
| └── pytorch_model.fp16.bin
├── tokenizer
│ ├── merges.txt
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── vocab.json
├── unet
│ ├── config.json
│ ├── diffusion_pytorch_model.bin
| |── diffusion_pytorch_model.fp16.bin
│ |── diffusion_pytorch_model.f16.safetensors
│ |── diffusion_pytorch_model.non_ema.bin
│ |── diffusion_pytorch_model.non_ema.safetensors
│ └── diffusion_pytorch_model.safetensors
|── vae
. ├── config.json
. ├── diffusion_pytorch_model.bin
├── diffusion_pytorch_model.fp16.bin
├── diffusion_pytorch_model.fp16.safetensors
└── diffusion_pytorch_model.safetensors
You can access each of the components of the pipeline as an attribute to view its configuration:
pipeline.tokenizer
CLIPTokenizer(
name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
vocab_size=49408,
model_max_length=77,
is_fast=False,
padding_side="right",
truncation_side="right",
special_tokens={
"bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"pad_token": "<|endoftext|>",
},
clean_up_tokenization_spaces=True
)
Every pipeline expects a model_index.json
file that tells the DiffusionPipeline
:
- which pipeline class to load from
_class_name
- which version of Diffusers was used to create the model in
_diffusers_version
- what components from which library are stored in the subfolders (
name
corresponds to the component and subfolder name,library
corresponds to the name of the library to load the class from, andclass
corresponds to the class name)
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.6.0",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
References
[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/