stable-diffusion 预训练模型汇总

文章介绍了不同版本的StableDiffusion模型,包括2.1和2.0版本,这些模型基于CLIP进行图像变体和混合操作。2.1版提供了两种变体,条件基于CLIPViT-L和ViT-H的图像嵌入。模型支持FP16精度,并有针对NSFW过滤的LAION-5B数据集的微调。此外,还提到了用于结构保留的图像到图像转换和形状条件合成的深度引导模型,以及文本引导的修复模型。
摘要由CSDN通过智能技术生成

目前各个github上各个库比较杂乱,故此做些整理方便查询

Stable UnCLIP 2.1

New stable diffusion finetune (Stable unCLIP 2.1, Hugging Face) at 768x768 resolution, based on SD2.1-768.

This model allows for image variations and mixing operations as described in Hierarchical Text-Conditional Image Generation with CLIP Latents, and, thanks to its modularity, can be combined with other models such as KARLO.

Comes in two variants:
sd21-unclip-l.ckpt :
conditioned on CLIP ViT-L and ViT-H image embeddings
sd21-unclip-h.ckpt:
conditioned on CLIP ViT-L and ViT-H image embeddings

Instructions are available here.
在这里插入图片描述

Version 2.1

New stable diffusion model (Stable Diffusion 2.1-v) at 768x768 resolution and (Stable Diffusion 2.1-base) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.

Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python <thescript.py>

Version 2.0

  • New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.
  • The above model is finetuned from SD 2.0-base(512-base-ema.ckpt), which was trained as a standard noise-prediction model on 512x512 images and is also made available.
  • Added a x4 upscaling latent text-guided diffusion model.
  • New depth-guided stable diffusion model, finetuned from SD 2.0-base. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.
  • A text-guided inpainting model, finetuned from SD 2.0-base.

Version 1

  • sd-v1-1.ckpt:
    237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
  • sd-v1-2.ckpt:
    Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
  • sd-v1-3.ckpt:
    Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-4.ckpt:
    Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-5.ckpt:
    Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-5-inpainting.ckpt:
    Resumed from sd-v1-5.ckpt. 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.

Inpainting

  • 512-inpainting-ema.ckpt
    Resumed from 512-base-ema.ckpt and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint.
  • sd-v1-5-inpainting.ckpt
    sd-v1-5-inpaint.ckpt: Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
回答: 当使用guided-diffusion预训练模型进行采样时,可能会遇到报错的情况。根据提供的引用内容,我没有找到直接与guided-diffusion预训练模型采样报错相关的信息。然而,根据引用\[1\]中提到的模型更换方法和引用\[2\]中提到的训练hypernets的方式,您可以尝试以下几个步骤来解决报错问题: 1. 确保您已正确设置模型的路径和文件位置。根据引用\[1\]中的说明,您可以将需要的模型移入指定的文件夹,或使用ChangeModel函数更换模型的路径。 2. 检查模型的训练方式和参数设置是否正确。根据引用\[2\]中提到的训练hypernets的方式,确保您在训练模型时使用了正确的学习率和训练方式。 3. 确保您使用的embedding模型与训练该embedding时的模型保持一致。根据引用\[3\]中的说明,使用embedding生成新的图片时,最好和训练这个embedding时的模型保持一致,以确保生成效果良好。 如果您仍然遇到报错问题,建议您查看相关的文档、教程或寻求更专业的技术支持来解决该问题。 #### 引用[.reference_title] - *1* *2* *3* [Stable Diffusion攻略集(Stable Diffusion官方文档、kaggle notebook、webui资源帖)](https://blog.csdn.net/qq_56591814/article/details/128385416)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ASKCOS

你的鼓励是我最大的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值