使用Diffusers库中的 pipeline时的训练参数详解

使用Diffusers库中的 pipeline时的训练参数详解

前言

在云端部署或是本地简单调用Diffusers实现图像生成时,在huggingface diffusers官网,Github官网或是网上各种讲解均未有效给出在Pipeline中可添加的参数。在一行行调试代码的过程中,终于在diffusers源码中找到了对于pipeline中可输入参数的解释,如下:

  1. prompt (str or List[str], optional):
    The prompt or prompts to guide image generation. If not defined, you need to pass prompt_embeds.
  2. height (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor):
    The height in pixels of the generated image.
  3. width (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor):
    The width in pixels of the generated image.
  4. num_inference_steps (int, optional, defaults to 50):
    The number of denoising steps. More denoising steps usually lead to a higher quality image at the
    expense of slower inference.
  5. guidance_scale (float, optional, defaults to 7.5):
    A higher guidance scale value encourages the model to generate images closely linked to the text
    prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.
  6. negative_prompt (str or List[str], optional):
    The prompt or prompts to guide what to not include in image generation. If not defined, you need to
    pass negative_prompt_embeds instead. Ignored when not using guidance (guidance_scale < 1).
    num_images_per_prompt (int, optional, defaults to 1):
    The number of images to generate per prompt.
  7. eta (float, optional, defaults to 0.0):
    Corresponds to parameter eta (η) from the DDIM paper. Only applies to the [~schedulers.DDIMScheduler], and is ignored in other schedulers.
  8. generator (torch.Generator or List[torch.Generator], optional):
    A torch.Generator to make
    generation deterministic.
  9. latents (torch.FloatTensor, optional):
    Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
    generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
    tensor is generated by sampling using the supplied random generator.
  10. prompt_embeds (torch.FloatTensor, optional):
    Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
    provided, text embeddings are generated from the prompt input argument.
  11. negative_prompt_embeds (torch.FloatTensor, optional):
    Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
    not provided, negative_prompt_embeds are generated from the negative_prompt input argument.
  12. output_type (str, optional, defaults to "pil"):
    The output format of the generated image. Choose between PIL.Image or np.array.
  13. return_dict (bool, optional, defaults to True):
    Whether or not to return a [~pipelines.stable_diffusion.StableDiffusionPipelineOutput] instead of a
    plain tuple.
  14. callback (Callable, optional):
    A function that calls every callback_steps steps during inference. The function is called with the
    following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
  15. callback_steps (int, optional, defaults to 1):
    The frequency at which the callback function is called. If not specified, the callback is called at
    every step.
  16. cross_attention_kwargs (dict, optional):
    A kwargs dictionary that if specified is passed along to the [AttentionProcessor] as defined in
    self.processor.
  17. guidance_rescale (float, optional, defaults to 0.7):
    Guidance rescale factor from Common Diffusion Noise Schedules and Sample Steps are
    Flawed
    . Guidance rescale factor should fix overexposure when
    using zero terminal SNR.

在这其中,我们主要用的比较多的是:

  1. prompt: 正面提示词
  2. height、width:生成图像的高和宽
  3. num_inference_steps: 这个很多资料都没有涉及,主要影响了扩散过程中加噪和去噪的部署
  4. guidance_scale: 文字相关度,这个值越高,生成的图像就跟文本内容越贴近(但不是越大越好,越大生成出来的质量很差)
  5. negative_prompt: 负面提示词
  6. num_images_per_prompt: 每次出图的数量
  7. generator: 生成器相关属性(可以设置出图的种子之类的)

其他的属性就不太常用到了
pipeline文件在:diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py 546行左右

实例:

from diffusers import StableDiffusionPipeline
import torc
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.to("cuda")
image = pipe("An image of a squirrel in Picasso style",height=768,width=768,guidance_scale=7).images[0]
image.save("squirrel.png")
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值