运行Diffusers的train_text_to_image_lora.py时报错:Attempting to unscale FP16 gradients.

项目地址:

GitHub - huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusersicon-default.png?t=N7T8https://github.com/huggingface/diffusers/tree/main在运行Lora微调案例:train_text_to_image_lora.py时出现如下报错:

Traceback (most recent call last):
  File "/root/LoRA/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 969, in <module>
    main()
  File "/root/LoRA/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 796, in main
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 2101, in clip_grad_norm_
    self.unscale_gradients()
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 2064, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
  File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")

issues里面提供了各种方法,但是尝试了一遍都没有解决(也可能是我没理解)

最后,误打误撞:

解决办法:把FP16改成了bf16解决了(注意在accelerate config和训练脚本里面都要设置成bf16

accelerate launch --mixed_precision="bf16"  train_text_to_image_lora.py \ # 这里原来是fp16
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
  --resolution=512 \
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir=${OUTPUT_DIR} \
  --push_to_hub \
  --hub_model_id=${HUB_MODEL_ID} \
  --report_to=wandb \
  --checkpointing_steps=500 \
  --validation_prompt="A pokemon with blue eyes." \
  --seed=1337

  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值