项目地址:
GitHub - huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch - huggingface/diffusershttps://github.com/huggingface/diffusers/tree/main在运行Lora微调案例:train_text_to_image_lora.py时出现如下报错:
Traceback (most recent call last):
File "/root/LoRA/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 969, in <module>
main()
File "/root/LoRA/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 796, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 2101, in clip_grad_norm_
self.unscale_gradients()
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 2064, in unscale_gradients
self.scaler.unscale_(opt)
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/root/miniconda3/envs/myconda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
issues里面提供了各种方法,但是尝试了一遍都没有解决(也可能是我没理解)
最后,误打误撞:
解决办法:把FP16改成了bf16解决了(注意在accelerate config和训练脚本里面都要设置成bf16)
accelerate launch --mixed_precision="bf16" train_text_to_image_lora.py \ # 这里原来是fp16
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--dataloader_num_workers=8 \
--resolution=512 \
--center_crop \
--random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-04 \
--max_grad_norm=1 \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR} \
--push_to_hub \
--hub_model_id=${HUB_MODEL_ID} \
--report_to=wandb \
--checkpointing_steps=500 \
--validation_prompt="A pokemon with blue eyes." \
--seed=1337