接上文,用peft的lora训练bloomz,训练1轮之后,发现可能不太充分,打算加载检查点继续训练,代码如下:
trainer.train(resume_from_checkpoint = 'checkpoint目录')
然后报错:raise ValueError(f"Can't find a valid checkpoint at {resume_from_checkpoint}")
ValueError: Can't find a valid checkpoint at checkpoint目录
参考Peft Model not resuming from Checkpoint · Issue #24252 · huggingface/transformers · GitHub
就是_load_from_checkpoint有点问题
解决,新建一个Trainer子类,子类里重写了加载检查点的函数,调用时用这个子类来创建trainer对象
from transformers import Trainer
import os
from peft import PeftModel
from transformers.utils import (
ADAPTER_SAFE_WEIGHTS_NAME,
ADAPTER_WEIGHTS_NAME,
is_sagemaker_mp_enabl