问题描述
运行一个开源项目LuXun-GPT时,调用peft加载LoRA模型时遇到如下报错:
Traceback (most recent call last):
File "/xxx/LuXun-GPT/inference.py", line 52, in <module>
peft_model = PeftModel.from_pretrained(model, args.lora).eval()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/peft_model.py", line 231, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/peft_model.py", line 500, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 123, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
单看报错信息,显然是Base Model和LoRA在Size上不匹配,由于ChatGLM有0.1.0和1.1.0两个版本,项目中训练好并开源的LoRA应该不会有问题,初步判断是ChatGLM版本的问题,但是两个版本都试过了均无效。
解决方案
pip install peft==0.2.0
通过测试发现peft这个库在0.3.0dev到0.3.0这两个版本前后,LoRA模型的加载有所区别。
该开源项目中推荐的是使用最新版本的peft,但作者在开发时使用的是0.3.0dev版(应该未料到后续peft库变动较大),故降低peft版本解决该问题。