peft库报错：RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:

最新推荐文章于 2025-04-02 21:47:47 发布

JL__Liu

最新推荐文章于 2025-04-02 21:47:47 发布

阅读量1.7k

点赞数

文章标签： python chatgpt

本文链接：https://blog.csdn.net/Liu_Jilong/article/details/131611599

版权

在尝试运行开源项目LuXun-GPT并调用peft加载LoRA模型时遇到了RuntimeError，错误源于模型的参数尺寸不匹配。尽管检查了ChatGLM的版本，问题仍然存在。解决方案是将peft库降级到0.2.0版本，因为从0.3.0dev到0.3.0之间，LoRA模型加载方式发生了变化，这导致了加载失败。降级库版本后问题得到解决。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题描述

运行一个开源项目LuXun-GPT时，调用peft加载LoRA模型时遇到如下报错：

Traceback (most recent call last):
  File "/xxx/LuXun-GPT/inference.py", line 52, in <module>
    peft_model = PeftModel.from_pretrained(model, args.lora).eval()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/peft_model.py", line 231, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/peft_model.py", line 500, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 123, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxx/.conda/envs/luxuntest/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
        size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
        size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).

单看报错信息，显然是Base Model和LoRA在Size上不匹配，由于ChatGLM有0.1.0和1.1.0两个版本，项目中训练好并开源的LoRA应该不会有问题，初步判断是ChatGLM版本的问题，但是两个版本都试过了均无效。