Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to

杜波超

已于 2024-06-28 13:04:55 修改

阅读量278

点赞数 3

文章标签：算法 python 计算机视觉自然语言处理 pytorch

于 2024-06-28 13:00:33 首次发布

本文链接：https://blog.csdn.net/dubochao_xinxi/article/details/140040566

版权

Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend
requires all the modules to be on GPU

要解决这个问题，你需要修改模型目录中的 config.json 文件，添加 disable_exllama: true 到 quantization_config 部分。以下是具体步骤：

步骤 1：找到 `config.json` 文件

config.json 文件通常位于你下载的模型目录中。假设你的模型目录是 Qwen/Qwen-7B-Chat-Int4，那么 config.json 文件应该在这个目录里。

步骤 2：编辑 `config.json` 文件

打开 config.json 文件，找到 quantization_config 部分，添加 "disable_exllama": true, "use_exllama": false 。

{
  ...
  "quantization_config": {
    ...
    "disable_exllama": true,
    "use_exllama": false
  }
  ...
}

步骤 3：保存并加载模型

保存修改后的 config.json 文件，然后使用以下代码加载模型：

# -*- coding: utf-8 -*-
import torch
from modelscope import AutoTokenizer
from peft import LoraConfig, TaskType, get_peft_model
from transformers import AutoModelForCausalLM, AutoConfig

device = "cpu" # the device to load the model onto
def chat(model,prompt):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)

    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=256
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response


if __name__ == "__main__":
    # Transformers加载模型权重

    # Transformers加载模型权重

    tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-1_8B-Chat/", use_fast=False, trust_remote_code=True)
    config = AutoConfig.from_pretrained("Qwen/Qwen-1_8B-Chat",
                                        trust_remote_code=True)
    config.disable_exllama = True
    model = AutoModelForCausalLM.from_pretrained("qwen/Qwen-1_8B-Chat/",
                                                 config=config,
                                                 device_map="cpu",
                                                 torch_dtype=torch.bfloat16
                                                 ).eval()

    config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        inference_mode=True,  # 推理模式
        r=8,  # Lora 秩
        lora_alpha=32,  # Lora alpha，具体作用参见 Lora 原理
        lora_dropout=0.1,  # Dropout 比例
    )

    model1 = get_peft_model(model, config)

    book_review = ["很生气，一晚上看完，只有生气。太矫情了",
                   "这个标题真是太贴切了，真的是罪，真的是美。",
                   "没有才调，看在材料份上加一星。",
                   "电影更值得一看",
                   "历史书做成这样真是太赞了！",
                   "废话太多",
                   "啥玩意，情色系的啊。故事一般。看的困。",
                   "你是猴子请来的逗逼吗。",
                   "人生读过最狗血的书之一  除了对了解穆斯林信仰风俗有所帮助之外都是狗血",
                   "作为资深影迷，这本书必读",
                   "两章果断弃！",
                   "跟看我高中同学的日记本差不多。",
                   "文不对题，读不下去。",
                   "莫非法国人的法语水平都堕落了？",
                   "就不加友情分了…",
                   "如隔夜白开，索然无味。",
                   "2015.1025  融合了我喜欢的所有元素，校园爱情、破镜重圆、高干子弟，可是却写不出一篇让人有一口气读下去的好文。",
                   "没多大意思，文笔俏皮轻佻得刻意。",
                   "据说抄袭大风刮过的《桃花债》和公子欢喜的《思凡》，呵呵哒",
                   "第二遍"]

    prompt = "评论：{} 请将以上评论分类到 好评 或 差评（你只需要回复 好评 或 差评）"

    for review in book_review:
        new_prompt = prompt.format(review)
        response = chat(model1,new_prompt)
        print(response, review)

注意事项

如果你使用的是远程模型库（例如通过 from_pretrained 下载模型），你可能需要先下载模型到本地，然后修改 config.json 文件，再加载本地模型。

你可以使用以下命令下载模型到本地：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen-7B-Chat-Int4"
model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir="./local_model")
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./local_model")

然后修改 ./local_model/Qwen/Qwen-7B-Chat-Int4/config.json 文件。

杜波超

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to

Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend要解决这个问题，你需要修改模型目录中的文件，添加到部分。
复制链接

扫一扫