Qwen-14B-Chat-Int4 保姆级微调详细步骤

昕领神会

已于 2024-02-28 13:25:08 修改

阅读量4.8k

点赞数 16

文章标签： pytorch gpt-3 文心一言

于 2024-01-16 13:41:54 首次发布

本文链接：https://blog.csdn.net/weixin_43907339/article/details/135622725

版权

本文介绍了如何从GitHub下载并微调预训练的Qwen-14B-Chat-Int4模型，涉及Git克隆、依赖安装、模型路径配置、finetune脚本执行、以及模型量化和Web演示应用的设置。

摘要由CSDN通过智能技术生成

1、下载Qwen-14B-Chat-int4 模型

git clone https://www.modelscope.cn/qwen/Qwen-14B-Chat-Int4.git

2、下载

git clone https://github.com/Dao-AILab/flash-attention

3、下载 QWEN 代码

git clone https://github.com/QwenLM/Qwen.git

将 finetune 文件夹web_demo.py finetune.py 一个文件夹和两个PY文件粘贴到模型文件夹下

4、执行

cd flash-attention

# 下方安装可选，安装可能比较缓慢。

pip install csrc/layer_norm

# 如果flash-attn版本高于2.1.1，下方无需安装。

pip install csrc/rotary

5、修改finetune 文件夹的 finetune_qlora_single_gpu.sh 文件里的MODEL、 DATA、 python 路径

MODEL="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/total36-67.json"




python /mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune.py \

--deepspeed ds_config_zero2.json

6、修改output_qwen文件夹adapter_config.json文件的base_model_name_or_path

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 16,
  "lora_dropout": 0.05,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 64,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "w2",
    "c_proj",
    "c_attn",
    "w1"
  ],
  "task_type": "CAUSAL_LM"
}

7、将json模型文件放到此目录下

8、执行 # 单卡训练

bash finetune_qlora_single_gpu.sh

模型微调成功后

9、然后修改config.json文件在quantization_config

下添加"disable_exllama":true

"quantization_config": {
    "bits": 4,
    "group_size": 128,
    "damp_percent": 0.01,
    "desc_act": false,
    "static_groups": false,
    "sym": true,
    "true_sequential": true,
    "model_name_or_path": null,
    "model_file_base_name": "model",
    "quant_method": "gptq",
    "disable_exllama":true
  },

10、在web_demo.py 添加

 # model = AutoModelForCausalLM.from_pretrained(
    #     args.checkpoint_path,
    #     device_map=device_map,
    #     trust_remote_code=True,
    #     resume_download=True,
    # ).eval()
    from peft import AutoPeftModelForCausalLM

    model = AutoPeftModelForCausalLM.from_pretrained(
        # 训练后生成的文件夹路径
        '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
        device_map="auto",
        trust_remote_code=True
        ).eval()

并把以前的 model注释掉

11、修改web_demo.py

DEFAULT_CKPT_PATH

AutoPeftModelForCausalLM.from_pretrained 路径

DEFAULT_CKPT_PATH = '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4'

model = AutoPeftModelForCausalLM.from_pretrained(
        # 训练后生成的文件夹路径
        '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
        device_map="auto",
        trust_remote_code=True
        ).eval()

12、启动