1、下载Qwen-14B-Chat-int4 模型
git clone https://www.modelscope.cn/qwen/Qwen-14B-Chat-Int4.git
2、下载
git clone https://github.com/Dao-AILab/flash-attention
3、下载 QWEN 代码
git clone https://github.com/QwenLM/Qwen.git
将 finetune 文件夹web_demo.py finetune.py 一个文件夹 和两个PY文件 粘贴到 模型文件夹下
4、执行
cd flash-attention
# 下方安装可选,安装可能比较缓慢。
pip install csrc/layer_norm
# 如果flash-attn版本高于2.1.1,下方无需安装。
pip install csrc/rotary
5、修改finetune 文件夹的 finetune_qlora_single_gpu.sh 文件里的MODEL、 DATA、 python 路径
MODEL="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/total36-67.json"
python /mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune.py \
--deepspeed ds_config_zero2.json
6、修改output_qwen文件夹adapter_config.json文件的base_model_name_or_path
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.05,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 64,
"rank_pattern": {},
"revision": null,
"target_modules": [
"w2",
"c_proj",
"c_attn",
"w1"
],
"task_type": "CAUSAL_LM"
}
7、将json模型文件放到此目录下
8、执行 # 单卡训练
bash finetune_qlora_single_gpu.sh
模型微调成功后
9、然后 修改config.json文件在quantization_config
下添加"disable_exllama":true
"quantization_config": {
"bits": 4,
"group_size": 128,
"damp_percent": 0.01,
"desc_act": false,
"static_groups": false,
"sym": true,
"true_sequential": true,
"model_name_or_path": null,
"model_file_base_name": "model",
"quant_method": "gptq",
"disable_exllama":true
},
10、在web_demo.py 添加
# model = AutoModelForCausalLM.from_pretrained(
# args.checkpoint_path,
# device_map=device_map,
# trust_remote_code=True,
# resume_download=True,
# ).eval()
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained(
# 训练后生成的文件夹路径
'/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
device_map="auto",
trust_remote_code=True
).eval()
并把以前的 model注释掉
11、修改web_demo.py
DEFAULT_CKPT_PATH
AutoPeftModelForCausalLM.from_pretrained 路径
DEFAULT_CKPT_PATH = '/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4'
model = AutoPeftModelForCausalLM.from_pretrained(
# 训练后生成的文件夹路径
'/mnt/workspace/Qwen/model/Qwen-14B-Chat-Int4/finetune/output_qwen',
device_map="auto",
trust_remote_code=True
).eval()
12、 启动
python web_demo.py
注意:这里的路径是根据个人下载的路径为准
最后执行 python web_demo.py 如果报错哪个包没有安装就安装哪个包
但是如果安装出现 提问后页面回答报错 控制台输出的时候 有可能是遇到gradio版本不兼容
更新到对应的版本即可 我的版本是3.40.0 希望大家不要踩坑