一、本地部署
1 代码下载
项目地址: https://github.com/QwenLM/Qwen
下载到本地: git clone https://github.com/QwenLM/Qwen.git
2 环境安装
conda create -n qwen2 python==3.10.8
conda activate qwen2
我的cuda 版本是12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
pytorch离线下载 (https://download.pytorch.org/whl/torch_stable.html)
用这个版本:
torch-2.2.2+cu118-cp310-cp310-linux_x86_64.whl
Pytorch 各个GPU版本CUDA和cuDNN对应版本_torchserve:0.6.1-gpu 对应cuda-CSDN博客
3 模型下载
模型占用资源情况
3.1 7B模型下载
git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git
/media/waj/新加卷/github/NLP/llm_model/Qwen-7B-Chat
3.1 1.8B模型下载
git clone https://www.modelscope.cn/qwen/Qwen-1_8B-Chat.git
/media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat
int4 量化版本
git clone https://www.modelscope.cn/qwen/Qwen-1_8B-Chat-Int4.git
(使用量化版本,条件1 :torch 2.0及以上,transformers版本为4.32.0及以上 ;
条件2: pip install auto-gptq optimum)
4 本地模型推理
1 安装依赖包:pip intall -r requirements_web_demo.txt
2 vim web_demo.py 修改里面的模型路径 (可选)
3 推理 :
GPU 推理
python web_demo.py --server-name 0.0.0.0 -c /media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat
cpu 推理:(5~6G内存)
python web_demo.py --server-name 0.0.0.0 -c /media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat --cpu-only
5 数据集
5.1 数据下载
https://hf-mirror.com/datasets/ShengbinYue/DISC-Law-SFT
下载 :https://modelscope.cn/datasets/Robin021/DISC-Law-SFT/files
数据格式:
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是一个语言模型,我叫通义千问。"
}
]
}
]
5.2 数据处理
python process_data_law.py
处理好的数据存放路径: /media/waj/新加卷/github/NLP/Qwen/train_data_law.json
6 模型训练
6.1 依赖安装
pip install "peft<0.8.0" deepspeed
6.2 训练
修改v 中模型路径和数据路径
MODEL="/media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat"
DATA="/media/waj/新加卷/github/NLP/Qwen/train_data_law.json"
bash finetune/finetune_lora_single_gpu.sh
7 模型推理
7.1 lora模型合并
微调好的模型目录:
path_to_adapter="/media/waj/新加卷/github/NLP/test/Qwen-main/output_qwen/checkpoint-1200/"
模型保存目录
new_model_directory="/media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat_law2"
python qwen_lora_merge.py
7.2 模型web界面推理
python web_demo.py --server-name 0.0.0.0 -c /media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat_law2
附件
附件1 :process_data_law.py
import json
# 读取以.jsonl结尾的文件
json_data = []
with open('/media/waj/新加卷/github/NLP/data/DISC-Law-SFT/DISC-Law-SFT-Triplet-released.jsonl', 'r', encoding='utf-8') as file:
for line in file:
data = json.loads(line)
json_data.append(data)
# 待填入的模板
template = []
# 遍历json数据集
for idx, data in enumerate(json_data[:]):
conversation = [
{
"from": "user",
"value": data["input"]
},
{
"from": "assistant",
"value": data["output"]
}
]
template.append({
"id": f"identity_{idx}",
"conversations": conversation
})
print(len(template))
# 输出填充数据后的模板
print(json.dumps(template[2], ensure_ascii=False, indent=2))
# 将template写入到本地文件
output_file_path = "train_data_law.json"
with open(output_file_path, 'w', encoding='utf-8') as f:
json.dump(template, f, ensure_ascii=False, indent=2)
print(f"处理好的数据已写入到本地文件: {output_file_path}")
附件2 qwen_lora_merge.py
#1 模型合并
from peft import AutoPeftModelForCausalLM
path_to_adapter="/media/waj/新加卷/github/NLP/test/Qwen-main/output_qwen/checkpoint-1200/"
new_model_directory="/media/waj/新加卷/github/NLP/llm_model/Qwen-1_8B-Chat_law2"
model = AutoPeftModelForCausalLM.from_pretrained(
path_to_adapter, # path to the output directory
device_map="auto",
trust_remote_code=True
).eval()
merged_model = model.merge_and_unload()
# max_shard_size and safe serialization are not necessary.
# They respectively work for sharding checkpoint and save the model to safetensors
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)
#2 分词器保存
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
path_to_adapter, # path to the output directory
trust_remote_code=True
)
tokenizer.save_pretrained(new_model_directory)