ChatGLM推理与 P-Tuning v2 微调（严格按照官方教程）

最新推荐文章于 2024-06-29 19:28:56 发布

xdbk2023

最新推荐文章于 2024-06-29 19:28:56 发布

阅读量886

点赞数 18

文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_50728220/article/details/137912175

版权

本文详细介绍了如何在Windows系统中部署ChatGLM-6B模型，包括设置conda环境、下载模型、修改配置文件进行推理和微调。特别提到使用量化技术以减少显存需求，并处理CUDA版本兼容问题以及微调后的评测过程。

摘要由CSDN通过智能技术生成

设备配置：windows系统，RTX3060，quantization_bit=4，最低只需 6.7G 显存，能推理和微调。

1.部署

首先新建一个conda虚拟环境和pycharm project

然后打开Anaconda Powershell Prompt

cd 到project的路径下

git clone https://github.com/THUDM/ChatGLM-6B.git

cd ChatGLM-6B

pip install -r requirements.txt

下载模型权重文件，无论什么方式都很慢，所以直接hugging face手动下载（预留好充足的时间）THUDM/chatglm-6b at main (huggingface.co)

在ChatGLM-6B文件下新建一个文件chatglm-6b，用在装从hugging face上下载的全部文件

2.推理

更改cli_demo.py文件模型加载路径，原始为

更改为本地路径，即上文自己新建的文件夹chatglm-6b，更改后

在Anaconda Powershell Prompt或PyCharm Terminal运行demo

python cli_demo.py

注：如果遇到torch版本的报错，命令行输入nvidia-smi查看cuda版本，到torch官网 PyTorch重新下载合适版本的torch，只需要满足要安装的cuda版本低于电脑cuda版本即可。

3.微调

参照P-Tuning v2微调教程ChatGLM-6B/ptuning/README.md at main · THUDM/ChatGLM-6B (github.com)

除 ChatGLM-6B 的依赖之外，还需要安装以下依赖

pip install rouge_chinese nltk jieba datasets

下载数据集 Tsinghua Cloud 。

ADGEN 数据集任务为根据输入（content）生成一段广告词（summary）。

{
    "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
    "summary": "这件衬衫的款式非常的宽松，利落的线条可以很好的隐藏身材上的小缺点，穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳，漂亮的绳结展现出了十足的个性，配合时尚的泡泡袖型，尽显女性甜美可爱的气息。"
}

将解压后的 AdvertiseGen文件夹内的AdvertiseGen文件夹放在 ChatGLM-6B 文件夹下

更改../ChatGLM-6B/ptuning/train.sh文件，需要更改的有5个地方

python3改为python ，否则运行chatglm的ptuning文件下的bash train.sh命令时没有反应；

main.py改为ptuning/main.py，主要是因为工作路径一直都是到ChatGLM-6B文件夹；

THUDM/chatglm-6b改为chatglm-6b，还是改为本地路径；

per_device_train_batch_size可以改为4，gradient_accumulation_steps同时改为4（只要满足per_device_train_batch_size*gradient_accumulation_steps=16即可）官方理由如下：

运行

bash ptuning/train.sh

但是在window系统下运行bash文件，需要安装git bash吧啦吧啦（省略），假设已经安装好了，参照下面一篇在windows系统用git bash激活虚拟环境windows系统git bash激活conda虚拟环境-CSDN博客

注，如果报以下错误

ptuning/train.sh: line 25: 780 Segmentation fault

可能是内存的原因，我是把per_device_train_batch_size设为8时会报错，但是改成4之后就跑通了

接下来就是历时5个多小时的微调训练时间

开始时间

结束时间

结果

4.微调后评测

更改ptuning/evaluate.sh文件

更改前、后

运行

bash ptuning/evaluate.sh

开始时间

结束时间

运行时间共计大约30分钟。

评测指标为中文 Rouge score 和 BLEU-4。生成的结果保存在 ./output/adgen-chatglm-6b-pt-8-1e-2/generated_predictions.txt。

5.加载新模型重新推理

在ChatGLM-6B目录下新建一个test.py文件

copy以下代码

import os
import torch
from transformers import AutoConfig, AutoModel, AutoTokenizer

# 载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained("chatglm-6b", trust_remote_code=True)
# 载入模型
config = AutoConfig.from_pretrained("chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("chatglm-6b", config=config, trust_remote_code=True)
CHECKPOINT_PATH='output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-3000'
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

# Comment out the following line if you don't use quantization
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

response, history = model.chat(tokenizer, "类型#工装裤*颜色#深蓝色*图案#条纹*裤长#八分裤", history=[])
print(response)

运行test.py文件