ChatGLM2-6B微调

小张Tt

已于 2024-03-22 11:01:45 修改

阅读量417

点赞数 6

分类专栏：大模型文章标签： python 人工智能语言模型

于 2024-03-22 10:27:24 首次发布

本文链接：https://blog.csdn.net/weixin_43788282/article/details/136932328

版权

大模型专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

前言
一、ChatGLM2-6B下载
二、ChatGLM2-6B微调

前言

最近对大模型很感兴趣，所以尝试了一下微调，记录一下。

一、ChatGLM2-6B下载

大家可以参考官方的教程安装和下载源文件。网上教程一大堆就不赘述了。

https://github.com/thudm/chatglm2-6b

这里我是拉去的docker，从dockerhub下载的，链接如下：

https://hub.docker.com/r/woshikid/chatglm2-6b

二、ChatGLM2-6B微调

新建一个数据集，网路上都用的衣服广告的，格式如下（这里可以自己定义，按对应的格式即可）：在这里插入图片描述

{"content": "", "summary": ""}
{"content": "", "summary": ""}
{"content": "", "summary": ""}

这里键名，自己定义，训练对应即可。
大家可以参考官方的P-Tuning v2 的微调教程：
https://www.heywhale.com/mw/project/64984a7b72ebe240516ae79c

运行微调除 ChatGLM2-6B 的依赖之外，还需要安装以下依赖:

pip install rouge_chinese nltk jieba datasets transformers[torch] -i https://pypi.tuna.tsinghua.edu.cn/simple

配置微调参数，开始训练。（这里训练过程中我出现过只记得训练之前的，微调后的忘记或者只记得后面训练的，前面的都忘了）
我个人觉得，可能loss有关，通过调整学习率和训练伦次，达到一个很好的loss，最好不是很小，也不是很大。

打开train.sh配置好参数，即可开始微调训练。
下面是我的参数（这里我自己微调的数据，只有20条）。

PRE_SEQ_LEN=128
LR=2e-2
NUM_GPUS=4

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS /ChatGLM2-6B/ptuning/main.py \
    --do_train \
    --train_file /ChatGLM2-6B/3DGF/train.json \
    --validation_file /ChatGLM2-6B/3DGF/dev.json \
    --preprocessing_num_workers 10 \
    --prompt_column instruction \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path /ChatGLM2-6B/chatglm2-6b \
    --output_dir /ChatGLM2-6B/output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 6 \
    --per_device_eval_batch_size 6 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --max_steps 300 \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN \
    --quantization_bit 4

微调训练完成后，官方提供了两个demo。
/ChatGLM2-6B/ptuning/web_demo.py 可以打开gradio的界面，只需要修改下面调用版本即可。
在这里插入图片描述
还提供了一个单论询问的demo。

我自己写了一个我微调后的轮询demo，但是不知为何，只有第一轮回答正确，后面又出问题了，还请大佬指导。

import os
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig
from IPython.display import clear_output

# 加载模型
model_path = "/ChatGLM2-6B/chatglm2-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(model_path, config=config, trust_remote_code=True)

# 加载微调后的参数
prefix_state_dict = torch.load(os.path.join("output/adgen-chatglm2-6b-pt-128-2e-2/checkpoint-100", "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

# 将模型移动到GPU并设置为推理模式
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

# 定义函数显示模型回答
def display_answer(model, query, history=[]):
    response = ""
    for r, h in model.stream_chat(tokenizer, query, history=history):
        response = r
    clear_output(wait=True)
    print(response)
    return history

# 在Linux窗口中运行
print("欢迎使用小张独家AI，智能与你相伴！")
while True:
    query = input("小张：")
    if query.lower() == "exit":
        break
    display_answer(model, query)