通义千问1.8B大模型推理、微调手把手保姆级教学

最新推荐文章于 2025-04-11 14:37:23 发布

原创最新推荐文章于 2025-04-11 14:37:23 发布

· 7.7k 阅读

51 ·

版权

文章标签：

#python #自然语言处理 #深度学习 #语言模型

大模型专栏收录该内容

1 篇文章

订阅专栏

一、环境

操作系统

Windows11 WSL(Unbuntu) Windows操作系统不建议玩有问题，但是可以装WSL

工具

PyCharm

QWEN-1.8B-Chat模型点击前往

通义千问运行代码点击前往

MiniConda/AnConda wsl上装

二、下载模型

前往ModelScope找一个Chat的模型这里使用的是QWEN-1.8B-Chat 点击前往

可以用下面的命令来拉取:

git clone https://www.modelscope.cn/qwen/Qwen-1_8B-Chat.git

三、拉取通义千问的运行代码

git clone https://github.com/QwenLM/Qwen.git

拉去代码之后建议看一下 README_CN.md 这个文件的说明，不看的话就跟着我的步骤来吧

四、配置Conda的虚拟环境

这里分你想在哪里运行,如果是想在wsl直接运行就在wsl上的conda创建，在windows下运行就在windows的conda创建

1、创建虚拟环境命令

conda create -n <虚拟环境的名称> python=3.8

2、进入虚拟环境

activate <虚拟环境的名称>

3、安装所需的包

在拉取的通义千问代码目录下运行如下代码:

pip install -r .\requirements.txt

4、安装PyTorch

取官网根据情况安装地址

五、进行推理

推理直接使用官方给的demo就够了代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model_dir = '模型所在的文件夹'

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained(model_dir, revision='master', trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(model_dir, revision='master', device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-1_8B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好！很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明，他来自一个普通的家庭，父母都是普通的工人。从小，李明就立下了一个目标：要成为一名成功的企业家。
# 为了实现这个目标，李明勤奋学习，考上了大学。在大学期间，他积极参加各种创业比赛，获得了不少奖项。他还利用课余时间去实习，积累了宝贵的经验。
# 毕业后，李明决定开始自己的创业之路。他开始寻找投资机会，但多次都被拒绝了。然而，他并没有放弃。他继续努力，不断改进自己的创业计划，并寻找新的投资机会。
# 最终，李明成功地获得了一笔投资，开始了自己的创业之路。他成立了一家科技公司，专注于开发新型软件。在他的领导下，公司迅速发展起来，成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险，不断学习和改进自己。他的成功也证明了，只要努力奋斗，任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业：一个年轻人的成功之路》

# Qwen-1.8B-Chat现在可以通过调整系统指令（System Prompt），实现角色扮演，语言风格迁移，任务设定，行为设定等能力。
# Qwen-1.8B-Chat can realize roly playing, language style transfer, task setting, and behavior setting by system prompt.
response, _ = model.chat(tokenizer, "你好呀", history=None, system="请用二次元可爱语气和我说话")
print(response)
# 你好啊！我是一只可爱的二次元猫咪哦，不知道你有什么问题需要我帮忙解答吗？

response, _ = model.chat(tokenizer, "My colleague works diligently", history=None, system="You will write beautiful compliments according to needs")
print(response)
# Your colleague is an outstanding worker! Their dedication and hard work are truly inspiring. They always go above and beyond to ensure that
# their tasks are completed on time and to the highest standard. I am lucky to have them as a colleague, and I know I can count on them to handle any challenge that comes their way.

六、模型微调

微调建议使用官方的脚本来进行，在官方的文档中有提到地址

1、准备微调使用的数据集

数据集这里使用Instruction/Output格式的数据集

创建数据集文件tran.json

内容如下：

[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "你好"
      },
      {
        "from": "assistant",
        "value": "你好，我是通通通。"
      }
    ]
  }
]

2、安装推理用到的包

如果你的PyTourch安装的cuda的需要安装一下CUDAToolkit

前往官网地址来安装就行了，点击下载

进入运行的虚拟环境下，安装deepspeed

pip install deepspeed

切记！如果你是在windows下安装这个东西会报错。原因是所需要的一个库是linux的不支持windows。但是可以在wsl下安装，cudaToolkit也需要安装一下，如果是CPU推理就不要安装cuda toolKit了

3、进行微调

接下来建议直接运行通义千问提供的训练脚本

我这里是用的Cuda单卡训练的。所以使用这个脚本 ** finetune_lora_single_gpu.sh**

为了方便我直接改了一下脚本

#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1

MODEL="模型所在的目录" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="数据集的目录"
...

4、开始微调

运行这个脚本

如果是在wsl下面运行，不用把win11的文件上传到wsl上，wsl会自动把盘符挂到**/mnt**下面，比如说你要找D盘的目录

/mnt/d

sh ./finetune_lora_single_gpu.sh

接下来就是漫长的等待，如果没有报错的话他会在你运行的那个目录里创建一个目录 output_qwen 这里放着微调的一些数据和配置

5、对微调好的参数推理

官方提供了以下代码亲测无坑

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_dir = '<微调输出数据的目录>'

model = AutoPeftModelForCausalLM.from_pretrained(
    f'{model_dir}',
    device_map="auto",
    trust_remote_code=True
).eval()

tokenizer = AutoTokenizer.from_pretrained(f'{model_dir}', revision='master', trust_remote_code=True)

response, history = model.chat(tokenizer, "你好", history=None)
print(response)