GPU及Colab部署ChatGLM-6B

Momosaki

已于 2023-10-24 12:06:28 修改

阅读量568

点赞数

分类专栏： ChatGLM-6B 文章标签： python windows 自然语言处理 1024程序员节

于 2023-10-24 12:01:41 首次发布

本文链接：https://blog.csdn.net/Momosaki/article/details/134008733

版权

ChatGLM-6B 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

这篇是默认比如Pytorch等等东西都配置好了就不做过多解释，本文内容只是个人安装ChatGLM-6B的记录

GPU安装ChatGLM-6B

环境安装

pip install -r requeirements.text

transformers库推荐4.27.1，但理论上不低于4.23.1即可

从Hugging Face Hub下载模型

从Hugging Face Hub下载模型需要先安装Git LFS，然后运行

git clone https://huggingface.co/THUDM/chatglm-6b

我下不下来，最后是直接上Hugging Face下载的[Hugging Face/ChatGLM-6B](THUDM/chatglm-6b at main (huggingface.co))

~~不过其实到最后也经常因为网络波动中断，到最后也没下载下来，在这步上Colab部署快很多~~

在这里插入图片描述

下载下来之后将代码里的THUDM/chatglm-6b替换为本地的chatglm-6b文件的路径

需要注意的是这里的地址写法都是D:\\深度学习\\ChatGLM-6B\\chatglm-6b，即\\，不能写成/，否则会报错文件名、目录名或者卷标语法不正确。

代码调用

按照官方给出的文档，可以通过以下代码调用ChatGLM-6B模型来生成对话

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "你好", history=[])
>>> print(response)
你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。
>>> response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
>>> print(response)
晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:

1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。
2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。
3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。
4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。
5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。
6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。

如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。

另外，模型的实现仍然处在变动中。如果希望固定使用的模型实现以保证兼容性，可以在from_pretrained的调用中增加revision='v1.1.0'参数。v1.1.0是当前最新的版本号，完整的版本列表见Change Log。

模型代码运行

根据下载下来的源码，找到web_demo.py文件打开，把路径修改为chatglm-6b模型在自己本地的地址

from transformers import AutoModel, AutoTokenizer
import gradio as gr
import mdtex2html

tokenizer = AutoTokenizer.from_pretrained("D:\\深度学习\\ChatGLM-6B\\chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("D:\\深度学习\\ChatGLM-6B\\chatglm-6b", trust_remote_code=True).half().quantize(8).cuda()

根据实际显卡显存，选择model的运行方式（不对应的话显卡会烧坏）

# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()
 
# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()
 
# 14G 以上显存可以直接不量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().cuda()

我折腾到这步的时候才发现自己的GPU跑不了，然后换成Colab了。

Colab 部署

Google云盘挂载以及查看GPU

# 挂载Google云盘
from google.colab import drive
drive.mount('/content/drive')

# 进入到Google云盘下
%cd /content/drive/MyDrive

# 查看显存是否足够
!nvidia-smi

下载ChatGLM项目及模型文件

# 在github上下载chatglm项目
# 这部分代码是否重复运行无所谓，就第一次能下载成功，后面再运行的时候会因为'ChatGLM-6B'已经存在而且非空报错，不会重复下载
!git clone https://github.com/THUDM/ChatGLM-6B.git

# 下载完成后在项目的主文件夹下新建一个model文件来存放chatglm的模型文件
# 就是在刚刚下载下来的ChatGLM-6B文件夹下下载
# 进入model文件夹下载模型文件
%cd /content/drive/MyDrive/ChatGLM-6B/model

模型文件下载命令

这部分只用运行一次，重复运行会重复下载

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/LICENSE

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/MODEL_LICENSE

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/README.md

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/config.json

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/configuration_chatglm.py

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/modeling_chatglm.py

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/quantization.py

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/ice_text.model

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/quantization_kernels.c

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/quantization_kernels_parallel.c

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/tokenization_chatglm.py

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/tokenizer_config.json

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/pytorch_model.bin

!wget https://huggingface.co/THUDM/chatglm-6b-int4/resolve/main/.gitattributes

配置运行环境

# 进入ChatGLM-6B文件夹下去配置运行环境
%cd /content/drive/MyDrive/ChatGLM-6B

!pip install protobuf==3.20.0 transformers==4.27.1 icetk cpm_kernels

!pip install -r requirements.txt

测试是否部署成功

from transformers import AutoTokenizer,AutoModel
tokenizer = AutoTokenizer.from_pretrained('model',trust_remote_code=True)
model = AutoModel.from_pretrained('model',trust_remote_code=True).half().quantize(4).cuda()
response,history = model.chat(tokenizer,'你好',history=[])
print(response)
response,history = model.chat(tokenizer,'晚上睡不着应该怎么办',history=history)
print(response)

运行结果

在这里插入图片描述

Momosaki

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
GPU及Colab部署ChatGLM-6B

这篇是默认比如Pytorch等等东西都配置好了就不做过多解释，本文内容只是个人安装ChatGLM-6B的记录。根据下载下来的源码，找到web_demo.py文件打开，把路径修改为chatglm-6b模型在自己本地的地址。我下不下来，最后是直接上Hugging Face下载的[Hugging Face/ChatGLM-6B](按照官方给出的文档，可以通过以下代码调用ChatGLM-6B模型来生成对话。我折腾到这步的时候才发现自己的GPU跑不了，然后换成Colab了。是当前最新的版本号，完整的版本列表见。
复制链接

扫一扫