windows部署ChatGLM-6B

zsh_abc

已于 2023-10-07 22:10:54 修改

阅读量203

点赞数 2

文章标签： python

于 2023-10-07 18:31:12 首次发布

本文链接：https://blog.csdn.net/qq_45437316/article/details/133650747

版权

windows私有化部署chatglm
最低显存要求6G

环境：
conda，docker，python venv，还是直接安装，安装自己需求来就行

cmd nvcc -V查看cuda版本，pytorch安装对应的版本

安装pytorch
官网：https://pytorch.org/

验证是否安装成功
1，先进入python环境

import torch 
torch.cuda.is_available()

2，或直接命令行运行

python -c "import torch; print(torch.cuda.is_available())"

下载源码

git clone https://github.com/THUDM/ChatGLM-6B.git

下载模型文件
https://huggingface.co/THUDM/chatglm-6b/tree/main
https://cloud.tsinghua.edu.cn/d/fb9f16d6dc8f482596c2/

ChatGLM-6B目录下新建model文件夹，模型和配置文件全部都放进去

修改 web_demo.py 代码中的模型路径为本地路径

tokenizer = AutoTokenizer.from_pretrained("model", trust_remote_code=True)
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().cuda()

安装依赖

pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simp1e

显存低的选择量化

# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()

# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()

# 14G 以上显存可以直接不量化，博主显存为16G选择的就是这个
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().cuda()

启动ui界面

python web_demo.py

api调用chatglm

安装依赖

pip install fastapi uvicorn

启动api端口

python api.py

1，curl方式

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "你好", "history": []}'

2，python-requests方式

import requests
import json

url = "http://127.0.0.1:8000"
headers = {
    'Content-Type': 'application/json'
}
data = {
    'prompt': '你好',
    'history': []
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    # 响应内容转为json获取到response输出
    result = response.text
    data = json.loads(result)
    rep = data["response"]
    print(rep)
else:
    print("请求失败，状态码:", response.status_code)

总结：拉源码-装环境-启动程序

zsh_abc

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
windows部署ChatGLM-6B

conda，docker，python venv，还是直接安装，安装自己需求来就行。ChatGLM-6B目录下新建model文件夹，模型和配置文件全部都放进去。cmd nvcc -V查看cuda版本，pytorch安装对应的版本。修改 web_demo.py 代码中的模型路径为本地路径。官网：https://pytorch.org/windows私有化部署chatglm。2，python-requests方式。总结：拉源码-装环境-启动程序。1，先进入python环境。2，或直接命令行运行。
复制链接

扫一扫