本地部署自己的chatglm大模型01——先让模型在本地跑起来

最新推荐文章于 2024-05-17 16:47:54 发布

起个新名字8859-1

最新推荐文章于 2024-05-17 16:47:54 发布

阅读量681

点赞数 9

文章标签：人工智能 nlp

本文链接：https://blog.csdn.net/qq_37686944/article/details/138017095

版权

一、下载

demo：https://github.com/THUDM/ChatGLM3?tab=readme-ov-file
model（按需下载）：https://huggingface.co/THUDM/chatglm3-6b
在这里插入图片描述

二、安装依赖

1、在虚拟环境（.venv）中拉取requirements.txt中指定的依赖包

(.venv) E:\Agi\ChatGLM3-main> pip install -r requirements.txt

2、安装pytorch（GPU版）：https://pytorch.org/get-started/locally/#windows-pip

看清楚自己的cuda版本，不知道的命令行通过nvidia-msi查看。如果是12.x，按下面的下载

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

在这里插入图片描述

三、执行Demo

1、修改cli_demo.py中的model路径（比如模型放在了E:/Agi/chatglm-6b）

MODEL_PATH = os.environ.get('MODEL_PATH', 'E:/Agi/chatglm-6b')

2、执行主程序，和chatglm对话。如果到此一切ok，那你可以关掉这篇博客了；如果有报错，请继续看第四部分
在这里插入图片描述

四、报错解决

问题1

ChatGLMTokenizer‘ object has no attribute ‘sp_tokenizer

解决方法

requirements.txt中指定的transformers依赖版本过高，考虑从>=4.39.x降低到==4.33.2

问题2

Cannot install -r requirements.txt (line 4) and tokenizers>=0.15.0 because these package versions have conflicting dependencies

解决方法

按提示，降tokenizers版本。如transformers指定==4.33.2，考虑将tokenizers降低到<0.14.0

问题3

RuntimeError: “LayerNormKernelImpl“ not implemented for ‘Half

解决方法

1、pytorch安装的有问题
核对下载的pytorch版本是否正确
2、显卡不支持
核对下载的模型精度、显卡计算能力是否匹配。比如RTX4060的计算能力版本是8.9，能够支持FP16数据格式类型。
https://developer.nvidia.com/cuda-gpus
https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix

问题4

expected scalar type Half but found Float

解决方法

在调用模型的cli_demo.py文件，指定数据格式类型为半精度，即初始化时增加.half()

model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, device_map="auto").half().eval()

问题5

ValueError: not enough values to unpack

解决方法

将stream_chat方法出参从3个改成2个，删掉past_key_values。参考https://juejin.cn/post/7293786247654441001

for response, history in model.stream_chat(tokenizer, query, history=history, top_p=1,
                                                                   temperature=0.01,
                                                                   past_key_values=past_key_values,
                                                                   return_past_key_values=True):