大模型部署：chatGLM2-6b-int4的CPU版部署

hzp666

已于 2024-05-15 11:56:23 修改

阅读量350

点赞数

分类专栏：机器学习文章标签： GLM2 pycharm AI AIGC chatGLM2

于 2023-07-25 16:20:35 首次发布

本文链接：https://blog.csdn.net/hzp666/article/details/131918360

版权

机器学习专栏收录该内容

80 篇文章

订阅专栏

先看效果：

1.下载代码

1.1主代码

主代码网址：

GitHub - THUDM/ChatGLM2-6B: ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

可以保存到本地文件夹内，任意位置都可以（最好不要带中文和空格）。

eg: C:\work\20230724GLM\ChatGLM2-6B-main

最终效果：

1.2 int4量化参数下载

在本地新建一个文件夹，可以跟上边的主代码并列的位置。

eg:C:\work\20230724GLM\chatglm2-6b-int4

在清华的地址下载

THUDM/chatglm2-6b-int4 at main

下载文件夹所有文件，一共11个文件：

PS：还要增加2个文件，quantization_kernels.c 和 quantization_kernels_parallel.c 这两个文件要放到chatglm2-6b-int4 量化参数文件夹内。

这两个文件在chatglm-6b-int4 第一代中有的，在chatglm2-6b-int4里边是没有的。

下载地址：

THUDM/chatglm-6b-int4 at main

最终效果：

2.安装依赖

在主代码文件夹内有个requirements.txt 文件夹，需要把里边的依赖包都安装上。

这里有几个坑

2.1 安装torch，总是下载失败，可能需要科科学学上网挂梯子等等。

2.2 用pycharm里边安装包很慢而且容易报错。

解决办法：在打开主代码项目后，点击左下角的命令行 terminal，然后pip install 安装包

这里有部分安装包

https://download.csdn.net/download/hzp666/88093824

2.3 安装TDM-GCC

需要gcc来编辑 quantization_kernels.c 和 quantization_kernels_parallel.c 文件

https://download.csdn.net/download/hzp666/88101341

如果代码运行后报错：

No compiled kernel found.
Compiling kernels : C:\Users\DuFei\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\DuFei\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c -shared -o C:\Users\DuFei\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so
Kernels compiled : C:\Users\DuFei\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers

那就是没有gcc 或者没有那两个文件。

3.修改代码

目录结构中，有4个文件，分别是API模式、CLI命令行、web、streamlit版 web（所以web2_demo.py 需要用 streamlit run web2_demo.py来启动）

只需要修改4个文件中的2行代码，改的内容和位置都一样。（不用把全部文件都改，需要用哪种模式就改哪个文件）

3.1 修改模型路径

改路径为我们int4量化参数的文件夹

3.2修改为cpu模式

修改前：

model = AutoModel.from_pretrained("C:\\work\\20230724GLM\\chatglm2-6b-int4", trust_remote_code=True).cuda()

修改后

model = AutoModel.from_pretrained("C:\\work\\20230724GLM\\chatglm2-6b-int4", trust_remote_code=True).float()

4.启动程序

在terminal里边输入：streamlit run web2_demo.py