ChatGLM-6B如何推理在MLU370

最新推荐文章于 2025-04-03 00:30:00 发布

小军军军军军军

最新推荐文章于 2025-04-03 00:30:00 发布

阅读量828

点赞数 1

分类专栏：编程应用疑难解答文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/xiaojunjun200211/article/details/131392259

版权

编程应用同时被 2 个专栏收录

22 篇文章

订阅专栏

疑难解答

3 篇文章

订阅专栏

提示：关于如何登录使用370，可以查看上一篇博文

文章目录

前言
一、MLU370
二、操作过程
总结

前言

有各种类型的预训练架构，包括自动编码模型(例如BERT)、自回归模型(例如GPT)和编码器-解码器模型(例如T5)。然而，没有一个预训练框架在三个主要类别的所有任务中表现最好，包括自然语言理解 (NLU)、无条件生成和条件生成。我们提出了一种基于自回归大规模空白填充的通用语言模型(GLM)来解决这一挑战。通过添加二维位置编码和允许任意顺序预测跨度，GLM改进了空白填充预训练，这在NLU任务上比BERT和T5的性能有所提高。同时，通过改变空格的数量和长度，可以对不同类型的任务预训练GLM。在NLU、条件生成和无条件生成的广泛任务中，在相同模型大小和数据的情况下，GLM优于BERT、T5和GPT，并且在单个BERTLarge参数为1.25倍的预训练模型中获得了最佳性能，证明了其对不同下游任务的泛化能力
引自ChatGLM-6B论文

一、MLU370

示例：pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。
本次采用MLU370-M8做示例说明，如需体验M8卡，可阅读前文来获取体验资源

二、操作过程

1.源码下载

代码如下（示例）：

这里我们已经编译完成了代码，直接下载即可，模型选用0.1.0版本。
ChatGLM-6B修改后源码：https://pan.baidu.com/s/1qFDOrpHpX-NFy3q5PhAixQ?pwd=0eek
ChatGLM-6B修改后模型：链接：https://pan.baidu.com/s/1VxHhb0E6tMO17DvSp_0Hpg?pwd=izue
模型改动eg:因为mlu目前支持版本为1.9不支持1.10得skip_init,将skip_init以函数得形式写入即可
transformers-4.27.1：https://github.com/xiaojunjun65/transformers-mlu_4.27.1
需要源码编译，本代码是经过mlu转换过的（具体转换方式可以参考上一篇博文）

2.部署环境

1.进入云平台

可参考上一篇如何使用MLU370博文

(pytorch) root@notebook-glm-6b-y4amqq-notebook-0:/workspace/volume/glm-6b-guojun/glm-6b#

进入之后显示如下，然后将上述源码放入同一目录前面有个pytorch即显示进入环境成功

2.编译transformers_mlu-4.27.1

pip install -e ./transformers_v4.27.1_mlu/

tomli              2.0.1
tomlkit            0.11.7
torch              1.9.0
torch-mlu          1.13.0-torch1.9
torchvision        0.10.0a0+300a8a4
tqdm               4.65.0
transformers       4.27.1           /workspace/volume/glm-6b-guojun/glm-6b/transformers_v4.27.1_mlu
typed-ast          1.5.4
typing             3.7.4.3
typing_extensions  4.5.0
urllib3            1.26.15
wheel              0.38.4
wrapt              1.15.0
zipp               3.15.0

如上图pip list输出结果完成后，即安装成功

pip install sentencepiece

3.修改模型读取

import os
import platform
import signal
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("/workspace/volume/glm-6b-guojun/glm-6b/chatglm-6b_data", trust_remote_code=True)

model = AutoModel.from_pretrained("/workspace/volume/glm-6b-guojun/glm-6b/chatglm-6b_data", trust_remote_code=True).half().mlu()

model = model.eval()

4.运行成功

用户：给我讲一个故事

ChatGLM-6B：我可以讲一个故事。请告诉我你想要听什么样的故事，是浪漫的故事、冒险的故事、科幻的故事还是其他类型的故事中的哪种？

用户：老人与海

ChatGLM-6B：老人与海是一篇著名的海明威小说，讲述了一个叫老人和一个亚拉伯拉罕鱼的故事。

老人是一个年迈的捕鱼人，他住在一个海边的小屋里，每天都去捕鱼。他非常擅长捕鱼，但他的健康状况开始下降，他不得不放弃他的工作。

一天，老人决定去捕一条更大的鱼，但他不知道这条鱼会有多大。他带着他的渔网和鱼叉，开始了他的捕鱼之旅。

他穿过大海，游过许多鲨鱼和海龟的领地，但他并没有找到那条巨大的鱼。最终，他终于找到了鱼叉，但他没有叉到鱼。

老人感到非常失望，但他并没有放弃。他继续游着，直到他终于游到了鱼的位置。他用他的渔网抓住了鱼，但他没有留下它。

老人把鱼带回了小屋，他和他的妻子一起吃掉了它。虽然老人没有捕到那条巨大的鱼，但他从这次经历中学到了很多东西。他意识到，即使他不能完成他的梦想，他仍然可以享受生活。

这个故事告诉我们，即使我们遇到困难，我们也不能放弃。我们应该坚持不懈，直到我们最终成功。

欢迎使用 ChatGLM-6B 模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序

用户：你好和我说下今天北京的天气

ChatGLM-6B：你好，根据我所掌握的信息，今天北京的天气是多云转晴，最高气温约为 18°C，最低气温约为 5°C。此外，今天早晨可能会有些雾霾，需要做好防护措施。希望您度过愉快的一天！

用户：今天的天气适合穿什么类型的衣服

ChatGLM-6B：今天的气温较为适宜，但雾霾天气需要注意防护措施，选择吸湿性强、透气性好的衣物，如棉麻质地的衬衫、轻薄羽绒服、运动夹克等，以保持身体舒适度和健康。同时也要注意搭配合适的鞋子和袜子，避免因为穿着不当而影响活动和出行。

用户：适合去哪里玩

ChatGLM-6B：如果喜欢户外活动，可以前往北京周边的城市，如张家口、天津、德州等地，享受阳光和沙滩。如果喜欢在室内活动，可以去各大商场、公园、博物馆等地，体验不同的文化和艺术。无论选择哪种方式，都需要根据自己的兴趣和喜好来做出选择。

Every 2.0s: cnmon                                                                      notebook-glm-6b-y4amqq-notebook-0: Mon Jun 26 21:47:42 2023

Mon Jun 26 21:47:42 2023
+------------------------------------------------------------------------------+
| CNMON v4.20.18                                                               |
+-------------------------------+----------------------+-----------------------+
| Card  VF  Name       Firmware | Inited        Driver | Util        Ecc-Error |
| Fan   Temp      Pwr:Usage/Cap |         Memory-Usage |         vMemory-Usage |
|===============================+======================+=======================|
| 0     /   MLU370-M8    v1.1.6 | On          v4.20.18 | 38%         N/A       |
|  0%   35C         69 W/ 300 W | 21744 MiB/ 47868 MiB | 31929 MiB/1048576 MiB |
+-------------------------------+----------------------+-----------------------+
| 1     /   MLU370-M8    v1.1.6 | On          v4.20.18 | 0%          N/A       |
|  0%   33C         52 W/ 300 W |     0 MiB/ 47868 MiB | 10240 MiB/1048576 MiB |
+-------------------------------+----------------------+-----------------------+

+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  Card  VF  PID    Command Line                             MLU Memory Usage  |
|==============================================================================|
|  0     /   9159   python                                          21689 MiB  |
+------------------------------------------------------------------------------+

总结

整体运行效果不卡顿，速度很快，下面附一篇大致得修改方向供参考

# Cambricon PyTorch Model Migration Report
## Cambricon PyTorch Changes
| No. |  File  |  Description  |
| 1 | cli_demo.py:7 | change "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()" to "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().mlu() " |
| 2 | web_demo_old.py:5 | change "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()" to "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().mlu() " |
| 3 | api.py:4 | add "import torch_mlu" |
| 4 | api.py:6 | change "DEVICE = "cuda"" to "DEVICE = "mlu" " |
| 5 | api.py:12 | change "if torch.cuda.is_available():" to "if torch.mlu.is_available(): " |
| 6 | api.py:13 | change "with torch.cuda.device(CUDA_DEVICE):" to "with torch.mlu.device(CUDA_DEVICE): " |
| 7 | api.py:14 | change "torch.cuda.empty_cache()" to "torch.mlu.empty_cache() " |
| 8 | api.py:15 | change "torch.cuda.ipc_collect()" to "torch.mlu.ipc_collect() " |
| 9 | api.py:54 | change "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()" to "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().mlu() " |
| 10 | web_demo.py:6 | change "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()" to "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().mlu() " |
| 11 | web_demo2.py:15 | change "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()" to "model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().mlu() " |
| 12 | utils.py:41 | change "model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half().cuda()" to "model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half().mlu() " |
| 13 | ptuning/web_demo.py:6 | add "import torch_mlu" |
| 14 | ptuning/web_demo.py:157 | change "model = model.half().cuda()" to "model = model.half().mlu() " |
| 15 | ptuning/web_demo.py:158 | change "model.transformer.prefix_encoder.float().cuda()" to "model.transformer.prefix_encoder.float().mlu() " |
| 16 | ptuning/trainer.py:59 | add "import torch_mlu" |
| 17 | ptuning/trainer.py:455 | change "# postpone switching model to cuda when:" to "# postpone switching model to mlu when: " |
| 18 | ptuning/trainer.py:561 | change "self.use_cuda_amp = False" to "self.use_mlu_amp = False " |
| 19 | ptuning/trainer.py:597 | change "args.half_precision_backend = "cuda_amp"" to "args.half_precision_backend = "mlu_amp" " |
| 20 | ptuning/trainer.py:604 | change "if args.half_precision_backend == "cuda_amp":" to "if args.half_precision_backend == "mlu_amp": " |
| 21 | ptuning/trainer.py:605 | change "self.use_cuda_amp = True" to "self.use_mlu_amp = True " |
| 22 | ptuning/trainer.py:623 | change "self.scaler = torch.cuda.amp.GradScaler()" to "self.scaler = torch.mlu.amp.GradScaler() " |
| 23 | ptuning/trainer.py:638 | change "and self.use_cuda_amp" to "and self.use_mlu_amp " |
| 24 | ptuning/trainer.py:1335 | change "self.use_cuda_amp = False" to "self.use_mlu_amp = False " |
| 25 | ptuning/trainer.py:2272 | change "if torch.cuda.is_available():" to "if torch.mlu.is_available(): " |
| 26 | ptuning/trainer.py:2274 | change "torch.cuda.random.set_rng_state(checkpoint_rng_state["cuda"])" to "torch.mlu.random.set_rng_state(checkpoint_rng_state["mlu"]) " |
| 27 | ptuning/trainer.py:2277 | change "torch.cuda.random.set_rng_state_all(checkpoint_rng_state["cuda"])" to "torch.mlu.random.set_rng_state_all(checkpoint_rng_state["mlu"]) " |
| 28 | ptuning/trainer.py:2366 | change "if torch.cuda.is_available():" to "if torch.mlu.is_available(): " |
| 29 | ptuning/trainer.py:2369 | change "rng_states["cuda"] = torch.cuda.random.get_rng_state_all()" to "rng_states["mlu"] = torch.mlu.random.get_rng_state_all() " |
| 30 | ptuning/trainer.py:2371 | change "rng_states["cuda"] = torch.cuda.random.get_rng_state()" to "rng_states["mlu"] = torch.mlu.random.get_rng_state() " |
| 31 | ptuning/trainer.py:2607 | change "if self.use_cuda_amp or self.use_cpu_amp:" to "if self.use_mlu_amp or self.use_cpu_amp: " |
| 32 | ptuning/trainer.py:2612 | change "else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)" to "else torch.mlu.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype) " |
| 33 | ptuning/trainer.py:2615 | change "ctx_manager = torch.cuda.amp.autocast()" to "ctx_manager = torch.mlu.amp.autocast() " |
| 34 | ptuning/trainer_seq2seq.py:17 | add "import torch_mlu" |
| 35 | ptuning/main.py:31 | add "import torch_mlu" |