LLM学习笔记(2)--本地部署ChatGLM3

yali325

已于 2024-09-06 16:14:02 修改

阅读量416

点赞数 5

文章标签：笔记深度学习人工智能

于 2024-09-05 22:26:22 首次发布

本文链接：https://blog.csdn.net/2303_81265404/article/details/141941748

版权

报错一：ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

编辑解决方法：（适用于gpu加载模型）

报错二：C:\Users\yali\.cache\huggingface\modules\transformers_modules\models\modeling_chatglm.py:226: UserWarning: 1Torch was not compiled with flash attention.

原因：

解决方法：不删除那个文件夹的数据就行。

配置来配置去，成功配置出一堆bug...

报错一：ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

(ValueError:当前“device_map”已将权重卸载到磁盘。请为它们提供一个“offload_folder”。或者，如果您使用的模型提供这种格式的权重，请确保安装了“安全系数”。)

解决方法：（适用于gpu加载模型）

将你想运行的相应模型的python文件的 device_map 参数的默认调用地址改为 “cuda” 即可解决（原本这个参数的默认值是“Auto”）

比如composite_demo中的client.py中
AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, device_map="cuda").eval()

报错二：C:\Users\yali\.cache\huggingface\modules\transformers_modules\models\modeling_chatglm.py:226: UserWarning: 1Torch was not compiled with flash attention.

(Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,

具体报错截图：

（pyCharm）

(命令行）

原因：

第一次需要加载模型，地址在“C:\Users\用户名\.cache\huggingface”，这个删除之后就会报错

经过看chatglm3文档后发现这是他必须要加载的模型

解决方法：不删除那个文件夹的数据就行。