组件 | 版本 |
---|---|
mindspore | 2.1.1 |
CANN | 6.3.RC2_linux-aarch64 |
mindformers | dev |
硬件 | atlas 310I pro *2 |
用途,准备用这两张推理卡部署chatglm3,没有完整的教程,自己摸索,在基础组件安装完成后,准备生成hccl json文件的时候,执行
(ascend_py39) [root@xctest1 mindformers]# python ./mindformers/tools/hccl_tools.py --device_num "[0,8)" --server_ip=10.23.13.83
start /root/llm/mind/mindformers/./mindformers/tools/hccl_tools.py
visible_devices:['0', '1', '2', '3', '4', '5', '6', '7']
server_id:10.23.13.83
device_num_list: [0, 1, 2, 3, 4, 5, 6, 7]
/bin/sh: hccn_tool: command not found
Failed to call hccn_tool, try to read /etc/hccn.conf instead
Traceback (most recent call last):
File "/root/llm/mind/mindformers/./mindformers/tools/hccl_tools.py", line 175, in <module>
main()
File "/root/llm/mind/mindformers/./mindformers/tools/hccl_tools.py", line 149, in main
device_ip = device_ips[device_id]
KeyError: '0'
复制
请教下大佬这是什么问题,本地显卡只有两张
npu-smi info
****************************************************解答*****************************************************
mindformers
版本匹配关系
当前支持的硬件为Atlas 800训练服务器与Atlas 800T A2训练服务器。
所有纯推理的310大概率会遇到其他问题。
另外json随便修改下就行了
下面的ip改下 就行
{
"version": "1.0",
"server_count": "1",
"server_list": [
{
"server_id": "10.*.*.*",
"device": [
{"device_id": "0","device_ip": "192.1.*.6","rank_id": "0"},
{"device_id": "1","device_ip": "192.2.*.6","rank_id": "1"}],
"host_nic_ip": "reserve"
}
],
"status": "completed"
}