MindIE跑大模型出现错误及解决方式
错误一、出现错误:lib/libatb_speed_torch.so: undefined symbol: _ZN6at_npu6native14get_npu_formatERKN2at6TensorE
2024-09-13 14:51:31,896 [ERROR] model.py:30 - [Model] >>> Exception:/usr/local/Ascend/atb-models/lib/libatb_speed_torch.so: undefined symbol: _ZN6at_npu6native14get_npu_formatERKN2at6TensorE
Traceback (most recent call last):
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/model_wrapper/model.py", line 28, in initialize
return self.python_model.initialize(config)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/model_wrapper/standard_model.py", line 28, in initialize
self.generator = Generator(
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/text_generator/generator.py", line 41, in __init__
self.generator_backend = get_generator_backend(model_config)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 17, in get_generator_backend
return generator_cls(model_config)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 16, in __init__
super().__init__(model_config)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 17, in __init__
self.model_wrapper = get_model_wrapper(model_config, backend_type)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 16, in get_model_wrapper
return wrapper_cls(**model_config)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 36, in __init__
self.model_runner.load_weights()
File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 84, in load_weights
self.model = self.model_cls(self.config, weights)
File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 17, in __init__
super().__init__(config, weights)
File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 21, in __init__
load_atb_speed()
File "/usr/local/Ascend/atb-models/atb_llm/utils/initial.py", line 28, in load_atb_speed
torch.classes.load_library(lib_path)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/torch/_classes.py", line 51, in load_library
torch.ops.load_library(path)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/site-packages/torch/_ops.py", line 852, in load_library
ctypes.CDLL(path)
File "/root/miniconda3/envs/MindIE_1.0.RC2/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/Ascend/atb-models/lib/libatb_speed_torch.so: undefined symbol: _ZN6at_npu6native14get_npu_formatERKN2at6TensorE
2024-09-13 14:51:31,899 [ERROR] model.py:33 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}
- 解决方式:
跟这里的版本要求,解决相关安装对应版本的torch-npu
https://www.hiascend.com/developer/download/community/result?module=ie+pt+cann
下载地址:
https://gitee.com/ascend/pytorch/releases/tag/v6.0.rc2-pytorch2.1.0
如我需要安装:torch_npu-2.1.0.post6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 这个版本就解决了。