Windows 11 DeepSpeed若干报错问题解决
1. 问题:Unable to pre-compile async_io
问题描述
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-4_iivh0f\deepspeed_07db96ef892c4625952709f60a01fd85\setup.py", line 163, in <module>
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-4_iivh0f\deepspeed_07db96ef892c4625952709f60a01fd85\setup.py", line 51, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
Setting ds_accelerator to cuda (auto detect)
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
[end of output]
解决方法
- 首先通过git下载deepspeed:
git clone https://github.com/microsoft/DeepSpeed.git
- 运行如下命令:
Set-Item Env:\DS_BUILD_OPS 0
- 运行cmd或PowerShell,进入deepspeed文件夹下,运行build_win.bat编译DeepSpeed。
- 进入dist文件夹,运行pip install命令进行安装,即可安装成功
pip install .\deepspeed-0.14.1+unknown-py3-none-any.whl
2. 问题:ModuleNotFoundError: No module named ‘deepspeed.accelerator’
问题描述
...
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
No module named 'deepspeed.accelerator'
解决方法
用github上下载的deepspeed包进行安装的,是由于离线安装和在线的pip install安装方法在加载该包时方法不一样,因此会报如上错误。到离线包的位置找到accelerator,将该包拷贝到相应位置即可。
3.问题:ModuleNotFoundError: No module named ‘deepspeed.ops.op_builder’
问题描述
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
No module named 'deepspeed.ops.op_builder'
解决方法
出现该问题是因为
- 从pypi安装的torch、torchvivsion、torchaudio,没有直接从pytorch官网安装。
- 安装的CPU版本的torch,没有安装GPU版本的torch
可以运行如下命令进行安装:
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
安装完成以后没有该报错了。