这个问题在google上能搜到各种版本,其实都没有简单直接地解决问题,有让你改cuda版本的,有让你重装环境的,总之代价都非常大
AttributeError: ‘DeepSpeedCPUAdam’ object has no attribute ‘ds_opt_adam’
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7b588b8720>
Traceback (most recent call last):
File "/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
解决过程
1、命令行输入
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
报错
deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.7 does not match the version torch was compiled with 12.1, unable to compile cuda/cpp extensions without a matching cuda version.
2、进到源码
/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py
if sys_cuda_version != torch_cuda_version:
return True
if (cuda_major in cuda_minor_mismatch_ok and sys_cuda_version in cuda_minor_mismatch_ok[cuda_major]
and torch_cuda_version in cuda_minor_mismatch_ok[cuda_major]):
print(f"Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda} "
"but since the APIs are compatible, accepting this combination")
return True
elif os.getenv("DS_SKIP_CUDA_CHECK", "0") == "1":
print(
f"{WARNING} DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}."
"Detected `DS_SKIP_CUDA_CHECK=1`: Allowing this combination of CUDA, but it may result in unexpected behavior."
)
return True
raise CUDAMismatchException(
f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}, unable to compile "
"cuda/cpp extensions without a matching cuda version.")
return True
说是系统cuda和torch的cuda版本不匹配,我们直接改成不检查cuda版本
修改成
if sys_cuda_version != torch_cuda_version:
return True
if (cuda_major in cuda_minor_mismatch_ok and sys_cuda_version in cuda_minor_mismatch_ok[cuda_major]
and torch_cuda_version in cuda_minor_mismatch_ok[cuda_major]):
print(f"Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda} "
"but since the APIs are compatible, accepting this combination")
return True
elif os.getenv("DS_SKIP_CUDA_CHECK", "0") == "1":
print(
f"{WARNING} DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}."
"Detected `DS_SKIP_CUDA_CHECK=1`: Allowing this combination of CUDA, but it may result in unexpected behavior."
)
return True
raise CUDAMismatchException(
f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
f"version torch was compiled with {torch.version.cuda}, unable to compile "
"cuda/cpp extensions without a matching cuda version.")
return True
3、修改完后再次在命令行执行
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
输出以下内容表示大功告成
[2024-03-26 15:53:59,618] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py311_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/TH -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -c /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
[2/4] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/TH -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
[3/4] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/TH -isystem /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/envs/zhangzc/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[4/4] c++ cpu_adam.o cpu_adam_impl.o custom_cuda_kernel.cuda.o -shared -lcurand -L/root/miniconda3/envs/zhangzc/lib/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cpu_adam.so
Loading extension module cpu_adam...
Time to load cpu_adam op: 38.48683714866638 seconds
总结
其实可以不改源码的,看上面的源码就知道,可以通过设置环境变量的方式跳过cuda检查:
在执行代码前加上
DS_SKIP_CUDA_CHECK=1