NameError: name ‘CUDA_RUNTIME_LIB‘ is not defined

在尝试运行Python脚本时,遇到了与CUDA库相关的问题。错误表明系统无法找到所需的libcudart.so.11.0文件。通过查找和复制正确的CUDA库文件到CONDA_PREFIX/lib目录解决了这个问题,但随后又出现了libcusparse.so.11的缺失问题。建议从源代码编译bitsandbytes库以匹配正确的CUDA版本。
摘要由CSDN通过智能技术生成

WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Traceback (most recent call last):
  File "finetune.py", line 6, in <module>
    import bitsandbytes as bnb
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 7, in <module>
    from .autograd._functions import (
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
    from ._functions import undo_layout, get_inverse_transform_indices
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in <module>
    import bitsandbytes.functional as F
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 13, in <module>
    setup.run_cuda_setup()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 101, in run_cuda_setup
    binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 382, in evaluate_cuda_setup
    cudart_path = determine_cuda_runtime_lib_path()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 247, in determine_cuda_runtime_lib_path
    CUDASetup.get_instance().add_log_entry(f'{candidate_env_vars["CONDA_PREFIX"]} did not contain '
NameError: name 'CUDA_RUNTIME_LIB' is not defined
Traceback (most recent call last):
  File "finetune.py", line 6, in <module>
    import bitsandbytes as bnb
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 7, in <module>
    from .autograd._functions import (
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
    from ._functions import undo_layout, get_inverse_transform_indices
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in <module>
    import bitsandbytes.functional as F
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 13, in <module>
    setup.run_cuda_setup()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 101, in run_cuda_setup
    binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 382, in evaluate_cuda_setup
    cudart_path = determine_cuda_runtime_lib_path()
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 247, in determine_cuda_runtime_lib_path
    CUDASetup.get_instance().add_log_entry(f'{candidate_env_vars["CONDA_PREFIX"]} did not contain '
NameError: name 'CUDA_RUNTIME_LIB' is not defined
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 372341) of binary: /home/gaosong/anaconda3/envs/vicuna8/bin/python
Traceback (most recent call last):
  File "/home/gaosong/anaconda3/envs/vicuna8/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-06-08_15:31:06
  host      : server
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 372342)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-06-08_15:31:06
  host      : server
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 372341)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

这个错误原因是查找cuda安装目录报的错

如果可以直接找到,就不会报这个错了

echo $CONDA_PREFIX 可以看到目录位置在

cd $CONDA_PREFIX/lib

检查是否存在

ls libcudart.so.11.0, 提醒不存在, 不要问我为什么叫  libcudart.so.11.0, 这个是我本机其它环境有的这个版本, 而且其它环境可用的

sudo find / -name 'libcudart.so.11.0'

找到此文件,复制到 $CONDA_PREFIX/lib 目录

我的目录是 

cp /work1/home/gaosong/anaconda3/envs/gpt/lib/libcudart.so.11.0 $CONDA_PREFIX/lib

然后接着报错

CUDA SETUP: CUDA runtime path found: /home/gaosong/anaconda3/envs/vicuna8/lib/libcudart.so.11.0

CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/gaosong/anaconda3/envs/vicuna8/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
libcusparse.so.11: cannot open shared object file: No such file or directory
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=117 make cuda11x 

尝试这个版本

cd $CONDA_PREFIX/lib
rm -rf libcudart.so.11.0 
cp /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudart.so.12 ./
mv libcudart.so.12 libcudart.so.12.0

# 升级到0.38.0 此处报错, 注释掉 

# if USE_8bit is True:

#     assert bnb.__version__ >= '0.37.2', "Please downgrade bitsandbytes's version, for example: pip install bitsandbytes==0.37.2"

       

CMake Warning: Ignoring extra path from command line: "../openMVS" -- Detected version of GNU GCC: 94 (904) Compiling with C++17 CMake Error at /home/xujx/.local/lib/python3.8/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:751 (message): Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed. Compiler: /usr/bin/nvcc Build flags: Id flags: --keep;--keep-dir;tmp -v The output was: 255 #$ _SPACE_= #$ _CUDART_=cudart #$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin #$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin #$ _TARGET_SIZE_= #$ _TARGET_DIR_= #$ _TARGET_SIZE_=64 #$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice #$ PATH=/usr/lib/nvidia-cuda-toolkit/bin:/usr/local/cuda-11.8/bin:/home/xujx/anaconda3/bin:/home/xujx/anaconda3/condabin:/home/xujx/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin #$ LIBRARIES= -L/usr/lib/x86_64-linux-gnu/stubs -L/usr/lib/x86_64-linux-gnu #$ rm tmp/a_dlink.reg.c #$ gcc -D__CUDA_ARCH__=300 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=243 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --allow_managed -arch compute_30 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name "CMakeCUDACompilerId.cu" --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o "tmp/CMakeCUDACompilerId.ptx" #$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o "tmp/CMakeCUDACompilerId.sm_30.cubin" ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name' # --error 0xff -- Call Stack (most recent call first): /home/xujx/.local/lib/python3.8/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD) /home/xujx/.local/lib/python3.8/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test) /home/xujx/.local/lib/python3.8/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:307 (CMAKE_DETERMINE_COMPILER_ID) CMakeLists.txt:109 (ENABLE_LANGUAGE)是什么问题
07-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值