[已解决] Linux 安装 CUDA 成功实践(指定版本-添加环境变量-无管理员权限)-Command ‘nvcc‘ not found-CUDA_HOME environment variable

目录

背景:

本帖可解决的问题1:

本帖可解决的问题2:

原因分析:

步骤 1:手动下载 NVIDIA 官方 CUDA 工具包

核心思路

步骤 2:以非管理员权限安装 CUDA 工具包

2.1 运行安装包并跳过驱动安装

2.2 验证安装结果

步骤 3:配置环境变量(关键)

3.1 临时配置(当前终端生效)

3.2 永久配置(所有终端生效)

步骤 4:验证 nvcc 是否可用

背景:

  • 在将AutoDL上的环境迁移到学校slurm集群时,会有CUDA报错
  • 之前被AutoDL照顾的太好了,AutoDL都是预装CUDA,学校的服务器环境是没有CUDA的
  • 本帖可解决的问题1:

    • 安装mmdet3d需要显卡+CUDA,但是只有cudatoolkit没有CUDA,安装报错:
(sparseocc) schen744@gpu3-11:~/code/sparseocc/mmdetection3d$ pip install -v -e .
Using pip 22.3.1 from /hpc2hdd/home/schen744/.conda/envs/sparseocc/lib/python3.7/site-packages/pip (python 3.7)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///hpc2hdd/home/schen744/code/sparseocc/mmdetection3d
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "/hpc2hdd/home/schen744/.conda/envs/sparseocc/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 2035, in _join_cuda_home
      raise EnvironmentError('CUDA_HOME environment variable is not set. '
  OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /hpc2hdd/home/schen744/.conda/envs/sparseocc/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/hpc2hdd/home/schen744/code/sparseocc/mmdetection3d/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-xinq3w4l
  cwd: /hpc2hdd/home/schen744/code/sparseocc/mmdetection3d/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(sparseocc) schen744@gpu3-11:~/code/sparseocc/mmdetection3d$ nvcc -V

Command 'nvcc' not found, but can be installed with:

apt install nvidia-cuda-toolkit
Please ask your administrator.

(sparseocc) schen744@gpu3-11:~/code/sparseocc/mmdetection3d$

在此之后,我重装了环境,还是会有问题

本帖可解决的问题2:

(sparseocc) schen744@gpu3-9:~/code/sparseocc$ nvidia-smi
Sun Jun  1 17:11:56 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A40                     Off | 00000000:35:00.0 Off |                    0 |
|  0%   29C    P8              33W / 300W |     11MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(sparseocc) schen744@gpu3-9:~/code/sparseocc$ conda list cudatoolkit
# packages in environment at /hpc2hdd/home/schen744/.conda/envs/sparseocc:
#
# Name                    Version                   Build  Channel
cudatoolkit               11.3.1              hb98b00a_13    conda-forge
(sparseocc) schen744@gpu3-9:~/code/sparseocc$ nvcc --version

Command 'nvcc' not found, but can be installed with:

apt install nvidia-cuda-toolkit
Please ask your administrator.

(sparseocc) schen744@gpu3-9:~/code/sparseocc$

原因分析:

nvcc(CUDA 编译器)是 CUDA 工具包的核心组件,我们当前的环境中未安装完整的 CUDA 工具包。虽然通过 conda list 看到了 cudatoolkit=11.3.1,但 Conda 的 cudatoolkit 通常仅包含运行时库(如 libcudart.so),不包含编译器 nvcc 及开发工具

nvidia-smi 显示我们的显卡驱动支持 CUDA 12.2(CUDA Version: 12.2),而 Conda 安装的 cudatoolkit=11.3.1 是兼容的(NVIDIA 驱动支持向下兼容旧版本 CUDA 工具包),因此版本冲突不是当前问题的主因

步骤 1:手动下载 NVIDIA 官方 CUDA 工具包

如果没有管理员权限,可从 NVIDIA CUDA Toolkit 存档 下载与你驱动兼容的 CUDA 版本(如 11.3 或 12.2)。以 CUDA 11.3 为例:

  1. 访问 CUDA Toolkit 11.3 下载页,或者最新的(CUDA Toolkit 12.9 Downloads | NVIDIA Developer)选择对应系统(如 Linux → x86_64 → Ubuntu → 20.04 → runfile)

  1. 按页面提示下载安装包并运行:
    wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run

检查:  

md5sum cuda_11.3.0_465.19.01_linux.run

 输出:

(sparseocc) schen744@gpu3-9:~/code/test$ md5sum cuda_11.3.0_465.19.01_linux.run
406cecd830bb369fa4d3bd6f50a39a7a  cuda_11.3.0_465.19.01_linux.run
(sparseocc) schen744@gpu3-9:~/code/test$

 和官网比对一下:(developer.download.nvidia.cn/compute/cuda/11.3.0/docs/sidebar/md5sum.txt

是一样的,没有问题

安装 CUDA 无需 sudo (管理员权限)

这里很可能会出现没有sudo(管理员)权限的情况,因此可参考以下方案:

“sudo sh cuda_11.3.0_465.19.01_linux.run”这个官方的安装命令很可能因没管理员权限会报错:

在没有管理员权限的情况下,可以通过自定义路径安装 CUDA 工具包来解决 nvcc 缺失的问题。以下是具体步骤:

核心思路

CUDA 的 runfile 安装包支持非系统路径安装(即安装到用户可写的目录,如 ~/cuda),无需 sudo 权限。安装时仅选择安装工具包(Toolkit),不安装驱动(Driver)和系统级组件(如 cuda 软链接到 /usr/local

步骤 2:以非管理员权限安装 CUDA 工具包

我们已下载了 cuda_11.3.0_465.19.01_linux.run 安装包,直接运行并选择自定义路径即可

2.1 运行安装包并跳过驱动安装

执行以下命令启动安装(无需 sudo):

sh cuda_11.3.0_465.19.01_linux.run \
  --toolkit \
  --toolkitpath=/hpc2hdd/home/schen744/code/test/cuda-11.3 \
  --silent

参数说明:

这些参数来自于nvidia的安装包本身,具体解释可通过以下

sh cuda_11.3.0_465.19.01_linux.run -help

命令调出:

  • --toolkit:仅安装 CUDA 工具包(含 nvcc
  • --toolkitpath=/hpc2hdd/home/schen744/code/test/cuda-11.3:将工具包安装到/hpc2hdd/home/schen744/code/test/目录下的 cuda-11.3 目录(可自定义路径,需确保有写入权限)(路径是最容易出错的地方,非常推荐使用绝对路径)(通过pwd命令获得当前位置的绝对路径)
  • --silent:静默安装(避免交互式选择)
2.2 验证安装结果

安装完成后,检查 ~/cuda-11.3 目录是否存在 bin/nvcc 文件:

ls /hpc2hdd/home/schen744/code/test/cuda-11.3/bin/nvcc

若输出路径

则工具包已成功安装

步骤 3:配置环境变量(关键)

需要将 CUDA 工具包的 bin(含 nvcc)和 lib64(含运行时库)路径添加到环境变量中,否则系统无法识别 nvcc

3.1 临时配置(当前终端生效)
export PATH=$HOME/cuda-11.3/bin:$PATH
export LD_LIBRARY_PATH=/hpc2hdd/home/schen744/code/test/cuda-11.3/lib64:$LD_LIBRARY_PATH
3.2 永久配置(所有终端生效)

将上述环境变量添加到 ~/.bashrc(或 ~/.zshrc,根据你使用的 shell):

echo 'export PATH=/hpc2hdd/home/schen744/code/test/cuda-11.3/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/hpc2hdd/home/schen744/code/test/cuda-11.3/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

保存后生效配置:

source ~/.bashrc

步骤 4:验证 nvcc 是否可用

运行以下命令检查 nvcc 版本:

nvcc --version

若输出类似以下信息,则安装成功:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

问题成功解决!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值