UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11060)

最新推荐文章于 2024-04-11 16:35:02 发布

匿名的魔术师

最新推荐文章于 2024-04-11 16:35:02 发布

阅读量2.5k

点赞数 23

文章标签：深度学习人工智能 python

本文链接：https://blog.csdn.net/allrubots/article/details/136559021

版权

问题描述

问题原因

解决办法

问题描述

跑LLM时遇到该问题，平常跑的时候也可能会遇到该问题。

问题原因

nvidia 驱动器的版本和安装环境中的 torch和cuda版本不兼容，不匹配。nvidia driver决定着系统最高可以支持什么版本的cuda和cudatoolkit，是向下兼容的。

下面的参考来源于这里

CUDA是显卡厂商NVIDIA推出的运算平台。CUDA™是一种由NVIDIA推出的通用并行计算架构，是一种并行计算平台和编程模型，该架构使GPU能够解决复杂的计算问题。CUDA英文全称是Compute Unified Device Architecture。

CUDA Toolkit可以理解成一个工具包，主要包含了CUDA-C和CUDA-C++编译器、一些科学库和实用程序库、CUDA和library API的代码示例、和一些CUDA开发工具。cudatookit版本有时会被简称为cuda版本。

cuDNN的全称为NVIDIA CUDA® Deep Neural Network library，是NVIDIA专门针对深度神经网络中的基础操作而设计基于GPU的加速库。cuDNN为深度神经网络中的标准流程提供了高度优化的实现方式。

pytorch是基于CUDA的深度学习框架，因此，pytorch的版本必须依赖于cuda toolkit的版本。

解决办法

首先，看一下 nvidia 驱动的版本

(llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# nvidia-smi
Fri Mar  8 04:16:34 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:04:00.0 Off |                  N/A |
| 22%   28C    P8     3W / 250W |      0MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:05:00.0 Off |                  N/A |
| 22%   26C    P8     3W / 250W |      0MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

右上角显示安装的 nvidia 驱动版本为11.6

接下来查看当前环境安装的torch的信息

(llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# python 
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.2.1+cu121
>>> exit()

可以看到环境中安装的是 torch2.2.1+cu121，与驱动版本不一致，或者说大于驱动版本。此时可以再继续探讨一下，发现此时的GPU也是不可用的

(llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# python
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/root/anaconda3/envs/llama/lib/python3.8/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11060). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

所以，需要对应nvidia 驱动版本去下载对应的torch+cuda，去pytorch官网查看所需的安装版本，例如，这里我的情况所选用的是

然后卸载之前的torch，再继续安装这个新的即可

(llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# pip uninstall torch
(llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

发现GPU可用了

llama) root@046cf2400456:~/data/zjx/Code-subject/LLaMA/llama-main# python
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> exit()