Linux服务器配置onnxruntime-gpu

伪_装

已于 2025-01-17 13:35:54 修改

阅读量1.7k

点赞数 18

分类专栏： Bug 计算机视觉深度学习文章标签：人工智能深度学习 onnx

于 2025-01-16 18:04:33 首次发布

本文链接：https://blog.csdn.net/weixin_62828995/article/details/145188826

版权

深度学习同时被 3 个专栏收录

24 篇文章

订阅专栏

计算机视觉

15 篇文章

订阅专栏

Bug

8 篇文章

订阅专栏

本文实现 onnxruntime-gpu 不依赖于服务器主机上 cuda 和 cudnn，仅使用虚拟环境中的 cuda 依赖包实现 onnx GPU 推理加速的安装教程。为了适配推理节点，因此我们仅在 base 下配置环境，不需要重新创建新的虚拟环境。

升级 pip

pip install --upgrade pip

安装 `Pytorch`

首先需要查看系统可安装的 cuda 版本

# nvidia-smi
Thu Jan 16 01:04:13 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          On  | 00000000:38:00.0 Off |                    0 |
| N/A   46C    P0              71W / 300W |    435MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

从上述结果可以看到，我们可以安装 CUDA12.2 以下版本的 GPU 版本的 Torch，根据官网 Previous PyTorch Versions | PyTorch 可以查看安装命令：

conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia

执行以后键入 y 即可，这里我们选择 CUDA12.1 版本的 Torch2.5.0 版本即可，安装完建议再执行一次上述命令，因为有时候会因为网络原因，导致部分依赖包并未安装完整，因此，我们建议再执行一次。

测试安装是否成功安装 GPU 版本

python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0), torch.version.cuda)"

安装 `onnxruntime-gpu`

查看

随便指定一个比较大的版本，即可查看可以安装的 onnxruntime-gpu 版本

pip install onnxruntime-gpu==1.88

输出即为所有可安装的版本号

from versions: 1.11.0, 1.11.1, 1.12.0, 1.12.1, 1.13.1, 1.14.0, 1.14.1, 1.15.0, 1.15.1, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.17.0, 1.17.1, 1.18.0, 1.18.1, 1.19.0, 1.19.2)

卸载已经安装的版本

卸载已经安装的 onnxruntime-gpu 和 onnxruntime

pip uninstall onnxruntime-gpu onnxruntime

查看 `libcublasLt.so.` 的版本

此外，我们还需要查看已经安装 libcublasLt.so. 的版本，因为不同的 libcublasLt.so. 所支持的 onnxruntime-gpu 是不一样的，可参考下表对应：

`libcublasLt.so.`	`onnxruntime-gpu`
11	1.18.×、1.17.×
12	1.19.×

为了查看 libcublasLt.so. 的版本，我们还需要安装 mlocate 依赖包

sudo apt-get update
sudo apt-get install mlocate

通过打印 libcublasLt.so. 的目录来查看已经安装的版本

updatedb
locate libcublasLt.so.11          
locate libcublasLt.so.12

输出的地址即为安装目录，那么按照上述表格内容安装即可。

再根据自己的 CUDA 版本在官网NVIDIA - CUDA | onnxruntime 中找到自己所对应的 onnxruntime-gpu 版本。

在这里插入图片描述

卸载原来的 onnxruntime-gpu，并安装新的 onnxruntime-gpu 版本。根据我的 CUDA12.1 版本所对应，安装 1.19.0 版本的 onnxruntime-gpu。

`onnxruntime-gpu` 测试

python 文件测试代码如下：

import onnxruntime as ort

def init_gpu(model_path):
    providers = ["CUDAExecutionProvider"] if ort.get_device() == 'GPU' else ['CPUExecutionProvider']
    session = ort.InferenceSession(model_path, providers=providers)
    print(f"# 模型初始化完成，所在的设备：{session.get_providers()}")

if __name__ == '__main__':
    model_path = './onnx_xx.onnx'
    init_gpu(model_path)

根据上述代码测试如果打印 模型初始化完成，所在的设备：['CUDAExecutionProvider', 'CPUExecutionProvider'] 则代表已经成功初始化到 GPU 上了。

问题概述

numpy 版本问题

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with ‘pybind11>=2.12’.

If you are a user of the module, the easiest solution will be to
downgrade to ‘numpy<2’ or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
pip uninstall numpy
安装
pip install numpy==1.24.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

libcublasLt.so 环境变量问题

2025-01-16 05:52:20.219748146 [E:onnxruntime:Default, provider_bridge_ort.cc:1992 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1637 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.12: cannot open shared object file: No such file or directory

2025-01-16 05:52:20.220658808 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:965 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they’re in the PATH, and that your GPU is supported.
=出现这种问题，我们首先需要查看 onnxruntime-gpu 版本是否安装正确，其实大部分原因都是因为没有安装与 cuda 的适配版本，如果排除版本问题，那大概率是环境变量的问题，可以按照下面方法解决。
解决办法，查看 libcublasLt.so.12 地址
方法1
apt-get install sudo
sudo find / -name libcublasLt.so.12
方法2
locate libcublasLt.so.12
配置环境变量
vim ~/.bashrc
添加环境变量
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
生效环境变量
source ~/.bashrc

libcudnn.so环境变量问题

2025-01-16 04:23:22.326215464 [E:onnxruntime:Default, provider_bridge_ort.cc:1548 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

2025-01-16 04:23:22.326932000 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:861 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.
解决办法，查看 libcudnn.so.8 地址
方法1
sudo find / -name libcudnn.so.8
方法2
locate libcudnn.so.8
配置环境变量
vim ~/.bashrc
添加环境变量
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
生效环境变量
source ~/.bashrc

安装 `ultralytics` 依赖包

pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple

Linux服务器配置onnxruntime-gpu

安装 Pytorch

安装 onnxruntime-gpu

查看

卸载已经安装的版本

查看 libcublasLt.so. 的版本

onnxruntime-gpu 测试

问题概述

numpy 版本问题

libcublasLt.so 环境变量问题

libcudnn.so环境变量问题

安装 ultralytics 依赖包

安装 `Pytorch`

安装 `onnxruntime-gpu`

查看 `libcublasLt.so.` 的版本

`onnxruntime-gpu` 测试

安装 `ultralytics` 依赖包