docker nvidia-smi Failed to initialize NVML: Driver/library version mismatch RuntimeError: cuda_call

在x86_64Ubuntu20.04系统上创建的Docker容器,在运行时遇到CUDA驱动版本不兼容的问题,导致NVIDIA-SMI无法初始化。在尝试升级CUDA到11.8失败后,通过下载并安装最新的NVIDIA驱动535.54.03解决了这个问题,从而修复了onnxruntime运行时错误。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

错误现象

x86_64 ubuntu20.04操作系统上制作的docker,在深度20.8上运行测试报错

root@400ffcdf5dce:/opt/roop/roop# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

python报错

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/roop/roop/run.py", line 6, in <module>
    core.run()
  File "/opt/roop/roop/roop/core.py", line 235, in run
    start()
  File "/opt/roop/roop/roop/core.py", line 166, in start
    if not frame_processor.pre_start():
  File "/opt/roop/roop/roop/processors/frame/face_swapper.py", line 28, in pre_start
    elif not get_one_face(cv2.imread(roop.globals.source_path)):
  File "/opt/roop/roop/roop/face_analyser.py", line 20, in get_one_face
    face = get_face_analyser().get(frame)
  File "/opt/roop/roop/roop/face_analyser.py", line 14, in get_face_analyser
    FACE_ANALYSER = insightface.app.FaceAnalysis(name='buffalo_l', providers=roop.globals.execution_providers)
  File "/usr/lib/python3.10/site-packages/insightface/app/face_analysis.py", line 31, in __init__
    model = model_zoo.get_model(onnx_file, **kwargs)
  File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 96, in get_model
    model = router.get_model(providers=providers, provider_options=provider_options)
  File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 40, in get_model
    session = PickableInferenceSession(self.onnx_file, **kwargs)
  File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 25, in __init__
    super().__init__(model_path, **kwargs)
  File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 394, in __init__
    raise fallback_error from e
  File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 389, in __init__
    self._create_inference_session(self._fallback_providers, None)
  File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=394874859 ; hostname=400ffcdf5dce ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id); 

安装cuda-11.8失败 

宿主机nvidia驱动与原来的ubuntu20.04不一致(容器要求的驱动版本更高),与容器内cuda不匹配。

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb && rm cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-11-8-local/cuda-368EAC11-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

发现没有对应的cuda版本

CUDA Toolkit 11.8 Downloads | NVIDIA Developer

重新下载升级驱动(成功)

下载最新驱动

https://us.download.nvidia.cn/XFree86/Linux-x86_64/535.54.03/NVIDIA-Linux-x86_64-535.54.03.run

安装,重启,恢复正常

 显卡驱动安装过程及故障处理参考

Deepin 20.08 linux 升级nvidia驱动 黑屏 报错nvrm api mismatch_hkNaruto的博客-CSDN博客

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值