错误现象
x86_64 ubuntu20.04操作系统上制作的docker,在深度20.8上运行测试报错
root@400ffcdf5dce:/opt/roop/roop# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
python报错
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/roop/roop/run.py", line 6, in <module>
core.run()
File "/opt/roop/roop/roop/core.py", line 235, in run
start()
File "/opt/roop/roop/roop/core.py", line 166, in start
if not frame_processor.pre_start():
File "/opt/roop/roop/roop/processors/frame/face_swapper.py", line 28, in pre_start
elif not get_one_face(cv2.imread(roop.globals.source_path)):
File "/opt/roop/roop/roop/face_analyser.py", line 20, in get_one_face
face = get_face_analyser().get(frame)
File "/opt/roop/roop/roop/face_analyser.py", line 14, in get_face_analyser
FACE_ANALYSER = insightface.app.FaceAnalysis(name='buffalo_l', providers=roop.globals.execution_providers)
File "/usr/lib/python3.10/site-packages/insightface/app/face_analysis.py", line 31, in __init__
model = model_zoo.get_model(onnx_file, **kwargs)
File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 96, in get_model
model = router.get_model(providers=providers, provider_options=provider_options)
File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 40, in get_model
session = PickableInferenceSession(self.onnx_file, **kwargs)
File "/usr/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 25, in __init__
super().__init__(model_path, **kwargs)
File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 394, in __init__
raise fallback_error from e
File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 389, in __init__
self._create_inference_session(self._fallback_providers, None)
File "/usr/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=394874859 ; hostname=400ffcdf5dce ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id);
安装cuda-11.8失败
宿主机nvidia驱动与原来的ubuntu20.04不一致(容器要求的驱动版本更高),与容器内cuda不匹配。
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb && rm cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-11-8-local/cuda-368EAC11-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
发现没有对应的cuda版本
CUDA Toolkit 11.8 Downloads | NVIDIA Developer
重新下载升级驱动(成功)
下载最新驱动
https://us.download.nvidia.cn/XFree86/Linux-x86_64/535.54.03/NVIDIA-Linux-x86_64-535.54.03.run
安装,重启,恢复正常
显卡驱动安装过程及故障处理参考
Deepin 20.08 linux 升级nvidia驱动 黑屏 报错nvrm api mismatch_hkNaruto的博客-CSDN博客