问题
(1)进入Docker容器发现 CUDA驱动报错,无法使用GPU训练。
[Hint: 'cudaErrorNoDevice'.
This indicates that no CUDA-capable devices were detected by the installed CUDA driver. ]
(at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:67)
(2)执行 nvidia-smi
也有问题。
Failed to initialize NVML: Unknown Error
解决
重启docker 容器可以解决
docker restart 容器