启动“docker run --gpus all ...”时报错:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1c530a4ee5d0941a9cf96799547e522a8629fe3def7d05d8024faf94684621af/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
报错原因
该镜像是在Ubuntu环境下创建的,而在WSL下使用nvidia-docker启动该镜像时会报错。
解决方案
1. 使用docker而不是nvidia-docker启动原始镜像下的容器(去掉--gpus all)
docker run -it --name=my-container --rm my-image:1.0
2、在该容器中删除 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 和文件 /usr/lib/x86_64-linux-gnu/libcuda.so.1
rm /usr/lib/x86_64-linux-gnu/libnvidia-*
rm /usr/lib/x86_64-linux-gnu/libcuda.so*
3、新开一个终端,把此时的容器打包为镜像
docker commit my-container my-image:1.1
4、使用nvidia-docker启动上一步打包的镜像,变为新的带有GPU的容器
docker run --gpus all -it --name=my-container --rm my-image:1.1