Docker容器搭建运行GPU深度学习环境

nvidia-smi

Wed Nov 24 13:44:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:21:00.0 Off |                  N/A |
|  0%   35C    P8    21W / 370W |    180MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2341      G   /usr/lib/xorg/Xorg                167MiB |
|    0   N/A  N/A      2445      G   /usr/bin/gnome-shell               11MiB |
+-----------------------------------------------------------------------------+

在这里插入图片描述
在这里插入图片描述
使用不了gpu,查找原因

进入docker
nvidia-smi
bash: nvidia-smi: command not found
docker run -itd --gpus all --name rv -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all wq001/rastertorch
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.
ERRO[0000] error waiting for container: context canceled
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I1124 05:37:22.076466 139016 nvc.c:372] initializing library context (version=1.6.0, build=dd2c49d6699e4d8529fbeaa58ee91554977b652e)
I1124 05:37:22.076510 139016 nvc.c:346] using root /
I1124 05:37:22.076516 139016 nvc.c:347] using ldcache /etc/ld.so.cache
I1124 05:37:22.076519 139016 nvc.c:348] using unprivileged user 65534:65534
I1124 05:37:22.076535 139016 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1124 05:37:22.076634 139016 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
I1124 05:37:22.078579 139017 nvc.c:274] loading kernel module nvidia
I1124 05:37:22.078778 139017 nvc.c:278] running mknod for /dev/nvidiactl
I1124 05:37:22.078815 139017 nvc.c:282] running mknod for /dev/nvidia0
I1124 05:37:22.078835 139017 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I1124 05:37:22.085097 139017 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I1124 05:37:22.085235 139017 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I1124 05:37:22.087446 139017 nvc.c:292] loading kernel module nvidia_uvm
I1124 05:37:22.087503 139017 nvc.c:296] running mknod for /dev/nvidia-uvm
I1124 05:37:22.087578 139017 nvc.c:301] loading kernel module nvidia_modeset
I1124 05:37:22.087630 139017 nvc.c:305] running mknod for /dev/nvidia-modeset
I1124 05:37:22.087983 139018 driver.c:101] starting driver service
I1124 05:37:22.089796 139016 nvc_info.c:758] requesting driver information with ''
I1124 05:37:22.090886 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.495.44
I1124 05:37:22.090931 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.495.44
I1124 05:37:22.090973 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.495.44
I1124 05:37:22.091011 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.495.44
I1124 05:37:22.091058 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.495.44
I1124 05:37:22.091101 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.495.44
I1124 05:37:22.091137 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.495.44
I1124 05:37:22.091167 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.495.44
I1124 05:37:22.091208 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.495.44
I1124 05:37:22.091234 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.495.44
I1124 05:37:22.091261 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.495.44
I1124 05:37:22.091292 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.495.44
I1124 05:37:22.091334 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.495.44
I1124 05:37:22.091373 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.495.44
I1124 05:37:22.091401 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.495.44
I1124 05:37:22.091435 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.495.44
I1124 05:37:22.091476 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.495.44
I1124 05:37:22.091517 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.495.44
I1124 05:37:22.091670 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.495.44
I1124 05:37:22.091761 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.495.44
I1124 05:37:22.091793 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.495.44
I1124 05:37:22.091825 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.495.44
I1124 05:37:22.091855 139016 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.495.44
I1124 05:37:22.091903 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.495.44
I1124 05:37:22.091937 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.495.44
I1124 05:37:22.091982 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.495.44
I1124 05:37:22.092024 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.495.44
I1124 05:37:22.092054 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.495.44
I1124 05:37:22.092095 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.495.44
I1124 05:37:22.092123 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.495.44
I1124 05:37:22.092150 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.495.44
I1124 05:37:22.092182 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.495.44
I1124 05:37:22.092224 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.495.44
I1124 05:37:22.092264 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.495.44
I1124 05:37:22.092293 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.495.44
I1124 05:37:22.092325 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.495.44
I1124 05:37:22.092379 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libcuda.so.495.44
I1124 05:37:22.092429 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.495.44
I1124 05:37:22.092460 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.495.44
I1124 05:37:22.092489 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.495.44
I1124 05:37:22.092519 139016 nvc_info.c:171] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.495.44
W1124 05:37:22.092533 139016 nvc_info.c:397] missing library libnvidia-nscq.so
W1124 05:37:22.092539 139016 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so
W1124 05:37:22.092548 139016 nvc_info.c:397] missing library libvdpau_nvidia.so
W1124 05:37:22.092555 139016 nvc_info.c:397] missing library libnvidia-ifr.so
W1124 05:37:22.092561 139016 nvc_info.c:397] missing library libnvidia-cbl.so
W1124 05:37:22.092564 139016 nvc_info.c:401] missing compat32 library libnvidia-cfg.so
W1124 05:37:22.092569 139016 nvc_info.c:401] missing compat32 library libnvidia-nscq.so
W1124 05:37:22.092574 139016 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so
W1124 05:37:22.092580 139016 nvc_info.c:401] missing compat32 library libnvidia-allocator.so
W1124 05:37:22.092584 139016 nvc_info.c:401] missing compat32 library libnvidia-ngx.so
W1124 05:37:22.092590 139016 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so
W1124 05:37:22.092596 139016 nvc_info.c:401] missing compat32 library libnvidia-ifr.so
W1124 05:37:22.092601 139016 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so
W1124 05:37:22.092605 139016 nvc_info.c:401] missing compat32 library libnvoptix.so
W1124 05:37:22.092610 139016 nvc_info.c:401] missing compat32 library libnvidia-cbl.so
I1124 05:37:22.092792 139016 nvc_info.c:297] selecting /usr/bin/nvidia-smi
I1124 05:37:22.092808 139016 nvc_info.c:297] selecting /usr/bin/nvidia-debugdump
I1124 05:37:22.092823 139016 nvc_info.c:297] selecting /usr/bin/nvidia-persistenced
I1124 05:37:22.092843 139016 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-control
I1124 05:37:22.092858 139016 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-server
W1124 05:37:22.092909 139016 nvc_info.c:423] missing binary nv-fabricmanager
I1124 05:37:22.092931 139016 nvc_info.c:341] listing firmware path /usr/lib/firmware/nvidia/495.44
I1124 05:37:22.092952 139016 nvc_info.c:520] listing device /dev/nvidiactl
I1124 05:37:22.092957 139016 nvc_info.c:520] listing device /dev/nvidia-uvm
I1124 05:37:22.092960 139016 nvc_info.c:520] listing device /dev/nvidia-uvm-tools
I1124 05:37:22.092963 139016 nvc_info.c:520] listing device /dev/nvidia-modeset
I1124 05:37:22.092983 139016 nvc_info.c:341] listing ipc path /run/nvidia-persistenced/socket
W1124 05:37:22.093001 139016 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket
W1124 05:37:22.093014 139016 nvc_info.c:347] missing ipc path /tmp/nvidia-mps
I1124 05:37:22.093020 139016 nvc_info.c:814] requesting device information with ''
I1124 05:37:22.098736 139016 nvc_info.c:705] listing device /dev/nvidia0 (GPU-88c8d8c8-8b2f-1f42-481d-6943f660960a at 00000000:21:00.0)
NVRM version:   495.44
CUDA version:   11.5

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3090
Brand:          GeForce
GPU UUID:       GPU-88c8d8c8-8b2f-1f42-481d-6943f660960a
Bus Location:   00000000:21:00.0
Architecture:   8.6
I1124 05:37:22.098764 139016 nvc.c:423] shutting down library context
I1124 05:37:22.099300 139018 driver.c:163] terminating driver service
I1124 05:37:22.099651 139016 driver.c:203] driver service terminated successfully

GPU的驱动正确安装,继续找

uname -a
Linux TRX40-AORUS-PRO 5.13.0-21-generic #21-Ubuntu SMP Tue Oct 19 08:59:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
1.distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
2.curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK

报错

3.curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Unsupported distribution!
# Check https://nvidia.github.io/nvidia-docker

https://nvidia.github.io/nvidia-docker
手动指定,我这Ubuntu21咋整??

distribution="ubuntu20.04"

在这里插入图片描述

4.curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
5.sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
命中:1 http://download.zerotier.com/debian/bionic bionic InRelease
获取:2 http://security.ubuntu.com/ubuntu impish-security InRelease [110 kB]
命中:3 http://cn.archive.ubuntu.com/ubuntu impish InRelease
命中:4 https://download.docker.com/linux/ubuntu impish InRelease
获取:5 http://cn.archive.ubuntu.com/ubuntu impish-updates InRelease [110 kB]
命中:6 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease
命中:7 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease
命中:8 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease
获取:9 http://security.ubuntu.com/ubuntu impish-security/main amd64 DEP-11 Metadata [6,436 B]
获取:10 http://cn.archive.ubuntu.com/ubuntu impish-backports InRelease [101 kB]
获取:11 http://security.ubuntu.com/ubuntu impish-security/universe amd64 DEP-11 Metadata [2,312 B]
获取:12 http://cn.archive.ubuntu.com/ubuntu impish-updates/main amd64 DEP-11 Metadata [18.9 kB]
获取:13 http://cn.archive.ubuntu.com/ubuntu impish-updates/universe amd64 DEP-11 Metadata [4,196 B]
获取:14 http://cn.archive.ubuntu.com/ubuntu impish-backports/universe amd64 DEP-11 Metadata [9,284 B]
已下载 363 kB,耗时 4秒 (87.4 kB/s)
正在读取软件包列表... 完成
正在读取软件包列表... 完成
正在分析软件包的依赖关系树... 完成
正在读取状态信息... 完成
nvidia-container-toolkit 已经是最新版 (1.6.0-1)。
nvidia-container-toolkit 已设置为手动安装。
下列软件包是自动安装的并且现在不需要了:
  chromium-codecs-ffmpeg-extra gstreamer1.0-vaapi libgstreamer-plugins-bad1.0-0 libva-wayland2
使用'sudo apt autoremove'来卸载它(它们)。
升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 66 个软件包未被升级。

??

6.sudo systemctl restart docker

查看–gpus 参数是否安装成功

docker run --help | grep -i gpus
      --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

用不了GPU
重装20再来更吧

docker run -itd --gpus all --name rv -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all quay.io/azavea/raster-vision:pytorch-latest

在这里插入图片描述
在这里插入图片描述
可以了

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

yddcs

你的鼓励--创作的动力!!!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值