这里记录一下简单的方法,使lxc能够使用物理硬件。
0.驱动安装
要让lxc调用硬件,首先pve得能调用。lxc的本质实际上是共享硬件。
INTEL
$ apt install intel-media-va-driver-non-free
验证方法:
$ apt install intel-gpu-tools
$ intel_gpu_top
AMD
验证方法:
$ apt install radeontop
$ radeontop
NVIDIA
$ apt install dkms proxmox-headers-6.5.13-1-pve #这里换成你的内核版本
$ apt install nvidia-driver
验证方法:
$ nvidia-smi
1.INTEL/AMD
在pve root中输入以下指令。
$ ls -la /dev/dri/
total 0
drwxr-xr-x 3 root root 140 Feb 24 21:36 .
drwxr-xr-x 19 root root 4380 Feb 24 21:52 ..
drwxr-xr-x 2 root root 120 Feb 24 21:36 by-path
crw-rw---- 1 root video 226, 0 Feb 24 21:36 card0
crw-rw---- 1 root video 226, 1 Feb 24 21:36 card1
crw-rw---- 1 root render 226, 128 Feb 24 21:36 renderD128
crw-rw---- 1 root render 226, 129 Feb 24 21:36 renderD129
输出结果中,我的card0和renderD128是intel核显,另外两个是独显。验证方法:
$ ls -la /dev/dri/by-path/
total 0
drwxr-xr-x 2 root root 120 Feb 24 21:36 .
drwxr-xr-x 3 root root 140 Feb 24 21:36 ..
lrwxrwxrwx 1 root root 8 Feb 24 21:36 pci-0000:00:02.0-card -> ../card0
lrwxrwxrwx 1 root root 13 Feb 24 21:36 pci-0000:00:02.0-render -> ../renderD128
lrwxrwxrwx 1 root root 8 Feb 24 21:36 pci-0000:01:00.0-card -> ../card1
lrwxrwxrwx 1 root root 13 Feb 24 21:36 pci-0000:01:00.0-render -> ../renderD129
接下来修改lxc的配置文件:
$ nano /etc/pve/lxc/xxx.conf
如果有unprivileged: 1,删掉,添加以下内容。
lxc.cgroup2.devices.allow: c 226:0 rwm #注意这里的数字要一一对应
lxc.cgroup2.devices.allow: c 226:128 rwm #注意这里的数字要一一对应
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
完毕,启动lxc即可。在lxc内安装驱动并按第零部分验证。
2.NVIDIA
唯一不同之处在于,需要额外检查以下内容
$ ls -la /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Feb 25 13:12 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 25 13:12 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Feb 25 13:12 /dev/nvidia-modeset
crw-rw-rw- 1 root root 510, 0 Feb 25 13:12 /dev/nvidia-uvm
crw-rw-rw- 1 root root 510, 1 Feb 25 13:12 /dev/nvidia-uvm-tools
/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root 80 Feb 25 13:12 .
drwxr-xr-x 19 root root 4400 Feb 25 13:12 ..
cr-------- 1 root root 235, 1 Feb 25 13:12 nvidia-cap1
cr--r--r-- 1 root root 235, 2 Feb 25 13:12 nvidia-cap2
然后添加到lxc配置文件,同样需要一一对应
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.mount.entry: /dev/dri/card1 dev/dri/card1 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD129 dev/dri/renderD129 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
在lxc内安装驱动并验证即可。
关于lxc内驱动安装:下载NVIDIA_xxx.run后执行 xxx.run --no-kernel-module
关于lxc内cuda:下载.run文件后执行cuda_xxx.run --override