自建多GPU服务器可以参考 https://blog.csdn.net/landian0531/article/details/120242839
报错原因
意外停电导致Ubuntu服务器重启,docker里面的容器无法通过docker ps -aq | xargs -I {} docker start {}
命令启动
报错如下:
gpu@gpu-workstation:~$ docker ps -aq | xargs -I {} docker start {}
Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown
Error: failed to start containers: 485f0e25b37c
报错解决方法: 删除新版内核
查看系统现有内核dpkg --get-selections | grep linux
gpu@gpu-workstation:~$ dpkg --get-selections | grep linux
binutils-x86-64-linux-gnu install
console-setup-linux install
libnvpair1linux install
libselinux1:amd64 install
libuutil1linux install
libzfs2linux install
libzpool2linux install
linux-base install
linux-firmware install
linux-generic install
linux-headers-5.4.0-88 install
linux-headers-5.4.0-88-generic hold
linux-headers-5.4.0-89 install
linux-headers-5.4.0-89-generic install
linux-headers-generic install
linux-image-5.4.0-88-generic hold
linux-image-5.4.0-89-generic install
linux-image-generic install
linux-libc-dev:amd64 install
linux-modules-5.4.0-88-generic hold
linux-modules-5.4.0-89-generic install
linux-modules-extra-5.4.0-88-generic hold
linux-modules-extra-5.4.0-89-generic install
util-linux install
zfsutils-linux install
发现系统自动安装了5.4.0-89,通过sudo apt-get purge linux-image-5.4.0-89-generic
命令删除内核
中间有个提示,选择Cancel (注意:删除内核有风险,需要自己斟酌。)
删除后重启服务器即可
gpu@gpu-workstation:~$ sudo apt-get purge linux-image-5.4.0-89-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
amd64-microcode intel-microcode iucode-tool libdbus-glib-1-2 libevdev2 libimobiledevice6 libplist3 libupower-glib3 libusbmuxd6 linux-headers-generic thermald upower usbmuxd
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
linux-image-unsigned-5.4.0-89-generic
Suggested packages:
fdutils linux-doc | linux-source-5.4.0 linux-tools
The following packages will be REMOVED:
linux-generic* linux-image-5.4.0-89-generic* linux-image-generic* linux-modules-extra-5.4.0-89-generic*
The following NEW packages will be installed:
linux-image-unsigned-5.4.0-89-generic
0 upgraded, 1 newly installed, 4 to remove and 39 not upgraded.
Need to get 9,011 kB of archives.
After this operation, 202 MB disk space will be freed.
Do you want to continue? [Y/n] y
Get:1 http://ca.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-image-unsigned-5.4.0-89-generic amd64 5.4.0-89.100 [9,011 kB]
Fetched 9,011 kB in 4s (2,522 kB/s)
(Reading database ... 113040 files and directories currently installed.)
Removing linux-generic (5.4.0.89.93) ...
Removing linux-image-generic (5.4.0.89.93) ...
Removing linux-modules-extra-5.4.0-89-generic (5.4.0-89.100) ...
Removing linux-image-5.4.0-89-generic (5.4.0-89.100) ...
W: Removing the running kernel
I: /boot/vmlinuz is now a symlink to vmlinuz-5.4.0-88-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.4.0-88-generic
/etc/kernel/postrm.d/initramfs-tools:
update-initramfs: Deleting /boot/initrd.img-5.4.0-89-generic
/etc/kernel/postrm.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.0-88-generic
Found initrd image: /boot/initrd.img-5.4.0-88-generic
Adding boot menu entry for UEFI Firmware Settings
done
Selecting previously unselected package linux-image-unsigned-5.4.0-89-generic.
(Reading database ... 107660 files and directories currently installed.)
Preparing to unpack .../linux-image-unsigned-5.4.0-89-generic_5.4.0-89.100_amd64.deb ...
Unpacking linux-image-unsigned-5.4.0-89-generic (5.4.0-89.100) ...
Setting up linux-image-unsigned-5.4.0-89-generic (5.4.0-89.100) ...
I: /boot/vmlinuz is now a symlink to vmlinuz-5.4.0-89-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.4.0-89-generic
(Reading database ... 107663 files and directories currently installed.)
Purging configuration files for linux-modules-extra-5.4.0-89-generic (5.4.0-89.100) ...
Purging configuration files for linux-image-5.4.0-89-generic (5.4.0-89.100) ...
I: /boot/vmlinuz is now a symlink to vmlinuz-5.4.0-88-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.4.0-88-generic
/var/lib/dpkg/info/linux-image-5.4.0-89-generic.postrm ... removing pending trigger
rmdir: failed to remove '/lib/modules/5.4.0-89-generic': Directory not empty
Processing triggers for linux-image-unsigned-5.4.0-89-generic (5.4.0-89.100) ...
gpu@gpu-workstation:~$