记一次ssh突然无法登录的问题,本次出现问题的复现过程如下:
1. 目的是测试kata-containers
2. 安装kata-containers
3. 安装qemu
4. 修改daemon.json文件,配置docker可使用kata-runtime
5. 重启docker
6. 使用docker指定kata-runtime启动容器
7. 此时当前终端没有退出,用另一终端ssh连接提示无法连接Connection closed by xxx port 22
开始解决
查看/var/log/audit/audit.log
发现有type=SECCOMP xxx syscall=115 xxx SYSCALL=unknown-syscall(-1)这样的报错,猜测是seccomp禁用了115号系统调用,查询(linux系统调用表(system call table) - gavanwanggw - 博客园)到115号系统调用为sys_getgroups,但是并没有什么用。。。
type=CRYPTO_KEY_USER msg=audit(1637803219.190:73052): pid=1476535 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:ee:2d:b2:8d:be:03:a8:ac:ba:a5:e6:6e:80:cb:49:66:90:a4:19:cd:ec:4c:0e:d0:47:e2:df:d3:8e:cb:10:3a direction=? spid=1476535 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1637803219.190:73053): pid=1476535 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:20:95:62:93:a2:c1:97:98:92:6b:99:fd:63:9e:00:5b:4f:00:81:2b:04:c2:1a:90:87:f6:c5:c9:81:e5:a3:5b direction=? spid=1476535 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1637803219.190:73054): pid=1476535 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:9f:86:d3:f7:c1:f9:3d:18:75:b9:9d:ad:9f:5d:da:7a:6c:bb:69:ad:40:26:b5:55:04:d2:15:8d:c9:67:de:56 direction=? spid=1476535 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_SESSION msg=audit(1637803219.190:73055): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-server cipher=chacha20-poly1305@openssh.com ksize=512 mac=<implicit> pfs=curve25519-sha256@libssh.org spid=1476535 suid=74 rport=14192 laddr=172.20.192.120 lport=22 exe="/usr/sbin/sshd" hostname=? addr=172.20.16.75 terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_SESSION msg=audit(1637803219.190:73056): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-client cipher=chacha20-poly1305@openssh.com ksize=512 mac=<implicit> pfs=curve25519-sha256@libssh.org spid=1476535 suid=74 rport=14192 laddr=172.20.192.120 lport=22 exe="/usr/sbin/sshd" hostname=? addr=172.20.16.75 terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=SECCOMP msg=audit(1637803219.230:73057): auid=4294967295 uid=74 gid=74 ses=4294967295 pid=1476535 comm="sshd" exe="/usr/sbin/sshd" sig=31 arch=c00000b7 syscall=115 compat=0 ip=0xfffcef881328 code=0x0AUID="unset" UID="sshd" GID="sshd" ARCH=aarch64 SYSCALL=unknown-syscall(-1)
type=ANOM_ABEND msg=audit(1637803219.230:73058): auid=4294967295 uid=74 gid=74 ses=4294967295 pid=1476535 comm="sshd" exe="/usr/sbin/sshd" sig=31 res=1AUID="unset" UID="sshd" GID="sshd"
type=USER_ERR msg=audit(1637803219.230:73059): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:bad_ident grantors=? acct="?" exe="/usr/sbin/sshd" hostname=172.20.16.75 addr=172.20.16.75 terminal=ssh res=failed'UID="root" AUID="unset"
type=CRYPTO_KEY_USER msg=audit(1637803219.230:73060): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:ee:2d:b2:8d:be:03:a8:ac:ba:a5:e6:6e:80:cb:49:66:90:a4:19:cd:ec:4c:0e:d0:47:e2:df:d3:8e:cb:10:3a direction=? spid=1476534 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1637803219.230:73061): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:20:95:62:93:a2:c1:97:98:92:6b:99:fd:63:9e:00:5b:4f:00:81:2b:04:c2:1a:90:87:f6:c5:c9:81:e5:a3:5b direction=? spid=1476534 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1637803219.230:73062): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=SHA256:9f:86:d3:f7:c1:f9:3d:18:75:b9:9d:ad:9f:5d:da:7a:6c:bb:69:ad:40:26:b5:55:04:d2:15:8d:c9:67:de:56 direction=? spid=1476534 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=USER_LOGIN msg=audit(1637803219.230:73063): pid=1476534 uid=0 auid=4294967295 ses=4294967295 msg='op=login acct="root" exe="/usr/sbin/sshd" hostname=? addr=172.20.16.75 terminal=ssh res=failed'UID="root" AUID="unset"
决定从头将前面的流程在另一台机器上走一遍,看具体是哪一步出的错,最后发现竟然是在安装完qemu后才出现的这个问题,于是尝试:
1. 卸载qemu,并且将qemu相关的依赖全部卸载,依然存在这个问题
2. 根据上步说明不是因为安装了qemu依赖的问题,于是查看安装qemu时升级了哪些包, 查看到升级了glibc glibc-common glibc-devel libselinux libsepol nettle这些包,然后通过把这些包降级
yum downgrade glibc glibc-common glibc-devel libselinux libsepol nettle -y
3. 因为解决问题时是在虚拟机上复现了这个问题,在虚拟机上降级时还出现了以下问题
...
错误:%prein(qemu-2:4.0.1-11.ky10.aarch64) 脚本执行失败,捕捉到信号: 11
Error in PREIN scriptlet in rpm package qemu
Downgrading : gstreamer1-plugins-base-1.14.4-3.ky10.aarch64 17/38
错误:qemu-2:4.0.1-11.ky10.aarch64: 安裝 已失败
Downgrading : mesa-libEGL-18.2.2-7.ky10.aarch64 18/38
Downgrading : mesa-libGL-18.2.2-7.ky10.aarch64 19/38
Running scriptlet: glibc-devel-2.28-36.1.ky10.aarch64 20/38
错误:%prein(glibc-devel-2.28-36.1.ky10.aarch64) 脚本执行失败,捕捉到信号: 11
Error in PREIN scriptlet in rpm package glibc-devel
Cleanup : gstreamer1-plugins-base-1.16.2-2.oe1.aarch64 21/38
错误:glibc-devel-2.28-36.1.ky10.aarch64: 安裝 已失败
错误:glibc-devel-2.31-10.oe1.aarch64: 删除 已跳过
Cleanup : gstreamer1-1.16.2-3.oe1.aarch64 22/38
错误:qemu-2:4.1.0-54.oe1.aarch64: 删除 已跳过
Cleanup : mesa-libGL-20.1.4-1.oe1.aarch64 23/38
Cleanup : mesa-libEGL-20.1.4-1.oe1.aarch64 24/38
Cleanup : alsa-lib-1.2.4-1.oe1.aarch64 25/38
Cleanup : virglrenderer-0.8.2-1.oe1.aarch64 26/38
Running scriptlet: virglrenderer-0.8.2-1.oe1.aarch64 26/38
Cleanup : mesa-libgbm-20.1.4-1.oe1.aarch64 27/38
Cleanup : libvisual-1:0.4.0-27.oe1.aarch64 28/38
Cleanup : mesa-libglapi-20.1.4-1.oe1.aarch64 29/38
Cleanup : opus-1.3.1-1.oe1.aarch64 30/38
Cleanup : libvorbis-1:1.3.7-1.oe1.aarch64 31/38
Cleanup : nettle-3.6-5.oe1.aarch64 32/38
Cleanup : glibc-common-2.31-10.oe1.aarch64 33/38
Cleanup : libselinux-3.1-2.oe1.aarch64 34/38
Cleanup : glibc-2.31-10.oe1.aarch64 35/38
错误:libsepol-3.1-3.oe1.aarch64: 删除 已跳过
Running scriptlet: glibc-common-2.28-36.1.ky10.aarch64 35/38
/usr/sbin/build-locale-archive: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
Running scriptlet: glibc-2.31-10.oe1.aarch64 35/38
/bin/sh: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
警告:%triggerpostun(glibc-common-2.28-36.1.ky10.aarch64) 脚本执行失败,退出状态码为 127
Error in <unknown> scriptlet in rpm package glibc
/bin/sh: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
警告:%triggerin(glibc-common-2.28-36.1.ky10.aarch64) 脚本执行失败,退出状态码为 127
...
...
Installed:
libiscsi-1.18.0-6.ky10.aarch64
Failed:
glibc-devel-2.28-36.1.ky10.aarch64 glibc-devel-2.31-10.oe1.aarch64 libsepol-2.9-1.ky10.aarch64
libsepol-3.1-3.oe1.aarch64 qemu-2:4.0.1-11.ky10.aarch64 qemu-2:4.1.0-54.oe1.aarch64
CUnit-2.1.3-22.oe1.aarch64
Error: Transaction failed
4. 此时再执行一次降级命令,出现以下错误
/usr/bin/python3: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory
5. 执行find命令查找libpthread.so.0,也同样出现错误
find: error while loading shared libraries: libm.so.6: cannot open shared object file: No such file or directory
6. 动态链接库问题,执行ldconfig解决
ldconfig
7. 最开始报错的节点是物理机,降级之后没有出现上述错误,问题直接解决,最终的结论就是安装qemu时升级了一些包导致了这个问题