SSH8.4普通账号无法连接问题定位

问题:
飞腾2000平台将SSH7.3升级到SSH8.4后,只有root账号可以连接,其他普通账号无法连接,SecuretCRT输入用户名后连接即断开,无法弹窗输入密码提示,而其他架构平台使用升级的SSH8.4后所有账户SSH连接都正常,飞腾2000使用之前的SSH7.3所有账户SSH连接也全都正常。

调试一:-ddd三级打印调试模式启动sshd

[root]# /usr/local/sbin/sshd -ddd #-ddd三级打印调试模式启动
debug1: KEX done [preauth]
debug3: receive packet: type 5 [preauth]
debug3: send packet: type 6 [peauth]
debug3: receive packet: type 50 [preauth]
debug1: userauth-request for user config service ssh-connectin method none [preauth]
debug1: attempt 0 failures 0 [preauth]
debug1: monitor_read_log: child log fd closed
debug3: mm_request_receive entering
debug1: do_cleanup
debug1:Killing privsep child 9632
[Inferior 1 (process 9499) exited with code 0377]

对比root账户连接的打印信息发现,只有“debug3: receive packet: type 50 [preauth]”,在断开连接前sshd服务器端没有“debug3: send packet: type 51 [preauth]”的打印,说明没有走到这个发送函数;

调试二:gdb调试sshd确认退出栈信息

[root]#gdb /uer/local/sbin/sshd
(gdb) b do_cleanup #断点退出函数,bt确认退出位置
(gdb) r -ddd #-ddd三级打印调试模式启动

Breakpoint 1, do_cleanup (ssh=0x64ed70, authctxt=0x628930) at
session.c:2662 266 session.c: No such file or directory.
(gdb) bt
#0 do_cleanup (ssh=0x64ed70, authctxt=0x628930) at session.c:2662
#1 0x000000000040a8a0 in cleanup_exit (i=i@ntry=255) at sshd.c:2563
#2 0x0000000000424308 in mm_request_receive (sock=6, m=m@entry=0x65b0b0) at monito_wrap.c:150
#3 0x00000000004231d0 in monitor_read (ssh=ssh@entry=0x64ed70,
pmonitor=pmonitor@entry=0x65adc, ent=0x604a30 <mon_dispatch_proto20>, pent=0x7ffffff420, pent@entry=0x7ffffff480) at monitor.c:506
#4 0x000000000423c0c in monitor_child_preauth (ssh=ssh@entry=0x64ed70, pmonitor=0x65adc0) at monitor.c:304
#5 0x000000000409344 in privsep_preauth (ssh=0x64ed70) at sshd.c:517
#6 main (ac=, av=) t sshd.c:2357
(gdb) p errno $1 = 32 #异常信号SIGPIPE

确认退出原因是"[Errno 32]管道损坏"错误.根据一些谷歌搜索,这是在关闭连接时发生的,关键是找到它的关闭位置;也有说是服务器进程已收到SIGPIPE对套接字的写入,当写入另一端(客户端)完全关闭的套接字时,通常会发生这种情况。当客户端程序不等到接收到来自服务器的所有数据而只是关闭套接字(使用close函数)时,可能会发生这种情况。
在这里插入图片描述
跟踪atomicio6()函数里面就是读管道句柄,但是read出来的长度为0,看不出问题原因。

调试三:gdb启动shellinabox调试ssh

#killall -9 shellinaboxd
#gdb /usr/local/bin/shellinaboxd
(gdb) set follow-fork-mode child #跟踪子进程
(gdb) b read_string Breakpoint 1 at 0x4098b0: file shellinabox/launcher.c, line 265.
(gdb) r -b -t -s /:SSH:127.0.0.1 -p 4201
Breakpoint 1, read_string (echo=1, prompt=0x473870 "127 login: ",
retstr=0x7ffffff158) at shellinabox/launcher.c:265 265
shellinabox/launcher.c: No such file or directory.
(gdb) b main
Breakpoint 2 at 0x4080dc: file shellinabox/shellinaboxd.c, line 1226.
(gdb) c
Continuing.

使用shellinabox连接ssh发现现象和SecuretCRT连接ssh现象一样,则从客户端方向调试连接断开问题;gdb启动shellinaboxd,设置跟踪子进程,设置获取账号输入函数read_string(),触发此断点后设置main函数,利用断点的传递性断住shellinaboxd启动的/usr/local/bin/ssh进程的入口main函数,设置ssh_packet_send2_wrapped()发送函数断点,打印报文对比如下:

process 2070 is executing new program: /usr/local/bin/ssh
Breakpoint 2, main (ac=24, av=0x7ffffffb38) at ssh.c:515
515 ssh.c: No such file or directory.
(gdb) b ssh_packet_send2_wrapped
Breakpoint 1 at 0x43e6dc: file packet.c, line 1075.
(gdb) c
Breakpoint 1, ssh_packet_send2_wrapped (ssh=ssh@entry=0x609ad0) at packet.c:1075 packet.c: No such file or directory.
(gdb) p sshbuf_dump(state->outgoing_packet,stderr) #打印报文

shellinabox页面打印报文如下:

127 login: config
buffer 0x60aa00 len = 1502

buffer 0x60aa00 len = 42
0000: 00 00 00 00 00 1e 00 00 00 20 48 30 fe 28 a5 59 … H0.(.Y
0016: 9a d0 78 e8 7e f1 73 4b c2 22 7b 39 c8 2c 88 cf …x.~.sK."{9.,…
0032: e4 25 0d 19 a1 d6 0b 82 7c 34 .%…|4
buffer 0x60aa00 len = 6
0000: 00 00 00 00 00 15 …
buffer 0x60aa00 len = 22
0000: 00 00 00 00 00 05 00 00 00 0c 73 73 68 2d 75 73 …ssh-us
0016: 65 72 61 75 74 68 erauth
buffer 0x60aa00 len = 42
0000: 00 00 00 00 00 32 00 00 00 06 63 6f 6e 66 69 67 …2…config
0016: 00 00 00 0e 73 73 68 2d 63 6f 6e 6e 65 63 74 69 …ssh-connecti
0032: 6f 6e 00 00 00 04 6e 6f 6e 65 on…none #userauth_none
Session closed.

根据报文对比及打印,发现SSH客户端在调用完userauth_none()发送完报文后即刻断开了连接,无法同时看到客户端及服务端的交互报文;

调试四:ssh -vvv调试模式连接sshd

#ssh -vvv config@192.169.8.88 #-vvv调试模式启动
debug2: pubkey_prepare: done
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50 #只有发送报文,没有接收
Connection closed by 192.169.8.88 port 22

SSH客户端连接SSHD服务器端,客户端打印“debug3: send packet: type 50”后断开连接,没有“debug3: receive packet: type 51”接收报文打印;这与前面-d启动sshd的打印信息只有“debug3: receive packet: type 50 [preauth]”收包是对应上的,也就是说客户端ssh发送了 type 50的报文后没有收到type 51的报文,然后关闭了连接,而服务器端sshd收到了type 50的报文,没有发送type 51的报文,也退出了服务器子进程;

调试五:gdb启动调试ssh设置断点

#gdb /usr/local/bin/ssh
(gdb) b ssh_packet_send2_wrapped
Breakpoint 1 at 0x43e6dc: file packet.c, line 1075.
(gdb) r -vvv audit@192.169.8.88
Starting program: /usr/local/bin/ssh -vvv audit@192.169.8.88
debug3: received packet: type 6
read/plain[6]:
buffer 0x60bac0 len = 16
0000: 00 00 00 0c 73 73 68 2d 75 73 65 72 61 75 74 68 …ssh-userauth
debug1: received packet type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: packet_start[50]
Breakpoint 1, ssh_packet_send2_wrapped (ssh=ssh@entry=0x609b10) at packet.c:1075
1075 in packet.c
(gdb) c
Continuing.
debug3: send packet: type 50 #只有发送报文,没有接收
plain: buffer 0x60b960 len = 41
0000: 00 00 00 00 00 32 00 00 00 05 61 75 64 69 74 00 …2…audit.
0016: 00 00 0e 73 73 68 2d 63 6f 6e 6e 65 63 74 69 6f …ssh-connectio
0032: 6e 00 00 00 04 6e 6f 6e 65 n…none
debug1: send: len 52 (includes padlen 11, aadlen 4)
debug1: packet_read()
Connection closed by 192.169.8.88 port 22
[Inferior 1 (process 4159) exited with code 0377]

根据报文发送打印,发现SSH客户端在调用完userauth_none()发送完报文后即刻断开了连接,而ssh_dispatch_run()中type 50的回调函数是input_userauth_service_accept(),调用栈信息如下:
在这里插入图片描述
客户端ssh发送了 type 50的报文后服务器端sshd也收到了type 50的报文,但是在ssh_packet_read_seqnr()中没有接收到type 51的报文,因为sshd服务器子进程没发送type 51的报文就退出了,ssh客户端select解除阻塞后read读出了报文长度为0,返回值赋值-52,走到最后关闭了连接,如此才导致sshd主进程read关闭的连接返回[Errno 32],错误返回函数如下:
在这里插入图片描述
在函数ssh_packet_read_seqnr()中没有接收到type 51的报文,返回值赋值-52,走入ssh_dispatch_run_fatal()后释放退出ssh进程,退出调用栈如下:
在这里插入图片描述
调试六:gdb attach 调试sshd子进程

2014 root 0:00 sshd: /usr/local/sbin/sshd [listener] 1 of 10-100 startups #sshd主进程
2021 root 0:01 sshd: root@pts/0 #sshd子进程
2023 root 0:00 -sh
2027 root 0:00 sshd: root@pts/1 #sshd子进程
2029 root 0:00 -sh
2053 root 0:00 [kworker/0:1]
2090 root 0:00 [kworker/0:0]
2091 root 0:00 gdb /usr/local/bin/ssh
2093 root 0:00 /usr/local/bin/ssh -vvv audit@192.169.8.88
2096 root 0:00 sshd: [accepted] #ssh触发启动的sshd子进程
2097 sshd 0:00 sshd: [net] #ssh触发启动的sshd子进程
2098 root 0:00 [kworker/0:2]
2099 root 0:00 ps
#gdb att 2097 #跟踪ssh客户端触发启动的服务端sshd子进程
(gdb) b ssh_packet_read_poll2
Breakpoint 1 at 0x44d3f0: file packet.c, line 1492.
(gdb) c
Continuing.
ssh_dispatch_run (ssh=ssh@entry=0x2766c1c0, mode=mode@entry=0, done=done@entry=0x2768cee0) at dispatch.c:97 dispatch.c: No such file or directory.
(gdb)
106 in dispatch.c
(gdb)
107 in dispatch.c
(gdb)
106 in dispatch.c
(gdb)
108 in dispatch.c
(gdb)
113 in dispatch.c #走完回调函数后触发SIGSYS进程退出
(gdb)
Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.

需要对应代码行号确认type及回调函数:
在这里插入图片描述
#define SIGSYS 12 /* non-existent system call invoked */
SIGSYS信号会在进程执行一个不存在的系统调用时被交付。操作系统会交付该信号,并且进程会被终止。缺省行为是终止进程,并且创建一个核心转储。

(gdb) p type #打印type
$4 = 50 ‘2’
(gdb) p ssh->dispatch[50] #查看回调函数
$5 = (dispatch_fn *) 0x416954 <input_userauth_request>
(gdb) s
input_userauth_request (type=50, seq=4, ssh=0xf9b1250) at auth2.c:270
270 auth2.c: No such file or directory.
(gdb) n
271 in auth2.c #确认挂死位置
(gdb)
Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.

挂死位置是数据库get操作,设置挂死函数:

(gdb) b sqlite3_get_table
Breakpoint 1 at 0x7fb4303158
(gdb) c
Continuing.
Breakpoint 1, 0x0000007fb4303158 in sqlite3_get_table () from /lib/libsqlite3.so.0
(gdb) n
Single stepping until exit from function sqlite3_get_table,
which has no line number information.
Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.

之前SSH8.4出现过sqlite3_open()开启、关闭数据库,然后再次sqlite3_open()开启会失败的情况,而sqlite3_get_table()重复执行会失败问题为飞腾2000独有,触发SIGSYS信号原因不明;

调试七:gdb attach 调试sshd主进程

gdb atta 2049
(gdb) b fork
Breakpoint 1 at 0x7fa4433a30
(gdb) c
Continuing.
Breakpoint 1, 0x0000007fa4433a30 in fork () from /lib/libc.so.6
(gdb) set follow-fork-mode child
(gdb) b sqlite3_get_table
Breakpoint 3 at 0x7f8a374158
(gdb) c
Continuing.
Breakpoint 3, 0x0000007f8a374158 in sqlite3_get_table () from /lib/libsqlite3.so.0
(gdb) c
[New process 2895]
Breakpoint 3, 0x0000007f8a374158 in sqlite3_get_table () from /lib/libsqlite3.so.0
(gdb) s
Single stepping until exit from function sqlite3_get_table,
which has no line number information.
Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.

可以看到,sqlite3_get_table()第一次执行正常,再次重复执行会失败,此问题为飞腾2000独有,其他架构平台正常,触发SIGSYS信号原因不明,没有再深究,暂时遗留,由于数据库操作是我们新增加入到SSH8.4的代码,找不到触发触发SIGSYS信号原因,只能将数据库操作删除,删除后连接功能正常;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值