1、环境介绍
故障环境是OpenStack集群,版本为Train。
OpenStack集群中部署了4个虚拟机节点用于部署K8S集群,三个master,一个node。
$ nova list
| ccec3aee-15e5-4710-8e4c-5a4db6826022 | k8s-m1 | ACTIVE | - | Running | 5gc_mgt=180.0.0.3; clu_net2=172.0.2.139, 192.168.32.198 |
| 9eacafc1-8ce7-4073-a8c2-73270cffae31 | k8s-m2 | ACTIVE | - | Running | 5gc_mgt=180.0.0.206; clu_net2=172.0.2.146, 192.168.32.186 |
| 8128acd7-fd05-4367-985d-a36023f87de8 | k8s-m3 | ACTIVE | - | Running | 5gc_mgt=180.0.0.8; clu_net2=172.0.2.234, 192.168.32.159 |
| c8713bd5-5d53-4def-ad35-f9222b4f0f6f | k8s-n1 | ACTIVE | - | Running | 5gc_mgt=180.0.0.71; clu_net2=172.0.2.69, 192.168.32.200 |
2、ssh登录虚拟机报kex_exchange_identification: read: Connection reset by peer
在OpenStack控制节点上登录虚拟机报错
$ ssh root@k8s-m1
kex_exchange_identification: read: Connection reset by peer
3、问题分析
3.1、异常操作
在出ssh登录失败问题前,OpenStack控制节点异常重启过
$ uptime
14:21:10 up 3 min, 4 users, load average: 3.33, 2.19, 0.90
运行时间才3分钟
3.2、环境补充说明
本环境中OpenStack虚拟机都是基于cinder共享盘创建
$ openstack volume list|grep 0b585a44-eafa-405f-b7d7-92b613e4436f
| 0b585a44-eafa-405f-b7d7-92b613e4436f | | in-use | 51 | Attached to k8s-m1 on /dev/vda |
而cinder存储池是位于控制节点的一个vg中
$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
vg_cinder_volume 1 34 0 wz--n- 799.99g 39.81g
$ sudo grep -r vg_cinder_volume /etc/cinder
/etc/cinder/cinder.conf:volume_group=vg_cinder_volume
3.3、推测
1、控制节点重启,导致共享存储断链、虚拟机内核出异常
2、ssh登录出错信息kex_exchange_identification应该表明内核拒绝了ssh链接
4、解决问题
4.1、通过vnc登录虚拟机
4.1.1、获取vnc登录链接
$ nova get-vnc-console ccec3aee-15e5-4710-8e4c-5a4db6826022 novnc
+-------+-----------------------------------------------------------------------------------------------+
| Type | Url |
+-------+-----------------------------------------------------------------------------------------------+
| novnc | https://172.255.0.113:6080/vnc_auto.html?path=%3Ftoken%3Db72fee1a-4d70-4523-ae38-e207be9a9e65 |
+-------+-----------------------------------------------------------------------------------------------+
4.1.2、复制链接通过浏览器登录
从vnc的信息可以看出,虚拟机内核异常了,vnc也无法登录
4.2、重启虚拟机
$ nova reboot --hard ccec3aee-15e5-4710-8e4c-5a4db6826022 9eacafc1-8ce7-4073-a8c2-73270cffae31 8128acd7-fd05-4367-985d-a36023f87de8 c8713bd5-5d53-4def-ad35-f9222b4f0f6f
Request to reboot server k8s-m1 (ccec3aee-15e5-4710-8e4c-5a4db6826022) has been accepted.
Request to reboot server k8s-m2 (9eacafc1-8ce7-4073-a8c2-73270cffae31) has been accepted.
Request to reboot server k8s-m3 (8128acd7-fd05-4367-985d-a36023f87de8) has been accepted.
Request to reboot server k8s-n1 (c8713bd5-5d53-4def-ad35-f9222b4f0f6f) has been accepted.
重启虚拟机后,ssh可以正常登录