k8s集群flannel问题之telnet node节点开放端口Connect timeout情况

前段时间在腾讯云clb上面的端口健康检测突然出现一堆异常,去手动检测时现实一切正常。去咨询了腾讯云工程师,他们对于端口检测处理方式是设置sysctl.conf中的几个参数,参数如下

net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_timestamps = 0

在修改完参数之后,发现问题并没有解决。
确认针对端口健康检测的一些设置没有错误的话,那么问题肯定就是出自我这边的服务器上。

排查思路:

  1. 首先检查集群node、pod的状态,发现节点和pod的状态完全正常,服务也是可以正常访问的(ps:一个大意导致了伏笔);
  2. 去查集群中所有组件的日志未发现什么有用的信息;
  3. 检查了集群中的几个节点的内部通讯也是没有问题的;

那既然是端口检测出了问题,我就去测试下端口是不是能够正常访问的。这里选择了比较方便快捷的telnet。
果然,使用telnet的登陆端口的时候出现了Connect timeout(ps:由于当时没有做好记录,这里只能模拟还原下)

[root@localhost ~]# telnet 192.168.159.129 5601
Trying 192.168.159.129...
telnet: connect to address 192.168.159.129: Connection timed out

偶尔还会出现时通时不通的问题。关于这个问题我又查了好久。
后来去咨询了下,发现有一个node的所有服务都是无法访问的。于是去查了下集群之间的网络发现其他node节点没有通往出现问题node的路由。

[root@k8s-w2 ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
172.18.39.0     172.18.39.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.42.0     172.18.42.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.56.0     172.18.56.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.83.0     0.0.0.0         255.255.255.0   U     0      0        0 docker0
192.168.159.0   0.0.0.0         255.255.255.0   U     100    0        0 eth0
[root@k8s-w2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:a9:29:ac brd ff:ff:ff:ff:ff:ff
    inet 192.168.159.133/24 brd 192.168.159.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8f09:17c2:30ca:6e5f/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::166d:3ad1:c8fa:16ef/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::76f:987b:2d68:f60c/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 92:48:b3:78:71:de brd ff:ff:ff:ff:ff:ff
    inet 172.18.83.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::9048:b3ff:fe78:71de/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:58:35:85:7d brd ff:ff:ff:ff:ff:ff
    inet 172.18.83.1/24 brd 172.18.56.255 scope global docker0
       valid_lft forever preferred_lft forever

其他机器以一台为例

[root@k8s-m1 ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
172.18.39.0     172.18.39.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.42.0     172.18.42.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.56.0     0.0.0.0         255.255.255.0   U     0      0        0 docker0
192.168.159.0   0.0.0.0         255.255.255.0   U     100    0        0 eth0

由此可以看到w2机器上有通往集群中其他机器的路由,但是其他几台机器并没有通向w2机器上的路由,而master节点上没有通向w2上的路由,网络上是连接不到那台机器的。
至于服务没有影响是因为yml文件里面,定义的副本数是2,所以只要两个pod有一个是能够提供服务的,那么服务还是能访问到,只不过有时候需要手动刷新,不然是加载不出来的。
发现问题后,我首先重启了所有node的flannel,重启后发现出现问题的node节点flannel和docker的网络不在同一网段

[root@k8s-m1 ~]# systemctl  restart flannel
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:a9:29:ac brd ff:ff:ff:ff:ff:ff
    inet 192.168.159.133/24 brd 192.168.159.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8f09:17c2:30ca:6e5f/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::166d:3ad1:c8fa:16ef/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::76f:987b:2d68:f60c/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 92:48:b3:78:71:de brd ff:ff:ff:ff:ff:ff
    inet 172.18.90.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::9048:b3ff:fe78:71de/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:58:35:85:7d brd ff:ff:ff:ff:ff:ff
    inet 172.18.56.1/24 brd 172.18.56.255 scope global docker0
       valid_lft forever preferred_lft forever

于是重启flannel和docker

[root@k8s-w2 ~]#systemctl restart flannel docker

这时再来查看网段和route发现都有了,端口检测过会也全部显示正常了。

[root@k8s-m1 ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
172.18.39.0     172.18.39.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.42.0     172.18.42.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.90.0     172.18.56.0     255.255.255.0   UG    0      0        0 flannel.1
172.18.83.0     0.0.0.0         255.255.255.0   U     0      0        0 docker0
192.168.159.0   0.0.0.0         255.255.255.0   U     100    0        0 eth0
[root@k8s-w2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:a9:29:ac brd ff:ff:ff:ff:ff:ff
    inet 192.168.159.133/24 brd 192.168.159.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8f09:17c2:30ca:6e5f/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::166d:3ad1:c8fa:16ef/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
    inet6 fe80::76f:987b:2d68:f60c/64 scope link tentative noprefixroute dadfailed
       valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 92:48:b3:78:71:de brd ff:ff:ff:ff:ff:ff
    inet 172.18.90.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::9048:b3ff:fe78:71de/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:58:35:85:7d brd ff:ff:ff:ff:ff:ff
    inet 172.18.90.1/24 brd 172.18.56.255 scope global docker0
       valid_lft forever preferred_lft forever

这就是本地解决flannel网络问题的基本思路及过程,希望对大家有所帮助。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值