问题发现:发现172.16.30.29 ping 不通172.16.30.55
一、排查问题:
- 在两台设备上查看防火墙状态:
ufw status
Status: inactive
确定均没有开启防火墙
2. 两台机器ping网关:
root@server:~# ping 172.16.0.1
PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data.
64 bytes from 172.16.0.1: icmp_seq=1 ttl=255 time=0.211 ms
64 bytes from 172.16.0.1: icmp_seq=2 ttl=255 time=0.115 ms
64 bytes from 172.16.0.1: icmp_seq=3 ttl=255 time=0.125 ms
root@hit:# ping 172.16.0.1
PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data.
64 bytes from 172.16.0.1: icmp_seq=1 ttl=255 time=0.302 ms
64 bytes from 172.16.0.1: icmp_seq=2 ttl=255 time=0.114 ms
64 bytes from 172.16.0.1: icmp_seq=3 ttl=255 time=0.126 ms
都可以ping通!
3. 使用tcpdump补包测试:
root@hit:~# ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.30.1 netmask 255.255.255.0 broadcast 172.16.30.255
inet6 fe80::42:14ff:fe26:6672 prefixlen 64 scopeid 0x20<link>
ether 02:42:14:26:66:72 txqueuelen 0 (Ethernet)
RX packets 34075 bytes 24690437 (24.6 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 37023 bytes 3542065 (3.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.30.55 netmask 255.255.192.0 broadcast 172.16.63.255
inet6 fe80::20c:29ff:fecb:d143 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:cb:d1:43 txqueuelen 1000 (Ethernet)
RX packets 6486177771 bytes 3933249709629 (3.9 TB)
RX errors 0 dropped 2434 overruns 0 frame 0
TX packets 5891491940 bytes 3476200301974 (3.4 TB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
......
在172.16.30.55上执行tcpdump:
tcpdump -i ens160 icmp
在172.16.30.29上执行ping:
root@bjhit-prod-tg-server:~# ping 172.16.30.55
PING 172.16.30.55 (172.16.30.55) 56(84) bytes of data.
From 172.16.30.29 icmp_seq=1 Destination Host Unreachable
From 172.16.30.29 icmp_seq=2 Destination Host Unreachable
From 172.16.30.29 icmp_seq=3 Destination Host Unreachable
From 172.16.30.29 icmp_seq=4 Destination Host Unreachable
172.16.30.55上也没有捕捉到任何包。
- 经过思考和观察发现docker0的地址是在172.17.30.1/24, 与本地的ip段相互冲突!
由于docker0的掩码为24位,相较于ens160的18位掩码更加精确,所以在ping的时候会优先走docker内部的网络即172.16.30.1/24的网络,而这个网段里根本不存在172.16.30.29!
二、解决问题
解决问题的关键就在于更改docker0的ip段,加上“bip"字段,
root@hit:~# vim /etc/docker/daemon.json
{"insecure-registries": ["172.16.0.120:30002"],
"bip": "172.16.29.1/24"
}
重启docker
root@hit:~# systemctl daemon-reload
root@hit:~# systemctl restart docker
再次查看docker0的ip地址
root@hit:~# ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.29.1 netmask 255.255.255.0 broadcast 172.16.29.255
inet6 fe80::42:14ff:fe26:6672 prefixlen 64 scopeid 0x20<link>
ether 02:42:14:26:66:72 txqueuelen 0 (Ethernet)
RX packets 82615 bytes 31799805 (31.7 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 86117 bytes 20611953 (20.6 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.30.55 netmask 255.255.192.0 broadcast 172.16.63.255
inet6 fe80::20c:29ff:fecb:d143 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:cb:d1:43 txqueuelen 1000 (Ethernet)
RX packets 6486296058 bytes 3933309702820 (3.9 TB)
RX errors 0 dropped 2434 overruns 0 frame 0
TX packets 5891622974 bytes 3476234757845 (3.4 TB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
变成172.16.29.1已经改过来了!
验证连通性
172.16.30.55 ping 172.16.20.29
root@hit:~# ping 172.16.30.29
PING 172.16.30.29 (172.16.30.29) 56(84) bytes of data.
64 bytes from 172.16.30.29: icmp_seq=1 ttl=64 time=0.688 ms
64 bytes from 172.16.30.29: icmp_seq=2 ttl=64 time=0.116 ms
64 bytes from 172.16.30.29: icmp_seq=3 ttl=64 time=0.119 ms
172.16.30.29 ping 172.16.30.55
root@bjhit-prod-tg-server:~# ping 172.16.30.55
PING 172.16.30.55 (172.16.30.55) 56(84) bytes of data.
64 bytes from 172.16.30.55: icmp_seq=1 ttl=64 time=0.145 ms
64 bytes from 172.16.30.55: icmp_seq=2 ttl=64 time=0.102 ms
64 bytes from 172.16.30.55: icmp_seq=3 ttl=64 time=0.142 ms
完美,收工!