测试环境
环境IP信息
master:192.168.66.40,启动etcd服务
node1: 192.168.66.41,启动flannel服务
node2: 192.168.66.42,启动flannel服务
版本信息
flanneld -version
v0.13.0
etcd --version
etcd Version: 3.4.15
Git SHA: aa7126864
Go Version: go1.12.17
Go OS/Arch: linux/amd64
安装etcd
etcd版本:
ETCD_VER=v3.4.15
下载地址:
https://github.com/etcd-io/etcd/releases/download/v3.4.15/etcd-v3.4.15-linux-amd64.tar
解压到tmp文件下(也可以其他任意目录):
tar xvf etcd-v3.4.15-linux-amd64.tar -C /tmp/etcd-download-test --strip-components=1
把可执行文件复制到/usr/local/bin/目录下
cp /tmp/etcd-download-test/etcd* /usr/local/bin/
查看etcd版本是否正确:
etcd --version
etcdctl version
后台启动etcd服务
启动etcd服务并监听url
etcd --enable-v2 -listen-client-urls http://192.168.66.40:2379 -advertise-client-urls http://192.168.66.40:2379
测试etcd是否可用
etcdctl --endpoints=192.168.66.40:2379 put foo bar
V2版本格式:etcdctl --endpoints=http://192.168.66.40:2379 set foo bar
etcdctl --endpoints=192.168.66.40:2379 get foo
V2版本格式:etcdctl --endpoints=http://192.168.66.40:2379 get foo
Build flannel
1.下载flannel
wget https://github.com/coreos/flannel/releases/download/v0.13.0/flanneld-amd64 && chmod +x flanneld-amd64
2.添加到bin目录,并重命名
cp flanneld-amd64 /usr/local/bin/flannel
VxLAN模式
将flannel网络的配置信息保存到etcd
先将配置信息写到文件flannel-config.json中,内容为:
[root@docker-manager flannel]# cat flannel-config.json
{
"Network": "192.168.88.0/16",
"SubnetLen":24,
"Backend": {
"Type": "vxlan"
}
}
- Network定义该网络的IP池为10.2.0.0/16
- SubnetLen指定每个主机分配到的subnet大小为24位
- Backend为vxlan
将配置保存到etcd
ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 set /docker-test/network/config < flannel-config.json
[root@docker-manager flannel]#ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 set /docker-test/network/config < flannel-config.json
OK
其中/docker-test/network/config是此etcd数据项的key,其value为flannel-config.json的内容。key可以任意指定,这个key后面会作为flanneld的一个启动参数。
确保执行成功
etcdctl --endpoints=http://192.168.66.40:2379 get /docker-test/network/config
[root@docker-manager flannel]# etcdctl --endpoints=192.168.66.40:2379 get /docker-test/network/config
/docker-test/network/config
{
"Network": "192.168.88.0/16",
"SubnetLen":24,
"Backend": {
"Type": "vxlan"
}
}
启动flannel
node1启动flannel
flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
[root@docker1 home]# flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
I0302 17:58:23.985522 17230 main.go:531] Using interface with name ens38 and address 192.168.66.41
I0302 17:58:23.985600 17230 main.go:548] Defaulting external address to interface address (192.168.66.41)
I0302 17:58:23.985692 17230 main.go:246] Created subnet manager: Etcd Local Manager with Previous Subnet: None
I0302 17:58:23.985699 17230 main.go:249] Installing signal handlers
I0302 17:58:23.992234 17230 main.go:390] Found network config - Backend type: vxlan
I0302 17:58:23.992454 17230 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I0302 17:58:24.027640 17230 local_manager.go:234] Picking subnet in range 192.168.1.0 ... 192.168.255.0
I0302 17:58:24.029605 17230 local_manager.go:220] Allocated lease (192.168.88.0/24) to current node (192.168.66.41)
I0302 17:58:24.034292 17230 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0302 17:58:24.034555 17230 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0302 17:58:24.034578 17230 main.go:325] Running backend.
I0302 17:58:24.034877 17230 vxlan_network.go:60] watching for new subnet leases
I0302 17:58:24.037219 17230 main.go:433] Waiting for 23h1m27.109360778s to renew lease
I0302 17:58:24.041997 17230 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0302 17:58:24.042031 17230 iptables.go:167] Deleting iptables rule: -s 192.168.0.0/16 -j ACCEPT
I0302 17:58:24.045793 17230 iptables.go:167] Deleting iptables rule: -d 192.168.0.0/16 -j ACCEPT
I0302 17:58:24.049215 17230 iptables.go:155] Adding iptables rule: -s 192.168.0.0/16 -j ACCEPT
I0302 17:58:24.055379 17230 iptables.go:155] Adding iptables rule: -d 192.168.0.0/16 -j ACCEPT
node1获取的IP地址
node2启动flannel
flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
[root@docker2 ~]# flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
I0303 11:14:22.890372 86807 main.go:531] Using interface with name ens38 and address 192.168.66.42
I0303 11:14:22.890456 86807 main.go:548] Defaulting external address to interface address (192.168.66.42)
I0303 11:14:22.890575 86807 main.go:246] Created subnet manager: Etcd Local Manager with Previous Subnet: None
I0303 11:14:22.890581 86807 main.go:249] Installing signal handlers
I0303 11:14:22.896486 86807 main.go:390] Found network config - Backend type: vxlan
I0303 11:14:22.899503 86807 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I0303 11:14:22.945784 86807 local_manager.go:234] Picking subnet in range 192.168.1.0 ... 192.168.255.0
I0303 11:14:22.947584 86807 local_manager.go:220] Allocated lease (192.168.34.0/24) to current node (192.168.66.42)
I0303 11:14:22.952407 86807 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0303 11:14:22.952585 86807 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0303 11:14:22.952601 86807 main.go:325] Running backend.
I0303 11:14:22.952918 86807 vxlan_network.go:60] watching for new subnet leases
I0303 11:14:22.955354 86807 main.go:433] Waiting for 23h1m1.52196736s to renew lease
I0303 11:14:22.962913 86807 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0303 11:14:22.962948 86807 iptables.go:167] Deleting iptables rule: -s 192.168.0.0/16 -j ACCEPT
I0303 11:14:22.966786 86807 iptables.go:167] Deleting iptables rule: -d 192.168.0.0/16 -j ACCEPT
I0303 11:14:22.970561 86807 iptables.go:155] Adding iptables rule: -s 192.168.0.0/16 -j ACCEPT
I0303 11:14:22.979335 86807 iptables.go:155] Adding iptables rule: -d 192.168.0.0/16 -j ACCEPT
node2获取的IP地址
配置Docker连接flannel
node1编辑docker配置文件
vim /etc/systemd/system/docker.service.d/10-machine.conf
添加–bip和–mtu这2个参数
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver overlay2 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=generic --bip=192.168.88.1/24 --mtu=1450
Environment=
这2个参数的值必须和/run/flannel/subnet.env中FLANNEL_SUBNET和FLANNEL_MTU一致。
[root@docker1 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=192.168.0.0/16
FLANNEL_SUBNET=192.168.88.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=false
重启docker相关服务
systemctl daemon-reload
systemctl restart docker.service
node2编辑docker配置文件
vim /etc/systemd/system/docker.service.d/10-machine.conf
添加–bip和–mtu这2个参数
[root@docker2 ~]# cat /etc/systemd/system/docker.service.d/10-machine.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver overlay2 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=generic --bip=192.168.34.1/24 --mtu=1450
Environment=
[root@docker2 ~]#
这2个参数的值必须和/run/flannel/subnet.env中FLANNEL_SUBNET和FLANNEL_MTU一致。
[root@docker2 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=192.168.0.0/16
FLANNEL_SUBNET=192.168.34.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=false
[root@docker2 ~]#
重启docker相关服务
systemctl daemon-reload
systemctl restart docker.service
node2修改前后网络配置
修改前的网关配置
[root@docker2 ~]# ip r
default via 10.121.137.254 dev ens33
10.121.136.0/23 dev ens33 proto kernel scope link src 10.121.137.42
169.254.0.0/16 dev ens33 scope link metric 1002
169.254.0.0/16 dev ens37 scope link metric 1003
169.254.0.0/16 dev ens38 scope link metric 1019
169.254.0.0/16 dev ens37.10 scope link metric 1022
169.254.0.0/16 dev ens37.20 scope link metric 1023
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
190.100.100.0/24 dev ens37 proto kernel scope link src 190.100.100.42
192.168.66.0/24 dev ens38 proto kernel scope link src 192.168.66.42
192.168.88.0/24 via 192.168.88.0 dev flannel.1 onlink
修改后的网关配置
[root@docker2 ~]# ip r
default via 10.121.137.254 dev ens33
10.121.136.0/23 dev ens33 proto kernel scope link src 10.121.137.42
169.254.0.0/16 dev ens33 scope link metric 1002
169.254.0.0/16 dev ens37 scope link metric 1003
169.254.0.0/16 dev ens38 scope link metric 1019
169.254.0.0/16 dev ens37.10 scope link metric 1022
169.254.0.0/16 dev ens37.20 scope link metric 1023
190.100.100.0/24 dev ens37 proto kernel scope link src 190.100.100.42
192.168.34.0/24 dev docker0 proto kernel scope link src 192.168.34.1
192.168.66.0/24 dev ens38 proto kernel scope link src 192.168.66.42
192.168.88.0/24 via 192.168.88.0 dev flannel.1 onlink
[root@docker2 ~]#
可见:flannel没有创建新的docker网络,而是直接使用默认的bridge网络。同一主机的容器通过docker0连接,跨主机流量通过flannel.1转发。
将容器连接到flannel网络
node1运行容器
docker run -itd centos
容器IP地址为:192.168.88.2
node2运行容器
docker run -itd centos
容器IP地址为:192.168.34.2
并且通过ping发现,即使2个容器不在同一个网段,也能ping通。
跨网段通讯流程
那么2个不同的网段直接是如何通讯的呢,以node1为例:
node1的路由
- node1 ping node2时发现不在同一网段,数据包默认发送给网关docker0(192.168.88.0)
- 根据node1的路由表,数据包会发给flannel.1
- flannel.1将数据包封装成VxLAN,通过ens38发送给node2
- node2收到数据包进行解封装,发现数据包目的地位192.168.34.2,根据路由表将数据包发送给flannel.1,并通过docker0达到容器centos里面
host-gw模式
修改flannel配置文件
cat /home/flannel-config.json
{
"Network": "192.168.88.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "host-gw"
}
}
更新etcd数据库
[root@docker-manager flannel]# ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 set /docker-test/network/config < flannel-config.json
{
"Network": "192.168.88.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "host-gw"
}
}
[root@docker-manager flannel]# ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 get /docker-test/network/config
{
"Network": "192.168.88.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "host-gw"
}
}
[root@docker-manager flannel]#
node1启动flannel
flannel启动前,node1的路由表
flannel启动后,node1的路由表
node2启动flannel
启动过程中一样,这里不再贴图了
修改mtu
以host-gw启动flannel后可以看到,mtu变为1500,需要重先修改docker的mtu
[root@docker1 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=192.168.0.0/16
FLANNEL_SUBNET=192.168.88.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=false
[root@docker1 ~]#
修改mtu:
vim /etc/systemd/system/docker.service.d/10-machine.conf
重启docker服务:
systemctl daemon-reload
systemctl restart docker.service
注意:2个节点都要操作
最后分别在2个节点上启动容器,并验证不同网段的连通性即可。验证步骤详见vxlan模式的验证步骤,此处略。
VxLAN和host-gw对比
- host-gw把每个主机都配置成网关,主机知道其他主机的subnet和转发地址。vxlan则在主机间建立隧道,不通主机的容器都在一个大的网段内。
- 虽然vxlan与host-gw使用不同的机制建立主机之间的连接,但是对于容器则无需任何改变。
- 由于vxlan需要对数据包进行额外的封包和拆包,性能会稍逊于host-gw。
TroubleShooting
Couldn’t fetch network config: client: etcd cluster is unavailable or misconfigured; error #0: EOF
解决方法
使用http而不是https,至于这里使用http还是https取决于etcd监听的是哪个协议。
flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
Couldn’t fetch network config: client: response is invalid json. The endpoint is probably not valid etcd cluster endpoint.timed out
排查步骤:
1.在nod1节点通过curl验证etcd服务是否正常
[root@docker1 ~]# curl http://192.168.66.40:2379/version
{“etcdserver”:“3.4.15”,“etcdcluster”:“3.4.0”}[root@docker1 ~]#
通过返回可以确认node1节点和master直接通讯以及etcd服务本身都没有问题
2.通过网上查找资料发现有可能和版本有关系,目前的flannel版本(v0.13.0)无法和etcd(3.4.15)通讯
解决方法
在etcd启动时兼容v2版本
etcd --enable-v2 -listen-client-urls http://192.168.66.40:2379 -advertise-client-urls http://192.168.66.40:2379
Couldn’t fetch network config: 100: Key not found (/docker-test)
解决了以上问题后,有出现了“Couldn’t fetch network config: 100: Key not found (/docker-test) [7] timed out”报错
问题原因
flanneld无法和v3版本的etcd通讯
flanneld -version
v0.13.0
etcd --version
etcd Version: 3.4.15
Git SHA: aa7126864
Go Version: go1.12.17
Go OS/Arch: linux/amd64
解决方法
第一种:export ETCDCTL_API=2(在后面实验时发现这个方法好像不太好使,建议大家使用第2中方法)
第二种:在etcdctl命令前添加ETCDCTL_API=2参数
ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 set /docker-test/network/config < flannel-config.json
注意,无论使用哪种方法,v2和v3版本的etcd,在命令行的格式上有所区别,主要区别如下:
- V2版本的endpoints=后面需要指定https或http,V3不需要
- V2版本的写入数据用set参数,V3版本使用put参数
正常情况下nod1节点会创建一个flannel.1的网卡
参考连接:
https://github.com/coreos/flannel/issues/1191
https://github.com/coreos/flannel/issues/554
https://github.com/coreos/flannel/issues/755
https://bugzilla.redhat.com/show_bug.cgi?id=1498096
问题小结
通过以上3个问题发现所有的问题基本都是由版本不一致导致的,那么解决版本不一致的方法如下:
1.启动ectd时指定兼容v2版本
etcd --enable-v2 -listen-client-urls http://192.168.66.40:2379 -advertise-client-urls http://192.168.66.40:2379
2.etcd写入数据时使用v2版本写入
ETCDCTL_API=2 etcdctl --endpoints=http://192.168.66.40:2379 set /docker-test/network/config < flannel-config.json
3.启动flannel时endpoints显示指定http或https
flanneld -etcd-endpoints=http://192.168.66.40:2379 -iface=ens38 -etcd-prefix=/docker-test/network
同时注意etcd-prefix的key要比etcd写入的key少一个字段,具体原因还未知,后面会补充。