简介
Swarm 是 Docker 官方提供的一款集群管理工具,其主要作用是把若干台 Docker 主机抽象为一个整体,并且通过一个入口统一管理这些 Docker 主机上的各种 Docker 资源。
Swarm 和 Kubernetes 比较类似,但是更加轻,具有的功能也较 kubernetes 更少一些。
部署
安装前提
Docker安装参考 搭建【docker in centos7.x】
防火墙配置方法 笔记【防火墙 firewalld in CentOS7.x】
网络配置
每个节点需要开放以下端口:
- 2377/tcp:用于客户端与 Swarm 进行安全通信。
- 7946/tcp与 7946/udp:用于控制面 gossip 分发。
- 4789/udp:用于基于 VXLAN 的覆盖网络。
[root@docker02 ~]# firewall-cmd --zone=public --add-port=2377/tcp --add-port=7946/tcp --add-port=7964/udp --add-port=4789/tcp --permanent
success
[root@docker02 ~]# firewall-cmd --reload
success
[root@docker02 ~]# firewall-cmd --list-ports
2377/tcp 7946/tcp 7964/udp 4789/tcp
搭建流程
搭建 Swarm 的过程有时也被称为初始化 Swarm,大体流程包括:
初始化第一个管理节点 ->
加入额外的管理节点 ->
加入其它的工作节点 -> 完成
初始化第一个管理节点
# --advertise-addr 指定其他节点用来连接到当前管理节点的IP和端口。这一属性是可选的,当节点上有多个IP时,可以用于指定使用哪个IP。此外,还可以用于指定一个节点上没有的IP,比如一个负载均衡的IP。
# --listen-addr 指定用于承载Swarm流量的IP和端口。其设置通常与 --advertise-addr 相匹配,但是当节点上有多个IP的时候,可用于指定具体某个IP。并且,如果 --advertise-addr 设置了一个远程IP地址(如负载均衡的IP地址),该属性也是需要设置的。
# 建议执行命令时总是使用这两个属性来指定具体IP和端口。
[root@docker01 ~]# docker swarm init --advertise-addr 192.168.1.130:2377 --listen-addr 192.168.1.130:2377
Swarm initialized: current node (zkdi8xfgm54ldw359s0skadul) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-5d67oz1ga0dddo6innd859gba4ba358a6letwf30hlszk8mnx8-1trvysjx5se6gk13qkjk2qwf3 192.168.1.130:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
# 查看加入swarm worker节点的token
[root@docker01 ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-5d67oz1ga0dddo6innd859gba4ba358a6letwf30hlszk8mnx8-1trvysjx5se6gk13qkjk2qwf3 192.168.1.130:2377
# 查看加入swarm manager节点的token
[root@docker01 ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-5d67oz1ga0dddo6innd859gba4ba358a6letwf30hlszk8mnx8-9adk2df3arm75avkjnxg8om0j 192.168.1.130:2377
# 查看当前swarm cluster的节点列表
[root@docker01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
zkdi8xfgm54ldw359s0skadul * docker01.zangh Ready Active Leader 20.10.16
- 默认第一个节点会成为manager节点,并被选为Leader
- 工作节点和管理节点的接入命令中使用的接入Token(SWMTKN…)是不同的。因此,一个节点是作为工作节点还是管理节点接入,完全依赖于使用了哪个 Token。接入 Token 应该被妥善保管,因为这是将一个节点加入 Swarm 的唯一所需!
- raft算法
加入额外的管理节点
# 使用manager的token加入
[root@docker03 ~]# docker swarm join --token SWMTKN-1-5d67oz1ga0dddo6innd859gba4ba358a6letwf30hlszk8mnx8-9adk2df3arm75avkjnxg8om0j 192.168.1.130:2377
This node joined a swarm as a manager.
# 查看节点状态,*表示当前节点
[root@docker01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
zkdi8xfgm54ldw359s0skadul * docker01.zangh Ready Active Leader 20.10.16
2tmz78uuijz2nql2gq1i69gwf docker02.zangh Ready Active 20.10.16
[root@docker03 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
zkdi8xfgm54ldw359s0skadul docker01.zangh Ready Active Leader 20.10.16
ni8okp2di7irsft7urq5m8al7 * docker03.zangh Ready Active Reachable 20.10.16
加入其它的工作节点
# 使用worker的token加入
[root@docker02 ~]# docker swarm join --token SWMTKN-1-5d67oz1ga0dddo6innd859gba4ba358a6letwf30hlszk8mnx8-1trvysjx5se6gk13qkjk2qwf3 192.168.1.130:2377
This node joined a swarm as a worker.
# 查看节点状态,*表示当前节点
[root@docker01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
zkdi8xfgm54ldw359s0skadul * docker01.zangh Ready Active Leader 20.10.16
2tmz78uuijz2nql2gq1i69gwf docker02.zangh Ready Active 20.10.16
ni8okp2di7irsft7urq5m8al7 docker03.zangh Ready Active Reachable 20.10.16
集群拆解
[root@docker02 ~]# docker swarm leave -f
Node left the swarm.
使用
多节点集群故障模拟
- 本节点配置:3个管理节点 + 6个工作节点,正常状态:
[root@docker-compose02 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 docker-compose01.zangh Ready Active Leader 20.10.9
mo51soa7ekhmms82ksrrgblpd * docker-compose02.zangh Ready Active Reachable 20.10.9
rminqzat73c93gk8us7288eoz docker-compose03.zangh Ready Active Reachable 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Ready Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Ready Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Ready Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Ready Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Ready Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Ready Active 20.10.9
管理节点故障模拟
- 本节点配置:3个管理节点 + 6个工作节点,异常状态(1个主节点下线):
[root@docker-compose02 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 docker-compose01.zangh Unknown Active Unreachable 20.10.9
mo51soa7ekhmms82ksrrgblpd * docker-compose02.zangh Ready Active Leader 20.10.9
rminqzat73c93gk8us7288eoz docker-compose03.zangh Ready Active Reachable 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Unknown Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Ready Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Ready Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Ready Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Ready Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Unknown Active 20.10.9
- 本节点配置:3个管理节点 + 6个工作节点,异常状态(2个主节点下线):
- 由于raft协议半数选举协议,此时集群不可用,直到主节点重新选出leader
[root@docker-compose03 ~]# docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
- 本节点配置:3个管理节点 + 6个工作节点,状态恢复:
[root@docker-compose03 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 docker-compose01.zangh Ready Active Reachable 20.10.9
mo51soa7ekhmms82ksrrgblpd docker-compose02.zangh Ready Active Reachable 20.10.9
rminqzat73c93gk8us7288eoz * docker-compose03.zangh Ready Active Leader 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Ready Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Ready Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Ready Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Ready Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Ready Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Ready Active 20.10.9
工作节点故障模拟
- 本节点配置:3个管理节点 + 6个工作节点,异常状态(1个从节点下线):
- 由于主从之间心跳同步间隔问题,主节点并不能第一时间感知从节点下线;且不会影响集群对外服务
[root@docker-compose03 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 docker-compose01.zangh Ready Active Reachable 20.10.9
mo51soa7ekhmms82ksrrgblpd docker-compose02.zangh Ready Active Reachable 20.10.9
rminqzat73c93gk8us7288eoz * docker-compose03.zangh Ready Active Leader 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Down Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Ready Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Ready Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Ready Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Ready Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Ready Active 20.10.9
- 本节点配置:3个管理节点 + 6个工作节点,异常状态(2个从节点下线):
[root@docker-compose01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 * docker-compose01.zangh Ready Active Reachable 20.10.9
mo51soa7ekhmms82ksrrgblpd docker-compose02.zangh Ready Active Reachable 20.10.9
rminqzat73c93gk8us7288eoz docker-compose03.zangh Ready Active Leader 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Down Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Down Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Ready Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Ready Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Ready Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Ready Active 20.10.9
- 本节点配置:3个管理节点 + 6个工作节点,异常状态(所有从节点全部下线):
- 由此可见集群依旧是可用状态,因为主节点也可以承担了从节点的工作。
[root@docker-compose01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m5a28kskbredhcx2uoesk1gy8 * docker-compose01.zangh Ready Active Reachable 20.10.9
mo51soa7ekhmms82ksrrgblpd docker-compose02.zangh Ready Active Reachable 20.10.9
rminqzat73c93gk8us7288eoz docker-compose03.zangh Ready Active Leader 20.10.9
m87j62tqn6qqc06jssvrd5801 docker-swarm01.zangh Down Active 20.10.9
ddwo2dht2wfsx2mt464wrtxyf docker-swarm02.zangh Down Active 20.10.9
m178u1m5jlqmcht0ta9619jr3 docker-swarm03.zangh Down Active 20.10.9
1yya7a2sr2azu7k2h0d255l74 docker-swarm04.zangh Down Active 20.10.9
kxtnj674bzzqob8w55hzo43xl docker-swarm05.zangh Down Active 20.10.9
w3umwvti24sms7aixxxqm9qgp docker-swarm06.zangh Down Active 20.10.9