一、了解Docker Swarm
- Swarm编排工具是内嵌在docker内的。是docker所属公司自己研发的简单编排工具
- docker集群node有两个角色:manager 和 worker,manager一般有两个以上,保证高可用
- 多个manager通过 Raft 分布式存储 实现 状态同步
- worker是主要运行容器的节点
- 多个worker节点之间通过 Gossip 网络进行信息同步
1. 创建一个三节点的swarm集群
# 查看帮助信息
[root@localhost voting]# docker swarm init --help
Usage: docker swarm init [OPTIONS]
Initialize a swarm
Options:
--advertise-addr string Advertised address (format: <ip|interface>[:port])
--autolock Enable manager autolocking (requiring an unlock key to start a
stopped manager)
--availability string Availability of the node ("active"|"pause"|"drain") (default
"active")
--cert-expiry duration Validity period for node certificates (ns|us|ms|s|m|h) (default
2160h0m0s)
--data-path-addr string Address or interface to use for data path traffic (format:
<ip|interface>)
--data-path-port uint32 Port number to use for data path traffic (1024 - 49151). If no
value is set or is set to 0, the default port (4789) is used.
--default-addr-pool ipNetSlice default address pool in CIDR format (default [])
--default-addr-pool-mask-length uint32 default address pool subnet mask length (default 24)
--dispatcher-heartbeat duration Dispatcher heartbeat period (ns|us|ms|s|m|h) (default 5s)
--external-ca external-ca Specifications of one or more certificate signing endpoints
--force-new-cluster Force create a new cluster from current state
--listen-addr node-addr Listen address (format: <ip|interface>[:port]) (default 0.0.0.0:2377)
--max-snapshots uint Number of additional Raft snapshots to retain
--snapshot-interval uint Number of log entries between Raft snapshots (default 10000)
--task-history-limit int Task history retention limit (default 5)
初始化节点(注意:第一个初始化的几点是manager)
# 执行命令后会出现下面提示,类似于kubeadm
[root@localhost ~]# docker swarm init --advertise-addr 192.168.10.70
Swarm initialized: current node (i88ecofl4tnoo3ndp7ob8m2bx) is now a manager.
To add a worker to this swarm, run the following command:
# 在其他节点上执行下面命令,worker节点加入到swarm集群
docker swarm join --token SWMTKN-1-2nlhgt9g6b0hnk5hhq6ofxteh1ktu24gu4ha7v03ok66i6z9jd-9o603r396hw3yzome71b0ydgy 192.168.10.70:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
在worker节点上运算上面命令
[root@localhost ~]# docker swarm join --token SWMTKN-1-2nlhgt9g6b0hnk5hhq6ofxteh1ktu24gu4ha7v03ok66i6z9jd-9o603r396hw3yzome71b0ydgy 192.168.10.70:2377
# 提示节点加入了集群作为一个worker
This node joined a swarm as a worker.
在manager节点上可以查看集群节点状况
# 可以看到集群内有三个节点,
[root@localhost ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
i88ecofl4tnoo3ndp7ob8m2bx * localhost Ready Active Leader 20.10.6
iu98ufck2e8ftogiffy5f91um localhost.localdomain Ready Active 20.10.6
jang0tvfjrl5c11u54y63o6tc localhost.localdomain Ready Active 20.10.6
2.了解docker swarm的相关操作
- 在swarm集群内service相当于一个容器
docker service create
— 相当于docker run,但是docker run是在本地创建,而这个是在swarm集群内创建- service是可以水平扩展的
相关演示:
#创建service
[root@localhost ~]# docker service create --name demo -d busybox /bin/sh -c "while true;do sleep 4000;done"
lfk2kmdnod84j9i1xlr7g9qjp
# 查看集群内的service
[root@localhost ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lfk2kmdnod84 demo replicated 1/1 busybox:latest
# 查看service的运行状态,(注意:和docker ps 不同的是,docker service ps 后面要指明service的名称)
[root@localhost ~]# docker service ps demo
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
l4qkr3bpk551 demo.1 busybox:latest localhost.localdomain Running Running about a minute ago
# 水平扩展service,如下扩展3的demo的eservice
[root@localhost ~]# docker service scale demo=3
demo scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
# 再次查看,可以看到replicas数量变成了3
[root@localhost ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lfk2kmdnod84 demo replicated 3/3 busybox:latest
# 查看详细状态,三个容器调度在不同节点上
[root@localhost ~]# docker service ps demo --filter "DESIRED-STATE=running"
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
l4qkr3bpk551 demo.1 busybox:latest swarm-work1 Running Running 20 minutes ago
ojzd9e5fuuzl demo.2 busybox:latest localhost Running Running 15 minutes ago
kvnl9oa1rn3n demo.3 busybox:latest localhost Running Running 13 minutes ago
swarm集群带有健康检查机制,当service的replicas数量不正常时,会自动创建
# 查看rs数量
[root@localhost ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lfk2kmdnod84 demo replicated 3/3 busybox:latest
# 到swarm-worker1上删除一个容器,测试是否会自动恢复
[root@swarm-work1 ~]# docker rm -f 63fcfafa62b9
# 在swarm-work1上删除后,立即到manager上查看,会发现少了一个
[root@localhost ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lfk2kmdnod84 demo replicated 2/3 busybox:latest
# 符几秒后再查看,删除的那个容器又被创建回来了
[root@localhost ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lfk2kmdnod84 demo replicated 3/3 busybox:latest
删除servicedocker service rm <service名称>
# 删除demo的service
[root@localhost ~]# docker service rm demo
# 再次查看,已经没有了
[root@localhost ~]# docker service ps demo
no such service: demo
3. 在swarm集群内部署wordpress
- 因为涉及到多机器间的网络通信,所以需要创建overlay的网络
# 创建一个overlay的网络
[root@localhost ~]# docker network create -d overlay demo
p9kbg79gpwkvp1un79eogvo0y
# 查看manager节点的网络信息,创建了demo的overlay网络
[root@localhost ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
451c3b1f0dce bridge bridge local
p9kbg79gpwkv demo overlay swarm
5adcb662ede3 docker_gwbridge bridge local
ef2dd6544dee host host local
5m0vr63uorwl ingress overlay swarm
0051745c4534 none null local
# 查看另外两个worker节点,暂时并没有同步demo的网络
[root@localhost ~]# sshpass -f .passwd.txt ssh 192.168.10.20 "docker network ls"
NETWORK ID NAME DRIVER SCOPE
58a3d0b1fb40 bridge bridge local
ba35cba6afaf docker_gwbridge bridge local
450b1d5510f0 host host local
5m0vr63uorwl ingress overlay swarm
5d7c09a3f93b none null local
[root@localhost ~]# sshpass -f .passwd.txt ssh 192.168.10.30 "docker network ls"
NETWORK ID NAME DRIVER SCOPE
ce9f019a4a3a bridge bridge local
b977f4a38389 docker_gwbridge bridge local
71e5f6103085 host host local
5m0vr63uorwl ingress overlay swarm
7dc75ea6de05 none null local
创建两个service,先创建mysql的service
--mount type=volume,source=mysql-data,destination=/var/lib/mysql
--mount
:相当于 -vtype=volume
: 挂载类型为volumesource=mysql-data
: 数据卷destination=/var/lib/mysql
: 容器内目录
[root@localhost ~]# docker service create --name mysql -d --network demo -e MYSQL_ROOT_PASSWORD=123456 -e MYSQL_DATABASE=wordpress --mount type=volume,source=mysql-data,destination=/var/lib/mysql mysql
# 查看mysql这个service信息,运行在manager上
[root@localhost ~]# docker service ps mysql
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
66auhvk03hl4 mysql.1 mysql:latest localhost Running Running 58 seconds ago
创建wordpress
# 创建wordpress的service
[root@localhost ~]# docker service create --name wordpress -d --network demo -p 80:80 --env MYSQL_DB_HOST=mysql:3306 --env MYSQL_DB_USER=root --env MYSQL_DDB_PASSWORD=123456 wordpress
# 查看wordpress运行在哪个节点上,发现在swarm-work1节点
[root@localhost ~]# docker service ps wordpress
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
oxcqgg0h8851 wordpress.1 wordpress:latest swarm-work1 Running Preparing 17 seconds ago
# 到 swarm-work1 节点上查看,运行了wordpress 容器
[root@swarm-work1 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2e55903404da wordpress:latest "docker-entrypoint.s…" 28 seconds ago Up 27 seconds 80/tcp wordpress.1.rzywcpd3li0hhhv2p9jdzp34i
访问swarm-work1的ip测试,成功访问
使用manager 和 swarm-work2的ip访问测试,发现也都是可以访问的
这时候,我们在查看 swarm-work1 和 swarm-work2 的网络信息,发现被调度到service的节点 swarm-work1 上出现的demo网络,没有被调度的 swarm-work2 上没有demo网络
[root@localhost ~]# sshpass -f .passwd.txt ssh 192.168.10.20 "docker network ls"
NETWORK ID NAME DRIVER SCOPE
ca3a3bd1b7c8 bridge bridge local
ba35cba6afaf docker_gwbridge bridge local
450b1d5510f0 host host local
5m0vr63uorwl ingress overlay swarm
5d7c09a3f93b none null local
[root@localhost ~]# sshpass -f .passwd.txt ssh 192.168.10.30 "docker network ls"
NETWORK ID NAME DRIVER SCOPE
a9d9db94cade bridge bridge local
p9kbg79gpwkv demo overlay swarm
b977f4a38389 docker_gwbridge bridge local
71e5f6103085 host host local
5m0vr63uorwl ingress overlay swarm
7dc75ea6de05 none null local
二、集群服务通信之Routing Mesh
- swarm自带服务发现的功能:当在overlay网络上创建service时,swarm会自动生成一个虚拟ip和这个service绑定,这样就可以做到无论service容器的ip怎么变化,都能进行通信。虚拟ip是通过LVS技术和service的容器ip进行通信。
# 创建一个whoami的service,这个容器是可以web访问的,返回主机名
[root@localhost ~]# docker service create --name whoami -p 8000:8000 --network demo -d jwilder/whoami
[root@localhost ~]# docker service ps whoami
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
nam97zd7o28h whoami.1 jwilder/whoami:latest swarm-work2 Running Running 39 seconds ago
# 访问whoami服务
[root@localhost ~]# curl 192.168.10.20:8000
I'm c7d008059b94
----------------------------------------------------------
# 运行一个busybox容器
[root@localhost ~]# docker service create --name client -d --network demo busybox sh -c "while true;do sleep 3000;done"
# 查看这个service容器,在swarm-work2节点上
[root@localhost ~]# docker service ps client --filter "desired-state=ready"
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
2tpreifgg2gp client.1 busybox:latest swarm-work2 Ready Ready less than a second ago
# 进入client容器,ping测试whoami,发现能ping通,目标地址是10.0.1.14
[root@localhost ~]# docker exec -it 5f sh
/ # ping whoami
PING whoami (10.0.1.14): 56 data bytes
64 bytes from 10.0.1.14: seq=0 ttl=64 time=0.147 ms
64 bytes from 10.0.1.14: seq=1 ttl=64 time=0.074 ms
64 bytes from 10.0.1.14: seq=2 ttl=64 time=0.126 ms
# 查看whoami地址,发现这个10.0.1.14地址并不是 whoami 的地址,其实是swarm绑定service的虚拟地址
[root@swarm-work2 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c7d008059b94 jwilder/whoami:latest "/app/http" 14 minutes ago Up 14 minutes 8000/tcp whoami.1.nam97zd7o28huqpn2rjnqll22
[root@swarm-work2 ~]# docker exec -it c7 ip a
# 对上面的实验进行扩展,将whoami进行扩容,看看ping的地址是否会变化
[root@localhost ~]# docker service scale whoami=3
# 进入client的容器进行ping验证,目标地址没变
[root@localhost ~]# docker exec -it 5f sh
/ # ping whoami
PING whoami (10.0.1.14): 56 data bytes
64 bytes from 10.0.1.14: seq=0 ttl=64 time=0.217 ms
64 bytes from 10.0.1.14: seq=1 ttl=64 time=0.071 ms
64 bytes from 10.0.1.14: seq=2 ttl=64 time=0.076 ms
# 可以通过 nslookup tasks.whoami 解析出whoami的真正的地址
/ # nslookup tasks.whoami
- Internal:Container 和 Container 之间的访问是通过overlay网络(通过vip,虚拟ip实现)
- Ingress:如果服务有绑定接口,则次服务可以通过任意的swarm节点的相应接口访问
1. Internal Load Balancing
2. Ingress Network
- ingress network的现象:比如一个web服务容器运行在node1节点上,有了ingress后,我们可以通过访问node2等其他节点来访问这个服务,ingress做了一个ip+端口的转发
- ingress是通过ipvs技术进行实现的。通过查看iptables的nat表可以查看到转发规则
# 例如查看iptables规则,发现会将任何访问8000端口的转发到172.30.0.2:8000上
[root@localhost ~]# iptables -t nat -L -v
……省略部分
Chain DOCKER-INGRESS (2 references)
pkts bytes target prot opt in out source destination
2 104 DNAT tcp -- any any anywhere anywhere tcp dpt:irdmi to:172.30.0.2:8000
6 312 DNAT tcp -- any any anywhere anywhere tcp dpt:http to:172.30.0.2:80
655 40130 RETURN all -- any any anywhere anywhere
# 现在查看172.30.0.2这个地址是什么地址
# 1. 用brctl show 命令,可以查看到docker_gwbridge网络上连接了4个接口
[root@localhost ~]# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02426f275910 no
docker_gwbridge 8000.0242c9d88f33 no veth4a9c734
veth4af5146
vethbb93b94
vethff15d0c
virbr0 8000.525400893c89 yes virbr0-nic
# 2. 查看docker_gwbridge的详细信息,看到ingress-sbox这个ingress网络地址恰好就是 172.30.0.2,因此可以判断,ingress其实是将路由转发到 docker_gwbridge 这个网络上来了
[root@localhost ~]# docker network inspect docker_gwbridge
……省略部分
"Containers": {
"18711b958fff1b824b6278c80cbb0f7bc30c131420c99e46d894f5e9cf4427f0": {
"Name": "gateway_9faea5e8a179",
"EndpointID": "4e04a4c2e83d3caefe6095735ed8bee58484be2bbf3398ae7cdd421aea4b3cbd",
"MacAddress": "02:42:ac:1e:00:05",
"IPv4Address": "172.30.0.5/16",
"IPv6Address": ""
},
"5fce69392cbd8247e593e028582fc753287ffa734fd54af314e775fb6abf1516": {
"Name": "gateway_1fa8b3ba7c37",
"EndpointID": "b11f0f4979488be6aa1eeff2e9b5b4eca2f55a8bc0cca0cda060a1023d90db82",
"MacAddress": "02:42:ac:1e:00:04",
"IPv4Address": "172.30.0.4/16",
"IPv6Address": ""
},
"9b6454d949244e9ce19304bbf6d40f86e6f5cdc58ddbc2133a32bd3098a6f5a5": {
"Name": "gateway_77bf3eda0db8",
"EndpointID": "4949a7aaeb13e8aaade37461290b448e4557e0c57a0490db0616177b650a41d7",
"MacAddress": "02:42:ac:1e:00:03",
"IPv4Address": "172.30.0.3/16",
"IPv6Address": ""
},
"ingress-sbox": {
"Name": "gateway_ingress-sbox",
"EndpointID": "b7eb71949d02f8251f6ca06b15a33646b310dc47228c3d7c1cfcd4e3859b3e70",
"MacAddress": "02:42:ac:1e:00:02",
"IPv4Address": "172.30.0.2/16",
"IPv6Address": ""
}
……
网络图如下
# 查看docker的ingress网络,在/var/run/docker/netns/这个目录下可以查看到
[root@localhost ~]# ls /var/run/docker/netns/
1-5m0vr63uor 1fa8b3ba7c37 1-p9kbg79gpw 7745f2003e1d 77bf3eda0db8 9faea5e8a179 ingress_sbox lb_p9kbg79gp
# 进入到ingress_sbox的network namespace
[root@localhost ~]# nsenter --net=/var/run/docker/netns/ingress_sbox
# 查看 ingress_sbox 的地址
[root@localhost ~]# ip a
……省略部分
119: eth1@if120: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:1e:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.30.0.2/16 brd 172.30.255.255 scope global eth1
valid_lft forever preferred_lft forever
# 在 ingress_sbox的network namespace 查看iptables,可以看到ipvs的负载均衡规则
[root@localhost ~]# iptables -nL -v -t mangle
Chain PREROUTING (policy ACCEPT 33 packets, 2534 bytes)
pkts bytes target prot opt in out source destination
54 7780 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 MARK set 0x103
19 1598 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 MARK set 0x104
Chain INPUT (policy ACCEPT 19 packets, 1598 bytes)
pkts bytes target prot opt in out source destination
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.5 MARK set 0x103
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.7 MARK set 0x104
……省略部分
# 使用ipvsadm -l命令也可以查看ipvs负载均衡的规则,rr是轮询
[root@localhost ~]# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 259 rr
-> 10.0.0.6:0 Masq 1 0 0
FWM 260 rr
-> 10.0.0.8:0 Masq 1 0 0
-> 10.0.0.9:0 Masq 1 0 0
-> 10.0.0.10:0 Masq 1 0 0