8、Service 之间通信
微服务架构的应用由若干 service 组成。比如有运行 httpd 的 web 前端,有提供缓存的 memcached,有存放数据的 mysql,每一层都是 swarm 的一个 service,每个 service 运行了若干容器。在这样的架构中,service 之间是必然要通信的
服务发现
一种实现方法是将所有 service 都 publish 出去,然后通过 routing mesh 访问。但明显的缺点是把 memcached 和 mysql 也暴露到外网,增加了安全隐患。
如果不 publish,那么 swarm 就要提供一种机制,能够:
1、让 service 通过简单的方法访问到其他 service。
2、当 service 副本的 IP 发生变化时,不会影响访问该 service 的其他 service。
3、当 service 的副本数发生变化时,不会影响访问该 service 的其他 service。
这其实就是服务发现(service discovery)。Docker Swarm 原生就提供了这项功能,通过服务发现,service 的使用者不需要知道 service 运行在哪里,IP 是多少,有多少个副本,就能与 service 通信
《实践部署》
1、创建 overlay 网络
要使用服务发现,需要相互通信的 service 必须属于同一个 overlay 网络,所以我们先得创建一个新的 overlay 网络。只在swarm-manager上创建
直接使用 ingress
行不行?
很遗憾,目前 ingress
没有提供服务发现,必须创建自己的 overlay 网络。
2、部署 service 到 overlay网络
部署一个 web 服务,并将其挂载到新创建的 overlay 网络。
[root@swarm-manager ~]# docker service create --name my_web --replicas=3 --network myapp_net httpd
ytsfzosj6c1jxom7scsltncrf
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
[root@swarm-manager ~]#
部署一个 util 服务用于测试,挂载到同一个 overlay 网络。
[root@swarm-manager ~]# docker service create --name util --network myapp_net busybox:1.28.3 sleep 10000000
eiknvmsrrmhkdipxsy2d45hi8
overall progress: 1 out of 1 tasks
1/1: running
verify: Service converged
[root@swarm-manager ~]#
sleep 10000000
的作用是保持 busybox 容器处于运行的状态,我们才能够进入到容器中访问 service my_web
。
新版busybox镜像的nslookup命令报错,使用旧版本即可,如1.28.3版
2、验证
通过docker service ps util 确认 util 所在的节点为 swarm-worker1。
[root@swarm-manager ~]# docker service ps util
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
uzub4he2axfb util.1 busybox:1.28.3 swarm-worker1 Running Running 6 minutes ago
[root@swarm-manager ~]#
登录到 swarm-worker1,在容器 util.1 中 ping 服务 my_web
。
[root@swarm-worker1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
42ce32a9c9f9 busybox:1.28.3 "sleep 10000000" 9 minutes ago Up 9 minutes util.1.uzub4he2axfbvnxaxuwpv9mqr
941a47d7b8f6 httpd:latest "httpd-foreground" 14 minutes ago Up 14 minutes 80/tcp my_web.2.sgu8fdgnovg7c1xqwukya8cqs
[root@swarm-worker1 ~]# docker exec util.1.uzub4he2axfbvnxaxuwpv9mqr ping -c 3 my_web
PING my_web (10.0.1.18): 56 data bytes
64 bytes from 10.0.1.18: seq=0 ttl=64 time=0.181 ms
64 bytes from 10.0.1.18: seq=1 ttl=64 time=0.063 ms
64 bytes from 10.0.1.18: seq=2 ttl=64 time=0.060 ms
--- my_web ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.060/0.101/0.181 ms
[root@swarm-worker1 ~]#
10.0.1.18
是 my_web
service 的 VIP(Virtual IP),swarm 会将对 VIP 的访问负载均衡到每一个副本。
在工具类service中ping之前创建的web service,是可以ping通的,但解析到的IP并不是副本的IP地址,而是该service的VIP,该VIP具有负载均衡的功能
我们可以执行下面的命令查看每个副本的 IP。
[root@swarm-worker1 ~]# docker exec util.1.uzub4he2axfbvnxaxuwpv9mqr nslookup tasks.my_web
Server: 127.0.0.11
Address 1: 127.0.0.11
Name: tasks.my_web
Address 1: 10.0.1.19 my_web.3.jecmwxbyha7wx7iv9akr8z9ku.myapp_net
Address 2: 10.0.1.20 my_web.1.jqqahg7wkjrqr2q0j3b942jjt.myapp_net
Address 3: 10.0.1.21 my_web.2.sgu8fdgnovg7c1xqwukya8cqs.myapp_net
10.0.1.19
、10.0.1.20
、10.0.1.21
才是各个副本自己的 IP。不过对于服务的使用者(这里是 util.1),根本不需要知道 my_web
副本的 IP,也不需要知道 my_web
的 VIP,只需直接用 service 的名字 my_web
就能访问服务。
9、如何滚动更新 Service
部署了多个副本的服务,如何滚动更新每一个副本。
滚动更新降低了应用更新的风险,如果某个副本更新失败,整个更新将暂停,其他副本则可以继续提供服务。
同时在更新的过程中,总是有副本在运行的,因此也保证了业务的连续性。
下面我们将部署三副本的httpd服务,httpd:2.4.35 升级到 httpd:2.4.37,以及回滚原来的版本
1、创建Service 镜像 httpd:2.4.35,副本数3
[root@localhost ~]# docker service create --name httpd_2435 --replicas 3 httpd:2.4.35
k8y1twyg0j8kbb9iv1eiou1kl
overall progress: 3 out of 3 tasks
1/3: running
2/3: running
3/3: running
verify: Service converged
[root@localhost ~]#
2、更新Service 镜像到 httpd:2.4.37 --image
指定新的镜像。
[root@swarm-manager ~]# docker service update --image httpd:2.4.37 httpd_2435
httpd_2435
overall progress: 3 out of 3 tasks
1/3: running
2/3: running
3/3: running
verify: Service converged
[root@swarm-manager ~]#
Swarm 将按照如下步骤执行滚动更新:
1、停止第一个副本。
2、调度任务,选择 worker node。
3、在 worker 上用新的镜像启动副本。
4、如果副本(容器)运行成功,继续更新下一个副本;如果失败,暂停整个更新过程
3、docker service ps
查看更新结果。三个副本都已经更新到 httpd:2.2.32。
[root@swarm-manager ~]# docker service ps httpd_2435
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
vgtlj9p3wm4f httpd_2435.1 httpd:2.4.37 swarm-worker2 Running Running 20 seconds ago
7ie69runo0fk \_ httpd_2435.1 httpd:2.4.35 swarm-worker2 Shutdown Shutdown 54 seconds ago
nygpgm3rcktb httpd_2435.2 httpd:2.4.37 swarm-worker1 Running Running 57 seconds ago
ifacs8pjc20s \_ httpd_2435.2 httpd:2.4.35 swarm-worker1 Shutdown Shutdown about a minute ago
208o4cq1sbyt httpd_2435.3 httpd:2.4.37 swarm-worker1 Running Running 56 seconds ago
227v6kftxy0y \_ httpd_2435.3 httpd:2.4.35 swarm-worker2 Shutdown Shutdown 56 seconds ago
[root@swarm-manager ~]#
4、回滚到之前的版本 --rollback
快速恢复到更新之前的状态
[root@swarm-manager ~]# docker service update --rollback httpd_2435
httpd_2435
rollback: manually requested rollback
overall progress: rolling back update: 3 out of 3 tasks
1/3: running
2/3: running
3/3: running
verify: Service converged
[root@swarm-manager ~]#
5、回滚到之前的版本后,新开了一个 httpd:2.4.25,而不是使用之前的
[root@swarm-manager ~]# docker service ps httpd_2435
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
luu1kh08bt4z httpd_2435.1 httpd:2.4.35 swarm-worker2 Running Running 54 seconds ago
vgtlj9p3wm4f \_ httpd_2435.1 httpd:2.4.37 swarm-worker2 Shutdown Shutdown 55 seconds ago
7ie69runo0fk \_ httpd_2435.1 httpd:2.4.35 swarm-worker2 Shutdown Shutdown 29 minutes ago
idp4k4e1mqj9 httpd_2435.2 httpd:2.4.35 swarm-worker1 Running Running 52 seconds ago
nygpgm3rcktb \_ httpd_2435.2 httpd:2.4.37 swarm-worker1 Shutdown Shutdown 53 seconds ago
ifacs8pjc20s \_ httpd_2435.2 httpd:2.4.35 swarm-worker1 Shutdown Shutdown 29 minutes ago
pldxqcs7ccn3 httpd_2435.3 httpd:2.4.35 swarm-worker2 Running Running 56 seconds ago
208o4cq1sbyt \_ httpd_2435.3 httpd:2.4.37 swarm-worker1 Shutdown Shutdown 57 seconds ago
227v6kftxy0y \_ httpd_2435.3 httpd:2.4.35 swarm-worker2 Shutdown Shutdown 29 minutes ago
[root@swarm-manager ~]#
默认配置下,Swarm 一次只更新一个副本,并且两个副本之间没有等待时间。我们可以通过 --update-parallelism
设置并行更新的副本数目,通过 --update-delay
指定滚动更新的间隔时间。
下面的例子中,我们有20个副本,更新并发数 4 ,更新延时 10s
docker service update --image httpd:2.2 --update-parallelism 4 --update-delay 10s httpd_2435
ps:弹性伸缩时,即增减副本数并不受此限制,会以最快的速度完成伸缩
10、Swarm 如何管理数据
service 的容器副本会 scale up/down ,会 failover,会在不同的主机上创建和销毁,这就引出一个问题,如果Service有要管理的数据,那么这些数据应该如何存放呢?
选项一:打包在容器里
显然不行,除非数据不会发生变化,否则,如何在多个副本间保持同步呢
选项二:数据放在Docker主机的本地目录中,通过 volume 映射到容器中
位于同一个主机的副本倒是可以共享这个volume,但是不同主机中的副本该如何同步呢?
选项三:利用 Docker 的 volume driver ,由外部的storage provider 管理和提供 volume,所有Docker 主机的volume将挂载到各个副本。
这是目前最好的方案了,volume 不依赖docker主机和容器,生命周期由storage provider 管理,volume 的高可用和数据有效性也全权由provider负责,Docker只管使用。
下面我们以 nfs 来实践第三种方案
swarm-worker1 nfs-client 192.168.1.109
swarm-worker2 nfs-client 192.168.1.108
swarm-manager nfs-client + nfs-server(/var/nfs) 192.168.1.107
1、安装和配置 nfs-server
[root@swarm-manager ~]# yum install -y nfs-utils
#安装nfs服务
[root@swarm-manager ~]# yum install -y rpcbind
#安装rpc服务
[root@swarm-manager ~]# mkdir /var/nfs
[root@swarm-manager ~]# cat /etc/exports
/var/nfs * (rw,sync,no_root_squash)
[root@swarm-manager ~]# systemctl start rpcbind
[root@swarm-manager ~]# systemctl enable rpcbind
[root@swarm-manager ~]# systemctl start nfs-server
[root@swarm-manager ~]# systemctl enable nfs-server
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
[root@swarm-manager ~]# showmount -e 192.168.1.107
Export list for 192.168.1.107:
/var/nfs *
[root@swarm-manager ~]#
2、安装和配置 nfs-client (需要在每台host都创建一遍 volume)
[root@swarm-manager ~]# docker volume create --driver local --opt type=nfs --opt o=addr=10.9.1.240,rw --opt device=:/var/nfs volume-nfs
volume-nfs
[root@swarm-manager ~]# docker volume ls
DRIVER VOLUME NAME
local volume-nfs
[root@swarm-manager ~]#
[root@swarm-worker1 ~]# yum -y install nfs-utils rpcbind
[root@swarm-worker1 ~]# showmount -e 192.168.1.107
Export list for 192.168.1.107:
/var/nfs *
[root@swarm-worker1 ~]# docker volume create --driver local --opt type=nfs --opt o=addr=192.168.1.107,rw --opt device=:/var/nfs volume-nfs
volume-nfs
[root@swarm-worker1 ~]# docker volume ls
DRIVER VOLUME NAME
local volume-nfs
[root@swarm-worker2 ~]# yum -y install nfs-utils rpcbind
[root@swarm-worker2 ~]# showmount -e 192.168.1.107
Export list for 192.168.1.107:
/var/nfs *
[root@swarm-worker2 ~]# docker volume create --driver local --opt type=nfs --opt o=addr=192.168.1.107,rw --opt device=:/var/nfs volume-nfs
volume-nfs
[root@swarm-worker2 ~]# docker volume ls
DRIVER VOLUME NAME
local volume-nfs
3、创建Service 并挂载nfs 的volume,并验证
[root@swarm-manager ~]# docker service create --name my_web --publish 80:80 --mount type=volume,source=volume-nfs,volume-nocopy=true,destination=/usr/local/apache2/htdocs --replicas 2 httpd
dhrf6dkql4kwzwfc65u47xopl
overall progress: 2 out of 2 tasks
1/2: running
2/2: running
verify: Service converged
如果遇到报错:
# 加一个参数volume-nocopy=true
4、更新共享目录里/var/nfs/下的html文件,并curl输入
[root@swarm-manager ~]# cat /var/nfs/index.html
docker swarm nfs volume test
[root@swarm-manager ~]# curl http://192.168.1.107
docker swarm nfs volume test
[root@swarm-manager ~]# curl http://192.168.1.108
docker swarm nfs volume test
[root@swarm-manager ~]# curl http://192.168.1.109
docker swarm nfs volume test
5、查看服务
[root@swarm-manager ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
v4envw9v96qj my_web.1 httpd:latest swarm-worker1 Running Running 17 minutes ago
ma2wnu0eodn4 my_web.2 httpd:latest swarm-worker2 Running Running 17 minutes ago
[root@swarm-manager ~]#
6、查看每个副本容器的信息
[root@swarm-worker1 ~]# docker exec -it my_web.1.v4envw9v96qj12xx0isv3ezcy cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
[root@swarm-worker1 ~]# docker volume inspect volume-nfs
[
{
"CreatedAt": "2020-05-13T02:06:38-04:00",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/volume-nfs/_data",
"Name": "volume-nfs",
"Options": {
"device": ":/var/nfs",
"o": "addr=192.168.1.107,rw",
"type": "nfs"
},
"Scope": "local"
}
]
[root@swarm-worker1 ~]# docker inspect my_web.1.v4envw9v96qj12xx0isv3ezcy | jq .[0].Mounts
-bash: jq: command not found
#这里是因为容器里没有jq命令实际上输出是下面的信息
[
{
"Type": "volume",
"Name": "volume-nfs",
"Source": "/var/lib/docker/volumes/volume-nfs/_data",
"Destination": "/usr/local/apache2/htdocs",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
]
7、成功将 nfs 的volume挂载到 Service上,验证 Failover时,验证 Swarm 数据持久性
①Scale Up 增加副本,并验证数据是否能够同步到新启动的容器上
[root@swarm-manager ~]# docker service update --replicas 4 my_web
my_web
overall progress: 4 out of 4 tasks
1/4: running
2/4: running
3/4: running
4/4: running
verify: Service converged
[root@swarm-manager ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
68g07ulol72e my_web.1 httpd:latest swarm-worker1 Running Running 26 minutes ago
ma2wnu0eodn4 my_web.2 httpd:latest swarm-worker2 Running Running 50 minutes ago
jwlr705r68iq my_web.3 httpd:latest swarm-worker2 Running Running about a minute ago
gj0pdh1suybq my_web.4 httpd:latest swarm-worker1 Running Running about a minute ago
②验证新的副本
[root@swarm-worker1 ~]# docker inspect my_web.4.gj0pdh1suybqmpml8nrkit99j | jq .[0].Mounts
[
{
"Type": "volume",
"Name": "volume-nfs",
"Source": "/var/lib/docker/volumes/volume-nfs/_data",
"Destination": "/usr/local/apache2/htdocs",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
]
[root@swarm-worker1 ~]# docker exec my_web.4.gj0pdh1suybqmpml8nrkit99j cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
[root@swarm-worker1 ~]#
③更新Volume内容,并进行验证
[root@swarm-manager ~]# echo "add test str" >> /var/nfs/index.html
[root@swarm-manager ~]# cat /var/nfs/index.html
docker swarm nfs volume test
add test str
[root@swarm-manager ~]# curl http://192.168.1.107
docker swarm nfs volume test
add test str
[root@swarm-manager ~]# curl http://192.168.1.108
docker swarm nfs volume test
add test str
[root@swarm-manager ~]# curl http://192.168.1.109
docker swarm nfs volume test
add test str
[root@swarm-manager ~]#
④验证每个副本
[root@swarm-worker1 ~]# docker exec my_web.4.gj0pdh1suybqmpml8nrkit99j cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
add test str
[root@swarm-worker1 ~]# docker exec my_web.1.68g07ulol72ef8xl72tb3raeg cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
add test str
[root@swarm-worker1 ~]#
⑤Failover 验证,warm-work1关机
关机前状态
[root@swarm-manager ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
68g07ulol72e my_web.1 httpd:latest swarm-worker1 Running Running 55 minutes ago
ma2wnu0eodn4 my_web.2 httpd:latest swarm-worker2 Running Running about an hour ago
jwlr705r68iq my_web.3 httpd:latest swarm-worker2 Running Running 30 minutes ago
gj0pdh1suybq my_web.4 httpd:latest swarm-worker1 Running Running 30 minutes ago
关闭warm-work1
[root@swarm-manager ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
bm5ndipc2fym my_web.1 httpd:latest swarm-worker2 Ready Ready 4 seconds ago
v4envw9v96qj \_ my_web.1 httpd:latest swarm-worker1 Shutdown Complete 58 minutes ago
ma2wnu0eodn4 my_web.2 httpd:latest swarm-worker2 Running Running about an hour ago
jwlr705r68iq my_web.3 httpd:latest swarm-worker2 Running Running 33 minutes ago
a2a0ut21mxm4 my_web.4 httpd:latest swarm-worker2 Ready Ready 4 seconds ago
gj0pdh1suybq \_ my_web.4 httpd:latest swarm-worker1 Shutdown Running 17 seconds ago
验证warm-work2上的容器
[root@swarm-worker2 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cd49394e4337 httpd:latest "httpd-foreground" 2 minutes ago Up 2 minutes 80/tcp my_web.4.a2a0ut21mxm42i02w2kwbc2fg
a228218a0656 httpd:latest "httpd-foreground" 2 minutes ago Up 2 minutes 80/tcp my_web.1.bm5ndipc2fymnanu9113vxrup
28efe52c66f8 httpd:latest "httpd-foreground" 35 minutes ago Up 35 minutes 80/tcp my_web.3.jwlr705r68iq9cd6rguqaqlf7
9540958f0b7d httpd:latest "httpd-foreground" About an hour ago Up About an hour 80/tcp my_web.2.ma2wnu0eodn4iuspzwp7hdi9z
[root@swarm-worker2 ~]# docker exec my_web.1.bm5ndipc2fymnanu9113vxrup cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
add test str
[root@swarm-worker2 ~]# docker exec my_web.2.ma2wnu0eodn4iuspzwp7hdi9z cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
add test str
[root@swarm-worker2 ~]# docker exec my_web.4.a2a0ut21mxm42i02w2kwbc2fg cat /usr/local/apache2/htdocs/index.html
docker swarm nfs volume test
add test str
[root@swarm-worker2 ~]#
[root@swarm-manager ~]# curl http://192.168.1.109 # 因为swarm-work1 关机了,所有swarm-work1 的ip也访问不了了
curl: (7) Failed to connect to 192.168.1.109 port 80: No route to host
[root@swarm-manager ~]# curl http://192.168.1.108
docker swarm nfs volume test
add test str
[root@swarm-manager ~]# curl http://192.168.1.107
docker swarm nfs volume test
add test str