准备
准备3台机器,包含一台管理节点,两台工作节点的最小的swarm集群
172.16.0.200 Ubuntu14.04
172.16.0.201 Ubuntu14.04
172.16.0.202 Ubuntu14.04
生成环境最好2n+1(n>=1)个manager节点,但也不是越多越好,官方建议是7个manager节点
安装docker
使用脚本自动安装
在测试或开发环境中 Docker 官方为了简化安装流程,提供了一套便捷的安装脚本,Ubuntu 系统上可以使用这套脚本安装:
$ curl -fsSL get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh --mirror Aliyun
执行这个命令后,会下载脚本到get-docker.sh,并使用阿里云的镜像下载,然后脚本会把 Docker CE 的 Edge 版本安装在系统中。且是默认启动。
或
直接只有Daocloud的安装脚本安装
curl -sSL https://get.daocloud.io/docker | sh
启动命令
service docker start
建立docker用户组
默认情况下,docker 命令会使用 Unix socket 与 Docker 引擎通讯。而只有 root 用户和 docker 组的用户才可以访问 Docker 引擎的 Unix socket。出于安全考虑,一般 Linux 系统上不会直接使用 root 用户。因此,更好地做法是将需要使用 docker 的用户加入 docker 用户组。
建立docker组
groupadd docker
将当前用户加入docker组
usermod -aG docker $USER
退出当前终端并重新登录,进行如下测试。
测试docker是否正确安装
$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
ca4f61b1923c: Pull complete
Digest: sha256:be0cd392e45be79ffeffa6b05338b98ebb16c87b255f48e297ec7f98e123905c
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://cloud.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/engine/userguide/
若能输出以上信息,则是正确安装
创建swarm集群
3台机器见最上面,都安装docker环境。现在以172.16.0.200为manager,其余2台为worker
初始化集群
我们使用 docker swarm init
在本机初始化一个 Swarm
集群。
root@ubuntu:~# docker swarm init --advertise-addr 172.16.0.200
Swarm initialized: current node (u66elsqnr7cx3ufefopuvchbm) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
如果你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr
指定 IP。
执行 docker swarm init
命令的节点自动成为管理节点。
增加工作节点
登录201
root@api:~# ssh 172.16.0.201
root@172.16.0.201's password:
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200
加入到集群
依照上面初始化集群成功后的提示执行加入命令即可
root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
This node joined a swarm as a worker.
登录202
root@api:~# ssh 172.16.0.202
root@172.16.0.202's password:
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200
加入到集群
root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
This node joined a swarm as a worker.
查看集群
经过上边的两步,我们已经拥有了一个最小的 Swarm
集群,包含一个管理节点和两个工作节点。
在管理节点使用 docker node ls
查看集群。
root@ubuntu:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c88xqfzcbhhlg6c07oochr2g7 ubuntu Ready Active
rq5oh6hfdo32t9xbl7z7sgikm ubuntu Ready Active
u66elsqnr7cx3ufefopuvchbm * ubuntu Ready Active Leader
退出集群
root@ubuntu:~# docker swarm leave
Node left the swarm.
两台都退出后,manager上查看
root@ubuntu:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c88xqfzcbhhlg6c07oochr2g7 ubuntu Down Active
rq5oh6hfdo32t9xbl7z7sgikm ubuntu Down Active
u66elsqnr7cx3ufefopuvchbm * ubuntu Ready Active Leader
可见STATUS=Down了
疑问
上面可见有个token,那么我忘记后面怎么再加入新的worker节点或manager节点呢? 通过 docker swarm join-token worker查看,见下
root@ubuntu:~# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
加入manager也一样,换下参数
root@ubuntu:~# docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-469wlqjh1jjuupv8ha29l2rzm 172.16.0.200:2377
也可更新 token
$ docker swarm join-token --rotate worker
Succesfully rotated worker join token.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-3pu6hszjas19xyp7ghgosyx9k8atbfcr8p2is99znpy26u2lkl-b30ljddcqhef9b9v4rs7mel7t \
172.16.0.200:2377
使用–rotate更新token之后,只能用新的token来加入集群。
-q或–quiet参数只打印token:
root@ubuntu:~# docker swarm join-token -q worker
SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii
部署服务
我们使用 docker service
命令来管理 Swarm
集群中的服务,该命令只能在管理节点运行。
新建服务
管理节点执行
root@ubuntu:~# docker service create --name nginx --replicas 3 -p 80:80 nginx
tqd95pxsro7o0rs33ylz9zimj
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
现在我们使用浏览器,输入任意节点 IP ,即可看到 nginx 默认页面,如curl http://172.16.0.200
root@ubuntu:~# curl http://172.16.0.172
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
到各台机器查看,上面都已经起了docker 服务
root@worker2:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
08a92a22061b nginx:latest "nginx -g 'daemon of…" 33 seconds ago Up 33 seconds 80/tcp nginx.1.tfhtoyshmzun54x17l3l3sop5
查看服务
使用 docker service ls
来查看当前 Swarm
集群运行的服务。
root@ubuntu:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
tqd95pxsro7o nginx replicated 3/3 nginx:latest *:80->80/tcp
使用 docker service ps xxx
来查看某个服务的详情,分布在哪个node等
root@manager1:~# docker service ps nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
tfhtoyshmzun nginx.1 nginx:latest worker2 Running Running 18 minutes ago
p6b1gfragl8y nginx.2 nginx:latest manager1 Running Running 16 minutes ago
250owqr51y50 nginx.3 nginx:latest worker1 Running Running 20 minutes ago
使用 docker service logs xxx
来查看某个服务的log,前面我访问curl http://172.16.0.200 3次,log如下
root@manager1:~# docker service logs nginx
nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
上面nginx是service名称,也可访问具体服务中的某一个服务的log,logs后面按tab键,会弹出名称. 我又访问了3次,可见3台上分别有2次访问,可见其实现了负载均衡
root@manager1:~# docker service logs 250owqr51y50
nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.3.250owqr51y50@worker1 | 10.255.0.2 - - [02/Jan/2018:10:06:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
root@manager1:~# docker service logs p6b1gfragl8y
nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.2.p6b1gfragl8y@manager1 | 10.255.0.2 - - [02/Jan/2018:10:06:36 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
root@manager1:~# docker service logs tfhtoyshmzun
nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.1.tfhtoyshmzun@worker2 | 10.255.0.2 - - [02/Jan/2018:10:06:42 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
删除服务
使用 docker service rm xxx
来从swarm集群移除某个服务。
root@manager1:~# docker service rm nginx
nginx
查看,已经无nginx的服务
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
到worker节点上查看,也已经删除了docker 服务
root@worker1:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
在swarm中使用compose部署
在swarm集群中也可以使用compose文件(docker-compose.yml)来配置,启动多个服务,我们以部署WordPress为例进行说明
配置文件
manager1节点上的 docker-compose.yml
version: "3"
services:
web:
image: nginx
deploy:
replicas: 3
restart_policy:
condition: on-failure
resources:
limits:
cpus: "0.1"
memory: 50M
ports:
- "80:80"
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
- webnet
networks:
webnet:
这里:
1、起了2个services:(web 和 visualizer)
2、web是3个nginx组成的。 visualizer是一个开源项目,可用一个图来看到整个swarm上运行的容器,这里指定了只能运行在manager节点上
3、起了一个网络webnet,类型为overlay,见最下面一行。启动的容器都使用此网络互联
root@manager1:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
965de71420b3 bridge bridge local
f66cd75a8741 docker_gwbridge bridge local
136fb30aa99c host host local
73ha87ntxcd5 ingress overlay swarm
eebca5dc00a0 none null local
pjxvmr7b1wcq proj_webnet overlay swarm
部署服务
deploy 部署
-c 指定配置文件
proj 名称随便起
root@manager1:~# docker stack deploy -c docker-compose.yml proj
Creating network proj_webnet
Creating service proj_web
Creating service proj_visualizer
部署完成以后,访问http://任意节点:8080,即会看到监控界面
查看服务
列出所有stack
root@manager1:~# docker stack ls
NAME SERVICES
proj 2
一个stack,2个services(web和visualizer)
列出所有服务services
root@manager1:~# docker stack services proj
ID NAME MODE REPLICAS IMAGE PORTS
44u5poqrieoy proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
s2hlawvwwal4 proj_web replicated 3/3 nginx:latest *:80->80/tcp
列出stack中任务情况,分布情况
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 20 minutes ago
4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Running Running 21 minutes ago
skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 21 minutes ago
uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 21 minutes ago
扩容服务
比如由3份变为5份 docker service scale proj_web=5
root@manager1:~# docker service scale proj_web=5
proj_web scaled to 5
overall progress: 5 out of 5 tasks
1/5: running [==================================================>]
2/5: running [==================================================>]
3/5: running [==================================================>]
4/5: running [==================================================>]
5/5: running [==================================================>]
verify: Service converged
查看服务情况
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
uxo0piooz4it proj_web replicated 5/5 nginx:latest *:80->80/tcp
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6jqcnikrh39j proj_web.1 nginx:latest worker1 Running Running 4 minutes ago
o7gq6w3wisbk proj_web.2 nginx:latest worker2 Running Running 4 minutes ago
15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 4 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 2 minutes ago
7mvta3g7m8fm proj_web.5 nginx:latest worker2 Running Running 2 minutes ago
减配置直接设置较少的数量即可,比如再设置回3个副本
root@manager1:~# docker service scale proj_web=3
proj_web scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
uxo0piooz4it proj_web replicated 3/3 nginx:latest *:80->80/tcp
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 6 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 4 minutes ago
7mvta3g7m8fm proj_web.5 nginx:latest worker2 Running Running 4 minutes ago
移除服务
docker stack rm xxx 移除服务
root@manager1:~# docker stack rm proj
Removing service proj_visualizer
Removing service proj_web
Removing network proj_webnet
root@manager1:~# docker stack ls
NAME SERVICES
root@manager1:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
webnet的网络也删除了
root@worker2:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
feff2a6d1503 bridge bridge local
2f0e401d10a6 docker_gwbridge bridge local
7a60d8ee6f8b host host local
73ha87ntxcd5 ingress overlay swarm
98442c4c5766 none null local
-----------------------
测试负载均衡
比如我在202上 stop 容器
root@worker2:~# docker stop 505
505
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 31 minutes ago
4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 3 seconds ago
skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 32 minutes ago
uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 32 minutes ago
可看到worker2 Shutdown
本以为过会会新起个容器,可没有
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 37 minutes ago
4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 6 minutes ago
skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 38 minutes ago
uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 38 minutes ago
本以为再重新启动202上的容器会恢复,可还是没有
root@worker2:~# docker start 505
505
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 40 minutes ago
4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 8 minutes ago
skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 41 minutes ago
uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 41 minutes ago
那么删除容器试试呢,还是没有--!
root@worker2:~# docker rm -f 505
505
root@worker2:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
quexa31y9kkt proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 1 hours ago
4ozvbgpkrqtb proj_web.1 nginx:latest worker2 Shutdown Complete 1 hours ago
skkh5uzsyl9o proj_web.2 nginx:latest manager1 Running Running 1 hours ago
uns9r5vdx1x4 proj_web.3 nginx:latest worker1 Running Running 1 hours ago
虽然服务挂了,但是访问没问题 curl http://172.16.0.202,见下
root@api:~# curl http://172.16.0.174
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
几经测试,发现kill掉的容器会马上发现并重启,见下:
kill 掉202上的容器
root@worker2:~# docker kill 2ff47
2ff47
root@worker2:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
发现副本又3个变为2个
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
uxo0piooz4it proj_web replicated 2/3 nginx:latest *:80->80/tcp
但几秒钟后就新起了一个容器,3个副本就恢复了。 ps看的话还是能看到Shutdown的那个容器
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
uxo0piooz4it proj_web replicated 3/3 nginx:latest *:80->80/tcp
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 9 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 6 minutes ago
gu528xwks2h8 proj_web.5 nginx:latest worker2 Running Running 22 seconds ago
7mvta3g7m8fm \_ proj_web.5 nginx:latest worker2 Shutdown Failed 28 seconds ago "task: non-zero exit (137)"
由此可见,docker stop停掉的容器可能认为是人工的方式,人为之,docker swarm集群就不再新起,这里可能是官方的bug
或
docker service update proj_web 也会使上面docker stop方式停掉的容器重启
root@manager1:~# docker service update proj_web
proj_web
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3:
3/3: running [==================================================>]
verify: Service converged
并且删掉了原各种原因退出的容器,以下可见原 “7mvta3g7m8fm \_ proj_web.5 nginx:latest worker2 Shutdown Failed 5 minutes ago "task: non-zero exit (137)"”的容器没了
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 11 seconds ago
15hzrtciynx7 proj_web.3 nginx:latest manager1 Running Running 16 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 13 minutes ago
去202上看退出的容器也是没了
root@worker2:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
69346c5e694e nginx:latest "nginx -g 'daemon of…" 2 minutes ago Up 2 minutes 80/tcp proj_web.1.r5ievoac5x3obvgelod00o9vw
又发现个问题,当尝试用attach进入容器时,会一直hang住,手动断开后,容器挂了
root@manager1:~# docker attach proj_web.3.15hzrtciynx72plzhjula2cd5
^C
root@manager1:~#
root@manager1:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
lhxiyrevf4dp proj_visualizer replicated 1/1 dockersamples/visualizer:stable *:8080->8080/tcp
uxo0piooz4it proj_web replicated 2/3 nginx:latest *:80->80/tcp
root@manager1:~# docker stack ps proj
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 18 hours ago
ige19sbe6zma proj_visualizer.1 dockersamples/visualizer:stable manager1 Running Running 19 hours ago
15hzrtciynx7 proj_web.3 nginx:latest manager1 Shutdown Complete 18 seconds ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago
发现proj_web.3已经Shutdown了,而且集群感知不到,不会重启,就像前面的手动stop一样
当docker service update时也会hang住,且没有新起挂掉的容器
root@manager1:~# docker service update proj_web
proj_web
overall progress: 2 out of 3 tasks
1/3:
2/3: running [==================================================>]
3/3: running [==================================================>]
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 19 hours ago
15hzrtciynx7 proj_web.3 nginx:latest manager1 Shutdown Complete 12 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago
当docker service scale proj_web=3时也会hang住,只是新加的容器启动了,那个死去的容器就是连不通,可能是服务内部网络的问题
root@manager1:~# docker service scale proj_web=4
proj_web scaled to 4
overall progress: 3 out of 4 tasks
1/4: running [==================================================>]
2/4:
3/4: running [==================================================>]
4/4: running [==================================================>]
考虑是容器网络的问题,那么把有问题的容器删掉呢
root@manager1:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ecd4a8bebc9c nginx:latest "nginx -g 'daemon of…" About a minute ago Up About a minute 80/tcp proj_web.2.xwmvxfs23ijab0wessptod8d7
d58eb620e9aa nginx:latest "nginx -g 'daemon of…" 19 hours ago Exited (0) 21 minutes ago proj_web.3.15hzrtciynx72plzhjula2cd5
1879b088ca37 dockersamples/visualizer:stable "npm start" 19 hours ago Up 19 hours 8080/tcp proj_visualizer.1.ige19sbe6zma322oplynf7csp
5bc982461a41 dockersamples/visualizer:stable "npm start" 27 hours ago Exited (0) 19 hours ago proj_visualizer.1.quexa31y9kkt1m4a4pqcwskzt
root@manager1:~# docker rm d58
d58
root@manager1:~# docker service update proj_web
proj_web
overall progress: 3 out of 4 tasks
1/4:
2/4: running [==================================================>]
3/4: running [==================================================>]
4/4: running [==================================================>]
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.
##### 可见删掉也还是不行的,集群服务里还是有4个副本,只是一个一直不通
root@manager1:~# docker rm 5bc
5bc
root@manager1:~# docker service update proj_web
proj_web
overall progress: 3 out of 4 tasks
1/4:
2/4: running [==================================================>]
3/4: running [==================================================>]
4/4: running [==================================================>]
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.
root@manager1:~# docker service scale proj_web=3
proj_web scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
##### 当重置为3个副本时,就ok了。也就说明了是坏掉容器的网络问题
root@manager1:~# docker service ps proj_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
r5ievoac5x3o proj_web.1 nginx:latest worker2 Running Running 19 hours ago
xwmvxfs23ija proj_web.2 nginx:latest manager1 Running Running 4 minutes ago
zjtcv1e68aag proj_web.4 nginx:latest worker1 Running Running 19 hours ago
root@manager1:~#