total 0
lrwxrwxrwx 1 root root 40 May 6 15:54 flanneld.service.bak -> /usr/lib/systemd/system/flanneld.service
[root@10-255-20-174 docker.service.requires]# pwd
/etc/systemd/system/docker.service.requires
etcd子网配置失效导致Docker启动失败
2017年10月24日 10:18:49 Fly2leo 阅读数:3945
Docker启动失败
执行 docker status docker 查看原因,发现以下错误
Oct 14 16:39:10 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 14 16:39:10 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.
Oct 24 09:02:34 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 24 09:02:34 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.
Oct 24 09:45:01 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 24 09:45:01 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.
由于安装了flanneld,因此docker增加了对flanneld的依赖,执行systemctl status flanneld 查看原因,现出以下错误
Oct 24 10:15:51 *.*.* flanneld-start[1187]: E1024 10:15:51.327561 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110550]
Oct 24 10:15:52 *.*.* flanneld-start[1187]: E1024 10:15:52.328849 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110551]
Oct 24 10:15:53 *.*.* flanneld-start[1187]: E1024 10:15:53.329930 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110552]
Oct 24 10:15:54 *.*.* flanneld-start[1187]: E1024 10:15:54.331301 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110553]
Oct 24 10:15:55 *.*.* flanneld-start[1187]: E1024 10:15:55.332503 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110554]
原因为etcd子网配置/kube-centos/network/config 未找到,怀疑每次启动etcd后,都需要重新配置子网,创建etcd 启动脚本如下
systemctl start etcd
etcdctl mkdir /kube-centos/network
etcdctl mk /kube-centos/network/config "{ \"Network\": \"172.30.0.0/16\", \"SubnetLen\": 24, \"Backend\": { \"Type\": \"vxlan\" } }"
执行该脚本,再执行 systemctl start flanneld
[root@q Kubernetes]# systemctl status flanneld
● flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2017-10-24 10:15:56 CST; 44s ago
Process: 1334 ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker (code=exited, status=0/SUCCESS)
Main PID: 1187 (flanneld)
Memory: 6.9M
CGroup: /system.slice/flanneld.service
└─1187 /usr/bin/flanneld -etcd-endpoints=http://q.emulian.com:2379 -etcd-prefix=/kube-centos/network
Oct 24 10:15:51 *.*.* flanneld-start[1187]: E1024 10:15:51.327561 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110550]
Oct 24 10:15:52 *.*.* flanneld-start[1187]: E1024 10:15:52.328849 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110551]
Oct 24 10:15:53 *.*.* flanneld-start[1187]: E1024 10:15:53.329930 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110552]
Oct 24 10:15:54 *.*.* flanneld-start[1187]: E1024 10:15:54.331301 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110553]
Oct 24 10:15:55 *.*.* flanneld-start[1187]: E1024 10:15:55.332503 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110554]
Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.337850 1187 local_manager.go:179] Picking subnet in range 172.30.1.0 ... 172.30.255.0
Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.339319 1187 manager.go:250] Lease acquired: 172.30.56.0/24
Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.340333 1187 network.go:58] Watching for L3 misses
Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.340369 1187 network.go:66] Watching for new subnet leases
启动成功,再执行 systemctl start docker
[root@q Kubernetes]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf
Active: active (running) since Tue 2017-10-24 10:17:02 CST; 10min ago
Docs: http://docs.docker.com
Main PID: 1684 (dockerd-current)
Memory: 48.6M
CGroup: /system.slice/docker.service
├─1684 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/dock...
└─1720 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/lib...
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.969192305+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.970172523+08:00" level=warning msg="mountpoint for pids not found"
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.970874674+08:00" level=info msg="Loading containers: start."
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.986058242+08:00" level=info msg="Firewalld running: false"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.078754482+08:00" level=info msg="Loading containers: done."
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.079203957+08:00" level=info msg="Daemon has completed initialization"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.079223135+08:00" level=info msg="Docker daemon" commit="88a4867/1.12.6" graphdriver=devicemapper version=1.12.6
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.089322355+08:00" level=info msg="API listen on [::]:2375"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.089334645+08:00" level=info msg="API listen on /var/run/docker.sock"
Oct 24 10:17:02 *.*.* systemd[1]: Started Docker Application Container Engine
Docker启动成功
补充flanneld启动错误,因防火墙端口问题导致启动失败,错误信息如下
[root@q kubernetes]# systemctl status flanneld.service
● flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2017-11-16 13:32:16 CST; 4min 51s ago
Process: 25604 ExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 25604 (code=exited, status=0/SUCCESS)
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.616564 25604 main.go:132] Installing signal handlers
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.616932 25604 manager.go:136] Determining IP address of default interface
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.617847 25604 manager.go:149] Using interface with name enp4s0f1 and address 192.168.1.101
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.617894 25604 manager.go:166] Defaulting external address to interface address (192.168.1.101)
Nov 16 13:32:10 *.*.* flanneld-start[25604]: E1116 13:32:10.618518 25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:12 *.*.* flanneld-start[25604]: E1116 13:32:12.619119 25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:14 *.*.* flanneld-start[25604]: E1116 13:32:14.619756 25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:15 *.*.* flanneld-start[25604]: E1116 13:32:15.622421 25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: getsockopt: no route to host
Nov 16 13:32:16 *.*.* flanneld-start[25604]: I1116 13:32:16.446052 25604 main.go:172] Exiting...
Nov 16 13:32:16 *.*.* systemd[1]: Stopped Flanneld overlay address etcd agent.
可以看到主要错误信息为
failed to retrieve network config: client: etcd cluster is unavailable or misconfigured;
经过排查,发现firewalld的没有开放2379端口,因此,flanneld无法访问etcd服务,导致启动失败
etcd配置文件位于 /etc/etcd/etcd.conf
flanneld配置文件位于 /etc/sysconfig/flanneld.conf