公司使用docker
单节点方式部署rancher
,利用rancehr
来操作k8s集群
,有一天要访问rancher ui
时,发现打不开,然后部署的所有容器也都不能使用,立马到服务器上查看情况,发现rancher
容器还在,然后尝试进入容器时,报了错cannot exec in a stopped state: unknown
,然后尝试查看rancher
日志,发现可以查看
E0712 15:47:03.730752 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.ProjectCatalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/projectcatalogs?allowWatchBookmarks=true&resourceVersion=155367341&timeout=30m0s&timeoutSeconds=574: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730790 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Catalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/catalogs?allowWatchBookmarks=true&resourceVersion=155367339&timeout=30m0s&timeoutSeconds=404: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947639 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.KontainerDriver: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/kontainerdrivers?allowWatchBookmarks=true&resourceVersion=155367345&timeout=30m0s&timeoutSeconds=481: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730823 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Pipeline: Get https://127.0.0.1:6443/apis/project.cattle.io/v3/watch/pipelines?allowWatchBookmarks=true&resourceVersion=155367348&timeout=30m0s&timeoutSeconds=568: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730842 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v1.Namespace: Get https://127.0.0.1:6443/api/v1/watch/namespaces?allowWatchBookmarks=true&resourceVersion=155367325&timeout=30m0s&timeoutSeconds=449: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947667 6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.RKEK8sSystemImage: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/rkek8ssystemimages?allowWatchBookmarks=true&resourceVersion=155367347&timeout=30m0s&timeoutSeconds=504: dial tcp 127.0.0.1:6443: connect: connection refused
2021/07/12 15:47:03 [FATAL] k3s exited with: exit status 255
通过日志发现,k3s exited with: exit status 255
以及127.0.0.1:6443: connect: connection refused
,因为6443是kube-apiserver
所以估计应该是k8s
集群的问题,然后查询了一下255这个状态,在githab
上发现
下面有一个回复(利用chrome浏览器自动翻译)
以及另外一篇博文跟我的情况比较像
下面是一条回复
估计应该是k3s
崩了,于是重启了一下对应机器,发现k3s
正常运行了,但是rancher
却没有启动,重启rancher
的docker
容器
docker resatrt rancher
发现443端口被占用
于是通过命令查找占用443端口的进程
netstat -tunlp|grep 443
发现是nginx
占用了,但是这台机器并没有安装nginx
,于是根据pid
查看nginx
所在位置
cd /proc/92922/cwd
发现有nginx
配置,编辑nginx.conf
发现有很多ingress-controller
的配置,于是猜测这个nginx
是ingress-controller
容器的,于是查看ingress-controller
的信息
docker inspect ingress_controller
发现其确实占用了443端口,于是先停止ingress-controller
,再启动rancher
,再重启ingress_controller
docker stop ingress_controller
docker restart rancher
docker restart ingress_controller
问题解决