记一次故障-Rancher界面突然无法访问,且K8s集群无法访问

本文记录了一次Rancher UI无法访问及容器服务受限的情况,通过分析日志发现K8s API连接问题和Ingress Controller占用443端口。作者逐步排查并解决了连接拒绝、K3s错误和端口占用问题,最终修复了Rancher与K8s集群的交互。
摘要由CSDN通过智能技术生成

公司使用docker单节点方式部署rancher,利用rancehr来操作k8s集群,有一天要访问rancher ui时,发现打不开,然后部署的所有容器也都不能使用,立马到服务器上查看情况,发现rancher容器还在,然后尝试进入容器时,报了错cannot exec in a stopped state: unknown,然后尝试查看rancher日志,发现可以查看

E0712 15:47:03.730752       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.ProjectCatalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/projectcatalogs?allowWatchBookmarks=true&resourceVersion=155367341&timeout=30m0s&timeoutSeconds=574: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730790       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Catalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/catalogs?allowWatchBookmarks=true&resourceVersion=155367339&timeout=30m0s&timeoutSeconds=404: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947639       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.KontainerDriver: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/kontainerdrivers?allowWatchBookmarks=true&resourceVersion=155367345&timeout=30m0s&timeoutSeconds=481: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730823       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Pipeline: Get https://127.0.0.1:6443/apis/project.cattle.io/v3/watch/pipelines?allowWatchBookmarks=true&resourceVersion=155367348&timeout=30m0s&timeoutSeconds=568: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730842       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v1.Namespace: Get https://127.0.0.1:6443/api/v1/watch/namespaces?allowWatchBookmarks=true&resourceVersion=155367325&timeout=30m0s&timeoutSeconds=449: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947667       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.RKEK8sSystemImage: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/rkek8ssystemimages?allowWatchBookmarks=true&resourceVersion=155367347&timeout=30m0s&timeoutSeconds=504: dial tcp 127.0.0.1:6443: connect: connection refused
2021/07/12 15:47:03 [FATAL] k3s exited with: exit status 255

通过日志发现,k3s exited with: exit status 255以及127.0.0.1:6443: connect: connection refused,因为6443是kube-apiserver所以估计应该是k8s集群的问题,然后查询了一下255这个状态,在githab上发现

文章地址:https://github.com/rancher/rancher/issues/22841

下面有一个回复(利用chrome浏览器自动翻译)
文章地址:https://github.com/rancher/rancher/issues/22841

以及另外一篇博文跟我的情况比较像
文章地址:https://forums.cnrancher.com/q_988.html

下面是一条回复

文章地址:https://forums.cnrancher.com/q_988.html

估计应该是k3s崩了,于是重启了一下对应机器,发现k3s正常运行了,但是rancher却没有启动,重启rancherdocker容器

docker resatrt rancher

发现443端口被占用

于是通过命令查找占用443端口的进程

netstat -tunlp|grep 443

发现是nginx占用了,但是这台机器并没有安装nginx,于是根据pid查看nginx所在位置

cd /proc/92922/cwd

发现有nginx配置,编辑nginx.conf发现有很多ingress-controller的配置,于是猜测这个nginxingress-controller容器的,于是查看ingress-controller的信息

docker inspect ingress_controller 

ingress_controller映射的端口

发现其确实占用了443端口,于是先停止ingress-controller,再启动rancher,再重启ingress_controller

docker stop ingress_controller
docker restart rancher
docker restart ingress_controller

问题解决

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值