文章目录
通用排查
集群整体日志收集脚本,通过以下脚本可以收集集群的运行日志,可以从日志中排查错误
curl -O https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/log-collector-script/linux/eks-log-collector.sh
sudo bash eks-log-collector.sh
一、创建ALB出错
错误一
configmaps "aws-load-balancer-controller-leader" is for bidden: User "system:serviceaccount:kube-system:aws-load-balancer-controller" cannot get resource "configmaps" in API group "" in the namespce "kube-system": RBAC: role.rbac.authorization.k8s.io "aws-load-balancer-controller-leader-election-role" not found
从错误可以看出,RBAC授权的问题。
排查思路
- 检查IAM Role有没有正确绑定到集群
- 检查集群对应SA有没有授予正确的集群权限
- 检查集群
Role
有没有正确的权限
参考文档:https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/aws-load-balancer-controller.html
错误二
"msg": "Reconciler error",..."error":"couldn't auto-discover subnets: unable to discover at least one subnet"
从错误看出,无法自动发现子网
子网没有相应的集群标签
解决方法
- 子网添加集群标签
https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.3/deploy/subnet_discovery/
yaml
文件添加子网Annotations
,alb.ingress.kubernetes.io/subnets: sub-xxx,sub-xxx
二、eksctl get nodeGroup出错
$ eksctl get nodegroup --cluster cluster-name --region ap-southeast-1
2021-12-07 16:27:54 [ℹ] eksctl version 0.75.0
2021-12-07 16:27:54 [ℹ] using region ap-southeast-1
2021-12-07 16:27:56 [!] retryable error (Throttling: Rate exceeded
status code: 400, request id: abbd7fb9-4333-485b-8145-0863d6ba1321) from cloudformation/DescribeStacks - will retry after delay of 9.81369948s
2021-12-07