参考文档:
https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/update-cluster.html
一、查看当前eks版本
kubectl version --short
二、升级eksctl (使用sudo-user,需要升级到0.84以上)
curl --silent --location https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmpsudo mv /tmp/eksctl /usr/local/bineksctl version
三、停止业务服务
cd /home/srm-user/eksctlcat > scale_deployment_0.sh <<EOF#/bin/bashdeployments=(fr-data-syncfr-interfacesrm-adminsrm-basic-interfacesrm-basic-platformsrm-cuxsrm-filesrm-front-coresrm-gatewaysrm-iamsrm-importsrm-interfacesrm-mdmsrm-messagesrm-monitorsrm-oauthsrm-platformsrm-purchase-cooperationsrm-purchase-cooperation-jobsrm-reportsrm-sadasrm-sagasrm-schedulersrm-script-containersrm-settle-accountsrm-sourcesrm-suppliersrm-swaggersrm-workflow)for deployment in ${deployments[@]}; do kubectl -n gsp-vrf scale deployment $deployment --replicas=0 echo "$deployment 副本数调整为0"doneEOFsh scale_deployment_0.sh
四、升级集群(需要用srm-user用户执行)耗时半小时左右 (参考文档中的第四步)
eksctl upgrade cluster --name vrf01-tky-eks-gsp-fr --approve
五、升级节点组(每个集群两个节点组,需要20分钟左右)(参考文档中的第五步)
因生产环境权限控制严格,需要在chat上联系Nakamura-san,将节点组名称发给他,由他在页面上进行操作
#查看节点组aws eks list-nodegroups --cluster-name prd01-tky-eks-gsp-fr#更新节点组eksctl upgrade nodegroup --name=prd01-a-tky-eks-nodegroup-gsp-fr --cluster=prd01-tky-eks-gsp-freksctl upgrade nodegroup --name=prd02-c-tky-eks-nodegroup-gsp-fr --cluster=prd01-tky-eks-gsp-fr
六、eks新节点添加标签
查看节点标签
kubectl get node --show-labels
eks新节点没有标签,导致pod无法调度到新节点,需要给新节点添加标签,将prd01-a-tky-eks-nodegroup-gsp-fr节点组其中一个节点设置成env=operation,其它节点设置成env=prd
kubectl label nodes ip-10-183-161-91.ap-northeast-1.compute.internal env=operationkubectl label nodes ip-10-183-160-48.ap-northeast-1.compute.internal env=prd
七、升级附加组件(VPC CNI、CoreDNS 和 kube-proxy)(参考文档中的第八步)
VPC CNI版本
CoreDNS版本
kube-proxy版本
1.18–>1.19
VPC CNI不需要升级
CoreDNS
kubectl set image --namespace kube-system deployment.apps/coredns \ coredns=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1
kube-proxy
kubectl set image daemonset.apps/kube-proxy \ -n kube-system \ kube-proxy=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.19.6-eksbuild.2
1.19–>1.20
VPC CNI不需要升级
CoreDNS
$ kubectl edit clusterrole system:coredns -n kube-system#添加配置...- apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch...#更新镜像$ kubectl set image --namespace kube-system deployment.apps/coredns \ coredns=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/coredns:v1.8.3-eksbuild.1
kube-proxy
kubectl set image daemonset.apps/kube-proxy \ -n kube-system \ kube-proxy=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.20.4-eksbuild.2
1.20–>1.21
VPC CNI
curl -o aws-k8s-cni.yaml https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.10/config/master/aws-k8s-cni.yamlsed -i.bak -e 's/us-west-2/ap-northeast-1/' aws-k8s-cni.yamlsed -i.bak -e 's/v1.10.2/v1.10.2-eksbuild.1/' aws-k8s-cni.yamlkubectl apply -f aws-k8s-cni.yaml
CoreDNS
curl -o aws-k8s-cni.yaml https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.10/config/master/aws-k8s-cni.yamlsed -i.bak -e 's/us-west-2/ap-northeast-1/' aws-k8s-cni.yamlsed -i.bak -e 's/v1.10.2/v1.10.2-eksbuild.1/' aws-k8s-cni.yamlkubectl apply -f aws-k8s-cni.yaml
CoreDNS
kubectl set image --namespace kube-system deployment.apps/coredns \ coredns=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/coredns:v1.8.4- eksbuild.1
kube-proxy
kubectl set image daemonset.apps/kube-proxy \ -n kube-system \ kube-proxy=602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy:v1.21.2-eksbuild.2
八、升级kubectl (使用sudo-user)
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/kubectlchmod a+x kubectlsudo mv kubectl /usr/local/bin/kubectl version --short
九、启动服务
查看服务运行状态
kubectl get pod -A
启动业务服务
kubectl scale deployment -n gsp-prd srm-admin --replicas=2kubectl scale deployment -n gsp-prd srm-saga --replicas=2sleep 360kubectl scale deployment -n gsp-prd fr-data-sync --replicas=4kubectl scale deployment -n gsp-prd fr-interface --replicas=4kubectl scale deployment -n gsp-prd srm-basic-interface --replicas=4kubectl scale deployment -n gsp-prd srm-basic-platform --replicas=3kubectl scale deployment -n gsp-prd srm-cux --replicas=4kubectl scale deployment -n gsp-prd srm-file --replicas=2kubectl scale deployment -n gsp-prd srm-front-core --replicas=1kubectl scale deployment -n gsp-prd srm-gateway --replicas=2kubectl scale deployment -n gsp-prd srm-iam --replicas=2kubectl scale deployment -n gsp-prd srm-import --replicas=1kubectl scale deployment -n gsp-prd srm-interface --replicas=2kubectl scale deployment -n gsp-prd srm-mdm --replicas=4kubectl scale deployment -n gsp-prd srm-message --replicas=2kubectl scale deployment -n gsp-prd srm-monitor --replicas=2kubectl scale deployment -n gsp-prd srm-oauth --replicas=4kubectl scale deployment -n gsp-prd srm-platform --replicas=4kubectl scale deployment -n gsp-prd srm-purchase-cooperation-job --replicas=1kubectl scale deployment -n gsp-prd srm-purchase-cooperation --replicas=6kubectl scale deployment -n gsp-prd srm-report --replicas=2kubectl scale deployment -n gsp-prd srm-sada --replicas=2kubectl scale deployment -n gsp-prd srm-scheduler --replicas=2kubectl scale deployment -n gsp-prd srm-script-container --replicas=2kubectl scale deployment -n gsp-prd srm-settle-account --replicas=4kubectl scale deployment -n gsp-prd srm-source --replicas=4kubectl scale deployment -n gsp-prd srm-supplier --replicas=2kubectl scale deployment -n gsp-prd srm-swagger --replicas=2kubectl scale deployment -n gsp-prd srm-workflow --replicas=2
十、jenkins工具安装
jenkins/jenkins:2.289.3-lts
安装工具
apt-get install mavenapt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common vimadd-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/debian $(lsb_release -cs) stable"curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/debian/gpg | apt-key add -apt-get updateapt-get install docker-ce docker-ce-cli containerd.ioservice docker startcurl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"chmod +x ./kubectlmv ./kubectl /usr/local/bin/kubectlcurl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.listapt-get update && apt-get install yarncurl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"unzip awscliv2.zip./aws/installaws configure输入区域:ap-northeast-1apt install pythonwget https://bootstrap.pypa.io/pip/2.7/get-pip.pypython get-pip.pypip install coscmdcoscmd config -a xxx -s xxx -b fr-prd-shanghai-gsp-staticcontent-1302901213 -r ap-shanghaivim /etc/profile#添加npm环境变量export PATH=$PATH:/var/jenkins_home/node/nodejs/bin
十一、创建eks备用集群
使用terrafrom创建eks集群
git clone https://github.com/fastretailing/srm-infrastructure.gitcd srm-infrastructure/aws/cp prd.tfvars prd03.tfvars
修改prd03.tfvars中集群名、集群版本、节点组名
region=""access_key=""secret_key=""#---------------------------------------------eksctl----------------------------------------------eksctl_version="0.84.0"eks_name="prd03-tky-eks-gsp-fr"eks_version="1.21"eksctl_api_version="eksctl.io/v1alpha5"eksctl_cluster_revision="1"// It is used for the internal service network segment of eks clusterserviceIPv4CIDR="192.168.0.0/22"nodeGroup_0_name="prd03-a-tky-eks-nodegroup-gsp-fr"kubeconfig_path="~/.kube/config"nodeGroup_0_labels={ Name = "prd03-a-tky-eks-nodegroup-gsp-fr" Brand = "fr" Country = "JP" Domain = "G1Production" Env = "Production" Role = "EKS NodeGroup" Segment = "Private" SystemID = "GSP"}nodeGroup_0_az="ap-northeast-1a"nodeGroup_0_instanceType="m5a.4xlarge"nodeGroup_0_desiredCapacity="2"// 200 GinodeGroup_0_volumeSize="200"nodeGroup_1_name="prd03-c-tky-eks-nodegroup-gsp-fr"nodeGroup_1_labels={ Name = "prd03-c-tky-eks-nodegroup-gsp-fr" Brand = "fr" Country = "JP" Domain = "G1Production" Env = "Production" Role = "EKS NodeGroup" Segment = "Private" SystemID = "GSP"}nodeGroup_1_az="ap-northeast-1c"nodeGroup_1_instanceType="m5a.4xlarge"nodeGroup_1_desiredCapacity="2"nodeGroup_1_volumeSize="200"eks_node_key_name="fr-admin"iams = [ { iamarn = "arn:aws:iam::599453524280:role/prd01-tky-gsp-fr-operation" username = "prd01-tky-gsp-fr-operation" groups = [ "system:masters" ] }]fluent_bit_policy_name="prd01-tky-fluent-bit-policy-gsp-fr"nodeGroup_0_maxSize=2nodeGroup_0_minSize=2nodeGroup_1_maxSize=2nodeGroup_1_minSize=2# VPC//----------------------vpc_id=""public_subnet_0_id=""public_subnet_1_id=""private_subnet_0_id=""private_subnet_1_id=""secure_subnet_0_id=""secure_subnet_1_id=""
联系NAKAMURA执行terraform创建集群
terraform apply -target=module.eksctl -var-file=prd03.tfvars