aws eks_生成自动缩放的eks集群

最新推荐文章于 2024-06-19 14:23:16 发布

weixin_26705651

最新推荐文章于 2024-06-19 14:23:16 发布

阅读量1.1k

点赞数

文章标签： python java

原文链接：https://medium.com/faun/spawning-an-autoscaling-eks-cluster-52977aa8b467

版权

aws eks

Kubernetes (KUBERNETES)

Options for creating an EKS clusters are many, amongst others:

创建EKS集群的选项很多，其中包括：

CloudFormation from AWS
来自AWS的CloudFormation
CDK also from AWS
CDK也来自AWS
Terraform from HashiCorp
Terraform从HashiCorp
Eksctl from Weavework
Wekswork的Eksctl

Of course, these solutions are giving you quite a bare cluster and the challenge is then to add all the tools to be production-ready.

当然，这些解决方案为您提供了一个简单的集群，而接下来的挑战就是添加所有工具以使其可以投入生产。

One desirable feature is the ability for the cluster to autoscale depending on the workload. More precisely, when additional load is applied, we are looking to horizontally scale our cluster by increasing the number of nodes.

一种理想的功能是群集能够根据工作负载自动扩展的能力。更准确地说，当施加了额外的负载时，我们希望通过增加节点数量来水平扩展集群。

In this post, we will set up a new cluster from scratch using CDK and take a look at the Cluster Autoscaler component to fulfill this requirement.

在这篇文章中，我们将使用CDK从头开始建立一个新集群，并看一下满足此要求的Cluster Autoscaler组件。

AWS️AWS设置 (⚙️ AWS Setup)

First, you will need an AWS account. You can create a new account on the sign-in page or reuse an existing account. Be aware that:

首先，您将需要一个AWS账户。您可以在登录页面上创建新帐户，也可以重复使用现有帐户。意识到：

EKS is not included in the AWS free tier, meaning you will be billed for the managed Kubernetes control plane.
AWS免费套餐中不包含EKS，这意味着您将需要为托管Kubernetes控制平面付费。
You can use t3.micro instances for free with limitations
您可以免费使用t3.micro实例，但有限制

选择节点实例类型 (Choosing the node instance type)

For an EKS setup, you certainly don’t want to use these free t3.micro instances:

对于EKS设置，您当然不希望使用这些免费的t3.micro实例：

t2/t3 instances are capped in term of CPU use
t2 / t3实例限制了CPU使用量
the default AWS CNI plugin assigns one IP from the subnet for each pod, meaning the number of pods running on a node is limited by the networking capacity of the node.
默认的AWS CNI插件会为每个Pod从子网分配一个IP，这意味着在节点上运行的Pod数受节点的网络容量限制。

For example, a t3.micro instance supports up to 4 pods (2 ENI * (2 IP — 1) + 2) which is simply too limited. You can find the number of supported pods by instance type here.

例如，一个t3.micro实例最多支持4个容器( 2 ENI * (2 IP — 1) + 2 IP_1 2 ENI * (2 IP — 1) + 2 )，这实在太有限了。您可以在此处按实例类型找到支持的容器的数量。

To keep it simple and cheap, we will use t3.medium instances which supports up to 17 pods by node. They are not free but quite inexpensive.

为了保持简单和便宜， 我们将使用t3.medium实例 ，该实例每个节点最多支持17个容器。它们不是免费的，但很便宜。

By the way, if you want to bypass the pod networking limitation, you need to opt-out of the AWS CNI plugin and use another plugin, for example, Calico.

顺便说一句，如果您想绕过Pod网络限制，则需要选择退出AWS CNI插件并使用另一个插件，例如Calico 。

创建一个管理员用户 (Create an admin user)

Operating infrastructure with CDK requires extensive rights. To keep it simple, create a new user and grant it the managed AdministratorAccess policy. In real life, you may want to create a custom IAM policy tailored to your needs.

CDK的运营基础架构需要广泛的权利。为简单起见，请创建一个新用户并为其授予托管的AdministratorAccess策略。在现实生活中，您可能需要创建适合您需要的自定义IAM策略。

安装CDK (Install CDK)

Follow through the official documentation for getting started with CDK.

请遵循官方文档，以开始使用CDK。

📝写入CDK堆栈 (📝 Write the CDK stack)

Let’s begin by creating a dedicated VPC and a basic cluster.

首先创建一个专用的VPC和一个基本集群。

const vpc = new ec2.Vpc(this, "eks-vpc");
const cluster = new eks.Cluster(this, "Cluster", {
  vpc: vpc,
  defaultCapacity: 0, // we want to manage capacity ourselves
  version: eks.KubernetesVersion.V1_17,
});

There is still no compute resources allocated. Let’s add a default managed node group for this cluster using between 1 and 3 instances:

仍然没有分配计算资源。让我们使用1到3个实例为此集群添加一个默认的受管节点组：

const ng = cluster.addNodegroup("nodegroup", {
  instanceType: new ec2.InstanceType("t3.medium"),
  minSize: 1,
  maxSize: 3,
});

Managed node groups are automatically kept up to date by AWS and are created in an Autoscaling Group.

受管节点组由AWS自动保持最新状态，并在自动扩展组中创建。

🎬运行堆栈 (🎬 Run the stack)

First, you will need to bootstrap the CDK tool, creating a bucket with CloudFormation manifests, then run the stack. Take a ☕️ as it will take more than 10 minutes.

首先，您将需要引导CDK工具，使用CloudFormation清单创建存储桶，然后运行堆栈。进行☕️，因为这将花费超过10分钟的时间。

> cdk bootstrap
> cdk deploy

🔑获取您的Kubernetes凭据 (🔑 Get your Kubernetes credentials)

You should get back some outputs with the AWS CLI command to generate your kubectl configuration:

您应该使用AWS CLI命令获取一些输出以生成您的kubectl配置：

CdkStack.testclusterClusterConfigCommand1851F735 = aws eks update-kubeconfig --name testclusterCluster00507BD3-639846f8ec5241a69f54eabd38c730a0 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkStack-testclusterClusterMastersRoleAAD0ED84-DR14A5TYS195

Note that a master role has been created for you. Run this command to generate your kubectl configuration.

请注意，已经为您创建了主角色。运行此命令以生成您的kubectl配置。

Then validate you can access your new cluster with a simple command:

然后验证您可以使用简单的命令访问新集群：

> kubectl get noNAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-165-141.ec2.internal   Ready    <none>   48m   v1.17.9-eks-4c6976

We have now a working cluster with one node like expected.

现在，我们有了一个工作群集，其中一个节点与预期的一样。

scale️自动缩放集群 (⚖️ Autoscale the cluster)

Our nodes are part of an Autoscaling Group which allows us to easily change the number of running nodes from 1 to 3.

我们的节点是Autoscaling组的一部分，它使我们可以轻松地将正在运行的节点数从1更改为3。

Of course, we want to automate this scaling depending on the load. One classical way of doing this is to add some automatic scaling rules based on CPU, RAM, or other custom metrics.

当然，我们要根据负载自动执行此缩放。一种经典的实现方式是基于CPU，RAM或其他自定义指标添加一些自动缩放规则。

But let’s consider a basic use case:

但是，让我们考虑一个基本的用例：

You want to run a pod with high memory or CPU requirements
您要运行内存或CPU需求较高的Pod
Your current nodes are underutilized so there’s no memory or CPU pressure to trigger your Autoscaling Group rules
您当前的节点未充分利用，因此没有内存或CPU压力来触发您的Autoscaling Group规则
You submit the pod manifest
您提交吊舱清单
The scheduler cannot find a node with enough resources to accommodate the pod even if there is plenty of capacity in global
即使全局中有足够的容量，调度程序也无法找到具有足够资源来容纳Pod的节点
You pod is left unscheduled!
您的广告连播未按计划进行！

To solve such a use case we need a more Kubernetes specific approach to autoscaling that can detect unscheduled pods. Let’s enter the Cluster Autoscaler.

为了解决这样一个用例，我们需要一种更特定于Kubernetes的自动缩放方法，该方法可以检测未计划的Pod 。让我们输入Cluster Autoscaler 。

The Cluster Autoscaler is capable of detecting unscheduled pods due to resource constraints and increase the node count accordingly. On the opposite, when nodes are underutilized, pods are rescheduled on other nodes, and the node count is decreased.

由于资源限制，Cluster Autoscaler能够检测未计划的Pod，并相应地增加节点数。相反，当节点未充分利用时，将在其他节点上重新安排Pod的时间，并减少节点数。

🛠安装集群自动缩放器 (🛠 Installing the Cluster Autoscaler)

Because we want the Cluster Autoscaler to be a fundamental part of our cluster, we can add it to our CDK stack.

因为我们希望集群自动缩放器成为集群的基本组成部分，所以我们可以将其添加到CDK堆栈中。

Let’s add some code. The main steps are:

让我们添加一些代码。主要步骤是：

To create a policy allowing the Cluster Autoscaler to do his job
创建允许集群自动缩放器执行其工作的策略
Attach this policy to the managed node group role
将此策略附加到受管节点组角色
Tag the nodes properly to allow the autoscaler to auto-discover them
正确标记节点，以允许自动缩放器自动发现它们
Install the Cluster Autoscaler manifest
安装集群自动缩放器清单

Another option would have been to use Helm to install the autoscaler manifests but it’s simpler this way as everything is in the same place.

另一个选择是使用Helm安装自动缩放程序清单，但是这种方式更简单，因为所有内容都在同一位置。

enableAutoscaling(cluster: eks.Cluster, ng: eks.Nodegroup, version: string = "v1.17.3") {
    const autoscalerStmt = new iam.PolicyStatement();
    autoscalerStmt.addResources("*");
    autoscalerStmt.addActions(
      "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances",
      "autoscaling:DescribeLaunchConfigurations",
      "autoscaling:DescribeTags",
      "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup",
      "ec2:DescribeLaunchTemplateVersions"
    );
    const autoscalerPolicy = new iam.Policy(this, "cluster-autoscaler-policy", {
      policyName: "ClusterAutoscalerPolicy",
      statements: [autoscalerStmt],
    });
    autoscalerPolicy.attachToRole(ng.role);


    const clusterName = new CfnJson(this, "clusterName", {
      value: cluster.clusterName,
    });
    cdk.Tag.add(ng, `k8s.io/cluster-autoscaler/${clusterName}`, "owned", { applyToLaunchedInstances: true });
    cdk.Tag.add(ng, "k8s.io/cluster-autoscaler/enabled", "true", { applyToLaunchedInstances: true });


    new eks.KubernetesManifest(this, "cluster-autoscaler", {
      cluster,
      manifest: [  {
          apiVersion: "v1",
          kind: "ServiceAccount", 
          ... }], // full code is available here: https://gist.github.com/esys/bb7bbeb44565f85f48b3112a8d73a092
    });
}

Note: we use the CfnJson object to wrap the clusterName as a CDK token (a value that is still not resolved) cannot be used as a key in tags. By doing this, the tag operation is delayed until the clusterName is resolved.

注意：我们使用CfnJson对象包装clusterName因为CDK令牌(仍无法解析的值)不能用作标记中的键。这样，将延迟标记操作，直到clusterName被解析为止。

Update the CDK stack again with cdk deploy and check everything is installed correctly:

使用cdk deploy再次更新CDK堆栈，并检查是否已正确安装所有内容：

> kubectl -n kube-system get all -l app=cluster-autoscalerNAME                                      READY   STATUS    RESTARTS   AGE
pod/cluster-autoscaler-57dfd566f9-27j29   1/1     Running   0          70mNAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-autoscaler   1/1     1            1           103mNAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-autoscaler-57dfd566f9   1         1         1       103m

Testing测试自动缩放功能 (🏋🏼‍♂️ Testing the autoscaling capability)

First, create a Deployment manifest for the Nginx image. Note that we require from Kubernetes to guarantee a comfortable amounts of CPU and RAM (see the limit/requests section) for each pod.

首先，为Nginx映像创建一个部署清单。请注意，我们从Kubernetes要求保证每个Pod都具有舒适的CPU和RAM量(请参阅限制/请求部分)。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-scale
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-scale
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi

And apply it on your cluster, you should have exactly one replica of Nginx running:

并将其应用于您的集群，您应该正好运行一个Nginx副本：

> kubectl apply -f nginx-deploy.yaml
> kubectl get po -l app=nginxNAME                           READY   STATUS    RESTARTS   AGE
nginx-scale-69644568d9-t995l   1/1     Running   0          68s

Let’s scale up the number of replicas and watch what happens:

让我们扩大副本数量并观察会发生什么：

> kubectl scale deploy nginx-scale --replicas=5
> kubectl get po -l app=nginx -wNAME                           READY   STATUS    RESTARTS   AGE
nginx-scale-69644568d9-ng7nj   1/1     Running   0          27s
nginx-scale-69644568d9-pv62h   0/1     Pending   0          27s
nginx-scale-69644568d9-sng2v   0/1     Pending   0          27s
nginx-scale-69644568d9-t995l   1/1     Running   0          6m42s
nginx-scale-69644568d9-xfwkf   1/1     Running   0          27s

Of the 5 requested replicas, 2 stay pending. If you describe the pending pods, you should see there is not enough CPU:

在5个请求的副本中，有2个处于待处理状态。如果描述挂起的Pod，应该会看到CPU不足 ：

> kubectl describe po nginx-scale-69644568d9-pv62h...
0/1 nodes are available: 1 Insufficient cpu.

If you look in the Cluster Autoscaler logs, you can observe that scaling up to 2 instances should be in progress:

如果查看“群集自动缩放器”日志，则可以观察到最多可以扩展两个实例 ：

> kubectl -n kube-system logs deployment/cluster-autoscaler | grep Scale-upI0904 13:43:55.584100       1 scale_up.go:700] Scale-up: setting group eks-4eba2c80-0b01-9821-207d-572d47bedd1a size to 2
...

Pending pods should soon be running and the nodes count is appropriately scaled to 2 🎉

待处理的Pod应该很快就可以运行，并且节点数适当地缩放到2🎉

> kubectl get noNAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-165-141.ec2.internal   Ready    <none>   107m    v1.17.9-eks-4c6976
ip-10-0-228-170.ec2.internal   Ready    <none>   7m19s   v1.17.9-eks-4c6976

The reverse should also be true: if you scale down the number of replicas, your cluster should shrink after a few minutes.

反之亦然：如果缩小副本数，则群集将在几分钟后收缩。

🧹清理 (🧹 Cleaning up)

Simply destroy the CDK stack by running the cdk destroy command. Note that CDK bootstrapping resources have to be cleaned by hand!

只需运行cdk destroy命令即可销毁CDK堆栈。请注意，必须手动清理CDK自举资源！

Thank you for reading! 🙇‍♂️ You can find complete CDK code example here.

感谢您的阅读！您可以在这里找到完整的CDK代码示例。

Subscribe to FAUN topics and get your weekly curated email of the must-read tech stories, news, and tutorials 🗞️

订阅 FAUN主题， 并每周收到必须阅读的技术故事，新闻和教程的精选电子邮件🗞️

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬

在 Twitter上 关注我们 🐦 和 Facebook 👥 和 Instagram 📷 并加入我们的 Facebook 和 Linkedin 组 💬

如果此帖子有帮助，请单击下面的拍手👏按钮几次，以表示对作者的支持！ ⬇ (If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇)

翻译自: https://medium.com/faun/spawning-an-autoscaling-eks-cluster-52977aa8b467

aws eks

weixin_26705651

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
aws eks_生成自动缩放的eks集群

aws eks Kubernetes (KUBERNETES)Options for creating an EKS clusters are many, amongst others: 创建EKS集群的选项很多，其中包括： CloudFormation from AWS 来自AWS的CloudFormation CDK also from AWS CDK也来自AWS Terraform fr...
复制链接

扫一扫