通过抢占式实例节省多达50的kubernetes成本

随着云技术的发展,Kubernetes在容器管理中占据重要位置。云提供商如AWS和GCP提供抢占式实例,以更低价格运行,最多可节省80%的计算引擎成本。这些实例在运行24小时后自动结束,适合运行无状态工作负载,为云提供商和用户带来双赢局面。
摘要由CSDN通过智能技术生成

Containers have come a long way, and Kubernetes isn’t just changing the technology landscape — but also the organisational mindset. With more and more companies moving towards cloud-native technologies, the demand for containers and Kubernetes is ever-increasing.

容器已经走了很长一段路,Kubernetes不仅改变了技术格局,而且改变了组织的思维方式。 随着越来越多的公司转向云原生技术,对容器和Kubernetes的需求不断增长。

Kubernetes runs on servers, and servers can either be physical or virtual. With cloud taking a prominent role in the current IT landscape, it’s become much easier to implement near-infinite scaling and to cost optimise your workloads.

Kubernetes在服务器上运行,服务器可以是物理的也可以是虚拟的。 随着云在当前IT领域中扮演着重要角色,实现几乎无限的扩展和成本优化工作负载变得更加容易。

Gone are the days when servers were bought in advance, provisioned in racks, and maintained manually. With the cloud, you can spin up and spin down a virtual machine in minutes and pay only for the infrastructure you provision. A great power, indeed!

预先购买服务器,在机架中置备并手动维护服务器的日子已经一去不复返了。 借助云,您可以在数分钟内启动和关闭虚拟机,而只需为配置的基础架构付费。 的确是强大的力量!

Most cloud providers, such as AWS and GCP, provide spot or preemptible instances at a much cheaper rate than on-demand instances. The only condition is they can terminate the instance to claim back resources whenever they like. In GCP, preemptible instances are removed automatically after 24 hours of being provisioned.

大多数云提供商(例如AWS和GCP)以比按需实例便宜得多的价格提供现货或可抢占实例。 唯一的条件是他们可以随时终止实例以索回资源。 在GCP中,抢占式实例在配置24小时后会自动删除。

According to Google, preemptible instances can reduce your compute-engine costs by up to 80%.

根据Google的说法,抢占式实例可以将您的计算引擎成本降低多达80%。

That is a win-win situation for both cloud providers and users. While cloud providers benefit by making the best possible use of idle resources during off-peak hours, users can benefit from the low price it offers and run stateless workloads with ease.

对于云提供商和用户而言,这是双赢的局面。 尽管云提供商可以通过在非高峰时段最大程度地利用空闲资源而受益,但用户可以从其低廉的价格中受益,并轻松运行无状态工作负载。

在哪里使用抢占式/竞价型实例 (Where to Use Preemptible/Spot Instances)

Are preemptible instances suitable for all kinds of workloads? No! They have a tiny niche of use cases, but they’re typically used in machine learning applications or for running stateless workloads.

可抢占的实例是否适合所有类型的工作负载? 没有! 它们的用例很小,但是通常用于机器学习应用程序或运行无状态工作负载。

Whenever you design anything to run on a preemptible instance, you need to make sure your application is fault-tolerant and can recover from the same point it left processing.

每当您设计要在可抢占实例上运行的任何内容时,都需要确保您的应用程序是容错的,并且可以从它离开处理的同一点恢复。

Preemptible instances are unsuitable for stateful applications, such as databases, where data durability is of utmost importance. But if you need to run applications such as load balancers, a Hadoop cluster, or anything that’s unaffected by state changes, then preemptible instances can be a good option.

可抢占的实例不适用于有状态的应用程序,例如数据库,其中数据的持久性至关重要。 但是,如果您需要运行负载平衡器,Hadoop集群或不受状态更改影响的任何应用程序,那么抢占式实例可能是一个不错的选择。

GKE上的可抢占实例 (Preemptible Instances on GKE)

The Google Kubernetes Engine (GKE) is a managed Kubernetes service offered by Google Cloud. It’s one of the most robust and feature-rich Kubernetes clusters available, and one of the features it offers is allowing you to provision a preemptible Kubernetes node pool. That means you can take advantage of the cost savings in your container workloads as well.

Google Kubernetes引擎(GKE)是Google Cloud提供的托管Kubernetes服务。 它是可用的最健壮且功能最丰富的Kubernetes集群之一,它提供的功能之一是允许您预配可抢占的Kubernetes节点池。 这意味着您还可以利用节省的容器工作量成本。

If you’re running microservices that don’t store the state, it’s a gold mine for you. If properly architected, you can save a lot of money by combining on-demand and preemptible instances in separate node pools of the Kubernetes cluster and take advantage of the cluster-autoscaling feature GKE provides.

如果您正在运行不存储状态的微服务,那么这对您来说就是金矿。 如果架构得当,则可以通过在Kubernetes集群的单独节点池中组合按需实例和可抢占实例来节省大量资金,并利用GKE提供的集群自动扩展功能。

在GKE上运行负载均衡器 (Running Load Balancers on GKE)

Load balancers are expensive resources, and most organisations tend to use Ingress controllers to manage traffic within the Kubernetes cluster. For organisations using a service mesh like Istio, the Istio Ingress controller offers similar functionality.

负载平衡器是昂贵的资源,大多数组织倾向于使用Ingress控制器来管理Kubernetes集群中的流量。 对于使用诸如Istio之类的服务网格的组织,Istio Ingress控制器提供了类似的功能。

These resources are stateless and can be good candidates for running on preemptible worker nodes. We’ll use a hybrid strategy here, where we’ll allow the workloads to spread between preemptible and ordinary nodes. That’s because if the preemptible nodes are taken back, we want to ensure customers don’t notice a service degradation.

这些资源是无状态的,可以很好地在可抢占的工作程序节点上运行。 我们将在此处使用混合策略,在该策略中,我们将允许工作负载在可抢占节点和普通节点之间分散。 这是因为如果抢占了可抢占的节点,我们希望确保客户不会注意到服务质量下降。

We’ll install the NGINX ingress controllers on a GKE cluster and use preemptible nodes to run them. So let’s get started.

我们将在GKE集群上安装NGINX入口控制器,并使用可抢占节点运行它们。 因此,让我们开始吧。

Image for post

创建集群 (Creating the Cluster)

Let’s start by creating a GKE cluster with two worker nodes in the default node pool.

让我们开始创建一个GKE集群,该集群在默认节点池中具有两个工作程序节点。

Now let’s create a preemptible node pool and attach it to the cluster.

现在,我们创建一个可抢占的节点池并将其附加到集群。

Right, so we now have a Kubernetes cluster with two regular worker nodes on the default node pool and two preemptible worker nodes on the preemptible node pool. The preemptible node pool has autoscaling enabled and can range from two to six nodes.

正确,因此我们现在有了一个Kubernetes集群,在默认节点池上有两个常规工作节点,在可抢占节点池上有两个可抢占工作节点。 可抢占的节点池已启用自动缩放,范围可以从两个到六个节点。

将污点分配给可抢占节点 (Assigning Taint to the Preemptible Nodes)

As you may have noticed, we have also assigned a taint to the preemptible node pool, as we don’t want all workloads to be scheduled on the preemptible nodes. It’s only the Ingress controller pods that we’d like to assign to the preemptible nodes. The rest should remain on the standard nodes.

您可能已经注意到,我们还为可​​抢占节点池分配了污点,因为我们不希望所有工作负载都在可抢占节点上进行调度。 这只是我们要分配给可抢占节点的Ingress控制器容器。 其余应保留在标准节点上。

部署入口控制器 (Deploying the Ingress Controller)

To deploy the Ingress controller, download the NGINX Ingress controller manifest.

要部署Ingress控制器,请下载NGINX Ingress控制器清单。

wget https://raw.githubusercontent.com/bharatmicrosystems/nginx-lb/master/deploy.yaml

I’ve modified it to include toleration for the NoSchedule taint on the preemptible node pool and a node affinity to prefer the preemptible node pool during scheduling.

我已经对其进行了修改,以包括可抢占节点池上NoSchedule污点的NoSchedule ,以及在调度过程中优先选择可抢占节点池的节点亲和力。

If you don’t want it to prefer preemptible nodes and want Kubernetes to decide the best node, remove the nodeAffinity section from the deploy.yaml file.

如果您不希望它优先使用可抢占的节点,而又希望Kubernetes决定最佳节点,请从deploy.yaml文件中删除nodeAffinity部分。

容忍 (Toleration)

节点亲和力 (Node affinity)

Apply the manifest:

应用清单:

kubectl apply -f deploy.yaml

Once deployed, we should be able to see the pod in the ingress-nginx namespace.

部署后,我们应该能够在ingress-nginx名称空间中看到pod。

$ kubectl get pod -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-controller-6b6855f9cb-sj4mw 1/1 Running 0 3m16s

Let’s wait for some time for the cloud provider to allocate an external IP to the ingress-nginx-controller load-balancer service and then get the services to know the external IP.

让我们等待一段时间,以便云提供商将外部IP分配给ingress-nginx-controller负载均衡器服务,然后使服务知道外部IP。

$ kubectl get svc -n ingress-nginx ingress-nginx-controller
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.8.6.193 35.224.126.9 80:32546/TCP,443:32267/TCP 10m

Let’s now create a HorizontalPodAutoscalar for the Ingress controller to scale within the cluster. That’s necessary, as all requests to the workloads are routed via the Ingress controller pods, so we want them to scale with traffic.

现在让我们为Ingress控制器创建一个HorizontalPodAutoscalar ,以在集群中进行缩放。 这是必要的,因为所有对工作负载的请求都是通过Ingress控制器容器路由的,因此我们希望它们随流量进行扩展。

Wait for some time for the HPA to kick in, and soon we will find that the nginx-ingress-controller has two pods.

等待一段时间以使HPA生效,很快我们就会发现nginx-ingress-controller有两个Pod。

The interesting bit here is that both Ingress controllers are provisioned in the preemptible pool. That’s because it prefers it, but when the load increases, we’ll find it bursting across the entire cluster.

有趣的是,这两个Ingress控制器都在可抢占池中配置。 那是因为它更喜欢它,但是当负载增加时,我们会发现它遍及整个集群。

部署样本应用程序 (Deploying the Sample Application)

Now let’s deploy a sample application that the Ingress controller would serve via the load balancer. Let’s deploy two versions of the application to understand how it works.

现在,让我们部署一个示例应用程序,该示例应用程序将由Ingress控制器通过负载平衡器提供服务。 让我们部署应用程序的两个版本以了解其工作方式。

Let’s create the deployment and service for v1.

让我们为v1创建deploymentservice

$ kubectl create deployment app-v1 --image=bharamicrosystems/nginx:v1
deployment.apps/app-v1 created
$ kubectl expose deployment app-v1 --port 80
service/app-v1 exposed

Then create a deployment and service for v2.

然后为v2创建deploymentservice

$ kubectl create deployment app-v2 --image=bharamicrosystems/nginx:v2
deployment.apps/app-v1 created
$ kubectl expose deployment app-v2 --port 80
service/app-v2 exposed

部署入口资源 (Deploying the Ingress Resources)

Now let’s expose the app-v1 and app-v2 externally via an Ingress resource.

现在,让我们通过Ingress资源在外部公开app-v1app-v2

As we defined, if we hit the Ingress controller load balancer with /v1, we should get a response from app-v1 (and /v2 from app-v2). Let’s curl to see for ourselves.

根据我们的定义,如果我们使用/v1击中Ingress控制器负载平衡器,我们应该从app-v1 (和从/v2app-v2 )获得响应。 让我们curl看看。

$ curl 35.224.126.9/v1
This is version 1
$ curl 35.224.126.9/v2
This is version 2

The Ingress rules are working perfectly fine.

Ingress规则运作良好。

对集群进行负载测试 (Load Testing the Cluster)

Let’s make things interesting and do some load testing. We’ll use the hey utility for this test, but feel free to use any load-testing tool you want.

让我们变得有趣,并进行一些负载测试。 我们将使用hey实用程序进行此测试,但是请随时使用所需的任何负载测试工具。

We’ve configured the nginx-ingress-controller pods to autoscale when the target CPU utilisation increases beyond 70%, and we’ve set a pod limit to 200 millicores of CPU. So anything exceeding 140m per pod should spin up another pod to handle the load.

我们已将nginx-ingress-controller pod配置为在目标CPU利用率增加到70%以上时自动缩放,并且将pod限制设置为200百万CPU。 因此,每个吊舱超过140m的物体都应旋转另一个吊舱以承受负荷。

Let’s plan for 1000 concurrent requests for 300s for v1 and v2. That should spin up several pod replicas in the preemptible node pool.

让我们为v1和v2计划1000个300秒的并发请求。 这样可以在可抢占节点池中启动多个Pod副本。

I’ll run the tests in two separate terminals. In a third terminal, I’ll run watch 'kubectl top pod -n ingress-nginx'.

我将在两个单独的终端中运行测试。 在第三个终端中,我将watch 'kubectl top pod -n ingress-nginx'

Let’s crack on with the hey command in the first terminal:

让我们在第一个终端中使用hey命令进行破解:

$ hey -z 300s -c 1000 http://35.224.126.9/v1

And the second terminal simultaneously:

与第二个终端同时:

$ hey -z 300s -c 1000 http://35.224.126.9/v2

In the third terminal, we see the following. With an increased load, Kubernetes spins up additional NGINX Ingress controller pods.

在第三个终端中,我们看到以下内容。 随着负载的增加,Kubernetes会启动其他NGINX Ingress控制器容器。

Image for post

If we get the pods, we see that most pods have appeared in the preemptibe nodes. There’s one pod that’s deployed on the default node pool. That was because the policy prefers preemptible nodes, but it’s a soft check so if we lose the preemptible nodes, the pods can be deployed in the regular node pool.

如果获得吊舱,则可以看到大多数吊舱已出现在preemptibe节点中。 默认节点池上部署了一个Pod。 那是因为该策略首选可抢占节点,但这是一项软检查,因此如果我们丢失了可抢占节点,则可以将Pod部署在常规节点池中。

If we get the nodes, we see GKE node autoscaling in action. GKE has created another preemptible node in the node pool to handle the additional load.

如果得到节点,我们将看到GKE节点自动缩放。 GKE在节点池中创建了另一个可抢占节点,以处理其他负载。

结论 (Conclusion)

Running your stateless applications on preemptible nodes is a decent cost-savings strategy. They’re quick to recover and replace, and if a node is taken away, they can quickly burst into other available nodes.

在可抢占节点上运行无状态应用程序是一种节省成本的策略。 它们很快就可以恢复和更换,并且如果一个节点被拿走,它们可以Swift爆发到其他可用节点中。

Using preemptible nodes for running your Ingress controllers might be a good strategy if your application SLO can tolerate some error and you don’t have a very high-availability requirement.

如果您的应用程序SLO可以容忍一些错误并且您对可用性没有很高的要求,那么使用抢占式节点运行Ingress控制器可能是一个不错的策略。

It takes a minute to spin up a new preemptible node when GCP takes one away. Spinning up another pod in a new node takes 10 seconds. You need to decide if your SLO allows a disruption of one minute every 24 hours. If not, then you should instead stick to regular nodes.

GCP抢走一个新的可抢占节点需要一分钟。 旋转另一个节点中的另一个容器需要10秒钟。 您需要确定您的SLO是否允许每24小时中断一分钟。 如果没有,那么您应该坚持使用常规节点。

You can minimise disruption by creating the node pool during the least busy period of the day. Smaller preemptible nodes live the entire 24 hours in most cases, so when you lose the nodes, you lose them during a time when there aren’t many users.

您可以通过在一天中最不繁忙的时段内创建节点池来最大程度地减少中断。 在大多数情况下,较小的可抢占式节点会在整个24小时内存活,因此,当您丢失节点时,您会在没有太多用户的情况下丢失它们。

It’s also possible to drain the nodes using a shutdown script within the node VM. That’ll ensure better availability.

还可以使用节点VM中的关闭脚本来耗尽节点。 这样可以确保更高的可用性。

Thanks for reading — I hope you enjoyed the article.

感谢您的阅读-希望您喜欢这篇文章。

翻译自: https://medium.com/better-programming/save-up-to-50-of-your-kubernetes-costs-with-preemptible-instances-bdade99ccd39

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值