Accelerate your Kubernetes journey with Amazon EKS

本文链接：https://blog.csdn.net/just2gooo/article/details/134805878

Good afternoon and welcome to K206. Accelerate your Kandy's journey with Amazon EKS. In this session, you'll hear about the major concerns of customers running Cinetis workloads in production - concerns like managing the control plane, scaling the cluster, and let's not forget about upgrades.

You'll also hear how Amazon EKS was built to address those concerns and provide you with production grade Kuban Eddis. We will also provide an insight into some EKS best practices that come from working with our customers.

Right at the end, we'll finish with an exciting announcement that will give you an opportunity to be amongst the first people in the world to unlock a new achievement.

My name is Liz Duke and I'm a Principal Specialist SA for containers at AWS. And I'll be joined on stage by Prati Goer who's a Senior Software Engineer in our Kennedys team.

I wanted to start off by talking about the impact that Kubertis has had on container orchestration. It's huge. Kuban Tis is open source and it's a graduated project from the CNCF or Cloud Native Computing Foundation.

I don't know if you've seen all the projects the CNCF has. It's a little like this. Yeah, I know, I know we can't talk right. This is a slide from a couple of years ago as I can't fit all the services on one slide these days.

So with all this choice, it can be hard to know where to start. EKS simplifies a lot of those options for you. Not that you couldn't include these projects or maybe just a few of them if you wanted to is EKS is upstream Kubernetes conformant, but EKS comes with prebuilt options for things like logging the container network interface and ingress.

EKS also provides you with support at Kuban's versions that have been battle tested for security and scalability.

Now from listening to our customers, we hear some common themes as to why customers are moving to containers. They want to move quickly to keep up with the pace of innovation in technology and micro services help you do that by enabling individual teams to work on specific features.

But even when moving quickly though they don't want to be insecure, so security is paramount but it's no good having a secure solution with lots of features if it doesn't scale and it's not performing because your customers will go elsewhere in today's economy.

It's more important than ever to only pay for what you need when you need it. If you need to expand, to serve customers in another geographical region, you want to be able to create that solution quickly without blowing the budget

Now, more than ever organizations are looking to modernize legacy systems. And increasingly they're looking to do that on AWS with Kuban Eddy, this in turn brings its own challenges, especially around matching that Cinetis release cadence with the additional governance that comes with being an enterprise organization.

Of course, you want that Kubernetes agility, agility and pace of innovation. So how can EKS help you align that speed with your processes and make it easier for you to not only know the impact of upgrading but how to improve and derisk that upgrade experience.

Let's see. So what makes up an EKS cluster? Well, an EKS cluster is a Kubernetes cluster and some of you may be very familiar with Kubernetes and others less. So a Kubernetes cluster is a way to run containers in units called pods and it consists of two main areas.

You have the control plane which is used for running the cluster and you have the data plane which is used for running the applications themselves inside of pods.

EKS started with a managed control plan as that's the part that customers told us that they had the most problems with upgrades being a particular pain point but also scaling it and making it resilient.

So how does EKS help with that? Well, the control point for EKS is fully managed and resilient. We run two copies of the API server and three copies of CD, the kernel state store in an AWS managed account across multiple availability zones. The control plane of EKS is designed to withstand single availability zone events. And of course, you also get AWS support with your EKS clusters.

As for upgrades of the control plane, you've got to keep on top of these as Kuban is there has three releases per year now, with EKS, these upgrades are just a single action either through the console, the API the CLI or through our preferred command line tool. EKS Cuddle and we support four current versions of Kubernetes.

When you're running Kuban Ali's clusters, you want to be able to scale your cluster easily. And when you scale your pods and nodes, EKS takes care of scaling the control plane for you behind the scenes. As we learned how the control plane behaved under different conditions, we adjusted our metrics to make that scaling more responsive.

Now we use a variety of metrics to scale the control plane, including the number of worker nodes and the size of the CD database. And once we've scaled up the control plane, we won't scale it back down again unless your utilization has dropped below that scaling threshold and remained there for several days.

Security comes built in with the EKS control plane. We apply security patching including embargoed CV ES. Those are common vulnerabilities and exposures. We encrypt the CD database and we secure access to the control plane.

As I said, EKS supports four Kuban tis versions at any given time, which gives you up to 14 months to upgrade. Although we'd really like it, if you'd upgrade a bit more often than that, we don't allow alpha feature flags as we don't trust that in most cases that they're stable enough for use in production.

But what about user access to the cluster itself? Pinelli's doesn't actually come with a user directory. It expects you to provide one while it gets on with its core task of orchestrating the pods.

We do have a service that deals with managing users. You might have heard of it, I am or Identity and Access Management, that service scales rather well. And so we integrated it as the user authentication service to the cluster. And the way we did this was to use a Kubernetes object called a config map.

Now this is a YAML config file and the feedback we got from our customers was that they didn't really enjoy this method of integration. And so coming soon, we have I a cluster access management because up until now, what you would do is you'd create the cluster using the EKS API s and then you'd switch to using the Kubernetes API s to manage user access.

Now, with cluster access management, you'll be able to control both your user authentication and authorization using one set of API s and no more awkward YAML config file, other AWS services integrated with EKS like Amazon Elastic Map Reduce can also use cluster access management controls to obtain the permissions. they need to run applications on EKS clusters without the need for cluster admins to perform multiple prerequisite configuration steps.

Now, as I said earlier, Certis releases a new version three times a year and EKS supports four versions at any one time. So to work with Certis, you've got to be working with the release cycle and we see customers wanting more help in this area.

So one of the areas that the service team have worked on has been the upgrade time of the control plane. Now, as I said, upgrading the control plane is a single action, but previously, it could take up to around 40 minutes once you'd kicked off that upgrade for it to complete, now, it should complete in under 10 minutes.

Customers really like that upgrade experience with a single action. So on our road map, we've extended that ability to not only upgrade the control plan, uh the control plane, sorry, but to also upgrade the data plane and add ons where customers also need more help is with the ability to upgrade with confidence.

So if you have multiple teams running on top of the same EKS cluster, you might not know all the API s that they're calling. And between Kubernetes versions, you get API s that come in, you get API s that are deprecated. So to help with that, on our road map, you will be able to generate a report on the readiness of your cluster to upgrade to the next version.

And along with these features will come a metadata API so you'll be able to run these actions programmatically.

Now with Certis, the support life cycle comes from the Certis project. And that's why EKS follows that same life cycle. At the start of this talk, I mentioned that we supported each Cinetis version for 14 months, but we know that keeping up with that Certis release cycle is difficult, especially for those customers who might be reliant on third parties testing their applications against new versions of Kubernetes.

Or maybe in your case, you have to have extended change, freeze windows over maybe the the end of an extended financial year for reporting purposes or over holiday periods. We've just had Thanksgiving in the US. We're coming up to Christmas. Maybe you need an extended change for his period that covers from before Thanksgiving into the new year.

Now you can focus on your business requirements and upgrade when it's convenient for you. Because earlier on this year, we announced extended support for Amazon EKS in preview. This extends that 14 month upgrade period out to 26 months. And best of all, there's no action needed by you to enroll in this. As long as you're on version 123 or higher, when your version goes out of normal life cycle support, you'll still be supported.

I've spoken a lot so far. about the control plane. But what about the data plane? You've got different options with the data plan for EKS. But again, where customers were telling us that they needed help was in upgrading it and in scaling it. So how does EKS help with that?

Let's first look at the different options you've got for the data plane. The original um option for the data plane was self managed EC2 instances and that's still available today with these self managed instances, you're fully in charge of controlling the life cycle of those instances. You provision them, they exist in your accounts and you make sure that you know, you upgrade your nodes and you manage the life cycle of those instances.

While you're doing that, you also have to be aware that you need to upgrade your A S underneath. When you know there's security releases with EKS managed node groups, which is another option. We step in and take some of that responsibility away from you. We help you both in provisioning the nodes and scaling them and in handling them during the upgrade process.

And then we've got Carpenter. Carpenter is the new kid on the block. It's an open source order scalar that works with your EKS clusters to provision the right nodes for your workloads.

Our final option is AWS Fargate. With Fargate, you specify the amount of CPU and memory you require and that's what we bill you for. So you get that granularity of billing down to the pod level.

So looking at managed node groups a little bit more manage node groups for EKS come with order scaling groups which configure the EC2 instances on launch and manage node groups simplify the worker node upgrades by managing the cooning and the draining of the nodes and also by scaling the number of nodes in your order scaling groups. So you don't suffer from reduced capacity during that upgrade process.

Now, this used to be done on a no by node basis. But we've got customers with a large number of nodes in their node group. So this really didn't work for them. And so we extended this to make it possible for you to upgrade nodes in parallel

So now you can select either a percentage of nodes to upgrade at the same time or a specific number of nodes to upgrade at the same time. And when there's a new AMI available for your managed node group, you'll be notified.

So why am I calling Carpenter intelligent compute for EKS? Well, when you want more copies of your application running in more pods, then you need more instances. And normally this works in EKS by integrating with the Cluster Autoscaler. The Cluster Autoscaler then talks to the EC2 Auto Scaling group and says I need more instances. These nodes would then join the cluster and the pods which have been in pending status, get scheduled on the nodes.

But there's always been a disconnect between the Auto Scaling groups and the EKS cluster because the Auto Scaling group doesn't know or care about the cluster. All it knows is that it's been told more instances are needed. Auto Scaling groups are also seen as being more restrictive in what nodes can be part of an Auto Scaling group and diversifying the Auto Scaling group is also seen as a pain point by customers.

So the EKS service team created Carpenter, this interfaces directly to the EC2 Fleet API and it selects the most optimal nodes based on what's in your pod specification. So say for example, your application needs to run on GPUs instead of CPUs and you put that in your pod specification. Carpenter will go out and find a node that suits. Carpenter also looks to see how it can pack pods onto nodes doing bin packing on your behalf.

Carpenter also will consolidate pods onto smaller nodes if the demand on your application reduces and your nodes and your pods scale in. Today, we have many customers, many Carpenters, many customers running Carpenter in production, seeing not only great performance but also awesome optimization leading to impressive cost savings across their clusters.

Now, if you don't want to manage compute at all, not even the tiniest bit, you could use our managed offering of EKS Fargate. With Fargate, we manage the patching, the updating and the scaling of the underlying nodes and we don't share those underlying nodes with other pods, whether it's for another customer or within the same account.

There are some considerations as you can see from the slide. But with Fargate, you get to truly focus on spending time on your business applications rather than managing infrastructure, we implement the EKS VPC CNI container network interface.

By default, when you build an EKS cluster, the VPC CNI takes care of requesting a new VPC IP address for every pod that's created. Now in Kubernetes, by default, every pod can talk to every other pod too. So to restrict traffic between pods, the Kubernetes native way, you'd implement network policies.

Previously, customers would have to add in an additional third party CNI to implement network policies in the EKS cluster. And this was more work both to manage and to keep updated. So the support for network policies built into the VPC CNI was one of the most requested features on our roadmap. And earlier on this year, we released that feature.

We implemented the feature using eBPF which has a lot of advantages over more traditional methods such as IP tables. Some of these advantages are things like more efficient packet filtering which can lead to better performance using network policies. You can implement granular network restrictions between pods and implement least privilege at the pod level. This adds to your defense in depth strategy for EKS.

Now, Kubernetes clusters rarely exist in isolation, they're part of a workload environment. Maybe your application requires a message queue like Amazon SQS and most likely you'll be integrating against one or more databases like Amazon DynamoDB or Amazon RDS. So you need to think about the security of integrating to those other AWS services I already mentioned using IAM for user access and network policies to restrict network traffic.

But what about using IAM roles to restrict access to other AWS services? With EKS, you can do that as well. And what about sensitive information like passwords? Kubernetes stores sensitive information, just like other container orchestrators in objects called secrets. These are stored in etcd.

Now by default in Kubernetes, these are only base 64 encoded but with EKS, you not only get encryption of etcd by default, which protects your secrets where they're stored, but you can also use KMS envelope encryption which protects your secrets where they're used.

And maybe you want to run analytics workloads on EKS using Amazon Elastic MapReduce on EKS. You can use a single EKS cluster to run multiple Apache Spark versions and configurations along with automated provisioning scaling and faster run times.

Another higher level service like EMR that's integrated directly with EKS is Amazon GuardDuty. Amazon GuardDuty is a managed threat detection service that provides you with a more accurate and easy way to continuously monitor and protect your AWS accounts and workloads.

You've been able to monitor the EKS API actions using AWS CloudTrail and have access to the Kubernetes audit logs for a long time. Now, now you can continuously monitor all your Amazon EKS clusters across your organization with just a few steps in the GuardDuty console and you can protect your EKS workloads with a new GuardDuty security agent that adds visibility into individual container runtime activities.

Using Amazon GuardDuty, EKS Runtime Protection adds a layer of security with very little work required by you to implement. You can enable it with just two steps. The first step enables GuardDuty runtime monitoring for EKS in your account. And the second step deploys the agent across your clusters.

Now GuardDuty starts looking for unusual and suspicious behavior. So maybe your pods are trying to contact the IP addresses of known command and control servers to make them part of a botnet or maybe they're trying to look at the host metadata to gain access to IAM credentials. You can then make informed decisions based on those findings.

You've probably got at least one operations team and maybe more so to make it easy for centralized operations teams to know about your Kubernetes clusters. Amazon EKS has a console that can be used to see all of your Kubernetes clusters, whether they are other EKS clusters or self-managed clusters running on AWS on-premises or even on other clouds. It does this using the EKS Connector.

There's always been a tension between making sure your workloads operate at the right sort of level of performance and give you the right availability and balancing that against the cost to run those workloads. This can be more of a challenge when we're looking at doing this in Kubernetes because you could have multiple teams running on the same underlying cluster.

So you need to be allocating the costs to those teams accurately. Or maybe you're actually running a multi-tenant cluster for your customers and you need to charge those customers the right amount for the resources they're using. So you're doing chargeback or showback and most likely you want to be doing reporting, budget forecasting, cost optimization.

There was a study that was done by the FinOps Foundation that found that when they did this study and surveyed customers, less than 66% of organizations had any form of detailed Kubernetes cost monitoring and only 22% of the organizations who were surveyed could accurately predict their Kubernetes costs within 5% or better.

So what drives these costs? What their survey found out? And what we see is that the majority of costs for your Kubernetes clusters and no different on EKS to other Kubernetes clusters comes from your compute and memory. It's over 90% of your overall costs.

Yes, you might have other costs that come from, for example, running third party applications on your cluster. So you might be paying license fees for those. Maybe you've got some storage, if you've got stateful workloads or maybe you're sending data across Availability Zones and you've got some data transfer costs.

In fact, the cost for the managed control plane is almost insignificant compared to these other costs. And if these costs are coming from applications which have been specified incorrectly for how much CPU and memory they need, then what we see is a lot of underutilized resources because you know what it's like, right?

You ask somebody how much CPU and memory do you need to run this application? And more than likely the first initial estimate is a finger in the air. This is how much I need, which is fine as long as you're continually measuring that and adjusting that based on the performance of the applications. Otherwise, the scheduler just looks at how much you've specified for that pod and looks to find the available amount of resource within your cluster.

And if you're using an Auto Scaling Group and that resource doesn't exist, it'll spin up more resource to host that pod. So you need to be aware of what you're spending on your pods. So when you're looking to optimize costs, you not only need to measure the performance of your applications inside your pods, but how much those are costing you, whether it's at the pod level or more commonly at the namespace level.

A lot of customers have been using Kubecost for this. Now, originally Kubecost came in two different flavors. You had the open source version which came with a limited feature set, support from the community but was free. And then you had the enterprise version which was supported directly by Kubecost, came with a richer feature set but came with an enterprise license cost.

With EKS, you've got a third option and that's the EKS Optimized Kubecost bundle. We've partnered with Kubecost to give you a version of Kubecost with more features than the open source version. You get support direct from AWS with this version of Kubecost, but you don't get charged for it. We pick up the licensing fee for that.

You can deploy this version of Kubecost onto your EKS clusters either by using a Helm chart, by using a managed add-on or from the marketplace as an add-on from the marketplace itself. And you can also now get a unified view across your clusters for the costs that you're incurring by using this optimized Kubecost bundle.

So whether you take advantage of the depth and breadth of services that AWS offers to build new cloud native applications or you want to use third party software applications on your EKS clusters. AWS can help. The AWS Marketplace and Amazon EKS have partnered to give customers a better experience.

You can deploy software solutions from providers as an add-on. Some of the available options are Crossplane from Upbound, Apache Kafka from Confluent or Dynatrace from Dynatrace.

So you might be thinking, oh, this is great if I want to run Kubernetes in the cloud. But what about when I need it in a particular country because of data residency requirements? Or maybe you've just invested a lot of money in your on-premises infrastructure and you'd like to make use of it before you retire it.

We have as a goal to provide our customers Kubernetes how they want it, where they need it. This includes in the cloud, on-premises and everywhere in between. So hopefully, you can see why Amazon EKS offers you production grade Kubernetes and how we're always focusing on continually improving the service to better enable our customers with that.

I'd like to hand over to Cedric who's going to cover some common issues and best practices to help you get up and running quickly with EKS.

When the demand reduces, you can definite, you would definitely want this to be seamless and fast. Without customers noting this, it is recommended to configure and run a Node Auto Scaler, either Carpenter or Cluster Auto Scaler for managing your nodes for this purpose.

S team developed Carpenter as we saw in earlier slides to help users to save infra costs and have efficient resource utilization to help with such scenarios.

Everything is great. You have your application auto scaling in and out, saving you hundreds of thousands of dollars in infra and operation cost when not required and you are able to serve your customers best. However, suddenly you see some of the parts are not running or serving any customers request anymore.

Let's first see as an EKS user, what tools are available and how you can use these tools to diagnose such scenarios starting with you would want to monitor any errors that are happening in your cluster or just how your cluster is performing with your control plane to check how our requests being processed by the APS server running in your control plane.

Next, you would want to monitor all the nodes that are running if they are running into any kind of issues or any other critical components like storage or networking to check if there are any network related issues that might be causing such issues in the cluster. You start with your metrics and logs from all these components and start to analyze them. But how do you proceed? Like where do you even, which tools do you use to start looking for these metrics and logs?

Let's see what are the tools available at your disposal to make it easy for you to collect and visualize all the cluster insights.

S provides all the logs and metrics from control plane and data plane in your CloudWatch. In your account, users can run CloudWatch Insights queries to analyze C control plane and check how the requests are handled by control plane.

Similarly, they can run these insights and that can help you scrape metrics from your workload related to resource utilization on the nodes, parts network performance status of your nodes Cloud Watch also allows you to have pre built dashboards for metrics, giving you a single pane of glass for all your metrics.

Also, you can publish all these metrics to Amazon Manage Prometheus through the agent less feature, which was just announced a couple of days ago.

Next up is logging. You can configure your cluster to publish all the control plane and data plane logs to your CloudWatch accounts for data plane. You would need to install a demon set, something like Fluent Bit. And once all the logs are published to CloudWatch Container Insights classifies these logs into three different categories.

By default, starting with your application logs. These are directly streamed into your application log groups. Next are your host log. These are things like wire log messages which go into your host log per node. And then the last category is your data plane log. These are for components like Q Proxy AWS Node or container runtime, which is running on every worker node and are responsible for maintaining running parts and are captured as data plane logs.

Great. Now that you have monitoring for your metrics and logging established for your cluster and with the help of these tools and data that you are collecting. Let's see how we can narrow down why are the parts not running in that cluster that we started?

Let's start with checking what's happening in the control plane where you can check your APS server metrics and APS server audit logs starting with the matrix. You can check if your APS server is healthy and available. There are no delays from the APS server and there are no errors for the request that are being sent to the APS server. So in terms of APS server metrics, everything looks good.

Let's continue looking at the audit logs which will help us answer some deep insights into APS server operation for things like. Are there any errors for request sent specifically by your user agent? Are there any errors for requests being sent by the core controllers such as running in Q controller manager? Are there any errors during scheduling these parts from cube scheduler. These are just some examples for things you can find answers based on your metrics and logs by using the Container Insights.

EKS team also publishes a sample contain sample insights queries in the link pasted here in the AWS Observable guide under log aggregation.

Continuing our search for the answer for why parts are not becoming ready. Let's analyze the data plane metrics and logs for the cue nodes again, CloudWatch Container Insights to the rescue and we are able to answer things like are all my cue notes healthy are cube nodes running into any kind of resource limit like CP or memory pressure. Looking into the data plane logs, you check if the container images for all the parts are available on the nodes depending dependent objects like secrets and config maps are available and are the nodes running into any kind of ip exhaustion and looks like in this case, that's what's happening. We have a node running out of free IPs which is causing parts to not go into the ready state.

Let's see if this info is available somewhere else as well. Enter the world of cluster networking as parts are created and terminated in the cluster IPs are assigned and they become available before looking into the metrics available from cluster networking. For those who are not familiar with cluster networking, let's understand at a high level what it means a cluster network more commonly also known as CN plug in. Can be used to configure power networking in a given is cluster does come configured with this networking out of the box which is AWS VPC and it uses AWS IM to assign the IP addresses. That means you get native EPC networking. This also means you get simple secure networking which is inbuilt. And you can use tools like BC flow logs to monitor the traffic.

S team provides a CNI metric helper tool to scrape this information. It aggregates metrics on a cluster level and publishes all the metrics to CloudWatch. There are a large number of metrics published by this tool, but there are some sample metrics which can help you answer things like how many es are available or allocated in my VPC and cluster, how many IPs are available are used in your VPC?

Now, this provides you with an option to configure alarms to avoid running into a situation. And users can take proactive steps by adding secondary CIDRs or using IPv6 address types for your clusters.

Let's see why does monitoring the ENI and the IP usage for the subnet use for the clusters becomes very critical as we saw in an earlier slide about the S architecture. The communication between the customer EPC and the control plane is also established through an S managed ENI if your account is hitting any of these limits for ENI creation or IP exhaustion and new ENIs cannot be created or no free IPs are available this would mean upgrades like scaling and security related patching to your clusters are blocked and clusters clusters are marked as degraded as you start running large number of critical applications in these clusters, it becomes critical for the control plane of these clusters to be scaled and upgraded whenever required.

Let's take a look at another scenario in your cluster. As the usage of your cluster increases, there are more and more developers and operators who are deploying their applications on these clusters and you need to ensure that they are labeling their parts correctly. For that. you consider adding admission controller web hook to your cluster. Let's take a look at a high level. how does this work with your requirements?

You have a user who is talking with your A P server. This could be an end user or CIC and they send a request to create parts when the request is being sent to create parts. APS server, forwards it to the admission web hook controller. It verifies that the parts that contains the label. Once everything is correct, it returns back the response to APS server and your part object is created in HD.

You might already be running various web hooks in your cluster for different use cases. So let's take a look at how you can ensure these web books are not impacting any of your requests in a P servers.

SC control plane, provide metrics for APS server which can help you monitor in case APS server is not able to reach the mission web hook controller. Similarly, you should also be monitoring in the case. Admission web controller is responding with any errors for the requests that are being sent by A P server. For scenarios where possible you should also configure admission web hooks with a fail open policy shorter than 30 seconds which prevents requests from being completely rejected from the A P server.

Qin docs actually provide some more best practices around when running admission web books in production. What are the sum of the additional things you should be considering? I do recommend checking the those dos out.

All right, now that you are not a small coffee shop anymore, taking hundreds of orders every hour and ser serving other items as well as a result, your cluster has increased. And as a result of this, there are a large number of requests being sent to the APS server. And occasionally you see errors like 429 when interacting with control plane before we see what's the most common cause and how to mitigate these errors in APS server.

Let's first understand how does APS server handle these requests prior to APF priority and fairness, which is a new feature which got added in communities 120. Any kind of call like a health check request, any call from Q get, get our list request were all treated equally at the APS server level. APS server did provide a couple of flags to control how many mutating and non mutating requests can be in flight and the remaining calls would get denied with a 429 status quo.

What this meant was if there are large number of heavy calls being made, which are represented here in red. For example, these are list calls which are listing hundreds of thousands of objects in your cluster. It could potentially impact the static stability of your control plane since other health check requests could be rejected while APS server is still processing these large time consuming requests.

So how does APF and help? In this case, it introduces two resources. First is the priority level configurations and the second one are the flow schemas. Priority level configurations define the priority levels for your requests and the flow schema help classify each request and match them to a priority level. In simple terms, we can think of these as requests that that get added to separate cues with this. Now health check and other critical calls for taxability can be in their own queue and not be impacted in case there are other slow and large requests being made in investor.

However, if you have APF configured with critical calls, like create port calls for your application and there are list calls which are also using the same bucket, it can potentially impact your create port calls because they are all falling into the same bucket at this point, it becomes really critical to monitor control plane metrics for APF that can help you diagnose this much earlier in case you are hitting any concurrency limits or queues are getting filled up.

Also, it is recommended to reduce the large number of expensive list calls being made to the control plane. Shout out to the scalability team as they have published a great documentation in EKS best practices guide in this link where they they go over these best practices around APF which metrics to monitor and some more background around APF.

Finally, I would like to summarize some of the learnings and the best practices when running EKS clusters run parts as part of your deployments and use topology spread configure parts and node auto scalars, monitor metrics and logs for the control plane and data plane. Use CNI matrix helper for monitoring CNI metrics or any other component that you are deploying, which are, which are critical to your application. Make sure you monitor your admission controller web hooks, configure APF flows for critical requests and monitor APF metrics.

This was just a set of most recommended best practices which can help, which can help ease a large part of your operations. S team publishes a complete guide for best practices which is quite comprehensive and covers topics like security, scalability, reliability and networking. I would highly recommend checking this link in case you haven't seen this next over to you for the final announcement.

Thanks Critique. So finally, we're at the end of our talk and i did say that we provide you with the opportunity to be amongst the first people to unlock a new achievement. Now, until now, we haven't had any specific EKS digital certifications. So I'd like to tell you about the first one of its kind. We're launching the digital EKS learning badge along with its learning path. In fact, it went live just the other day. So if you or your team want to know about the core concepts of EKS and how to get started, we've got you covered. You'll also be able to take an assessment and earn a digital badge that you can then share on your resume with your employer or on social media, test your level of knowledge on Amazon EKS and show what, you know.

Thanks for attending the talk. I know it's at the end of a long day.