What’s new and what’s next with Amazon EKS

taibaili2023

于 2023-11-27 18:41:46 发布

阅读量437

点赞数 10

文章标签： aws

本文链接：https://blog.csdn.net/weixin_46812959/article/details/134651963

版权

Nate: Hi. Hi, everybody. Good afternoon. Thank you so much for coming down to the Mandalay Bay. My name is Nate Taber. Uh and I lead the product team for Amazon Kubernetes. Really excited to be talking to you today about what's new and what's next with Amazon EKS.

I'm joined by Mike Stefanik, who's aaa product leader on our team as well. And Mike Stefanick and I are gonna be kind of splitting this presentation up. So I'm gonna be talking about, I'm gonna talk about the intro, kind of talk about some high level stuff and then Mike's gonna kind of deep dive into some of the really important features that we're building and what's going next and then we're gonna wrap it up and all the way throughout. We're gonna be talking about what we've been doing in the last year and kind of what's coming in different areas.

So let's go ahead and get, get going here and the thing i wanted to start with and it's, it's hard to share numbers or quantify this and i've just, we put a smattering of, of different companies up on the slide here. but at this point, Amazon EKS was announced here at re invent in 2017. and so we've been out as a service since 2018. so 3.5 years old or almost 4.5 years old and we will be five years next june. and, and since we announced amazon uk, s, it's become adopted by almost every type of industry and every type of company um around the world.

And so we have companies running uh gaming servers, running critical applications, running machine learning. um some of the the favorite apps that you may use, the ones that are on your phone every day or the ones you may log into are are running on amazon eks. and so we have a huge responsibility and we have a huge scope of, of different types of customers that we're supporting.

So these customers that are running eks, what are they looking to do? There's, there's four key goals that when we talk to customers that they're asking us for. The first is that they want to standardize it operations in order to accelerate the pace of delivery and the amount of changes that they can make. And so as you're probably familiar the fact that you can change a system and that you can safely and reliably change a system, the faster you can innovate because innovation comes from trying new things, it comes from experimentation, not everything is going to work. And if it takes a long time to put a change into your it system. And this could be really simple, like people say, hey, that shopping cart checkout is really clunky. We wanna change that, right? Like that's a change you have to deploy to production or, you know, when i try to spin up this part of my application, it's, it's, it's, it's giving me an error, something's not working, right? You need to make a change.

So the faster that you can make those changes. And if you can standardize how you make changes across all of your it systems, because many of our customers aren't just supporting a single application. They're supporting hundreds or thousands of applications across their organizations. If you can standardize and accelerate the pace of change, you can accelerate the pace of innovation. So that's the first goal.

Second, customers want to reduce their fixed expense. How do you reduce fixed expense? You run things more efficiently, which mean that you have to eliminate complex contracts and management overhead, you wanna reduce the amount of people that are in the loop in order to do an operation in order to make a change, in order to scale your system. And the more that you can reduce that, the less cost that you have built into the system and that more easily, you can scale the system across your organization.

These aren't things specific to kubernetes by the way, but these are the kind of things that customers are using kubernetes to achieve and then as part of that, you want to enable the whole organization. And this is a really common pattern we see with eks and kubernetes is, it's not just about enabling a single development team, it's not about just making one application better. It's about pushing out a standard operating system for how you're running in the cloud across the whole organization and allowing multiple development teams to have efficiency gains um from a standard framework for deploying and managing applications at scale.

And then finally planning for the future and reducing risk. Um obviously, this is this is something that development teams think about in terms of technical debt. But organizational leaders like cto s and cio s are thinking about long term sustainability of their code bases. And one of the great things that we have at aws is a very strong promise to customers about the quality in the long of our services. Aws tends to not turn things off. And at the same time with kubernetes, we have a huge community both inside of aws but also with other uh cloud providers, other partner companies, um our customers that are innovating and helping to grow this project. And so we think that if you do things in an open and transparent way, if you're building in the open source and that people are sharing the burden, then you have aaa code base that's built for the long term because it's built with the community in mind and that helps customers reduce their, their long term risk and kind of plan for the future and say, you know, what this system is worth investing in and it's worth us taking uh taking our time to build something on here.

So given those goals, what's our mission on the kubernetes team? We have a few different things. First of all, security comes first for us, security is our most important thing and we'll get into that in a few minutes about what we're doing in the security space for kubernetes. But we think that if you don't have a secure system, you, you almost might as well not have a system, right? Like if you don't have security, then you might as well, you know, just do something different because that's one of the most important things for customers is that to know that the stuff they're running is not going to have issues is not going to be hacked or, or have a breach. Um, then they're gonna have good controls over what they're doing.

So that's the first thing we do is to, to put security at the heart of everything we do. Um, and we don't make changes to our systems unless we have a strict kind of set of security reviews and standards and that's just not eks, that's all of aws. So security comes first.

The second thing is to have open standards, um, and to utilize community and open standards and to not just use those standards but help propagate those standards and design those standards. Um and that's a really important thing of what we do kind of going back to that long term future is that we want to be using api s and mechanisms that help customers kind of standardize.

Third is to kind of build in best practices. And this is something that we think we can uniquely do at aws is to help bring best practices, both the practices that we see from our customers. But also the practices that we've developed over, you know, 15 years of running large scale distributed systems at amazon and our team that builds eks and runs eks for our customers are, are not just coming, some of them are coming from the certis community, but a lot of these people are long term amazonians and they've built large scale distributed systems at amazon for many, many years and they bring those practices into eks. And we think that expressing those through our service is a, is a really important mission.

Um and then finally, seamless integrations, we'll talk about this as well. We wanna make sure that you're getting the most out of the cloud when you run ekseks is a is a layer and it's an access layer into the aws cloud, right? And so we wanna make sure that any integrations down to those cloud resources, both below on the side and above are seamless and that you can get the full value you out of aws when you're running kubernetes with aws and then finally 24 7 support, we want you to feel like as a customer, you have complete support from amazon for everything that you're doing.

So these are the things that we're doing in the and the goal here, the overarching thing is to give customers application ready production kubernetes cloud in kubernetes in the cloud and then also in the data center.

So this is great. It's a mission. Ok. But how do we actually execute it? How do we think about our priorities in order to fulfill this mission?

Well, like i said, security is first. So security is at the bottom here and this is our pyramid of customer priority. So when we talk to customers and we say what's important, what matters to you. These are kind of the hierarchy of how we think about the priorities for, for our service and, and um our customers and by extension our service.

So like i said, securities is at the bottom. If you don't have security, you might as well not even be running a system at this point, especially in the internet, especially at cloud scale.

Second is reliability. If you don't know that that system is gonna be up, then you're gonna be in trouble. So the system has to stay up, it has to be reliable.

Second, you have efficiency. So once you have a system that's secure a system that you can trust. You want a system that is efficient, that gives you uh the best cost value per resource. So you want to pay what you need to pay and you don't want to pay anymore. So you want really high efficiency and that's both for cost. But it's also, there's a lot of talk this week about sustainability. And when you think about efficiency, one of the largest drivers of of usage on a kernes cluster is ec2. It's the compute. And so if we can optimize the resource utilization of a cluster, we can help you save money and we can also be more sustainable in terms of our environmental impacts. Um and that's, that's felt at an aggregate scale.

So efficiency and then once you have those things, you can start to think about higher order functions, things like automated operations. How do i make sure that everything comes together really seamlessly? And i have to do minimal work to run and maintain the system. And then once i have all this, you can think about this in scope of a kubernetes cluster, right? I have a cluster that's secure, reliable, efficient, i can automate it and then i can take that package and i can roll that out as a standard across my organization.

So these are our priorities and we're gonna go through each of these layers of the pyramid. Trust me, it's not arduous. It, it's actually quite interesting. And we're gonna talk about what eks is doing at every layer of this pyramid, both what we've been doing and also what we plan to do.

So we're gonna go ahead and kick it off with security and mike's gonna, gonna lead the way talking about how aws is helping to improve security for kubernetes clusters.

Mike: Thank you, Nate. So, security the bottom of the pyramid, but it's always the top priority for us. And we think about it in these three different areas, supply chain, compliance and controls, supply chain is all about is the software you're running patched, secure and it's a shared responsibility with eks. We're gonna make sure the control plane is always handled for you and on your side where you're actually running your applications, we're gonna build and deliver software that has all the latest fixes and, and upstream patches and then compliance and controls are related with compliance, eks checks just about every compliance uh mandate out there. A lot of you operate in regulated industries and you just want it to be easy to check that box without having to do extra work. Eks makes that easy for you for things that don't necessarily fall under compliance. You might need additional controls and that's making sure that you can access and, and remediate any issues that you might have in your cluster.

This might seem like a mistake. What's the, what's the kubernetes version slide doing on the first section? Of security. But it, but it's actually very intentional. One of the really, maybe the best way that you can stay secure with eks is staying up to date on kubernetes versions. We do not allow clusters to stay on unsupported version and that's not to be annoying. It's, it's really all about security. There are in upstream kubernetes, they don't accept security patches for unsupported versions. And while we do our best to back port things at a certain point, the code base might drift too far away that it becomes impossible to back port.

So a lot of what you'll see in our road map today is, is about making upgrades easier. We know that it can be challenging and kubernetes moves fast. But the first thing you do when you adopt uk s is you have to have an upgrade strategy and that's really the, the number one way you're secure.

So this is our schedule. We'll, we'll, we'll send out notifications if you're, if you're running an older version. But ideally you're never there. You're just always have a strategy to keep up to date on your versions.

Getting into some of the improvements we've, we've launched this year. It might seem like log for j was, was a long time ago, but it was actually just after last re invent that this, this came out. So we included it here in this slide. And this is a really good example of, of while we're talking about roadmap here. We're never gonna give exact dates because security is that top priority. And i get asked from customers. I get asked from solutions architects, account managers. When is this feature coming? What date is that coming? And we're never gonna give exact dates because if something like this comes out, we drop everything.

So when mod for j came out, anything, we were working on all hands on deck was to build various features and make sure that uk s customers were or patch. So we leased a uh damon set to do live patching. We worked with the bottle rocket team for those of you using bottle rocket

"And this is just a good example of, of a case where security takes priority over everything. Some features launched this year. One really that, that I want to highlight is the work we've been doing with the GuardDuty team.

So earlier this year, GuardDuty launched protection for EKS clusters where they will scan the audit logs of clusters and they can detect things like did any malicious actor try to log into the cluster. That's phase one. And this was actually preannounced just yesterday - GuardDuty is gonna launch support for runtime protection of EKS clusters. So in addition to the cluster itself, it'll do operating system and container scanning and it'll do things like checks for file access or network intrusion down all the way to the container level.

And the nice thing about the integration with GuardDuty is it's just a check box. You go into GuardDuty at the payer level and you can say I want to secure all of my EKS clusters and that's all you have to do. You don't have to run your own agents, do a bunch of work. It's just a check box.

Speaking of compliance, AWS PrivateLink is something where we're gonna check the box here on soon. Not, not to be confused. So when you, when you create an EKS cluster, you have the option to get a private endpoint for the Kubernetes API server itself. And that's something we've supported for quite a while. But the gap we have is supporting private access to the EKS management API. So if you want to create a cluster, delete a cluster and do that without having traffic traverse the internet, that's something we're gonna launch soon.

Cluster Access Management - anybody who's used EKS at, at any, even just a little bit, you've probably familiar with the way to manage IAM users and roles who need to access the cluster. And today you do that through a config map running inside the cluster. It it works but it's fairly brittle. You, you might accidentally type something wrong.

And so we're working on an API replacement for the config map. It'll be a strongly typed API that you can associate IAM roles and identities with the cluster without having to do it inside the cluster itself. So you'll never have to worry about locking yourself out of a cluster. You can control whether the cluster creator has admin access to the Kubernetes API server, which is something people have been asking for for a while. And it'll just really simplify the the process of managing user access.

I think just overall, a lot of the features you'll see today that we're talking about coming soon. are we really want to simplify a lot of the management operations that you do with EKS.

The second part of this API is we've heard from customers that RBAC, which is the Kubernetes method for giving users access to the clusters. So say you need to read something from a namespace or write a secret, et cetera. RBAC is powerful. But for those more straightforward use cases where you say just need to give this developer read only access to a namespace or you wanna give an IAM role admin access to namespace dev-star. You can do that with RBAC, but you have to write lots and lots of roles, roll bindings, cluster rolls - it it can be a lot.

So the second part of this feature will be doing authorization through the EKS API itself. So you can do those exact use cases I just mentioned, authenticate this user, but also give them access to the namespace and you can completely skip RBAC, it's optional, but we think it will really help for a lot of those more straightforward use cases.

Another security related feature, the VPC CNI is the CNI that we build in support and comes with every EKS cluster out of the box. It's tightly integrated with the VPC. It, it runs as a underlay network using the VPC itself. So you can take advantage of any VPC features like security group integration or prefixed delegation.

One of the gaps in the CNI plugin is support for Kubernetes NetworkPolicy. And, and generally customers use this to secure traffic within a cluster. So you want, you know, you have a multi-tenant cluster where you have team A running their application, team B running their application. It's not that they're malicious or they're running untrusted code, but you just want to prevent noisy neighbors or this, you know, this application shouldn't be able to talk to that application.

And so today you can run third party plugins to do that, but we're gonna build this natively into the CNI plugin. So you don't have to run additional software. It will also be our first foray into eBPF. This is a technology that's got a lot of, a lot of buzz out there. This is the first use case that we feel really makes sense for us to do eBPF.

We've heard from customers using other plugins that IP tables can run into issues at scale. And so we wanna make sure we support clusters of any size and we're gonna, we're gonna implement these network policy rules using eBPF. And I think this is the last one in the security section.

So another one on the theme of there's an existing feature, but we have opportunity to make it easier for you. And sort of the reverse of giving human users access to the cluster is I have pods running inside the cluster that need to access an AWS service. So it could be the, the AWS Load Balancer controller that needs to call the ELB APIs.

And we have a feature today called IAM roles for service accounts that that works to do that. You can give fine grain permissions without having to give the node, the IAM role to the node, all of those permissions. And it's, and it's the secure, least, least privileged way to, to do that, but it runs, there are issues at, at scale.

If you are running lots of clusters, you have to go through a number of steps to configure it. If you get past a certain number of clusters, you have to create duplicate roles. And we just want to remove a lot of that automation that you have to build yourself and just do it in the service.

So we're working on the new, the new version of IAM roles for service accounts. It'll work a lot like ECS and Lambda does where you just create the role once. The trust policy is updated to say "trust eks.amazonaws.com", and then you associate that role with any number of clusters. So it'll really simplify the way that you can manage credentials to applications running in your clusters.

The other feature we're gonna add support for is session tags, IAM session tags. And this will help you implement fine grain permissions without needing to write lots and lots of policies.

Okay. Onto reliability, which is the next rung in the pyramid after security. With reliability, it's all about making sure that the system is, is stable. It's up, it's gonna be able to meet EKS has our own SLA but we want to make sure that you can also meet your SLA for your customers. And that's really how we think about reliability.

So we wanna make sure that the cluster itself, the our side of the responsibility model of the control plane. We do a lot of work there, which I'll talk about to make sure it can meet your scaling needs. But then it's also on your side, the running applications. How do you get that compute, networking, storage capacity that you need to run to run your apps?

Okay. Cluster updates and creates - I know, I know a lot of you have been asking for this for a long time is faster cluster updates and creation. This year, we focused on update time. We reduced the amount of time to update a cluster from sometimes in 40 minutes. Now it's under 10 minutes. And this, this supports all types of updates, including version upgrades and associating OIDC providers to your cluster.

It, it seems like a simple change, but there's actually quite a lot of work that went on behind the scenes to do this. Because every EKS cluster you create, you're getting dedicated infrastructure, you're, we're in our accounts, we're creating EC2 instances, NAT gateways, VPCs, load balancers, auto scaling groups because, and we're doing that across multiple availability zones.

And so there was a lot of work we did to optimize that workflow of bringing up newer versions of the, of the control plane. On the on the road map is faster cluster creates - I know for, for a lot of dev test CI/CD use cases where you might need to bring up clusters and bring them down the amount of time it takes today can be anywhere from 10 to 15 minutes. And that's just frankly too long.

So we're gonna work to optimize that to get down to at least under five minutes. But I have, I have expectations we'll do quite a bit better than that. And then finally on the, on the topic of upgrades is upgrading with more confidence.

We've heard from a lot of you that it can be scary to hit the upgrade button, right. You're you're managing platforms, you're not entirely sure what teams are running in their clusters. Are they running an outdated version of the ingress? API are they, I don't know, mounting the docker socket in their pods and they're moving to containerd and, and while we put a lot of effort into the release notes of every EKS version, we, we all know our track record with, with reading documentation and release notes, right?

So we want to build that into the service. So we don't, you don't have to go for every single version, read very very detailed notes and do that. So you go to click the upgrade button, we'll generate a report. If we detect that, you know, some pod is using a deprecated API version or rather some resource in the cluster will stop the upgrade and tell you, "Hey, you have to go fix this." So that's that's one of the things on the on the road map that we intend to do to really make it simpler to to upgrade your clusters.

Control plane scaling - this, this one is actually related to the faster updates that I talked about. But in terms of reliability, on our side of the responsibility model, we are constantly working to improve the performance of your control plan. So you pay the same price for every EKS cluster. But behind the scenes, we're gonna scale to larger EC2 instances, higher capacity EBS volumes, larger RDS database sizes and we're doing that all on your behalf seamlessly without any change that you notice or any change in pricing.

And the faster updates that I mentioned before also allows us to scale faster. So this is not something you see or, or an API call you make, but it's something we do behind the scenes on your behalf. And then we're on the road map, we are constantly working to push the limits of, of Kubernetes.

So we have teams that are testing clusters up to 10 and in some case, 15,000 worker nodes, we want to make sure our scale is going to be far beyond where you ever need to be. And a lot of those learnings get trickled down into existing clusters. So things like tuning the max requests in flight or the max QPS that you can set to a cluster, the learnings that we're doing at the largest scale were trickling down to all different clusters. So you're getting those benefits.

So you can have an EKS cluster that was created four years ago is running much, much different infrastructure on the control plane than it was when it, when it started.

Okay. Here's another one. It seems like we released a while ago, but this was actually post re:Invent last year, I think we released in January. So when you, when you talk about reliability, you, you wanna make sure you have the capacity in your cluster to run your applications and with the VPC CNI, one of the challenges with having the tight integration with the VPC is many of you are running in environments where you, you might not even meet the team that manages the VPC. It might be a separate team and it was built in the, you know, years ago where you were running a single app on a, on a VM.

And so it can be challenging to run your applications when you're constrained by IPv4 space. So one of the features we launched last year was support for IPv6. It was finally that project that wasn't perpetually two years, two years away, we actually went in and did it. And it, it really simplifies networking in your cluster and allows you to scale and get that reliability.

The other, the other thing we really took into account with this is making sure the transition was easy. So we, we realize IPv6 can be a scary thing. It's, it's been on everybody's road map for it's always two years away, right? So we built it in a way that is slightly different from upstream where you have Kubernetes dual stack in every pod gets a v4 and a v6 address that doesn't actually solve the problem that you're running into of you have running out of IPv4 space.

So we took a different approach where every pod in your cluster gets a v6 address, but you still get IPv4 at the boundaries of the cluster. So for ingress into your cluster, we integrate with ALB and NLB which both support dual stack at the front door for handling traffic. And that can get routed to v6 pods in your cluster.

And then if your application in your cluster needs to sock to something that's IPv4 outside the cluster, the worker node is still dual stack. So traffic will get routed through the node and it'll talk to whatever outside your cluster that, that still has a v4 only address. So we, we really kept in mind of, of trying to make it easy to migrate.

Here's one, I'm, I'm super excited about the, this was just announced an hour or two ago, I think. So the timing is, is good here."

But we've been working with the VPC team for well over a year now on supporting VPC Lattice. And the, the easiest way to think about VPC Lattice is many of you are familiar with service mesh and the capabilities you get with running a service mesh like Istio or Linkerd or Envoy.

Running a service mesh comes with challenges. You typically have to manage the control plane, you're running a sidecar next to every one of your pods, which (a) takes up resources that might otherwise be allocated to your applications, and (b) can be really challenging to upgrade if you need to upgrade the Envoy version. Now you have to roll your entire fleet of applications just to do that.

So there's lots of challenges we heard with managing sidecars to do service mesh capabilities. And we, uh, decided that we were gonna take what service mesh typically does and build that directly into the VPC. So VPC Lattice gives layer 7 application networking capabilities, uh, without the need for sidecars. It's, it's just built into the VPC.

It supports all AWS compute services - EC2, Lambda, ECS - I think ECS is coming soon. But we worked really closely with the Lattice team to make sure that it felt like a really Kubernetes native experience when you're using Lattice without having to understand all the details of it.

So we built a, a Kubernetes controller that actually uses the Kubernetes Gateway APIs, which are relatively newer set of APIs upstream - they're like the evolution of Ingress and Service APIs. And you don't actually have to understand Lattice. Just like the Load Balancer controller, you just run it and apply an Ingress object with this. You run this controller, apply a Gateway object and it's gonna go program Lattice.

One of the really powerful features that we can get from building a service mesh into the VPC is that, as a best practice, a lot of you are running multiple clusters - you might be running them in multiple accounts, different VPCs - and to get that communication between those boundaries you have to use tools like VPC peering, Transit Gateway. Many of you aren't experts and don't want to be experts in that.

And so with Lattice, you don't have to worry about any of that anymore. You just say "Service A needs to talk to Service B" and that traffic is handled seamlessly behind the scenes.

So this was announced as a, as a private preview yesterday. The public preview should be out in a month or two and that's when we'll release the controller. Um, and you can all go try it out.

Reliability also means being able to run your applications near where your customers are. So as an example, we have a lot of gaming customers on EKS and it's really important - or banking customers, another good example - so for banking customers for regulatory reasons they need to run in certain regions, or for gaming customers it's all about latency, making sure that they're running their servers near where their customers are.

So EKS supports every commercial region except I think the most recently announced one or two. But we're working hard to support those. We support all the Local Zones, Wavelength Zones and there's a bunch of new regions coming and EKS will always, if not day one, very soon after - we're working really hard to make sure we get, if not day one, really close.

If regions aren't enough, and you have a need to run workloads somewhere else where AWS doesn't have presence, we also support Outposts. So there's two different modes we support - you can run worker nodes on Outposts and the control plane in the region. But more recently, after a lot of feedback from customers, we wanted support for running local clusters on Outposts, meaning the entire control plane will run on the Outpost.

And so you can survive network disconnects - if any network connectivity gets severed back to the region, your applications can still continue to run for, I believe, up to seven days.

And then finally, if Outpost isn't an option, some of you still have hardware in your own data center, existing contracts, and you want to be able to run Kubernetes in a consistent way as you run in the cloud. So the other, the other feature of EKS we have is EKS Anywhere. And today we support running on VMware, bare metal, CloudStack, and a couple other options coming soon.

And this is really about, I think on the very first slide, the mission was, you know, making EKS run in all of the environments you need to run. And we support a bunch of different options for you.

And then finally, efficiency. So you're, you're secure, you're reliable, you have the capacity you need. Now you wanna go look at "Let's optimize." I wanna make sure that I'm running just the right amount of compute and, and not too much.

And, and really, when we think about this, it's about giving you the flexibility to decide on uptime versus cost because that's a tradeoff everyone is always having to think about. You could provision 100x capacity and your app would never go down, but that's not very efficient. You can also run at 99% capacity, right near the top, and that probably doesn't make sense if you have spiky workloads.

So when we think about efficiency, it's about giving you tools to easily, you know, find that sweet spot in between uptime and cost. And then it's also about monitoring it - I wanna make sure that I can see the cost that I'm using in my cluster, both at the cluster level itself and then even within the cluster, namespace, pod level.

A quick aside - how we, a lot of how we think about efficiency is we want EKS to be the best place to run Kubernetes on AWS. And that means we're gonna integrate with other AWS services because they are experts in certain areas, because we're not gonna do everything.

And so there's really three different layers that we think about when we're integrating with AWS. The first is the lower level infrastructure services - so things like Load Balancers, EC2, EBS, all that core infrastructure that you need to run your cluster. Generally we're gonna write controllers to automate provisioning that infrastructure for you so you don't have to think about it.

Then it's supporting services - so maybe your application needs an S3 bucket or it needs a database. And we have a project called the AWS Controllers for Kubernetes which allows you to provision AWS resources in a Kubernetes native way. And this, this really appeals to developers - you might be familiar with Kubernetes and writing their app, their manifests in YAML, and they don't necessarily want to become an expert in Terraform just to go get an S3 bucket.

And so this gives them the option to package up their application manifest along with their supporting infrastructure all in a Kubernetes native way.

And then finally, higher level services - so there are lots of AWS services out there that are experts in things like security, batch, big data. And we want to give those services an easy way to run their workloads on EKS clusters or provide their value proposition to whatever you're running in EKS.

A couple of recent launches in the efficiency area - I think this was for those of you who follow the Containers Roadmap, I believe this was the highest upvoted issue, and so we finally closed the gap on this one.

For those of you managing Node Groups and using the Cluster Autoscaler, you can now scale your workloads back down to zero and more importantly, scale them back up when applications come in. So we actually did some work upstream in Cluster Autoscaler to make it easy and call the Node Groups API to do this.

So I know there was a lot of waiting on this one, but we finally, we finally launched this. Lots of improvements still going on in Managed Node Groups. I've been getting lots of asks - when will we support Amazon Linux 2? I think the tentative plan is with 1.25, but it'll be early next year that we're gonna launch an AMI with 2022 support.

And then it's other minor but really important features, like if your EC2 instance has a health problem, we should automatically detect that, terminate it, and bring up a new replacement. Or some of you have use cases where you need more control over upgrading because your applications - it's a game server or something that's running for a longer time. And so we're gonna add more flexibility and control in upgrades with Managed Node Groups.

Carpenter is the other big project that we've put a lot of investment in over the last year or two. It was last year at re:Invent that we announced General Availability, and there's been a lot of features we worked on over this year that I'll talk about in a second.

But at a high level, Carpenter was born when we heard lots of issues customers were having with the way traditional autoscaling works in Kubernetes. Five or six years ago when Kubernetes started, running Auto Scaling Groups and scaling those groups made a lot of sense because everyone was used to running their applications on VMs and it made for an easy migration.

But we took a step back and really thought about if you want to do a Kubernetes native autoscaling and node provisioning experience, what would it look like? And that's where Carpenter came from.

So it really flips the model of the traditional ASG + autoscaler model on its head, where instead of predefining everything, you're instead just providing constraints - a layer of constraints - to Carpenter and it's gonna go work with EC2 to find the right capacity for you.

So it's, it's - I won't call it serverless - but it's getting to that point because you don't have to predefine infrastructure. You just apply your applications and let Carpenter go do its thing and work with EC2.

There's been a number of features we launched with Carpenter over the last year. Probably the biggest one that folks were waiting on was consolidation. Carpenter will now look for opportunities to scale down nodes - whether it's at night and you have, you know, time of day workloads that run more in the day and less at night, Carpenter will see opportunities to scale down.

It will even do On-Demand to Spot consolidation if it finds opportunities to do that, and you've told Carpenter that your workload can tolerate those interruptions.

And then there's been a few other things - full support for the Kubernetes scheduler, so things like affinity/anti-affinity scheduling, EBS volume state - full workloads, support for IPv6.

And then another big one we just launched was native termination handling, so you don't need to run any other infrastructure to handle Spot terminations or anything else coming from EC2 - Carpenter will natively handle that for you.

A few things on the roadmap for Carpenter - very similar to Managed Node Groups, you want to give more controls over node upgrades. This is something we've heard as a challenge for version upgrades. Generally updating the control plane is pretty easy, but your applications are actually running on the worker nodes and sometimes you need more control - like "I wanna do it only at this time of day" or "I only want to take so many nodes down at once."

So we're going to give you that control in Carpenter to how you do upgrades to newer versions of the optimized AMI. And then a lot of that work will be getting Carpenter to version 1.0 - it's production ready but there's a couple features we're waiting before we give it the 1.0 moniker.

And then after that, the focus will be on - we want to make Carpenter really easy to use for ECS. So we're gonna fully manage the ECS integration - use Carpenter for ECS. Carpenter has that unique chicken and egg problem where it has to run somewhere before it can provision other workloads. So we're gonna make that super simple on ECS where you don't have to think about it.

I think maybe one or two more here - for those of you who want a true serverless experience where you don't even see the nodes, AWS fully handles the patching, making sure it's up to date - we support AWS Fargate.

And one of the really big launches that we made this year that makes it much, much simpler to launch workloads on Fargate is support for wildcards. So you can now define your cluster and say "fargate-*" - every namespace that matches that, run on Fargate. Whereas previously you had to, every time you launched a new namespace, you would have to go do a new Fargate profile. So this really, really simplifies the way you can launch your workloads on Fargate.

Okay, two more slides on reliability and then I'll hand it back. Carpenter, Managed Nodes, autoscaling - that all gets you the right compute. But how do you, how do you actually look at - how do you dig deeper and say "Uh, which of my teams - you know, you might have 1000 nodes that got spun up and you want to go dig - which, which team workload, who's actually utilizing the EC2 capacity in my cluster or on my platform?"

There's a couple different use cases here. Some of you might be just running a single application per cluster or maybe it's a team per cluster, and cluster-level visibility is good enough for you. And so the feature we launched several months back is we now add an AWS cost and usage tag to every single EC2 instance that joins a cluster. So this can come from Managed Node Groups, Carpenter, self-managed nodes - doesn't matter. We're gonna add a tag with the cluster name to every instance.

So that makes it really, really easy to go into the AWS Cost and Usage reports and generate a report of "How much does this cluster cost in terms of EC2 spend?" For a lot of you, that's not necessarily enough - it helps, it's a start - but you're running platforms and you have lots of teams that are sharing a cluster.

So you need more visibility. And to meet that use case, we partnered with Kubecost. So we worked with them this year and we launched an EKS-optimized version of Kubecost, and this includes several enterprise features at no additional cost to you.

There are two ways to install - initially it was a Helm chart, and as of yesterday a new feature that I think Nate will talk about in a second is you can now install certain operational software from partners directly through the EKS APIs. And Kubecost was one of the launch partners we had for the EKS Add-ons marketplace integration that launched yesterday.

So cube costs gives you that ability to break down your cluster costs by name space even down to the the pod label. So if you have certain labels that all of your pods do, you can see how much did this cost in terms of ec2, in terms of ebs and i think a few other uh vectors and that's reliability and i think we're on to automated operations and i'll hand it back.

Awesome. Thanks Mike.

So yeah, we're making ourselves our way higher and higher in the pyramid into higher level things that are very important. But the security, the efficiency and the reliability of a system always come first. So let's talk about what we're doing at a higher level to help you scale and bring kubernetes across the entire organization, make it easier to manage your eks clusters and your kubernetes applications.

So, automated operations, what do we need to do with automated operations? Well, you need life cycle operations that can be automated and that you can trigger these things and they can proceed without user intervention. So taking people out of the loop, you need um common tools and things that ship with the system. So every time you have to provision a system and then put a bunch of stuff into it before you can use it. That takes a lot of time, it slows down the deployment time. It makes it more complicated to manage the system. And you want to be able to easily standardize and enforce best practices with those tools so that you don't disrupt workflows. But you make sure that things are running in a way that's expected running in a way that's normal. And if there's an issue that you can solve that issue really easily across multiple environments where you're running.

Um and then finally, you need insights, you need to understand uh from multiple sources, what you can do to improve performance, reduce costs and minimize time to resolve issues. So what are we doing in this space? I think really talking about that tooling side, Mike was alluding to this. It takes a lot of work to make a cluster production ready on kubernetes. Of course, there's, there's eks, there's the control plane, there's the compute, but there's a bunch of stuff that customers are running inside of in order to bring their apps onto those clusters.

So what people do today is they'll start a cluster, they'll get the compute going, they'll have everything running and then they'll go out and they have to put a bunch of tooling on that cluster before they can actually start running applications, monitoring tools, network tools security tools. These are tools that are coming from aws, things like c si drivers, they're coming from github, uh their com their community tools and they're also coming from places like aws marketplace or third party vendors.

So we built a system a few years ago to help automate all of this tooling because getting the tools on the cluster is not that it's, it's actually not that hard, like you can do it, you can install this stuff, it's a hassle. But when it comes to keeping that stuff up to date and keeping the cluster up to date, it gets even harder. So you can imagine the sprawl as you from one cluster to 10 clusters or 100 clusters, you're downloading and installing tools. But then you're managing this this entire like tree of dependencies and which clusters updated where and all, all the things up to date and understanding what the latest versions are on all this software.

And so eks add ons provides a standard api that ships with eks to allow you to install and manage the life cycle of the automated tooling that you need to put on the cluster to make it production ready. And, and this tooling uh to date um has come from aws and uh but that's meant that we have a somewhat limited catalog.

So aws catalog, the mission of our catalog is to ship add on software that we're building. So things like cn i uh c si driver and core components of the kubernetes project. So think like things like metric server, right, that are just undifferentiated and they're part of kubernetes. But what about all that other tooling? What about tooling like data dog um agents from vendors that you may want to put on the cluster that you're using as part of your system. If you can't put everything through add-ons, then it's not actually really helping because you still have to go out of band.

So our goal is to put everything through the add ons api make it really simple to template and provision out complete production ready clusters as part of eks. So this week, we announced that you can now launch vendor provided tools from the aws marketplace into eks add ons. So these are our launch partners that we started with cucos teleport factor house tetra dina trace and up bound. So six really great partners that joined us to do this launch and we have a bunch more partners that are coming soon that are on boarding to eks add ons. And this is the kind of thing we'll talk about our public road map and our feedback mechanisms as we wrap up here. But this is the kind of thing where we really look for community involvement. We will look for your involvement. If there's something that you wanna see as part of add ons that you're using in your stack, let us know, let the company that you're using know, say, hey, i want that as an eks add on, can you guys get on marketplace? And the goal here is that you can put your whole cluster template it out, run a standard cluster with all the tooling and then easily keep that up to date.

One of the the coolest things about add ons is not just the life cycle automation, but it's also the metadata api and this is aaa really kind of an unsung hero of the add on system. If you're going out to github or you're going to various vendors and you're downloading software, it can be really complicated to keep track of which software version works with which version of kubernetes and part of add ons is shipping a metadata api where you can easily see the compatible versions between an add on software and the eks version. So this is something that helps you to automate and understand is your cluster up to date.

In addition to that, when you install this software, there's often a lot of things that you need to do, maybe some flags or some values that you need to implement in order to make it work. And so this has been a pain point for customers. Um and we're really excited to be launching this very soon, which is what we call add-ons configuration. And this lets you pass unique values. For example, if you want to modify the cord ns uh core file, it lets you pass these values through the eks add on api s and persist across your cluster so that they're not overwritten. And so we think that this is an important feature, an important capability that will let us on board more and more add ons um into, into eks so that you can have that true vision of standardization on eks anywhere we have a similar vision.

So with eks, anywhere that we call these curated packages, not add ons. And the reason for that is that uh ekseks in the cloud ships with a lot of capabilities that you may take for granted, things like uh uh image uh image repositories, right? We have amazon ecr uh things like load balancers, we have a lb but on premises, you don't have these things and a lot of times you need an image repository, for example, on premises to run your on prem cluster, especially if you're operating in an air gap or a frequently disconnected environment.

So with ks anywhere, we're beginning to ship what we call curated packages and it's the same mission as ks add ons, you want to be able to take an ks anywhere cluster, run it on premises and of all the core capabilities that you need to actually go into production. So um this year, we announced support for eks curated packages. Um and we're bringing in a bunch of things, we're bringing in local registry with harbor metric server ingress with emissary, um a aws distro for open telemetry, which is our, our uh hotel operator um service load balancing with metal lb and a few others. And we have a list here of just a bunch of stuff that the team is working on. And we're really excited to, we're gonna be bringing to um eks anywhere curated packages in 2023. So that's add ons.

That's actually how do i automate getting software on the cluster? But what about all the things that you need to do to automate your troubleshooting and to understand things? So, one of the things that we did in in 2022 is we continue to expand the ecs console. So last year, we announced the ability to add all of your clusters from anywhere your ecs clusters in the cloud clusters from uh other locations including on premises or other cloud providers, self hosted clusters into the ecas cluster and visualize all those things. And this year we added support for integrated metrics um and allowing the support for the full kubernetes api and i'll show you what that looks like.

So when you go into the eks console, now you can actually see the full kubernetes api for every single cluster and you can quickly navigate through this api um and see both all the things that are running and then also see the deep linked information about the nodes about the images that are running about all the information inside of aws that's supporting that cluster. And so the goal here with the console is to give you full access to the api and also give you an integrated troubleshooting environment where you can quickly deeplink into different parts of the cluster or deep link into supporting aws infrastructure, things like ec2 instances or load balances or image repositories that may have an issue that may be causing and are supporting your cluster.

So organizational standards, this is kind of the top of the pyramid, right? This is this is uh what what maslow might call self actualization. What's really our our goal here. So we have secure clusters, reliable clusters, efficient clusters, we've automated our operations and then how is eks allowing you to um get the same supported kubernetes experience across all of your environments, enforce best practices and allow for portability between environments when you need that portability.

So the first thing that we've done in over the last few years is we've taken eks from something that just runs in the cloud, which is on the far right here and we've stretched it across all these different environment options. So we have like mike said, support for local zones, support for wavelength zones, support for aws outposts, ecs anywhere allows you to run kubernetes clusters in a variety of on premises environments that are in your data center, things like bare metal and vm ware and um and, and uh cloud stack and then eks distro is even the is is more core.

So ek s distro allows you to take the core kubernetes that we're running as part of eks and put it anywhere that you want, but have that same degree of curation and management from aws. So that you know that the kubernetes that you're running in any device anywhere with eks distro is the same kubernetes that we're shipping in the manage eks service in the cloud in all of these managed edge locations. And as part of our on premises suite for amazon eks anywhere.

So mike talked a little bit earlier about the uh the infrastructure services that we're providing. And um i want to talk a little bit about what we're doing on the supporting service side and also on the higher level services.

So aws controllers for kubernetes is how you actually can articulate running aws services that support your kubernetes applications. So if you need like mike said an s3 bucket. If you need a dynamo uh db table to support your kubernetes app, there's two ways to do this. One is to start the cluster, define your application, go out to cloud formation, define the s3 bucket and then link everything together and hope that it works. And that can take a long time, especially if you're trying to manage all this at scale.

But ac k, you can use the ac k controllers to articulate and manage aws resources through your kubernetes api. So you can define your application instance and supporting aws resources in the same eo file and have them instantiated and have a shared life cycle. So if you need that s3 bucket to support your, your state full application, then you can run those at the same time. And you can automatically use kubernetes and labels to uh store things like secrets and endpoints and allow applications to find database endpoints and stuff like that.

So, um we have a lot of work that we've been doing on ac k. The team's been incredibly busy this year. Some of this was launched last year, but the majority of these services came online in 2022. I won't read them out. It'll take me all day to explain all of them, but we have a lot of core aws functionality and there's a whole bunch of stuff right now. What's here is coming soon. This is in preview right now. So you can actually go and see the preview on the, on the ac k website. And then there's a whole other larger list which you can see on the website of stuff that we're currently working on in building. And the goal here is that you should have the majority of core aws infrastructure services at your fingertips so that you can manage them through the kubernetes api alongside your applications.

So those are the supporting services, aws things that go alongside your kubernetes apps. What about the higher level services. Amazon has a lot of really cool things that we're running and we're building for customers that we think can help supercharge kubernetes clusters.

And so this uh this last year we announced support for amazon emr on eks and this allows you to run um spark jobs directly on your kubernetes clusters using amazon emr. Um and so when you're running a patch of spark before you either had to do self managed spark um or use emr and ec2 with dedicated resources. And with emr and eks, you can actually run um emr jobs and spark jobs on shared kubernetes clusters get the standardization, right? We're talking about organizational standards, the standardization of kubernetes while taking advantage of the amazon emr service.

And this year in 2022 we took that a step further. We worked with the aws batch team and we announced support for aws batch on eks. So this is a similar idea. Um you don't need to run previously. You could run batch workloads on amazon ecs. You could run batch workloads on amazon ec2. But if your organization is standardizing on kubernetes, you can now use the powerful uh scheduling algorithms of, of aws batch and you can schedule batch workloads directly on eks, use aws batch as a batch scheduling and then those batch workloads all run on your kubernetes clusters.

And what's really cool about this is just like emr, you can segregate those workloads to different parts of your clusters and you can share clusters across multiple batch uh jobs, multiple workloads. And so you can establish kubernetes as a standard here while continuing to use these powerful manage services from aws and there's more to come.

So one of the things that we've done on the eks team is we think kubernetes as an open source project is, is really valuable because customers know what's being worked on and what's coming. And we wanna continue that spirit with amazon uk s. So we publish a public road map and this road map lets you, it's on github. You can follow the url here in the qr code. It lets you see um what we're working on what's coming soon and it's also a great place to give us feedback and to see just new ideas.

So, um one of the things that i really, really like about this road map is that when um you're normally working with aws, you can work with your aws account manager or solutions architect. And you could say, hey, i'd really like if uh this service did this thing and you can feel a little bit disconnected, it can be hard to reach the product team, right? But with the public road map, this is a place where i spend a lot of time here. I know mike spends a lot of like a lot of time on the public road map and we read the issue that come in and we love it when customers come and they comment and they share code samples and they share their problems.

And our goal here is to really establish a more direct line of communication so that we can build things. And i think um i forget the feature earlier you mentioned was it was the scale of zero, right? I mean, this was, this was hundreds of up votes on the public road map and, and having this really allows us to get a direct view into what our customers need and we use this every single day to help with our our prioritization.

So really appreciate your time this afternoon. Glad that you let us share what's new and what's coming next with amazon eks. We're gonna keep working hard for you guys and keep building kubernetes on aws and beyond. Thank you so much.

taibaili2023

关注

10
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
What’s new and what’s next with Amazon EKS

Nate: Hi. Hi, everybody. Good afternoon. Thank you so much for coming down to the Mandalay Bay. My name is Nate Taber. Uh and I lead the product team for Amazon Kubernetes. Really excited to be talking to you today about what's new and what's next with Ama
复制链接

扫一扫