Navigating the future of AI: Deploying generative models on Amazon EKS

最新推荐文章于 2024-08-15 12:22:50 发布

李白的朋友王维

最新推荐文章于 2024-08-15 12:22:50 发布

阅读量131

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134794849

版权

ok. the future of a i generative models, kubernetes, there are admittedly quite a few buzzwords in the title of this session. would it look good? adding to your resume? the fact that you're attending, maybe we're not gonna stop you from doing that. but we don't think that's really why you're here.

more likely you're working in an organization that's already using kubernetes for more traditional workloads and you're exploring how to support machine learning workloads. maybe over the course of this past year, you've been called into emergency meetings with your leadership. who asked, what are we gonna do with gen a i? what's our gen a i story? gen a i fomo. if you want to call it that there's, there's plenty of it going around these days.

so your organization puts their brainstorming hats on, comes up with some potential applications of gen a i. and now a new set of challenges fall squarely on your shoulders. you are the dev ops engineers, the ml ops engineers, the software development managers responsible for providing application development environments in your organization and your data scientists are coming to you. they need faster storage, more computing power access to the latest libraries, machine learning tool kits.

your finance department is watching nervously worried about the costs of gen a i your leadership wants results yesterday and you need to deliver something all by maintaining security compliance and a high bar.

do you have to start from scratch? learn a totally new set of technologies to make this happen in this talk myself. and i'm mike, a product manager with eks and rama who's a container specialist with aws. we're gonna talk about how extending an eks platform to support machine learning workloads can actually accelerate your organization's journey to building and deploying gen a i workloads.

we'll show how eks makes it easy to integrate and use some of the latest aws machine learning innovations in compute networking infrastructure. we'll talk about the latest developments in the kubernetes machine learning open source community that you can leverage to quickly transform your existing uk s environment to provide a stress free and maybe even a delightful experience to your data. scientists who shouldn't have to become kubernetes experts when you put it all together leveraging and extending eks for ml gen a i workloads can help you go from gen a i idea to production application faster than you thought possible.

and finally, what's better than a couple of aws employees standing up here for an hour t telling you all of this, getting to hear from an actual customer who's already gone through this journey. and we have a special guest with us today, john weber, who's the senior director of developer productivity at adobe. and he's gonna share how they extended their internal eks platform to support machine learning workloads and use it to build and deploy their highly successful adobe firefly product with that. i'm gonna first hand it off to rama, who's gonna start with some background in kubernetes and machine learning. thanks, mike.

hi folks. good afternoon. i'm rama pus swi. i'm a senior container specialist at aws. i drive worldwide. go to market for a am l workloads on eks, helping customers across the world on their eks journeys.

so getting to today's topic, uh i'm actually gonna start from why even kubernetes before we get to eks, right? why even do machine learning on kubernetes and why we see a lot of our customers are doing it today and some of the couple of the reasons that we see, like as mike mentioned, it seems like ml and cs, there's just a couple of buzzwords together, right? but as you start to deep dive and figure out what are the actual leads for machine learning, you start to see that coconut is actually perfectly fits in into that world of ml, right?

so that's what i attempted doing with this slide, which is just put together a set of machine learning needs and the challenges that you come across when you're trying to meet those needs for machine learning with cer or machine learning in general as such, right?

so any machine learning development really starts you choosing a framework of choice to build on top of it, right? so those frameworks come with a lot of complexities in the form of dependency management. not only do you have to ensure that you are including all of the machine learning library framework dependencies and all of that. but you also have to ensure that you are loading up all of your gp u drivers. anything else that gp u needs in order for your code to talk to your gp u layer to leverage all of its functionality as such, right?

say we solve that challenge. the next is of course, how do you get massive compute at scale to address all of your distributed training and in needs with machine learning? thanks to aws, you get access to that massive compute with a click of a button, right? but you still need to be able to manage and orchestrate those massive compute storage and networking to fit your specific needs. and with ml the needs are really about how do you do distribute training at scale where you trying to expose your model to a large amount of data within a short amount of time so that it gets to learn and produce inference results for you from an inference standpoint, you are trying to see how you can sort of scale up as the demand comes in for you. but you also want to ensure that you're scaling down as the demand goes down to save resources and cost from the training standpoint. you still need to ensure that you're reducing failures again, to waste, to stop wastage of resources from a training standpoint.

say you figured out those challenges as well. you still need to figure out how do you do logging? how do you do monitoring? how do you build observable into your machine learning models? how do you do identity access management at scale? how do you ensure that a lot of teams across your organization are able to efficiently share resources but in a secure manner as well? and talking about security, of course, while you're trying to meet and overcome all of these challenges, you still need to keep in mind at every point in time, you have to meet your security and compliance requirements.

so that itself is complex and trying to overcome all of these challenges means that you're spending a lot of time on managing and orchestrating infrastructure primitives, right? and that is time you're taking away from building core business value, which is really your machine learning model development, right?

so what it does is a lose, lose situation, right? you are driving up your operational cost, but at the same time, you are prolonging the time to market for your models and ml products as well.

so you really need a tool that can sort of abstract all of that infrastructure primitives for you while providing you enough levers to do all of the customization that you want. and coon does exactly that and it's the 40 of c as well, right?

so before even going to c you are, if you're on the journey to c you are making an inherent choice which is to use containers as the packaging method for your machine learning workloads, right? and with that, you already solved the first piece of the puzzle which is dependency management, right? with containers, you get a single unit within which you can package in all your ml code, your libraries, your framework depends your gpu drivers, toolkits, etcetera, etcetera. and it provides you a single unit that you can easily move across your environments without worrying about consistency. but you also need to make sure, but you're also able to do it in a lightweight manner, right?

so you also get easy scalability because it's a single unit that you can spread across multiple computer instances and you get to scale and do distributor training or you can do inference on demand as you see it, right.

so coming to cn once you have packaged your application into a container, what c is really offers is automated scaling inherently built in that is native capability of c as in the form of hp or cluster scaler or a couple of other options that are coming up right now as well. so with that, you get to meet all of your cluster provisioning needs. kon abstracts the actual compute provisioning and provides the scaling that you need at a massive scale.

and the next step, once you have solved this, you also have what we call improved resource utilization with go, right. it provides a lot of options at the container level. it's already using minimal resources without you having to spin up vms for every application that you bring up. but it also provides features like resource codes that you can use to further drive up your resource utilization using c right.

technical issues apart, you still have to do management at scale. you need to be able to, to expose these resources. and with gpu scarcity which all of us are facing, you need to ensure that whatever resources that you have available to you are made are being made use of completely, right? and you a lot of time that comes to you having to share that across multiple teams. how do you do it securely? how do you do it in a way that you're isolating your workloads and not letting in other folks come in as well. and that's a native capability of c wires, name space constructs as well.

so another key point here with cubans advantage is the influence it has in the open source community. in fact, most of the foundation models that you see that are responsible for creating this dna revolution in the first place were actually built on top of cubs. that's sort of the influence that has it, that a robust open source community that have a vast community of folks constantly contributing and innovating at scale. and you get to adopt those innovations, integral enterprise much quicker with c is in place.

so where does c is really fit in in the machine learning workflow? right? it comes in when a machine learning developer is satisfied with local experimentation, he wants to scale it out, expose his model to all the data that is available and c is all the machine learning developer needs to do is tell c which container he wants to use and what are the types of resources that the container needs to run as well as how many resources it needs? and c goes out pros the machines deploys the containers and takes care of scaling completely.

so you get to get simplified scaling but you also get that to define different types of resources. cpu gp you get to use flexibility on that. but most importantly, c controllers have inherent capability to always keep watching the cluster and making sure it's meeting the desired capacity or desired state that you have configured. so whenever a failure happens, you automatically able to address it without any manual intervention, speeding up your distributor training and influence processes as well.

so those are sort of the technical reasons why we see a lot of our customers adopt c for machine learning workloads. but we also see customers adopt c for machine learning workloads for strategic reasons, right? and i say that because a lot of our customers have chosen to standardize on c as the way to run their applications, right? and they're building platforms on top of c to enable that. it's usually because of a key reason because c gives them a common interface that they can use to standardize their deployments across cloud on premises as well as their edge locations, right?

so they also build, not only do they standardize on the deployment practices, but they also build foundational capabilities. like how do you do logging, monitoring, how do you do identity management? how do you do security at scale? how do you apply security governance across your enterprise? all of that is already figured out and built into the platform that is built on top of gout.

so with that, a lot of customers tend to reuse that platform and build the capability on top of this workload. so it provides them three distinct advantages. the first one, you get to reuse whatever is already built for you. so you're not wasting a lot of resources. but more importantly, you are also aligning with their existing enterprise standards and governance, right?

the second key advantage that customers get with that is reducing cost because you're not building something that already exists from the ground up, which means you get to save a lot of cost and you get to use a lot of open source technology as well, which does not add any premium to you, right?

and the third point is about accelerating time to market because you are able to utilize something that's already built. you are able to get it going much faster and you're able to launch your machine learning products much faster as well. something that john will allude to in his sort of presentation as well and how adobe sort of did that on their side.

so with that, i will hand it over to mike again who will talk a little bit more specific around generative a i and how eks comes into the picture we on. yeah.

ok. if this was day four of re invent, maybe i'd spare you another gen a i overview, but it's only day two. so we'll make this quick gen a i has captured our imaginations over the last year for its ability to write stories, produce music videos, create conversations, even write code. but before we get to gen a i, we really need to take a step back and start at the top. what's, what's a ia i is simply the ability for a computer to replicate something that previously required human intelligence.

at the next level, you get to machine learning, which is a subset of a i. and it's about automating a ia a computer replicating a task without being told to do so. the most common example you can think of. here is your favorite streaming platform, giving you recommendations on what to watch next.

then we get to deep learning, which is yet another subset of ml inspired by the human brain and common tasks here are image recognition, speech recognition, where machine learning can recognize a face, deep learning can say it's your face.

and then we get to generative a i. gen a i is the ability for a machine to ingest existing content and generate new and original content. with all of this talk from gen a i, that's happened over the last year. did it, did it come out of nowhere? why? now in in reality, the the seeds have been there for decades and there's a few key developments that have happened in the last couple of years to make it happen.

first is the pro proliferation of data and the ability to store it efficiently. wikipedia reddit all of these sources of data that can be used as training input. then you have cloud computing, the ability to spin up huge clusters of gp u instances just in time without a massive upfront investment in your own data center. and then lastly, in the last couple of years, there's been some key machine learning innovations that have really enabled gen a i to come together to give you a sense of the scale that's happened just in the last couple of years. in 2019. the largest bet models were something like 300 million parameters in size. and today's state of the art, large language models are something like 500 billion parameters in just three years. it's over a 1500 x increase in the size of models that can be trained based on all these machine learning innovations.

what can you actually do with gen a i? generally, it falls into three high level buckets of use cases that you might want to use. ja i for the first is improving your customer experience. probably the most common one to think of here is chatbots and chat bots aren't necessarily new. they've been around for many years, but previously, they were rule based complex to maintain and often didn't feel very conversational gen a i based chat bots can be developed much quicker, more generalized for your use cases. and often the person might not even realize they're talking to a bot because they sound so conversational.

the next is boosting employee productivity. i imagine most of you don't love writing status documents. for example, you could use a gen a i assistant to look over your emails, slacks and auto auto automatically generate a, you know, status report for your boss code generation is another one, of course, it reinvents your hearing about a lot, the ability for an assistant to do the boilerplate code stuff. that's not fun. so you can focus on writing the more interesting code and then lastly similar to employee productivity. but slightly different, improving business processes, text extra extraction, document processing, et cetera

"Ok. Now we're gonna get to our, our gen AI audience challenge and I, I didn't know ahead of time. This is during lunch, but this is the image I, I chose. One of these images is a standard Adobe stock photo. The other is generated with Adobe Firefly. We'll do a show of hands - who thinks the image on the left is the one that's generated? Maybe 50/50. The one on the left was generated with Adobe Fly Fly. Using that input, the one on the right is a standard Adobe stock photo. It's still early days. I'm sure John will tell you they're working all the time to improve their models and over time it's gonna get harder and harder to tell the difference between a generated image and, and one that's uh just a standard actual photo.

Ok. Let's get into how EKS can help you with running generative AI workloads. First, I want to talk about the difference between traditional machine learning models and gen AI models. With traditional models, you often require months of costly upfront investment to do manual labeling and gathering of data. Then you're using that data set to train your model for a very specific task. With JAI, you're training what are called foundational models which can be trained on a large amount of unlabeled data, essentially raw data that you can feed into a model and it's gonna output a foundational model that does generally well at a bunch of tasks, you can then take that model fine tune it for your specific use case. But using a, a much lower amount of data compared to what you might need in the label data for traditional models. The the key difference is that you don't need to start from scratch with every new use case. You come up when you're using generative AI models.

Some of the challenges involved with generative AI scale even more so than your traditional models. Because as mentioned, you might be using vast amounts of input to train these foundational models you need. When you're using Kubernetes, you need something that can auto provision your nodes to handle that scale. You also need a control plane that's gonna be able to handle hundreds, maybe thousands of nodes in your cluster. Because of the scale, you're not gonna fit the training job on a single node, especially for foundational models. You might be running thousands of instances in a new way to efficiently split and distribute that data across the cluster. And then you want to reduce failures. Inevitably, when you're talking about thousands of GPU instances hardware is gonna fail. You wanna, you wanna be able to minimize the impact of that.

Generally EKS is focus for helping large scale distributed training is providing seamless Kubernetes native integrations to all of the AWS compute storage and networking primitives that you can use for ML. The EKS control plane automatically scales. If you're running a bunch of instances, we're gonna scale behind the scenes to, we might be running r5 25 24 xl large instances. If you're running hundreds of nodes in your cluster, we still charge the same 10 cents an hour. It's gonna be much more cost effective to do that than trying to build your own cluster.

We also vend accelerated armies that work out of the box with the various EC2 accelerated instances, Nvidia and also EC2 S or Amazon's own Trainium instances, which were I believe released last year. On the storage side, we build Kubernetes native drivers to integrate with the various AWS storage services. FSX for Luster is a common one that we see customers using for doing large scale training. And on the networking side, we build plugins that integrate with EC2 networking components that you can use for training. The one to highlight here is Elastic Fabric Adapter which is a specialized EC2 network interface designed for high bandwidth internode communication. With EFA&EKS, you can get the performance of an on premises HPC cluster, but with the scalability flexibility of AWS.

Then we move on to inferencing. Not necessarily every model that you build is gonna be successful, but ones that are you're gonna want to use in production and actually provide predictions to your to your users. Scale is still a challenge here. It's a different type of challenge. It's a lot closer to the scaling challenges of traditional workloads where you need to scale up and down based on user demand performance. Now, inferencing could be in the actual path of your users. You don't want a slow prediction to get in the way of a good user experience in cost, especially again for a successful application where you might be running it over the long term. Infer might account for up to 90% of the overall costs. Because once you have a relatively successful model, you might only need to train every now and then to fine tune it. But most of the cost is gonna come from infer.

How does EKS help here? You weren't going to escape an EKS breakout session this year without at least one mention of, of Carpenter. If you aren't familiar with Carpenter, it's a project that we open sourced a couple of years back now. It's designed for high performance flexible Kubernetes, node, auto scaling. With traditional node, auto scaling in Kubernetes, you're predefining your compute. You might be saying I want node group A with these instance types, node group B with other instance types. I've talked to customers who are running hundreds of node groups in their cluster which becomes really hard to manage, hard to really take advantage of of AWS's compute offerings, Carpenter shifts that paradigm where instead of predefining everything you can provide Carpenter, set of constraints as specific or broad as you want. And then you let Carpenter look at your workload requirements. You might need x number of GPUs and it's gonna go call EC2 APIs directly to get you just in time, right sized compute for your workload.

We didn't like in all honesty, we didn't necessarily build Carpenter with ML worst use cases in mind, but it turns out especially for infer given it's somewhat similar to more traditional web applications. Carpenter works really well for that case.

Ok. What's what's coming? Are, are we satisfied? No, of, of course not. We like the satisfied customers. You keep us innovating. We're gonna keep integrating with all the new AWS infrastructure features that come out. In fact, this slide is now slightly out of date. Just last night. We announced the CSI driver for S3. S3 open sourced a technology earlier this year called mount point, which you can use to mount an S3 bucket onto an EC2 instance and use local file system commands that automatically get converted to S3 object API calls. When you with the CSI driver, you get a Kubernetes native interface to S3. You don't need to write custom application code or have elevated privileges to mount that bucket. The CSI driver will do that for you. It's available as an EKS add on. When you combine EKS in S3, you can process petabytes of data across potentially thousands of instances and benefit from S3's scalability high throughput. We're, we're pretty excited about this one. I'd, I'd highly encourage you to go check out the announcement and learn more about that.

We're doubling down on Carpenter, especially for ML. Given the signal we've seen of, of how many customers are telling us that they're using Carpenter for their ML workloads. One of the features I'll highlight here is native support for EFA. If the poll request is not merged already, it's it's imminent in another release or two of Carpenter, we should have native EFA support, which if you, if your pod requests an EFA interface, Carpenter will recognize that automatically set up all the required EFA configuration interfaces when it starts those instances.

Performance resiliency, not the most exciting marketing topic, but it's areas we're always gonna continue to invest in container image. Lazy loading is, is a good one to highlight here. We talk to customers a lot who sometimes measure their container images in ML in tens of gigabytes. I've heard of 50 even hundreds of gigabytes images and it can be frustrating to start your instance. The instance comes up in 30 seconds, but then you're waiting 10 minutes for the image to download. AWS, open sourced a technology earlier this year called cable OCI. And it's actually possible to use with EKS today, but it requires quite a bit of set up on your part to make it work. That's one of the items on our roadmap next year to make that much easier to use with EKS. It helps, especially in cases where you want to scale quickly in ML workloads, but you have large container images, you don't necessarily have to wait all of that time before your training job or can start.

And the last one and I would argue this is the most important is ML and EKS should just work. You shouldn't have to do a bunch of special configuration to get the most performance and flexibility when you're using EKS for ML workloads. Another slide actually slightly out of date, the EKS accelerated armies as of a few weeks ago, uh already support the latest Nvidia drivers that work with EC2 P5 instances.

Ok. So sounds great. Who's actually using generative AI and EKS? We've highlighted a few customers here actually start in the middle customers who are using EKS to train foundational models. And these are the ones who are running at the largest scale because they're training on vast vast quantities of data. Data rama mentioned, Anthropic. If you watched the keynote, Adam's keynote this morning, their CEO is on stage, they use EKS for all of their training. They train their cloud model on EKS on the bottom layer. We have customers who are building their own gen AI platforms on top of EKS, which you might want to choose to use as a starting point. It's gonna be a simpler interface to run gen AI workloads, but with a Kubernetes like experience. And on the top, we have EKS customers who build consumer focused products and they are extending their EKS platforms to support ML/AI workloads. Of course, the one I'm gonna highlight here is Adobe, but no need to hear that from me. Instead, I'm gonna hand it over to John who's gonna talk about they, how they extended their EKS environment to support gen AI workloads.

Thanks Mike. I'm really excited today to share the story of how Adobe delivers the power of AI to our users. This is Adobe. Oh this is Adobe Firefly. You too. Mhm right now and it's looking like wow flow. Oh pretty wild.

Um with Adobe Firefly, we want to power folks from say they're creative professional all the way down to my teenage daughter to create and manipulate content as they imagine it by simply describing it. We run a mix of AI capabilities including Firefly across all of our clouds so we can incorporate those products and capabilities. So our users can they experience that Adobe magic that we're so famous for.

I don't know about how things work at your company, but at Adobe, sometimes it's really hard to get a product out to market for some developers. It could take up to 30 days to go from zero to hello world. Some of those steps involve filing tickets to get access to tools like observability or source control or creating a new cloud account. And in the meantime, while the writing code testing integrations and deploying it, the folks that they sent those tickets to will come back and ask a whole bunch of questions because they didn't understand what they actually needed. And then when all that gets settled and sorted out, you still have to worry about production readiness. Do you have the right monitoring set up? Do you have a backup strategy defined? Do you have incident management set up? So you're ready to go at two in the morning on a Tuesday when something breaks.

I'm part of the developer platforms group and our motto is really simple"

"Help developers write better software faster. How do you make this landscape any easier where our developers can actually do that? I'm a fan of abstractions and platforms you need to simplify and in many cases oversimplify for your developers. I'm also a fan of our cloud providers like AWS where they can manage and operate this at scale better than I could even think of. So where do those two principles intersect?

I want to draw your attention to the blue box at the bottom labeled Adobe Internal Developer Platform. At Adobe, the core of our developer platform was built in 2016 and it's called Ethos. When we first built it, we built it up on MAOS and then it became quickly obvious we bet on the wrong horse and we quickly pivoted to Kubernetes. What Ethos tries to do is expose capabilities. So developers don't need to think about any of these things. They don't need to think about how to run cloud infrastructure at scale. They don't need to think about how they onboard and create a CI/CD pipeline. They certainly don't want to think about security and compliance. And last but not least, least I don't know any developer that loves to do cost efficiency. What they want is they want to be able to have a platform, make those decisions on their behalf without any input. Ethos achieves all of these.

So we originally ran our Kubernetes on EC2 nodes. I'm sure you guys did that as well and I don't know about you, but I was not a fan of running etcd at scale. I was not a fan of trying to manage rate limiting. I was not a fan of trying to manage API server latency. We have better things to do. So we quickly proceeded to set up a program to convert all of our EC2 based clusters to EKS. And as we did that, unsurprisingly, we found it to be cheaper and way more reliable than we were on native EC2 Kubernetes.

So when a team comes to us and says, hey, I've got this great idea around something like generative AI, what do we do? Well, I have yet to meet an executive that says, well, that sounds like a great idea. Come back in a couple of months when you have this fully baked idea and actually thought this through and then we'll talk. No, they want it like last week or last month. The other thing is the stakes in this space for Adobe were really high. So we needed something that our developers and Adobe was comfortable with. So we made sure we got this right the first time. We also needed to understand the customer experience. Our developers are using tools like distributed tracing or Prometheus or log ingestion to be able to understand the customer experience. So we know we're getting delivering that world class experience that our customers expect from us. So for us, the choice was easy for products and services like AI, we'll go ahead and build those on Ethos.

Ok, John, you've sold me on abstractions, you sold me on platforms. How does it seem like say Firefly get out the door and deploy their application? What Adobe we've embraced gets powered by Argo? But first they'll usually go to a screen like this to get bootstrapped. We provide a set of libraries and capabilities called the Adobe Service Runtime where it makes all those integration pains that I mentioned earlier much easier in the onboarding section. We'll ask them some simple questions, what clusters they want to deploy to? How many environments they need what the size and number of containers they need and then we'll walk them through, installing a primitive GitHub app within their repo that creates all their helm charts and workflows to get a working CI/CD workflow up and running.

Once the application is actually deployed and onboarded, we'll go ahead and create things like DNS entries. We'll create their namespace, everything that a developer will need in order to get to production.

So what does, what does production look like for a developer at Adobe? It's this screen, this is where they'll see their runtime. And in that screen, you'll see things like Argo sync status or the application health, you also get info on namespace info. So if they have to roll up their sleeves and get dirty and actually do some debugging, you also get a sense of what images they're deploying in their namespace. And thankfully, this developer has decided to install Prometheus to scrape metrics and send those off to long term storage.

One screen, all of the relevant info that your developer will ever need. So obviously having a unified developer platform makes everything easy, right? Not so much. GPUs remain hard to come by. You may have your code ready to ship, but you may not have the hardware in order to deploy that code too. Another thing that we're kind of famous for is we like to abuse the Kubernetes API and we tend to get rate limited by our friends at AWS. Last thing container start up time. Unfortunately, containers that of this type are rather big, don't start in say seconds or sub-seconds. And so that becomes incredibly hard to scale.

So what did we do? First thing we did was work with our partners at AWS to make sure we had supply, ready to deploy applications like generative AI. Secondly, we got experts on the phone from Amazon to actually look at our usage of the Kubernetes API chastise us, chastise, I chastise us a bit and tell us some better ways to actually get the metrics that we were interested in. Finally, we kept it simple, keep dependencies, local, do some optimizations on your application, start up time. It'll do wonders in terms of scalability.

The nice thing is when we deploy applications on EKS we never have to think about the control plane. If we get that right on day one, we don't have any incidents and it scales beautifully.

So where is AWS investing? And what are those returns on those investments? The first metric that I look at at my level is our cluster to operator ratio. Simply how much labor does it take for us to manage the infrastructure that we deploy? I'm happy to say that as we've moved to EKS, we've driven that number from a 10 to 1 ratio to over 30 to 1.

Secondly, we're gonna retire our homegrown CI/CD pipeline and fully embrace Argo, which is way more flexible for new and exotic use cases like JIT AI.

Finally, we need to continue to reduce friction for our developers, having less places for them to go will make them much more productive, much happier and you'll achieve higher velocity.

What's next? Hopefully, you saw some of our MAX announcements last month where we're going to take these capabilities and expand them Firefly audio, Firefly video, Firefly 3D models. On the Ethos side, I'm really excited about our EC2 control plane retirement party that should happen close to the first of the year. Unfortunately, GPUs will remain scarce. So we're working with our partners at Amazon to take a look at the inferential two series for our imprints.

Lastly, I will always ask the provocative question and I encourage you to as well, is someone or something doing this better than I ever could? So we'll continue to look at open source solutions or managed solutions to see what our options are.

Lastly, let me leave you with this. Never underestimate the power of a single URL to unlock your developers. Thank you. Thanks.

So I know we have printed a pretty rosy picture here today, but running ML on Kubernetes is not without its challenges, right? With Kubernetes sorry, Kubernetes was not built with ML or machine learning in mind, right? It was so a lot of Kubernetes native constructs like deployment stateful sets all of that makes sense, application developers and they're able to easily adopt it. Whereas for ML scientists, it's not something native to them. So we need a new abstraction layer that makes it easy for them to consume Kubernetes native constructs while operating in a familiar environment that they're used to.

So a general pattern that we see a lot of our customers take uh not very different from what John was mentioning is they start out building a generative AI product or a model with custom tooling so that they can get to market quick. Once they prove out that generative AI product, then they start to take a step back and look at building an end to end ML platform and how they can do it in a standardized manner that helps them scale for many years to come.

So a general sort of an approach that our customers take is use Kubernetes for what it's good at. We're just managing the infrastructure parameters while taking advantage of open source solutions. And there's a whole ocean of ML solutions that are available out there that provide native Kubernetes integrations. So you're getting all of the ML specific functionality with tighter integration on Kubernetes and you're not ending up paying any premium for achieving that ML functionality on your existing Kubernetes clusters.

So again, as I mentioned, there's a whole ocean of solutions available out there from an ML standpoint and a lot of our customers were looking to ask for guidance on what is the standard stack I should use. How do I achieve this end to end ML functionality in a way that is more standardized, that is more sort of recommended by AWS as well. And so for that, we came up with what we call the KARP stack. It really stands for Jupyter Hub, Argo CD and Ray put together on top of EKS, right? These three tools together form an end to end ML stack and provide you that end to end ML functionality that you need and that you can use to build and scale your ML platform with EKS and Kubernetes as well.

So what it does is JupyterHub again provides a familiar interface for data scientists. They are used to working with JupyterHub for their model experimentation and development and uh you know Argo workflows sort of take care of all the ML specific workflows that are needed to manage ML specific tasks, right? And then that is then handed over to Ray, which actually takes care of orchestration of model parallelism and also provides an endpoint through which you can serve inference from your models as well.

So if you're interested, we have a blueprint for this. It's a Terraform module that you can download and you know, with a single terraform apply almost you'll be able to deploy this entire stack end to end. We also have a blog that we have put out recently explaining you step by step on how you can deploy the stack on your EKS clusters as well.

If you're more curious, we also have a hands on workshop later today at 5:30pm in Hall D, Room 404 at Venetian where we'll be walking through a hands on way of actually starting to create an EKS cluster, deploy the KARP stack using our blueprint and downloading your own Stable Diffusion model from the booth, fine tuning it as well as getting some inference out of that fine tuned model as well. So check it out if you're interested to learn further.

So that's sort of one of the stacks that we have put out for machine learning. But late last year, we started to see a lot of our customers wanting to adopt EKS for their data workloads, right, especially across data analytics, data streaming as well as the ML that we have spoken about today. So we created a new project for that, that's called Data on EKS which sort of provides all of these blueprints and best practices and sort of guidance more specifically focused on these data, data workloads as such and deploying or making it easier for our customers to deploy these data workloads on EKS.

Uh so yeah, that's what we call it Do EKS uh short form for Data on EKS. It provides Terraform blueprints, uh that you can use to such terraform apply and you get all of our stacks uh or any stack of interest for you uh across data analytics, data streaming. Uh we have uh Apache Flink on EKS, we have Kafka on EKS, we have some Spark on EKS as well. So feel free to check out our website, you know, engage with us on GitHub. And if you're really interested, if you are along the same path of trying to adopt Kubernetes or EKS for these workloads, reach out to our account teams who can connect you with a container specialist in your region, who will be able to guide you through the conversations and sort of talk about which uh stacks or solutions that you can use or which blueprints that you can leverage from the Data on EKS.

Uh but uh you know, if you want to reach out my DMs are always open, you can reach out to me on LinkedIn as well. Definitely open to getting on a call any time with you folks walking through or Do EKS blueprints and providing guidance and best practices on how you can adopt them as well.

Um so quickly, almost to the end of the presentation in terms of takeaways, I just wanted to quickly talk about some quotes from our existing customers on EKS who are deploying their generative AI on EKS and have adopted it successfully and sorry for packing it with so much text on it. But if you just look at the highlight of the text, most of the advantages or benefits that customers have been able to reap is around as I mentioned, being able to optimize on your costs while reducing your development effort and time that takes.

So that gives you with that we come to our takeaways which is accelerated time to market because you're getting to use a lot of capabilities that have been built into EKS. And you are able to sort of abstract all of the infrastructure primitives moving you away from focusing on managing those primitives, rather than focusing on your own ML model development and delivering business core value at a faster rate.

And again, you get to reduce cost because you are using an existing platform, you're not rebuilding from scratch, you're not reinventing the wheel, but you're also using a lot of open source solutions which do not need you to pay a premium to achieve the most specific functionality that you need.

And most importantly with EKS, we manage the control plane for you. And with that, you get to scale much further than you would be able to do with your own sort of self managed Kubernetes.

And last but not least as I spoke, Data on EKS is the project that we have launched to provide you an easier to sort of deploy an end to end ML stack or end to end data analytics stack and data streaming stack as well that you can definitely leverage for your journey or accelerating your journey itself.

With that we have come to the end of the session. Thank you so much for your time and attention. Feel free to leave a survey, actually do leave a survey. All your feedbacks really helps us fine tune our content."

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Navigating the future of AI: Deploying generative models on Amazon EKS

right?why?
复制链接

扫一扫