Accelerate generative AI and ML workloads with AWS storage

最新推荐文章于 2024-08-18 20:55:51 发布

李白的朋友王维

最新推荐文章于 2024-08-18 20:55:51 发布

阅读量133

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134809454

版权

Thank you all for coming to this session. We know you had a lot of choices today on amazing sessions to choose from. So Jordan and I are really happy that you chose ours.

My name is Peter Imming. I work on the Amazon S3 product management team.

Jordan: My name is Jordan Dolman. I work on the Amazon FSx file storage team.

The session that you are in, if you're in the right spot, is "Accelerating Generative AI and ML Workloads with AWS Storage." Is everybody in the right room, right session? Ok.

Alright. So we're going to be covering a variety of topics here this morning. Jordan and I have spent the first few minutes talking with several of you and uh the good news and the bad news is that none of you seem to be doing it the same way.

So, uh there will be a number of choices here that we're going to present to you and try and provide you some prescriptive guidance based upon what some of you have told us and what we think you're going to want to talk about today.

But mainly what we're going to be talking about uh for the next 50 minutes or so is really why we're all here, which is how does your business and your organization adopt, if you're not already there, how do you adopt generative AI and machine learning? And what is the right AWS storage that you want to pair with that?

And like I said earlier, no one has seems to have chosen the same way to do things, which is perfectly fine. I think we've anticipated this in our session today. And we're going to try and provide prescriptive guidance for kind of each of the different scenarios that we encounter quite a bit.

How many folks just so we can get a show of hands in here. How many of you are on the infrastructure storage side where your background is in storage, more of a storage admin or an infrastructure admin? Ok.

Where are my data scientists, my ML practitioners? Ok. That's a about a 50/50 mix. Ok. So we're going to be trying to cover both of those here today and kind of give you a balanced view of why storage is important for gen AI and ML and what factors go into making the right choice.

And also give you some guidance as an ML practitioner, what you should be looking for from the storage perspective. So not all storage is created equal, not all data is created equal in value. So there's going to be probably a mix and match here that we're going to talk you through and that's what we're going to spend the next 50 minutes talking about.

So, if that does not sound interesting to you, no harm, no foul. If you want to get up and move on to a different session, that's going to be our focus. But we've really all, we've come here today because we've essentially arrived at a tipping point for gen AI and ML, right, we now have a massive amount of data, the tools, the gen AI and ML training tools have matured.

We have now CPU and GPU instances that are amazingly powered, high performance, that demand high performance storage. And now this has sort of reached an inflection point where businesses want to know, your organizations want to know, you want to know how you can best adopt AWS storage to actually meet the needs of your applications that are depending on gen AI and are making real time predictions, real time inference and need better performance out of the storage to support those applications.

So what we're going to be talking about, where do we start this journey? Well, we're going to start sort of at the beginning. Artificial intelligence has been around since the fifties, right? This is something that where we have data, we are making predictions based upon the data. And we are essentially setting up an environment where we are setting up systems to mimic human decision making and the way we do that is through machine learning, right?

We have a large amount of data that we are searching for patterns on to build logic to look into these models and then make predictions about the data, whether you're fine tuning the data, whether you're making predictions or inferences about the data. These are some of the categories that we're gonna be talking about today, but we're really gonna be focusing on machine learning. Uh that's gonna be what Jordan and I are going to focus on and really kind of tailoring and, and tuning the storage choices that you have today to kind of fit in to that machine learning model.

Now, when we talk about machine learning today, this is something that we've kind of seen right away from a split in the room here. Uh the world of storage, if you're a, a data scientist or an ML engineer coming in and looking at storage, it's one of those things that you expect that's just sort of there. It's a utility. It is something that you just consume. Maybe you haven't given a lot of thought to which storage service is the right fit. Maybe there's not been a lot of design thinking about how to lay out the data, how to organize it, how to optimize it for training for data loading, for your checkpoints.

And so that's what we're going to spend time here talking about today on the flip side of that though, if you are a storage admin or an infrastructure admin, when we talk to you, you see the world as storage, storage, storage, it's infrastructure and gen and ML is just another application in the laundry list of applications that you've got to support and design and architect your storage infrastructures for.

So it's sort of a, a dichotomy there between those two viewpoints, neither is wrong, neither is incorrect. It's just two different viewpoints. And our goal here today, I think is to give you some balance, some perspective from both sides. Uh Jordan and I have been in the storage industry for more than 30 years. So we tend to have a storage viewpoint, but what we've come to learn over the last four years. So at AWS is the world is changing and that gen AI and ML is a different type of application that demands a bit more thinking and thought about storage and how you want to lay out the data, how you want to consume the data.

Do you want to stream it? Do you want to copy it and then load from local storage? You've got a lot of choices and all of you are making different choices right now, but it's something that we're going to cover today and really kind of try and dive deep on some of the options for AWS storage that we think can really help accelerate your gen AI and ML training.

Alright. So our agenda is gonna be short and sweet. So we're gonna talk about storage, which we've already done. Now, we're gonna really kind of focus on some of the options that we've got out there and what some of the design and thinking patterns that we want you to take away from this session today. So Jordan's gonna walk you through some of those options right now.

Jordan: Thanks Pete.

Ok. So the role of storage is really going to differ based on where you are in the ML life cycle. So I'll just acknowledge that up front, Pete and I, you know, he mentioned we've been working on the storage for quite a while. We work with a lot of customers who are running ML workloads and the area that we hear the most questions about, where customers are looking for the most guidance, is actually in this middle section here of building and training models.

And that's because that's where a lot of the cost of infrastructure is, is going a lot of the cost of compute and where there's a lot of innovation happening right now. And so we're going to try to focus today even more so on this kind of building and training models.

And we're going to do that in the context of a few different services that the customers use when they train models. We're going to talk about SageMaker, which is our fully-managed offering for ML, the the easiest experience when working through the ML life cycle.

We're also going to provide guidance for those of you who are doing it yourself with ML frameworks on EC2, EKS, etc. And so we'll, we'll kind of cover that as well just as a, as a quick primer.

If you're not as familiar with Amazon SageMaker, it's a, it's a fully-managed service on AWS that basically lets you train models without having to think about a lot of the underlying infrastructure. So SageMaker is going to spin up those compute instances. It's going to load the docker image, it's going to load the data from your storage, train your model do checkpoint, save the results of the model, all of that gets handled for you.

So you don't have to think about what's happening on the back end. And so that's what we, what we talk about when you say it's a fully-managed experience is those, all those resources that get provisioned and deprovisioned that happens in SageMaker's account, it happens away from you. So you don't have to think about kind of the the nitty gritty details of it.

But for those of you who want the full flexibility, we also have, you know, support to, to build your own ML frameworks or your own, train your own models using ML frameworks on EC2, EKS, etc. And oftentimes, what we see is customers who are looking for more flexibility, things like scheduling training jobs or being able to SSH onto the instances to get more data about what's happening while you're training.

Those are some of the reasons why customers use their own kind of ML frameworks and don't use SageMaker, they're looking for that extra flexibility. And so again, we're going to talk about how storage plays a role in both of these, but just to set the stage of kind of what these are and and why they exist.

Regardless of which path you choose, the trend we're seeing is that these models are getting bigger. And I mentioned that, you know, we obviously I'm sure in every session, people are saying models are getting bigger. And so the reason why I mentioned that in this storage session is because bigger models need to be trained with more data. And obviously more data has an implication on storage.

The most obvious one being you have to store the data. But there are others that, that we think are actually even more important and they are going to be the focus of our, of our discussion today.

The first is that that data that you need to train those models needs to be delivered to those GPU instances, those CPU instances in some cases, as quickly and efficiently as possible. And that's because the cost of training a model is largely determined by the amount that you're spending on those compute resources.

And so if your compute resources are sitting idle, waiting for data to arrive from storage, that's wasted cycles. It's more expensive training workload cost. And it's also, you know, delaying your time to actually get results, get insight from your model.

In this way, model training is actually more akin to classic high-performance computing workloads than it is to enterprise IT. In enterprise IT, the goal of storage is often to just minimize costs. You've got a lot of data and the goal is just try to minimize the cost of storing that data.

In high-performance computing and what we see in in ML, the goal of storage is actually to minimize the cost of the whole training workload. And that means minimizing the cost of the compute and the storage together.

And so this is one of the first areas that we want to, you know, kind of pay attention to and focus on is getting the data loaded into the GPUs. And there's a number of factors that will kind of inform how quickly and effectively you can do that.

One of the obvious ones is the the size of your data sets. So, you know, it, I'm always a bit cautious or I always want to make sure we're really clear, like at some point, even storage performance can be a non-issue, it can be immaterial. If you're moving very small amounts of data, the time it takes to move that data doesn't really matter.

But again, as we're talking about larger data sets, moving that data can consume more of the time of your training workload. So we want to make sure we're, we're being conscious of it as the data sets grow.

We want to think about it in terms of what's happening in the underlying storage layers. So again, Pete and I spend a lot of time in storage. And so we're always thinking kind of end to end about what's happening when customers are running workloads.

And so, you know, an example of that is when we build our file systems, we build them sometimes with solid state based disks and sometimes with spinning hard disks, a spinning hard disk has a mechanical arm that can only move so fast to access different, you know, pieces of data with random access.

And so that means that if you're working with a, a lot of small files and your storage is hard disk based, there's going to be a performance implication as that arm is moving around. And that's kind of the the end to end thing that we're, we're trying to make sure that people are thinking about all of you who are coming from a an ML and data science background.

We want you thinking if you're working with small files and all of your data is structured in that way, it's possible that you might have a latency impact as you try to access each individual file that may have an impact on your training time. As an example, there's other things that, that also have an impact on loading that data into your GPUs.

So things like whether that's happening in parallel or sequentially, the file formats you use, and again, we'll, we'll talk through some of these in more detail as we get into the the storage recommendations.

So that's on the the kind of loading the data side. The other area where we see a lot of importance of storage with, with ML training is with checkpointing.

So again, these models are getting bigger to train large models, you often need more compute resources. And as you grow the number of compute instances that are supporting that model, the failure rates of any individual node start to stack up.

And so what we see is, you know, you'll train a model and you'll want to actually save the the state of that model. You want to checkpoint the model and write the data, the state of the model off to disk so that if there is a failure, you can recover back to that point and you can minimize the amount of training that has to happen again.

So you can see here, we take our first checkpoint, we are in the process of taking our second checkpoint, we have some type of failure.

You know, the goal of check pointing is to be able to then load the state of the model from the old checkpoint and then resume your training from there. And so if you think about this from a storage perspective, we want to be able to reduce that checkpoint time, meaning we want the storage to be able to accept writes from your compute cluster as quickly as possible.

And then if you need to actually restore from a checkpoint, we want to make sure that that data can come from storage and return to the compute cluster as quickly as possible. Now when I work with my customers, both internal and external customers, what I'm seeing is with some of the recent models, especially large language models with a ton of parameters, these checkpoints can be terabytes in size.

So imagine writing terabytes of data to disk and trying to do that sometimes every few minutes, sometimes every few hours - that has a tremendous load on the storage. So you really want to make sure that your storage can actually accept those writes, has enough throughput to be able to accept those writes, and also return those checkpoints to the compute cluster in the case of a failure.

It's also just worth noting that checkpoints aren't only for failure use cases. A lot of times many customers will store checkpoints because they're constantly tuning their models and iteratively building them. And sometimes they want to revert back to a prior state where they had better results if they're really paying close attention to how the model is being trained throughout the training process.

So kind of pulling this all together at a high level, we're going to focus today on training, improving the training speed and data loading as well as on the checkpoint side. And what this looks like for different environments, different places that you might be coming from and different tools you might be using. So we'll jump on in there.

Optimizing ML training cost and performance. So what we're seeing right now is that at a high level, there's many ways to get to the cloud, but we see kind of two buckets of customers that I think are helpful frames of reference for us to at least talk about today.

The first is customers who are basically lifting and shifting from on-prem to AWS. Pete mentioned that ML is not new, that's very much true. A lot of ML has been done on-premises for years, but customers are looking to move to the cloud for a number of reasons which we'll get to.

And then of course, there's many customers who have been basically building on AWS and putting together data lakes for years. And we see here is that this kind of fundamental starting place really shapes how customers make their storage decisions and choose to interact with the different services on AWS.

So we're going to use this as our frame of reference for the rest of this session - how many of you all would count yourself on the left side of the slide where you're coming and bringing in data from on-premise as a lift and shift journey? Ok, are the rest of you on the right side then coming with data already in an S3 data lake? Ok. And about 30% neither. Yeah, there we go. That was my assumption. That's right. Great.

So let's, we'll start on the lift and shift side. In 2018, we found that there were so many customers that were looking to do this lift and shift and what they were asking for was file systems that looked like what they had on-premises. And so we launched a service called Amazon FSx where we basically are building file systems and fully managing them.

We're taking care of all the underlying infrastructure for a range of file systems that are out there in the market, some commercial, some open source. And the goal here was to provide people with the POSIX interface and the administrative features that they're accustomed to while also giving them access to cloud native services, cloud native features and the agility and scale of the cloud.

And so for machine learning, specifically, the FSx offering that we typically recommend and typically see our customers choosing is Lustre. For those of you not familiar with Lustre, it's the world's most popular high-performance file system. It's an open-source file system actively being developed by the open-source community and it has tremendous scalability, which is why it's used in many of the national labs.

It's also used in a lot of different traditional high-performance computing applications from genomics to financial simulations and what we're seeing today and for the last few years is this has also been extremely helpful for customers who are lifting and shifting their machine learning workloads to the cloud as well.

The benefits that FSx for Lustre brings to ML and working with a file system are up here. You can see it's compatible with virtually every ML application. That's because ML applications have been developed on file systems, whether it was local file systems or shared file systems.

It offers an intuitive interface and many of the customers that we work with - the end users, ML researchers, data scientists - aren't necessarily experts in storage. But they are familiar with using a drive on their computer. It also provides highly scalable low latency access for random reads and we have the ability to provide different price and performance options.

So FSx for Lustre has been really successful in that way for these machine learning customers. If I go a little bit deeper into these, we do support - despite the fact that you've lifted and shifted these file systems to the cloud - we do support native integration with SageMaker and with all of the other cloud native capabilities that you would need to run ML frameworks on FSx.

For Lustre, we also provide, I mentioned before, that kind of POSIX interface, that permissions model and intuitive kind of place to share data amongst ML researchers. And one of the things that has been particularly helpful with many of our customers who have large teams of ML researchers is the ability to guarantee consistency across a file system.

So you can see this bullet on the rightmost point here - FSx for Lustre gives you the ability to mount a file system for hundreds of researchers to work with and collaborate, which is really important from what we've seen over the last few years. And it ensures that anyone who's writing to a file will have that content updated across the fleet of all other clients who are accessing that same data.

So there's complete cache consistency. So you don't end up with a bunch of users overwriting each other's data. It's natively built in all of the locking capabilities to make sure that people can collaborate without having to again think about the storage layer underneath.

FSx for Lustre also provides very scalable performance. And so if you're trying to work with a small model and then scale it up once you have your larger dataset, or if you want to actually accelerate the speed with which you can train a model, the ability to have scalable performance with low latency is another reason why FSx for Lustre has been used by many of our customers.

And to put things into perspective, when I sit and watch the kind of metrics of our customers' file systems when they're running these ML training workloads, I'm seeing sometimes up to hundreds of gigabytes per second of throughput. So it's pretty tremendous I/O that in some cases with very large compute clusters you can get up to. And obviously you want to make sure you're not creating any bottlenecks.

The little graph on the right side here is basically demonstrating that as you add more GPUs, what you want to see is scalable storage performance. And that's something that we get with this file system offering.

We do this, by the way, similar to the way many of you might do this on-premises. We do this by co-locating our infrastructure in the same availability zones as our compute cluster and by building our storage and compute together. And so if you've been kind of thinking about like how we're going to make sure that the performance in the cloud actually looks like the performance we have on-premises, the reason or the way that we're actually able to deliver that is by building the infrastructure in a very similar way because we know that there's limits to physics and we have to keep these resources close together to achieve the latencies that are required for these applications.

And then the last thing I'll call out here on the file system side is that we do have these cost optimized options for running these workloads. So when you build a file system, you can actually choose the size of the servers that are underpinning the file system. Those are the servers that determine how quickly you can read and write data.

When customers come from on-premises, they basically have to choose a level of performance that they think will kind of match their needs. But one of the things that we launched actually recently as a cloud native capability is the ability to swap out those servers underneath.

So if your I/O needs change over time, if you don't need beefy servers to support that I/O, you can now click a button, run an API call, and actually reconfigure your file systems to better meet the needs of your workload - whether that's scaling up or scaling down.

And so again, I think the goal that we saw with Amazon FSx was to give customers a very simple and familiar file system with those cloud native integrations, cloud native features for machine learning.

And one of the things that file systems inherently provide because of the random access is that most of the time the end users, the ML researchers, don't really have to think so much about how their data is structured and how their data is stored because of the low latency.

I just mentioned the quick random access that kind of obscures a lot of the data decisions that sometimes come up when you're working with storage and ML. So that's going to wrap up a quick primer on the file system migration side of things.

Obviously many of you are also data lake customers. So let's shift over to that side. It's worth noting, you know, many of you for years have been sticking, taking your data and gluing it together into these S3 based data lakes. And nowadays, we actually have exabytes of data on S3 that again has this incredible scalability to access with high levels of performance throughput specifically.

And what we're seeing over the last few months is that because this data is already in the cloud, many of our customers have been able to leverage that data to move more quickly and with more agility in this current space of machine learning.

Now, before we get too deep into the data lake side, I also just want to call out that if any of the things that I talked about on the file system portion of this are interesting to you, if those resonate, if those capabilities or those benefits seem like things that would be helpful to you as a data lake customer, that's ok too.

We actually have the ability with FSx for Lustre to link a file system to your S3 bucket. And so everything that I just talked about can still apply even if you're storing your data in Amazon S3. And so just to give you a sense of how that works, what that looks like, you can take your Amazon S3 bucket, you can create a Lustre file system, and then we can kind of run a script or API call that will basically take all of the object metadata on S3 and port it over to the file system.

And then moving forward everything stays synchronized. And so now you've got a file system with complete POSIX compliance, all the user ID and group ID permissions with your data on S3. When you spin up a node and you mount the file system, just like any other file system, all of your data is accessible to your compute nodes.

If you run a file open command, we will go and retrieve the data from S3 and we'll park it on the file system. The first time might take a little bit longer because it's actually coming from S3, but afterwards it's cached and stored locally on the file system.

And so outside of machine learning, this has been incredibly helpful and popular with customers who have file-based workloads with data on S3. But in machine learning specifically, this has been incredibly helpful to enable collaboration and a lot of iterative research when hundreds of researchers or even sometimes small teams are collaborating on data cleansing, preparing it, and then eventually training because they have this familiar interface.

So again, it's an option, not necessarily what you have to be using, but it's something that you can use if you choose to. And maybe I'll spend a little bit of time walking through a couple of particular use cases that we've seen for customers who take this model.

One thing is when customers are actually training models, you see the bar on the left side here, this is using SageMaker and using fast file mode which takes data and it copies it to a compute cluster and then trains a model. Now, if you look at that bar, you'll see kind of two things:

There's a section of the bar in orange, that's basically time spent moving data.
Then the second is a bar spent actually training the model.

If you compare this first bar with the second one, the thing you'll see right away is there's no more data movement, there's no data download. And that's because ML is inherently file-based. And so to start training the model, it has to be in a kind of a file format that's accessible to these training instances.

When you put Lustre in front of S3, you get a file format right away. And so you can just start reading data directly from S3 without moving the data in advance.

The second thing that's again very helpful is that the performance of the file system allows you to take the data and as you're running multiple epochs, you can actually get low latency access with every single run. So again, if you're not really thinking too much about how you're structuring your data and you've got smaller files, small I/O, that latency and throughput benefit of working with Lustre can really start to add up.

The third bar chart here is really showing what happens after you kind of get that data loaded onto the file system and you don't have to pull it from S3 anymore - it's cached. You can see there's an even bigger kind of additional benefit because now you're getting low latency with every single run that you're working with that data.

So that's a little bit on combining the two services together and the kind of better together story there. There's also benefits or ways that customers are using Amazon FSx with S3 for checkpointing.

And so here you can see if you're running your model, you get past the first 100 iterations, you generate your checkpoint. What customers are doing is they're taking that checkpoint, they're storing it on their file system, they're automatically having that get replicated back to S3, then they move on, do the same with the second checkpoint, and then they actually release the data from their file system.

So the colder data sets, the colder checkpoints that they no longer need, they remove from the file system because you can release data similar to a cache. And then they keep the latest checkpoint around again just in case there is that failure, they can go ahead and quickly load that data back.

So this is something that we see customers doing again - large checkpoints, a lot of throughput needed to support those.

But you don't really want all that data sometimes more data for checkpoints than actually the data sets themselves. You don't want that sitting around on very high-performance storage for long. So the ability to tier to S3 has been incredibly popular with our customers.

How many of you all are trying to write checkpoints more frequently than once per hour? Is anyone here trying to checkpoint more than once per hour? How about every four hours? What about once a day? Okay. All right. Checkpoints obviously super bursty as Jordan talked about with multiple terabytes coming in. So this is where the storage optimization choices that you make. This is where we're we're focused on right now and be spending more time talking about. Yeah. So that's, that's a little bit on the on the file side and the integration with S3 Pete. I'll let you kind of take over and share what we've been doing on the S3 side.

Thanks Jordan. So we have really kind of two mantras simplicity and performance that we want to talk to you about today and some of the launches that we've done recently. But first I wanted to say thank you to all of you that are storing your data in S3. I think we are over, I know we are over 300 trillion objects stored in S3 today. So we just wanted to say thank you for putting your trust in S3. That's an incredible number and something that we're really proud of.

But one of the reasons why S3 is so popular as a storage destination for data lakes is it's easy to put data in. It's easy to put data out. It's fast, it's scalable, virtually unlimited scale publicly accessible end points that you can stream data into and out of. We have multiple storage choices in terms of storage classes that you can choose to put your data in depending on your needs for performance cost and availability. So there are all reasons why customers like yourself have chosen S3 to put data in to the tune of 340 trillion objects.

But for an ML perspective, the ability to have high throughput for large models and large training sets. And as Jordan talked about checkpoints being really bursty workloads S3 is really able to handle that well to have that large amounts of data spike and flow into S3. And then over time as it cools off, like Jordan talked about that can be then archived to more cost optimized storage classes. So that's really one of the things that we've we've focused on for the data lake side.

But now a lot of you are choosing S3 data lakes as the source for your training jobs which is fantastic. But as I as we walked around and surveyed many of you, there definitely was no one way that you were all training your data today, a lot of SageMaker. But also we had some PyTorch, we had some TensorFlow, we even had some Ray data. So you're having different choices and many of you, that's not a single choice. You're choosing multiple frameworks, you're choosing SageMaker as a fully managed service in addition to kind of doing it yourself, because your data scientists are asking for that, you want to have that training kind of ongoing and easily accessible.

So what we focus on on the S3 side is kind of supporting both of these kind of living in both worlds. And this past week, what we'll be talking about for kind of the rest of the session is what we've done in terms of performance and simplicity for both SageMaker customers as well as kind of the do it yourself. I want to install my own framework. I want to have my own compute instances, whether it's Amazon EC2 or Amazon EKS or Amazon ECS. And you want to kind of roll your own, both are great choices. And what we wanted to do is support you with either choice that you make or if your choice is both have a great experience in both.

And so what Jordan talked about out with Luster is very much true across the board for any storage that you want to load data from the SageMaker, you're typically going to have two choices here. So out of the box SageMaker is going to offer you a file mode and a fast file mode. And the choice is yours. Do you want to copy the data out of S3 onto local instant storage and then train from there or do you want to stream data out of S3 and skip that copy step? Those are kind of your two choices. There's also a pipe mode, but we're really not seeing a lot of customers interested in that file mode is a default choice for SageMaker. And that's where we see most customers today.

But as Jordan talked about having to copy the data out of S3 in its entirety before the training job can start, that doesn't lend itself well to having GPUs that are waiting for data at an idle state. Those costs can add up. And customers have said, well, why can't I just stream it out of S3 without having to land the data first on local storage? That's where fast file mode comes in. Fast file mode was launched a couple of years ago by the SageMaker team and it's exclusive to S3. So you'll be streaming the data out of S3 again without having to choose the local storage to copy the data to first.

Now, that works fantastic today with S3, but we've also this week announced a new storage class Amazon S3 Express one zone which pairs really well with fast file mode because it's built for really one purpose. And that is high performance ultra low latency, single digit millisecond levels of latency, very consistent and then high throughput and also high transactions per second. This was launched in Adams'. keynote on Tuesday is did everybody see that? Is everybody familiar with e three express one zone? I know a lot of you were asking questions about that coming in. Thank you.

So we're going to spend some time talking about express one zone. But that is one of the things that pairs again really well with a fast file mode pairs also works great with file mode to copy data out of if that's your preference. But with fast file mode now you can skip that copy step completely and just stream out of S3 express at a much higher rate images process per second goes way up when you're using express as the storage class for fast file mode for data input.

So when we look at S3 express one zone, really, what is it? It is a new storage class which is amazing. It is from the S3 team. It uses the S3 APIs that you use today. Get object, put object list, object delete objects, head object, those primitives all work with the S3 express one zone. But it's purpose built really for performance for ultra low latency, high transactions per second. And for data, that's really your most frequently accessed data, there's a couple of pieces though that come with that, that enable that.

So the storage itself is a different type of storage than S3 standard. And the advantage of that is about a 10x reduction in the time to first bite latency. So your get objects, your put objects, those happen at a much faster pace on S3 express than S3 standard. It's also a more consistent level of low latency. So you typically don't see any spikes or kind of variations in the levels of first byte latency. It's pretty much like a metronome, very consistent single digit millisecond levels of first byte latency, which is great.

But as you drive more and more transactions, we also needed something that would be able to scale to handle those large numbers of transactions per second. And that's where the second piece comes in. That's a directory bucket. So for the first time in our 17 year history, you're getting a new bucket type from S3. When you go to create a bucket. Now in S3, you'll be asked the question. Do you want a general purpose bucket that we've had since the inception of S3 that can support all of our other storage classes or do I i want a directory bucket. Jordan wants a directory bucket. He loves directory buckets.

Directory buckets are built for the scale. The scale to hundreds of thousands of transactions per second. The data is laid out hierarchically in a directory bucket. They actually have directories using a delimiter as a a as a delimiter to tell us what is a directory versus what is an object key. Now, the again, the directory buckets use the same S3 APIs that you use today, get object put object. But the data is organized hierarchically. The advantage of that is that it ends up mimicking more closely to a file system that is hierarchical with the directory buckets.

They're always private. The data is always encrypted and they live in a different name space than your S3 standard buckets. They can never be made public. They're always private block, public access is always enabled. These are zonal buckets, essentially a one zone storage class. That's the name it's built and designed to really co locate and really focus on your most frequently access data. It's designed to have a different endpoint and a different authorization model than S3 standard.

So when you look at our storage classes, you can see now where this fits in. It's really again focused on performance more than anything else. And that advantage comes from the architecture of S3 express. So let's take a look at it a little bit closer here of what that actually looks like.

So as the name implies, one zone means a single availability zone, just like Jordan talked about with FSX for Luster on the ultra clusters, the closer your data is to your compute instances, the lower the latency physics, right? That, that is just the laws of nature. This new bucket type can be placed into an availability zone of your choice. You'll actually tell us where you want to physically locate that storage, which availability zone that might be us east one availability zone five.

Now, if that's where your compute instances are, now, you've got those co located for really low levels of single digit millisecond latency. Well, what about if you've got an EKS cluster that spans availability zones 1234 and five, you can still place a directory bucket in any availability zone that we have available for S3 express. You can go across availability zones in the same region. You'll pay a small penalty in latency 2 to 4 milliseconds roughly. But there's no cross a zone cost for transferring the data as you all know with S3 data transfers within a region from S3 to another AWS service are not charged for.

So you can go across availability zones with no cost for data transfer. There's a small amount of latency that's involved in going across availability zones to go into a z. But now you have the choice you can co locate if you need to and if that's not possible. Well, no problem, you'll still go across the availability zone for your data transfers with a small latency impact.

Now, when we look at the architecture as well you'll notice something off to the left and that is something that we probably have not done enough evangelism on. And that's the AWS SDK. Now with express one zone, you're still going to be using your same APIs.

"But the SDK have been retuned and optimized to support this new ultra low levels of latency. So there's a new API called CreateSession for S3 Express One Zone. CreateSession will allow you to essentially have a session token that we use that gets created after you authenticate. That allows us to essentially cache the credentials so that we do not have to go out to IAM on every single read and write request to the bucket. We cache those credentials for roughly five minutes and the SDKs will automate that refresh. You don't have to manage the session token. But the ability then to have that session based token allows us to then read and write out of a directory bucket at very high frequency without having to go out on every single object.

Now, the flip side of that is because machine learning and AI applications are typically doing the read and writing, they're looking at the entire bucket, they're reading and writing at high frequency, not caring about individual permissions per object or per directory. So what this means for a directory bucket is that you'll be setting the permissions at the entire bucket level. The bucket is either read/write or the bucket is read only - it's your choice. But essentially the permissions model is set at the bucket level.

The advantage of this though, like we talked about is the latency, but also the transactions per second limits are also set at the bucket level. If you've ever received a 503 message out of S3 where you've exceeded the per prefix limits of 3500 writes per second and 5500 reads, that requires you to think about how you want to organize your data in S3 standard today. With S3 Express and the One Zone in the S3 directory buckets, you no longer have to think about per prefix read and write limits. It's set at the bucket level. So you have ATPs, essentially you have hundreds of thousands of TPS available per bucket rather than having to worry about each individual prefix and what the limits are and breaching those limits and having to throttle back or retry.

So that's one of the big differences about directory buckets - the permission boundary is set at the bucket level and the TPS limits are at the bucket level. Now, the buckets themselves, even in the bucket policy, you'll actually use CreateSession as the action to authenticate and allow access to the bucket again, read only or read/write, read/write is the default. When it's read/write obviously you can read, write data, delete data, list data, head objects. Though the CreateSession action will be the only action allowed in the directory bucket policy. So you'll no longer see a direct read bucket policy that will have PutObject, GetObject, DeleteObject. Those are all inside and encompassed inside of the CreateSession API.

So this is one of the big differences for us and it's why the SDKs are so critical to use. If you're thinking about S3 Express One Zone, the SDKs do that heavy lifting of managing the CreateSession API, the session token. And now they've been optimized with a technology called Common Runtime. And that will do kind of our best practices in code. Your transfers of multipart get optimized based upon the type of instance. We also have the ability to really kind of look and understand what's the optimum concurrent number of readers and writers to upload large objects in parallel.

So when we look at S3 Express One Zone, how do we get started with it if you already have your data in an S3 data lake today? Well, we've created a new data import capability. It's fully managed by S3. It's simple one click in the console on every directory bucket, you'll simply tell us to import data. You'll tell us a prefix or a directory, I'm sorry, a prefix or a bucket that you have already in S3 today. And we'll start then copying the objects at high speed from your general purpose buckets into your new S3 directory bucket.

So you don't have to give us a manifest file any longer. You don't have to tell us individual objects you want us to copy. You just tell us a prefix in an existing bucket or the entire bucket that you want us to import into S3 Express. Really couldn't be easier. It really is a one click import.

We've also made optimizations outside of just SageMaker because you all have told us that you want choice and you are using different frameworks. But simplicity and performance are still kind of our two driving goals for what we wanted to launch this week. So what I'm going to talk about next is how we've optimized the code for S3 APIs, how we've increased per client throughput and how we reduce the latencies for frequently read data.

So this past week, while we're here, we've launched, and prior to this week, we've launched three new features that we want to talk to you about today for those customers that are using PyTorch. We have a new connector for PyTorch, an S3 connector for PyTorch. This is from the S3 team. This is really designed to manage and take away the chores of listing all of your buckets and then managing concurrent requests out of those buckets.

So if you're using the standard default PyTorch data loader, the connector for PyTorch would replace that. And I'll show you what that looks like here in a second. But the idea here is that our best practices are now in code inside of this connector and it will require minimal code changes on your side to adopt. It's out there today on GitHub, you can get started with this. Now, I'll talk more about it in a second.

The other launches this week that we've had is MountPoint for Amazon S3. MountPoint allows you to have file operations against an S3 bucket. So you can install the MountPoint client on an EC2 instance and then mount that S3 bucket to perform file operations as if you were talking to a file system.

So you've got the ability now to support MountPoint for S3, both with S3 standard buckets, but it works even better with S3 Express buckets, especially if you have a lot of small files that you're reading and writing, whether it's random or sequential, the low latency performance of Express the high, the high TPS capabilities of a directory bucket, those pair exceedingly well with MountPoint.

And then we've also beyond that, added the ability to cache data locally with MountPoint because as fast as we can stream data out of S3, it's even faster if the data is stored locally, if you have to reread it again. So I'll talk about how we've added local caching for MountPoint as well.

The PyTorch connector though is really amazing what we've seen with. This is a 40% improvement in throughput and a 40% improvement in checkpoint save and loading times. The connector itself is super simple to set up and get going. It works with both iterable and maps style datasets - your choice. And again, it takes away all of the chores of having to list out your buckets and having to then manage all of the concurrent requests and to tune it. We did the work for you so that you don't have to worry about tuning.

Now for streaming, obviously, iterable style datasets are that model. And then for map style datasets for random access, again, we support both and also then checkpointing all with the connector. So the connector is a simple pip install on PyTorch. This is something you can get started with today and essentially replace the default data loader inside a PyTorch.

So we're really excited about this and we think you will be too once you have the chance to try out the performance. Now from MountPoint, you can see what mounting an S3 Express bucket looks like an Express One Zone directory bucket. The directory buckets themselves will actually have a different name appended at the end of them. You can see here on our bucket name. We've got the bucket name, but we've also got the availability zone ID and the region along with a new alias that we reserve for, especially for directory buckets so that your directory buckets will have a different name because they live in a zone, they have a regional endpoint and a zonal endpoint.

But mounting them with MountPoint allows you to have essentially up to 30% faster training. So when would you use MountPoint versus the PyTorch connector? Well, if you're using PyTorch, use the connector - pretty simple. If you're using TensorFlow, Ray, Mosaic, an alternative framework, maybe you built yourself internally - MountPoint is there to give you file access to an S3 bucket. It will allow you then to read and write out of that bucket using standard file operations. And then optionally, you now have the ability to cache data locally on the instance that you install that MountPoint client on.

Now let's take a look at some of the performance results here. Now when we start out in our first epoch, we've got kind of a difference here. Most customers ask me, well Pete, so those kind of be more similar there since it's the first epoch. Well MountPoint for S3 will actually store some of the metadata locally as soon as you mount the bucket. So even right out of the gate, you're getting a performance benefit, that performance benefit multiplies on subsequent epochs, right? As we are reading the data locally that we've cached rather than having to go out to either S3 standard or S3 Express - local cache is there and available to store those most frequently read - maybe they're images, maybe they're video files. But when we don't have to go request them from S3 and can get them locally, now, we can take full advantage of the local NVMe SSDs that are underpinning the local instant storage.

So you've got choices here on how to mix and match. Our intent again with performance and simplicity as our goals is to cover whatever choice you make. If you want to do it yourself - great. If you want to use SageMaker - great. If you want to use PyTorch and want a specific connector that's pre-built and pre-optimized and supported by AWS specifically for PyTorch - we now have that as well.

So we've kind of come to the end of our session here and whether you're coming from on-premise, lift and shift where you've got HPC applications and HPC environment or you're starting out with S3 data lakes - the idea here is we've got choices for you that we think deliver the performance and simplicity that you want without having to spend time tuning.

Now, when we look again at our previous slides, the balance that we wanted to have between storage and AI and ML, we are looking to provide that balance and you've seen the launches that we put out this week so that you can better design your storage for your AI and ML applications and also have your AI/ML applications tuned for AWS storage. And we think that combination is the right balance to kind of take away from this.

So you've got now, choices, these are all available. Everything that Jordan and I talked about today is available. Now, this isn't a preview. This is all available today. So we want to say thank you for your time and thank you for choosing our session. If you want to learn more, there is a hands-on lab that's going to be happening Thursday, yep, that's tomorrow STG 312 in the Wynn at 2:30 PM. So go check that out if you'd like some hands on experience with what we talked about today. Thank you very much. Thanks all."

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Accelerate generative AI and ML workloads with AWS storage

Ok.Ok.
复制链接

扫一扫