What’s new with AWS file storage

最新推荐文章于 2024-08-09 14:16:22 发布

李白的朋友王维

最新推荐文章于 2024-08-09 14:16:22 发布

阅读量87

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134829477

版权

Good afternoon. Hope everyone's week is going well so far.

Welcome to What's New with Amazon File Storage. I'm Edward Name and I lead our file storage offerings here at AWS. And joining me today is Mark Roper, who is a principal engineer for our file storage services.

We're really excited to talk with you today about some of the innovations that we've delivered this year. We want to share some of the details about them. We want to share how customers are using some of the capabilities that we've delivered. And we also want to give you a little bit of a glimpse into how we think about where we invest and what are the types of capabilities that we are continually evolving and continually making better.

I'll start by just giving you a sense for how many things we've launched this year, how many capabilities. We've had more than 20 feature launches since this time last year at Re:Invent. And this week we're launching another seven capabilities. So that's on average a delivery pretty much every other week. We're excited at the pace. And we see that pace continuing forward as well. So a lot of good, exciting things that we've been working on and we're continuing to work on.

Today, this talk is not just about what we delivered over the past year. We really want to more broadly talk about several topics:

The evolving role of file storage in the world as data growth accelerates.
The challenges that are emerging with this fast pace of data growth.
We want to talk about the solutions that we've built and what we're investing in in order to support a pretty broad spectrum of cloud workloads that depend on file storage.
And then finally, we want to give some details about some of the capabilities that we've delivered recently and dive into some of those details that hopefully will be pretty helpful.

So let's actually start by taking a step back and sharing how we think about solving your data challenges and how that permeates what we do as we deliver new capabilities to you.

At AWS, we know that more than ever before innovation relies on data and the ability to leverage data effectively. And that's why we commonly say that data is the basis of modern innovation. And because of data's role, organizations are reaching unprecedented levels of scale with their data sets, serving a really growing set of workloads in everything from analytics to high performance computing, of course machine learning, generative AI, and the volumes of data out there are growing at an accelerating rate.

According to IDC, 101 zettabytes of data were created and replicated around the world in 2022, and that data is expected to more than double by 2026.

Why is it growing so fast? Well, the growth is driven by a bunch of things. Some examples on this slide are an increasing base of video cameras and video analytics, the adoption of AI/ML, the use of simulated and virtual environments, the proliferation of IoT and other data generating devices. So a lot of these kinds of things happening in parallel really leading to this explosion in data growth.

And when we double click to really understand what kind of data is out there, we see that the majority of that data is file data. Why is so much of it file data? Well, file data has a really rich history. It's been developed and has matured over 50+ years, which really means that most applications speak file and operating systems are built around working with a file interface to data. So it's become the default and standard interface for application developers. It provides a simple, really simple shared access model for collaborating across users and applications. And there's a really wide array of file systems that are optimized for high performance and a lot of shared file systems, for example, offer near local latency.

So these are the reasons that the file really has so much of a role to play in data today. And for these reasons, it's pretty ubiquitous and files are used across a really broad spectrum of workloads. I won't go through all of these on the slide, but you can get a feel for it's everything from enterprise IT applications to data science and analytics, to web serving and content management. So really, really broad set of workloads depend on file at their core.

So when we talk about this growth of data, it's kind of like, well, what's the problem? Well, growth and scale are really hard, especially at the scale that we're talking about for data volumes. Why is it hard? Well, it's really hard to predict future data growth. So traditional models of buying and provisioning hardware are challenging and they just really don't work anymore. It's becoming increasingly critical to optimize data storage costs as volumes grow. And then maintaining availability, resiliency, security - those things become much harder for organizations to do on their own as the volume of data grows, as the locations the data is accessed from increase, and as the applications themselves grow and the number of applications proliferates.

We really look at what we offer with our file storage services as a key part of helping customers solve these challenges of growth and these challenges of scale. Some examples of how customers leverage AWS and leverage our file storage services to address these issues - our services provide elasticity, they provide pay as you go pricing so you don't need to provision for peak, you don't need to predict where you're going to be 3 months, 6 months, a year from now. Our services provide real time visibility into costs. And we do a lot of automatic cost optimization - we place data optimally in different storage classes, we do a bunch of compression and deduplication type of things. So a lot of focus on automatic cost optimization.

And then in addition to providing all of AWS's security and resiliency benefits, we allow customers to manage their on-prem and their cloud environments within a single security perimeter leveraging services like site-to-site VPN and Direct Connect.

But it's not just about dealing with the scale and dealing with the growth. Businesses also want to do more with their data and with their file data. And there's this kind of really insatiable business desire to leverage data for insights and to build competitive advantage in ways that we've just never seen before. A lot of what we see day to day, a lot of the discussions that I have with customers are about organizations figuring out how to use all their data with innovative AWS services that let them do more with the data, get rich insights from the data.

So a big part of our focus as well within the file storage space is making sure that a lot of AWS's services work natively and seamlessly with your file data so that you can really get more from the data.

We talked a few minutes ago about the use cases and that really broad set of use cases. So how do we think about serving such a broad set of use cases since a lot of them have very different needs?

Well, really what it comes down to is we focus on serving two different types of individuals:

The first are enterprise storage and application administrators. These are folks that have traditionally deployed file storage on-premises. So they've typically spent years developing administrative processes and workflows that rely on the APIs and capabilities of particular storage solutions from vendors like NetApp. They tend to be experts in storage and they want really granular control over their file storage. They want to really have the last level of control over how they manage it. And when moving to the cloud, what they commonly want is they want all the benefits of the cloud - the agility, the scalability, the economics, the ability to do more with their data - but they also don't want to give up the capabilities that they've become used to on-premises for managing their data. And they want to be able to have access to the same APIs and the same management features and functions so that they don't have to change all of their management processes.
The other is builders and data scientists. That's kind of the other end of the spectrum. These are cloud native application builders, cloud native data scientists. And they're not migrating data from on-prem NAS. They're developing and deploying new apps and new workloads in the cloud. What are they looking to do? Well, they need to at its most basic form, share storage across multiple compute resources. But they don't rely on specific NAS capabilities, specific performance profiles, specific APIs. They really don't have any of that need, they really just want really simple shared storage and they don't have a storage background. What they ideally want is they want to create an endpoint for their data, mount that endpoint from any type of instance, any container or serverless function, and just go. The types of applications that we're talking about are SaaS applications, maybe user shares for data scientists, dev ops tool type of workflows.

So those are kind of the two different sets of folks that we spend a lot of time obsessing about. And although it is a pretty broad set of workloads that we really try to serve with file storage, we do see them all fitting into three categories of use cases. And we spend a lot of time thinking about how do we really enable those use cases:

User shares and enterprise apps - Think of that as kind of traditional NAS, shared storage that almost all medium and large enterprises have, commonly exposed over NFS or SMB, often integrated with enterprise identity systems like AD or Kerberos. It's things like file shares, enterprise applications, ERP, HR apps, analytics tools that have been developed over decades.
Compute intensive workloads - Applications that rely on really high performance shared storage to support really demanding data processing. So think of clusters of dozens up to thousands of CPU or GPU instances, and the name of the game there is you don't want your storage to bottleneck your expensive GPU. The last thing you want is for your GPU or your CPU to be waiting for your storage. So you really want your storage I/O to keep up. It's things like genomics analysis, semiconductor chip design, high performance computing, a lot of machine learning and AI type applications.
Cloud native applications - Really more of the builder persona that I was talking about where you have applications that are built for the cloud. They just need shared storage, they don't need NAS capabilities. They just want full elasticity, simplicity. Think of things like SaaS applications, dev ops, platform development, data science use cases.

So those are the three areas that we really think about as we think about building capabilities.

And how do we serve those? Well, we have the Amazon FSx family which is really focused on user shares and enterprise apps and compute intensive workloads. And we have Amazon EFS which is really focused on cloud native applications and builders.

So let's just talk about each of those quickly. For FSx, think of it as the file storage analog of RDS. It provides the most popular file systems - we have Windows, NetApp ONTAP, OpenZFS, Lustre - and provides them as fully managed single tenant deployments. It's really our default recommendation, our default solution for storage and IT admins that are bringing NAS to AWS because it really has those kind of like-for-like features that those file systems have that folks are running on-prem. We provide it in the cloud but fully managed and with all the benefits of the cloud.

And then with its high performance file systems - OpenZFS and Lustre - FSx is also our default solution for the compute heavy ML/AI/HPC compute intensive workloads.

And then we have EFS. EFS is a serverless file system, delivers an elastic really simple experience for sharing data across compute instances. And it's our default solution for those builders I was talking about that are looking for simple shareable data in the cloud. We've designed that experience to align with what builders want - a serverless solution and the full elasticity.

As we think about the areas that we invest in, we typically gravitate around these three areas:

Performance
Data management and resilience capabilities
Cost optimization

And so let, let's jump in and talk about across these areas, how we've invested in those themes over the past year.

And if we start with user shares and enterprise apps, um and we're gonna start with performance, performance matters and uh we've were never done frankly with raising our performance bar.

So arm is, is a name that i'm sure a lot of you are familiar with their, their uh technology provider, leading technology provider of, of semiconductor ip. Um their, their ip um powers, billions of devices out there.

Um and one of the things that we've heard from customers like arm and they're in fsx for netapp ontap customer is they love uh on tap's capabilities, they love fsxs capabilities. Um and they're using fsx and fsx on tap for a pretty wide set of, of chip design workloads.

But until this, this past week, some of their most performance intensive on tap workloads required much higher levels of performance than they could get from a single on tap file system.

Um so in some cases that meant that customers um couldn't use uh fsx were on tap. In other cases, folks like arm would actually use multiple on tap file systems um and essentially try to shard data across them and have compute instances, mounting multiple file systems and just a bunch of management and complexity overhead.

That's just not something that customers want to do.

And so um we were really excited uh this week to announce that um we're launching uh on tap file systems, fsx on tap file systems with much higher performance limits.

And we call these file systems, scale out file systems.

And so with these file systems, you get a nine x increase in read throughput. So before you could get four gigabytes, a second of throughput, now you can get 36 gigabytes a second um if six x increase in right throughput.

Um so we went from 1 to 6 gigabytes a second and then a seven x increase in iops. So from 160,000 iops to 1.2 million iops.

And this is all within a single file system.

And uh let me talk a little bit about how it works. Essentially it works by scaling out the file system cluster and so on.

Uh today's uh untapped file system, I should say before this week there really uh you could think of each file system as having two servers that make up the file system.

Um one is an active node and one's a passive node. And data is replicated between those nodes for resiliency. And then when the primary node fails, file system traffic automatically fails over to the secondary node.

And so essentially what that means is it's highly available. So if a node goes away, you're still uh data is still being served. But that was the what we offered uh until this week, which was a single high availability pair of file servers.

And you, you could probably imagine that the constraint there is that it's two servers and really it's one server at a given time delivering traffic. So there's a limit to how much a single server can serve.

Um and uh with scale out file systems, you actually launch these scale out deployments that have multiple a a pairs.

And so you have multiple pairs of, of servers that are serving traffic to your clients.

Um from the client's perspective, it all kind of works magically.

Um and they're not really aware that there's these multiple servers. But what's happening behind the scenes is you are scaling out your, your cluster of servers and uh with what we announced this week, we can scale to six h a pairs.

Um and that's how you get to the, the performance numbers that we talked about earlier and then earlier this year, we also meaningfully increased performance on fsx for windows.

And we didn't do that by launching scale out, but we actually um are leveraging the the most up to date arios infrastructure under the hood to get significant performance benefits.

And so uh for windows file server, we have a six x increase in throughput.

Um we have a five x increase in ssd iops and a 10 x increase in the number of iops that you can provision per gigabyte of data.

Um so, pretty meaningful performance increases for, for windows applications.

Um and then a big part of what enables fsx to, to serve such a broad spectrum of file workloads is the really rich set of data capabilities that it provides.

So, for example, um netapp has invested decades in building a really rich set of data management, data access and data protection capabilities into its on tap file systems.

Um and along with aaa wide variety of storage efficiencies and support for hybrid architectures.

Um and fsx for on tap is the only complete on tap in the cloud running in the cloud is a fully managed service.

So it, it means that you get all of these capabilities and this is a big part of what, what draws people to uh to fsx.

Um and to give you a sense of how much these data capabilities matter.

Um e health nsw was looking to migrate, they had an enterprise image repository and they wanted to move it to the cloud and move it to the cloud really quickly.

Um and their, their workloads. uh and that, that particular application relied pretty heavily on, on premises on tap storage and on tap because it provided uh a full set of of net apps, data management capabilities and data movement capabilities including snapshots and snap mirror.

Um it allowed them to very quickly and easily move their data to the cloud and also use the same kind of management processes and um and, and, and storage optimizations that they're used to on on premises.

Um and the results are, are, have been uh pretty, pretty great for them.

Um they've seen a 10 fold improvement in performance 70% reduction in critical incidents.

Um, they, they said what they, uh, didn't really expect or they didn't think it through, uh necessarily at the start was they can actually implement enhancements to applications up to 50% faster than they could because they can now launch um environments in under four hours to conduct testing in the past that would have taken like 6 to 8 weeks.

Um and they've already recognized $16 million worth of, of benefits due to, to avoided costs.

Um and so, uh for these reasons, we really do continue to, to focus a lot on how do we expose even more data management capabilities to our fsx offerings.

And a big part of our focus this year was on that uh with fsx for on tap.

And in july, we introduced snap lock on fsx for on tap.

Um and snap lock is the first and only fully managed cloud file service uh with right, once read many data protection capabilities.

Um and it means that snap lock with your data um is immutable um and indelible. So it's protected against modification or deletion of files.

And really it acts as a line of defense against ransomware. It also allows companies to meet compliance requirements and, and there's two different modes to snap lock.

The first is what's called snap lock enterprise and that protects files against modification and deletion. But it, it allows authorized users to delete files if necessary.

And then there's snap lock compliance, which is like a more locked down form of snap lock where data cannot be modified or deleted by any user until its retention period expires.

And that helps to address a variety of, of government and industry specific mandates.

And then, uh let me touch on two of the on tap data management capabilities that we, we announced this week.

Uh the first is uh for shared vpc s, you can now create multi a file systems and shared vpc s before you could uh could create single easy ones and now you can do multi easy ones.

So it's pretty important for um a, a variety of, of larger enterprises that, that have multiple vpc s and sharing across vpc s is a big part of, of how they, how they operate.

Um and then we also uh launched uh the, the ability to manage uh flex groups from within the aws uh console and cli and sdk and also to take backups of uh of, of um your, your flex groups.

So for those who aren't familiar with flex groups, there's, there's actually two different types of volumes that on tap provides.

So flex balls are kind of the standard on tap volume and they're for general purpose workloads and then flex groups are purpose built for higher performance and storage scalability and they, they essentially work by storing data across multiple on tap collections of disks.

Um and so the ability to, you could create uh flex balls before, but you had to use the netapp cli and that was all exposed through uh through the, the aws co i.

Um and i'm going to actually turn it over to, to mark. Thanks ed.

Hey, everyone. I'm mark roper. I'm an engineer on aws file services and i'm gonna talk about compute intensive workloads.

So to serve our most demanding file storage workloads on aws, we offer two fully managed open source file systems and that's fsx for luster and fsx for open zfs.

And so luster, if you're not familiar is the world's most popular parallel file system. It's open source and it's really used in supercomputers and universities and national laboratories where it was originally developed um for all sorts of compute intensive workloads.

Um fsx for luster supports up to a terabyte uh per second of throughput uh in a single file system.

Open zf s offers the lowest latencies of any of our file storage solutions on aws as low as a few 100 milliseconds per operation.

And it also offers a broad range of data management capabilities that are in the zfs file system and, and built upon by fsx.

So we launched data management capabilities and cost optimization features in both of these file systems this year that i'd love to share with you today.

First, when we launched fsx for luster in 2018, we saw adoption across a broad range of the hpc workloads where the file system was originally developed, you know, to serve on prem uh going back years.

And that wasn't a surprise to us. That's why we built it.

These are workloads that are oil and gas, seismic workloads, physics simulation, video and image processing, uh financial services, healthcare and life sciences, genomics, analytics. Whole long list of, of things that luster serves really well.

We were pleasantly surprised in the era of gen a i that we're all living in. now. How much adoption there has been for fsx for luster by aws customers working on gen a i.

Um it's not a surprise when we think about it because it's all really about performance.

Um gen a i research and training workloads are a lot like traditional hpc workloads. They are compute intensive typically using hundreds or thousands of cpu or gp u instances.

They tend to read training data from shared storage and then they often write check, pointing data back to shared storage while workloads uh are ongoing.

And so the performance requirements of these workloads are are quite high um compute clusters.

As ed mentioned earlier, uh on these really performance intensive workloads are very expensive and so it's critical that you have storage uh that can serve the needs of those large clusters.

And luster's consistent low latencies and very high aggregate throughput are a really great match uh for gen a i customers for this reason.

Ok. So on aws for gen a workloads, we have amazon ec two ultra clusters which provide access to supercomputing class performance for everyday ml and hpc developers with an easy pay as you go model.

So two clusters deliver up to 30,000 traini gp u instances or up to 20,000 of the latest nvidia h 100 based instances um in your ultra cluster.

And amazon fsx for lester is really the storage infrastructure for these ultra clusters, delivering the hundreds of gigabytes a second or up to a terabyte per second of aggregate throughput and millions of iops to these ultra clusters.

So ultra clusters give customers on demand access basically to a supercomputer along with fsx for luster for storage to cut their model training times, you know, from weeks to you know, days and hours.

In some cases to train ml models with file based workloads. If you have your data on s3 in an s3 data lake, we also integrate fsx for luster with s3.

So you can link your fsx for luster file system to your s3 data lake and then data stored in s3 is automatically loaded into luster when your workload starts accessing that data where it's processed.

And you can optionally configure your fx for lester file system to push data back to s3. when your workloads are complete, which minimizes manual data management and copying and gives you the high performance file interface that your workload may require.

So using fx for lester with s3 this year lga i research developed xa one which is the 300 billion parameter, multimodal foundation model composed of images and text and lg.

When they talk about the benefits of coming to the cloud and aws file storage, talk about not meaning to manage their own hardware, their own infrastructure, not needing to provision that they pay for what they use.

When they're done, they stop and they were able to reduce the costs of training xa one you know, estimated against their previous methods by 35%.

And that was with s3 as a data lake fsx for luster acting as a high performance file cache in front of that data lake and using amazon sage maker.

So when performance and scale are massive, the importance of managing cost is also really important.

Something that customers with s3 linked file systems of fx for lester have been asking us for is the ability to have more control over how they clear data out of their s3 connected file system.

When their workload is complete, this way, they can make space on the file system for more data and to run subsequent workloads that may be linked to different buckets or different prefixes in their bucket without having to, you know, expand the size of their file system. Or manage what's on the file system directly.

So that was in august uh in august of this year, we launched a capability uh to give customers the ability to call an api and clear data that's no longer needed from their fsx for luster file system, freeing up the storage capacity for subsequent workloads.

Uh when we do this, the metadata about the structure of your bucket remains in the file system. So if your workload does need to go back and reaccess data that was previously loaded, all of that is transparency transparently reloaded back into your fsx for lester file system.

And so this new capability gives you direct control over the size and cost of your file system and helps you make sure you'll never run out of space

Really, what it does is it lets you keep an FSx for Luster file system acting as a cache in front of your bucket uh in a long term way without needing to spin them up and tear them down.

Um also on the theme of managing cost and performance at re:Invent this year, we're announcing FSx for Luster on demand, throughput scaling. And so with on demand throughput scaling, uh you can with a click of a button in the console or by using the AWS CLI or SDK, you can make a call to increase or decrease the throughput capacity of an existing FSx for Luster file system.

And so until today, customers who wanted to reuse the same file system for multiple workloads would really need to provision the throughput capacity of that file system for their peak workload. And then if they're, you know, some of the time running workloads that are not as intensive and don't need the throughput, they're stuck having to pay for that capacity or sort of tear the whole thing down and create new infrastructure later.

And so with today's launch, you can use long lived file systems and you can optimize your cost by reducing throughput capacity when you don't need the throughput and alternatively to burst capacity by increasing the performance of your file system when you're running a particularly compute intensive workload.

And so you can use this capability today on your FSx for Luster file systems that are S3 linked, uh new and existing ones. Um and it's an exciting new way for you to run your workloads, your high performance workloads uh without trading off uh that performance. And the, and the cost do it all at a lower cost.

Ok. So that's FSx for Luster our file system storage optimized for the highest scale. But there are many compute intensive workloads that run on AWS uh that are latency sensitive uh or they have uh lots of operations on tiny files. And for these workloads, they don't necessarily need the scale out performance of a parallel file system. What they really want is the lowest possible latencies that you can achieve by having a single active file server node.

Um and that's really why we launched FSx for OpenZFS to support these latency and IOPS intensive workloads on AWS. So let me talk a little bit about what we've launched recently on FSx for OpenZFS.

So in August, we launched a new feature that enables you to create OpenZFS file systems deployed with two nodes, an active node in a single availability zone and a standby passive node in another AWS availability zone. And by deploying your your OpenZFS file system, this way, you're able to achieve much higher resilience. So you can run applications like critical database applications line of business applications, web server applications where you require really high, basically continuous availability of the file storage.

So FSx for OpenZFS multi AZ file systems support four nines of availability. So 99.99 percent of the time the file system will be available on the data path uh and enhanced durability as well. And so you can launch multi AZ FSx for OpenZFS file systems today in the console or the AWS uh CLI.

So companies like Hatch are using OpenZFS because they love the performance Hatch is a cloud native gaming company and they're using OpenZFS as part of their global hosting platform as a repository for uh the data that they're hosting there. Uh they love the performance of ZFS but they do have one challenge.

So Hatch's use case requires it to replicate data across multiple file systems that may be in different regions or different availability zones. And they do that today by using open source copy tools. But it's a challenge because they need to manage their own infrastructure to do this. And it's something they're not, you know, interested in doing. It's not core to their business, but it's something that they need to do for this application.

So for use cases like Hatch where data needs to be repeatedly replicated today or this week at re:Invent, we're really excited to announce that we're supporting on-demand data replication for OpenZFS. And so on-demand data replication enables you to easily and efficiently transfer incremental point and time snapshots from volumes on one file system to volumes on another file system.

Uh with on demand data application, you can leverage the speed and the efficiency of ZFS send and receive technology. And with just a few clicks, you can easily create this resilient copy of your data across multiple file systems.

Um this is really used for like read replica workloads. It's used to create custom data replication workflows where you wanna run active active workloads in multiple localities uh and to act on the same shared data and keep them updated over time. Uh it's also a great way to do disaster recovery.

So on demand, data replication is also faster and more efficient than using copy tools because it's able to use incremental snapshots and only replicate the data that has changed between multiple replications. It also does this with compression. So we have customers in financial services and in media and entertainment that use ZFS send and receive replication technology on prem and they're now able to do those workloads here on AWS using FSx for OpenZFS.

Ok. So how does it work? FSx for ZFS on-demand replication uses ZFS snapshots. So you take a ZFS snapshot of your data set and then FSx manages securely sending that snapshot over the network from one file system to another automatically retrying if there are any interruptions and making sure that that transfer is done in a secure way and then snapshots are received on the receiving end and then applied to the destination.

Uh and it's this incremental over time. That's really where the value comes. You can end up with very large data sets that are being incrementally replicated over time. And uh you're only replicating basically the churn that's occurred on the on the source to the destination uh over a period of time.

Ok. So that was compute intensive workloads in cloud native applications. As Ed mentioned at the start, we serve cloud native applications with Amazon Elastic File System or EFS. And this was launched in 2015 and EFS is the default choice for application builders who need shared file storage. that's really simple where they don't need to manage the storage themselves, they don't want or need the hands on capabilities of a storage administrator. What they really want is to never run out of space and always have the performance that they need when, when their application uh you know, uh desires it.

So, first on EFS, um we've been continuously investing in improving the performance of EFS over time. And so just to give you a sense of the timeline here in at re:Invent 2022 we reduced file operation latencies by 2x and so customers running latency sensitive or serial workloads just started running faster.

In April of this year, we increased single file system throughput 3x to 10 gigabytes, a second of read and 3 gigabytes. a second of write throughput on a single file system enabling a whole new set of streaming workloads that otherwise weren't, weren't able to use EFS at Storage Day.

This August, we announced a more than 3x increase in single file system IOPS. And then for re:Invent this week, we're announcing more performance improvements. So starting this week, Amazon EFS delivers up to 250,000 read IOPS for frequently accessed data and that's a more than 4x increase to the prior to the prior limit.

With these enhancements. you can continue to benefit from simplicity and the elasticity of EFS that Ed had mentioned while running even more IOPS, demanding workloads on Amazon EFS and this includes workloads like Spark or Hadoop. And these new limits are available today on, on every EFS file system.

Ok. Along with the investments in improving performance, we're also making investments in how you manage your costs on EFS. So when you use EFS today, you're, you're basically getting, you know, three benefits first, you're reducing your operational costs relative to being on. prem, as Ed had mentioned at the beginning, if you're trying to figure out what your workloads are going to need for the next year or three years as you provision hardware, you know, it's, it's a, it's a challenge. you have to make sure you get enough.

Uh if you're wrong, it's costly to pivot and you need to provision more hardware, potentially find new space. so none of that applies with EFS, right? you're just mounting a file system and putting as much data into it as you'd like you're paying for what you use and you're never running out of space.

Uh second, EFS's bottomless storage and elastic throughput mean that you don't worry about over provisioning. there's never, there's never a problem of paying for more than you're using. you're, you're only paying for what you use.

And last, maybe most interestingly is automatically moving your colder data to lower cost storage for you. And I wanted to talk a little bit more about that.

So EFS today has two storage classes. it has a high performance SSD standard storage class and a lower cost infrequently accessed storage class or IA storage class for cold data. EFS also has life cycle management which is an existing capability that supports moving any files that are not accessed for a configurable amount of time from the standard storage class to the IA storage class.

So data remains in the file system, the EFS file system when it's moved between storage classes and if a workload accesses cold data, that data is still available to the application and it can be moved back to the standard storage class automatically.

So by having both of these capabilities, the IA storage class along with standard and life cycle management, EFS eliminates the need for you to kind of manually manage where your data is stored in order to optimize your costs.

So just to give you a sense of the value of this. so for SAP, they use EFS to provide shared storage between nodes for their SAP RISE deployments which they manage on behalf of their customers. And in 2023 they were really looking to do cost optimization in a serious way by taking advantage of the cost optimization features of life cycle management and infrequently accessed storage tier and EFS, they were able to reduce their overall bill by 40 plus percent uh while maintaining the same performance for their workloads.

Um and you know, not making any changes on the application side at all. So they use lifecycle management to tier their data into the IA tier. They also uh made a change to use EFS elastic throughput, which is an existing capability rather than provisioning throughput to elastic uh throughput. Let you just use the throughput that you need and pay only for the throughput that you use. They were able to reduce their bill by 45% in that component of the, of the bill as well.

Ok. So EFS is getting even more affordable this year for re:Invent. We're announcing a new storage class for Amazon EFS called Amazon EFS Archive. And it's a storage class that's cost optimized for your coldest data. And this is data that you're really going to access a few times a year or less.

Um this is really a great feature for customers that want to retain data that's quite cold, maybe for future use for AI or ML who knows or for retention requirements where you're simply required to retain data for a certain number of years and, and you need to do so in an affordable way.

So EFS Archive storage costs less than one cent per gigabyte per month. So an incredible price and it supports the same life cycle management experience that I explained previously. So you can configure, you know how many days your cold data needs to stay in your IA tier before it's automatically moved by the system down to Amazon EFS Archive and again, the same point as for IA and Standard your applications that are mounting your EFS file system. They don't know about any of this. All of the data is available to the application and should the data be needed and used by an application it's accessible in real time.

Yeah. So to reiterate with the with the EFS Archive storage class, you can save up to 50% on your bill for your coldest data while preserving that valuable data for future use and to comply with retention requirements in a cost effective way.

So those are the uh the new announcements we have for EFS this year with that, I will hand it back over to Ed.

Thanks Margo.

There we go. Uh so just, just in closing, we, we've talked about a few of the capabilities we, we launched this year. Um we talked about uh how we think about where we invest and, and thematically how we invest.

Um and what I want to close on is uh my favorite slide. Um which really uh you know, is, is kind of a a thank you. Um it's an honor to serve uh so many customers. Um our, our investments really do come from what we hear from, from you. Um and uh you know, we show a few examples of, of some of the companies who are using our file services to do um pretty innovative. Um pretty amazing things.

Um and this is really truly what motivates our engineering teams and our product teams. Um and so I just wanted to close on that note with, with a thanks.

Ok. Well, thanks everyone. Yeah, thanks, everyone. Take care.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
What’s new with AWS file storage

Good afternoon. Hope everyone's week is going well so far.Welcome to What's New with Amazon File Storage. I'm Edward Name and I lead our file storage offerings here at AWS. And joining me today is Mark Roper, who is a principal engineer for our file storag
复制链接

扫一扫