AWS re:Invent 2022 - What’s new in Amazon EC2 (CMP225)

Good afternoon everyone and welcome to cmp225. Sorry it seems a little loud. I’m Art Baudel, I’m a Principal Product Marketing Manager in the EC2 Core group. Here today with me is Martin Yip, who will also be presenting with me. Martin, why don’t you introduce yourself?

Yeah hi everyone! So happy to be here today, so excited to be back in person, back on stage. I lead EC2 product marketing for compute as well as networking. I’m happy to be here and interact with everyone today.

Alright, so we’re going to kind of go back and forth a little bit throughout the deck trying to make it interesting and if we have a little bit of time I’ll give time for some questions at the end if we can do it. Just a note to my left, your right here, there’s a microphone if you could step step up to the microphone if we don’t happen to have enough time for recording. Martin and I will make ourselves available there is another session in the room afterwards so we’ll do it outside right after if we run out of time to get through this.

So today my our plan is to cover some of the last 12 months or so of launches and other information that we’ve had with EC2, try to demystify some of EC2 for everybody and talk about directionally where we’ve gone and a little bit about how we have developed EC2.

I’d like to start with this little thing here, the light bulb. Our goal in EC2 is actually to make EC2 as simple in the cloud as turning on flipping a light switch to make it easy for you and hopefully by the end of the talk today I’ll help have demystified a little bit of EC2 enough so that we can all agree that it is something that is pretty easy.

I want to start with a couple of stats to kind of talk about the global breadth of Amazon and EC2. To date more than 30 billion instances have been launched in Amazon EC2 since we launched back in 2006. This is an incredible number of instances that have launched and that speaks to the breadth of the network that we have to devise the reliability that’s needed and I’ll talk to how we have done that so far.

Everything we have done has been built upon what we call the cloud pillars, something that was here that we started with on day one. If you had ever taken a look at Jeff Barr’s 2006 blog those two things were we want to provide you, our customers, with the tools and services to securely and reliably work in the cloud and the second is is to provide you with the best possible performance at the lowest cost. I’m going to talk about how Amazon has designed our system to be able to do that.

We’re going to finish today’s talk with a little discussion about cost because I think in today’s current macroeconomic climate everybody’s interested in you have to pay for this stuff, how do you do it in a cost efficient manner as well.

The next thing, for those who attended or heard Dave Brown, Dave Brown’s the EC2 the VP for EC2, and he says to me all the time sometimes the some the small things make a big difference and is the our focus on some of those small things that’s actually led to so many of the innovations that I’m going to talk about today. I’m going to talk a little bit about what those small things are that we’ve paid attention to that actually resulted in the creation of some of our own silicon, why we started that path and I hope to invite you on that journey as well today.

So far today we have roughly a hundred million EC2 instances are launched per day at Amazon. If you had come here in 2021 the number I would have had on the screen last year was 60 million instances so this gives an idea about the rate of increase that has happened here at Amazon EC2. And I’m going to share to you today some of the stories about how we have disrupted and revolutionized the compute paradigm. If you one of the other things i have is we actually even have the patent for defining the cloud and just finding the actual virtual cpu that we use today that’s out there.

Those 100 million instances incidentally correspond to roughly 700 instances are launched in Amazon EC2 per second around the world. And just another note is is that in 2022 customers lost launched five times as many instances that they did in 2018 talking about how many instances our customers have used here.

To summarize a little bit about where we’re going with the talk we’re going to talk about that global scale, I’m going to talk about the innovations of nitro and the nitro system itself. Martin’s going to come up here and talk a little bit more about some of the compute instances and the new insta the new introductions we’ve had. And then I’ll come back up and talk a little bit more about cost again.

As a reminder we celebrated our 16th anniversary this year in August of 2022 and this represents a portfolio of the regions that we have had in our first 10 years. We introduced 11 regions, today we have 30 regions that have launched including one just two weeks ago in Hyderabad. We have five other announced regions and we have 96 availability zones. Amazon as a cloud provider has more has the most availability zones and more most of our regions have a minimum of three availability zones in them. Why do I say talk about that? This talks to the reliability and our focus on providing reliable and availability services all the time.

This is like having multiple data centers within those regions but these regions are spread throughout the globe and our customers have very variations of where they need that performance and in many cases you want to have the performance even closer. So we introduce local zones, so we in we announced that we would have an additional 30 local zones that we were going to add in 2021. We’ve already begun that process to the original 17. So today we have 25 local zones spanning the globe. What these local zones do is allow AWS compute to get even closer.

These local zones are spread through many major cities throughout the globe and this is an effort especially if you have applications that are perhaps your streaming video or you’re doing gaming, the local zones allows allow us to provide closer connections to customers that are out there. And in addition to local zones we’re now we have now also offered AWS outposts. Outposts are lo or our vision of being able to deliver compute to many customers even in an on-premise fashion.

Outposts are something that we can deploy in 1u to 42u racks and that means that we can provide essentially an entire data center on premise if needed or we can actually include something within an iot situation in in on-premise as well as needed. And what this does is this helps reduce the cost that it takes to work with the cloud. These outposts are fully managed by AWS and they allow customers such as yourselves the opportunity actually use one group of apis when managing the cloud and also managing content that is placed in the on-premise environment.

Some of the customers that are currently using outposts are pretty wide and varied here they talk about customers in many different industries. I’ll also mention that earlier we had a nasdaq on stage with Dave talking about how they have actually leveraged many of these different places using both local zones and our outpost premises to in order to do the entire financial network here in the aws cloud.

So why AWS infrastructure? Our infrastructure comprises everything from routers, load balancers, custom servers and semiconductors to our own custom software and silicon. But all of this is design and purpose built for the cloud and designed and developed by AWS. It’s all unique to us and then the question customers and other people ask me all the time is is why do all of this stuff yourself, why not take off the shelf components? The answer is is we started to realize that by designing the servers ourselves we can improve the overall reliability. If the problem happens in the server then we can handle it and fix it very quickly without having to rely on a third party and that was a small thing that fix and reliability that resulted in other realizing that we could expand that to other spaces as well.

Years ago we introduced our load balancers which are the heart of being able to make sure that we can optimize the performance and balance the equipment between different servers within our data centers. We introduced our own custom software, our hypervisor which I’ll talk a little bit more about, which helps make sure that that lightweight hyper hypervisor does not consume any of the cpu resources allowing us to provide higher performance than competitors into the cloud. Just this week we announced that we have actually over 600 instances. Amazon offers Amazon web services we offer more instances than any other cloud provider but the 600 number of people sometimes will tell me there are a lot of instances that creates a lot of confusion but I would say that the instances are actually very tailored and you can really narrow down the instances to the salient view that are important to your workloads.

So if we take each of these here and start taking about the categories and choose the category, so for example let’s say you need a general purpose workload perhaps you’re doing a web service or hosting web services, you take a look at the general purpose workloads and then you look at maybe the processor type and the second capabilities category. Perhaps you need to do the work on x86 or maybe you’re flexible and you can use arm architecture. Now you further narrowed what you need to do down to a handful of instances, amd or intel in the case of x86, or the arm being the graviton processors, and finally you can choose if you’re going to use a managed service further narrowing it down.

So every time you choose the workloads that 600 list gets shrunk down to the ones that are most important for you and these are the building blocks that we believe that everyone can work off of and how we have designed our network.

Back in 2006 we had one instance, the m1 the m1 instance came with one gigabyte of networking that was available which at the time we thought was really really impressive but since then we realized the customers don’t need just a one-size-fits-all instance and that wasn’t really practical for us. So we introduced general purpose which are those m instance types, we then did compute optimized, we then began offering memory optimized and then we added our accelerated computing which included concludes our fpgas, our gpus, the machine learning platforms and we also added storage optimized instances as well.

So we’ve tried to break that down and some what maybe unknown fact is is you can actually still access the m1 instances should you decide you wanted to deploy these to this day in the in our cloud system and we also even have a class on how you can use all of our older instances. We have not deprecated a single instance since the launch since our launch in 2006. So customers can take a look at all of it.

So what is this innovation pace? I put this chart up, I didn’t go all the way back to 2006 here we start in 2010 and you can see the various kind of milestones if you will. You can see that accelerating platform here. I want to just call out a couple of things, one our introduction inferentia two which mark will talk a little bit more about that’s happening in 2023 but I also want to call out two other items: graviton back in 2018 and the fact that we’ve actually introduced three generations of graviton since that introduction in 2018. But you know if you think about the pace of silicon innovation that’s pretty rapid and also this innovation of the nitro system which is where the accelerated acceleration of our instances began.

What is that nitro system? It’s more than this pretty picture but this pretty picture gives me a great idea of in the architecture or you myself an idea of the architecture that we’ve designed here. The nitro system is unique to Amazon. What the nitro system has allowed us to do is actually develop all these instances even faster and deploy them faster for you. The other thing about the nitro system is is focusing on this has allowed us to deliver better performance to customers that are here.

So the nitro system produces and provides an overall performance benefit to customers at a component level performance and how do i show that performance level? This is a chart here. So stage to my the far right of the the slides here your left is is just a standard information about using benchmark just a spec end here. So basic compute information and most cloud providers have roughly the same performance there. However, if you look to the right to what I would characterize as real world workloads you can see the variation starts to be pretty significant especially if you look at redis we have a 20 27 advantage versus some cloud providers or memcache where we have up to 22 percent.

So all in all using amazon web services we believe you have about a 15 percent better performance simply by using the amazon web services and that’s the equivalent of almost a full generation of cpu performance just by coming here.

So what why are we able to do that? Part of that is that nitro system I have offloaded a ton of function that many other people have to place into the cpu that allows me to provide more performance just as a note about this data, this is based on our sixth generation instances which were launched just in 2022 so this is a direct comparison here for the ice lake processor on the x86 side.

So why build your own chips? So those these chips allow us to do specialization, it allows me to provide specialized security to the platform, it allows me to better and faster deliver things as I said since graviton launch in 2018 we’ve delivered three generations of a processor. This is an incredible space a pace excuse me of innovation in the silicon space and it allows us to deliver onto other innovations we’ll talk about the nitro ssds and also talk about graviton as we get there as well. And finally this is core of our security as well.

We believe at Amazon that we are not interested in touching or viewing any of customer data, your data and your workloads and the security of them are our biggest priority. I cannot and no operator at Amazon can physically access any of your workloads or your data. This is core to how we have done this to move this off there. I’m going to talk briefly about security as we go forward but the nitro system as I mentioned before isn’t just one thing, it’s a combination of our cards which here allow us to do vpc networking or ebs services storage and controller and they also deliver a security chip to us just as we power on.

Security is very very important as we boot up our system every day or anytime we do a reboot we cryptic we do cryptographic attestation oops on here to verify the images on there. If there’s ever a problem with the images in the system we do not boot and we rectify but keeping this away from customers such as yourself allows us to make updates and allows us to do these improvements without having a direct customer impact including when we update software and the operating system and make these type of changes.

And finally that lightweight hypervisor which allows us to better do and offer our scheduling of the instances excuse me.

So first I’m going to talk about networking here and what we’ve delivered. As I said earlier back in 2006 we delivered one gigabit per second of networking, something we were pretty proud of, and then we made a pretty large leap back in 2019 to 25 gigabits and now 50 gigabits in the in our networking performance. We introduced the our network enhanced instances and this year earlier this week we introduced that we now have network enhancements in our sixth generation up to 200 gigabits per second.

I’ll just call out the very large 1600 gigabytes on the far right side and answer the question of where the heck is this and who uses this? This isn’t our machine learning space in the trn1n area. So for folks doing trainium, if you’re doing machine learning workloads, you know in the last like five to five years or so you move from doing and trying to do millions of machine of models to billions and you need this additional networking performance out there.

So we’ll talk a little bit about trn1n as we go through here but that’s what’s on the outside here. So this week we introduced our ec2 sixth generation optimized instances. These offer as i said the 200 gigabytes per second, 200 gig in networking bandwidth and that means that you can increase your data transfer from s3 by 2x which is pretty cool and these instances support up to 80 gigabytes of ebs bandwidth.

Next we introduced storage. So a lot of people you know coming from a silicon background I’m not sure i always consider storage silicon but it’s in the same relative family I would say but why would we start focusing on silicon? Everyone usually talks about performance on the cpu side of the house but the reality is is if you can’t access the data that you’re going to be doing something with or the time it takes to access that data, the latency that it takes, it becomes a performance problem as well.

So we recognize this and we introduced our nitro ssds before back in 2021 at re invent. We started this path here with the nitro ssds. What this has done is is offered us the ability to provide 60 lower i o latency performance and up to 70 percent reduction in latency variability, two key elements when you access in ssds. Our ssds are also encrypted with aes 256 encryption as well.

Earlier this year actually yeah we introduced i4i which is our first storage instance using the x86 platform here. This uses the intel’s xeon ice-like performance and we offer 30 better compute than the i3 performance that was here originally. We also have two other graviton based instances, the im4 gn and the is-4gen. The difference between the two of them aside from the slight name this is the memory to vcpu ratio that’s here. So the im4gn is a one to four v cpu to memory ratio and the is-4gen is a one to six v cpu ratio. So depending upon what you needed and what the what’s there.

But since then, since these introductions earlier this year we haven’t actually stopped there. Earlier this week actually yesterday we introduced torn right protection here. This has been enabled off of our nitro ssds for the i4i storage here and what this does is an innovation that we’ve delivered for all of our customers so that you can have and we can reduce slas for customer workloads here as well to improve overall performance and security of these. We increase up to 30 percent a better data transfer using the torn right protection excuse me.

I think there’s no better way though to talk about this than to give a brief customer example and i might mention splunk was actually here yesterday talking about how they have used and leveraged these but one key element about splunk is they do a lot of data collection and they have to analyze data and reports for their marketing reports really quickly and so that’s why i think it’s a great example here. Using the graviton 2, the is-4 gen instances, they’ve experienced almost a 50 better decrease in their search runtime performance here which results in better productivity for customers here.

I talked a little bit about networking, we did storage, I’m going to finish here talking about security. So today Amazon is trusted by millions of customers around the world for providing protection and data services and it’s key to that. So we call a lot of people in this industry call this confidential computing but i think the definition there is no real dictionary or webster’s definition of confidential computing and what it is.

So I’m going to offer our take on confidential computing here. We see confidential computing as something falling into two dimensions. The first dimension in confidential computing is to protect data of the cloud provider, which is pictured here, that’s us. So we want to protect your data from us.

The second dimension here that comes up is is potentially protecting data within that a customer perhaps you have two entities that you don’t need you may need to isolate and you want to protect personal data. Think about an example i have with that, we published a blog with the trade desk. This is they were using uid2 which is a you may never have heard of uid2 but i’m sure everybody in the room has at least accepted a cookie once or twice on their browser. Uid 2.0 is a a version to be able to protect your personal information using using this. So be able to provide you still a customized ad, sorry everybody needs to get a customized ad to pay for things, but without having to exchange too much personal information.

So that’s a great example of dimension number two which we’ll talk about using here. We introduced nitro enclaves in 2020, we introduced nitro enclaves two years ago and i’ll talk about that briefly. So our security earlier this year we did nitro tpm which is our the introduction of trusted platform module in ec2 so you can now use this to protect secrets or keys and other sensitive data on ec2.

So going back to the second dimension protecting the other data that you have. So nitro enclaves provides fully isolated secure environments to be able to do development and testing. We introduced these nitro enclaves as i mentioned two years ago in order to solve problems customers that come to us about talking particularly about personal data that you wanted to protect. They’re hardened constrained environments that can only be accessed as i said here through a local shell.

In the last two months we’ve introduced two new features associated with nitro enclaves. One nitro enclaves are now available on graviton and dave brown announced earlier that we now also offer nitro enclaves in kubernetes. So from your kubernetes pod you can also launch an enclave as well. So further enhancing our on our features here.

Next i’d like to have martin come up here and martin’s going to start talking a little bit about where we are going and some of the instances we introduced from the x86 and the arm platform. Martin thanks art and he’ll ever win again excited to be here.

As art had mentioned AWS offers the broadest and deepest choice for our customers. Why do we do this? It’s important for customers to be able to tailor their infrastructure to the workload needs and part of that is offering customers choice in terms of processors across intel, amd and aws.

Let’s dive into Intel first. So Intel has been with AWS from the very beginning back in 2006. The m1 instance that Art talked about, they were with us powering that instance. And since then we’ve grown our partnership and now we have over 350 instances together spanning every single compute category in every single region.

Art had mentioned earlier about the network optimized instances that we launched here this week. But I really want to dive into some of the other instances that we have launched earlier this year. So the Amazon EC2 c6id, m6id and r6id instances are these are our sixth generation disk variants of the core instances that are powered by third generation Intel Xeon Scalable processors, aka Ice Lake, and come equipped with NVMe attached storage.

They come equipped with up to 7.6 terabytes of the local NVMe storage and deliver up to 15% better price performance over a comparable previous generation instance. It’s not just about the storage though, it’s also about faster processing and faster networking right? It’s really, it’s each generation we try to offer customers more in terms of performance and price performance.

So with these instances they offer 2x faster networking and 20% higher memory bandwidth than previous generations. They also come with support for the new Total Memory Encryption that encrypts the memory of the instance itself.

These instances, the main difference between C, M and R is, as Art mentioned, is really the vCPU to memory ratio - whether you need compute optimized with a 1:2 ratio, general purpose with a 1:4 ratio, or memory optimized with a 1:8 ratio of vCPU to memory. So these instances are ideal for your general core compute workloads, including things like enterprise workloads, databases, backend and frontend servers, as well as many other things, but specifically those that need additional access to high-speed, low latency storage.

And as you know at AWS we don’t stop innovating. We’ve also announced earlier this week that we are introducing the fourth generation of Intel Xeon Scalable processors into our portfolio and the first instance to have that is going to be Amazon EC2 R7 IZ instances.

These instances are both high frequency and memory optimized instances. Again they’re powered by the fourth generation Intel Scalable processor aka Sapphire Rapids. They’ll have up to 128 vCPUs with one terabyte of memory and you know that results in 2.6x more vCPUs and compute versus previous generation instances, 20% higher memory bandwidth, as well as you know these are going to be the first x86 instances to have DDR5 memory. So you know that means just much faster memory overall and 2.4x higher memory bandwidth as I said.

These are high frequencies so they’re really designed for things like EDA workloads or databases that have high per core licensing or gaming that requires just really big beefy instances. So we look forward to delivering these in 2023.

Coming up next, let’s dive a little bit into AMD. So AMD has been a great partner. We were the first cloud provider to offer AMD in our portfolio back in 2018 and today we have over 100 instances with AMD. They spanned just about every category also and they offer better economics in terms of you get the performance that you need but at a 10% discount versus other comparable x86 instances.

Let’s dive into some of the instances that we launched this year with AMD. Earlier this year we launched the sixth generation compute optimized, general purpose and memory optimized C6a, M6a and R6a instances. These again are kind of the core compute instances that are the workhorses of your portfolio.

They’re powered by third generation AMD EPYC processors with up to 192 vCPUs, up to 100 or 1.5 terabytes of memory and 50 gigabits per second of networking, and also 40 gigabits per second of EBS bandwidth, depending on the instances.

There offer significant performance improvements over previous generation - 35% better performance improvement for the M6a and R6a. That’s mostly because we’re doing a two generation jump from the first generation Naples processor to the Milan processor and with the compute optimizer jumping from Rome to Milan. So you’re you’re getting a 15% price performance jump there.

Offer you know a lot faster networking, 2.5x faster networking, and up to 2x faster EBS bandwidth. So you’re going to be able to transfer data, make transactions to and from EBS a lot faster. And as I said they are they they’re they’re they’re great performance but you get also the 10% lower cost versus other x86 instances out there.

Now let’s talk about AWS and the Graviton processor. So one thing Art talked about earlier was Nitro. And you know with Nitro we actually got a lot of experience building ARM hardware as well as building software for the ARM ecosystem and we really took that experience and applied it towards building our own ARM-based processor, the Graviton processor.

We launched our first Graviton processor in 2018 and you know customers loved it but they wanted more. So we offered them the Graviton 2 processor a year later. Today Graviton 2 based instances, we have over 100 Graviton 2 based instances across every single category. And you know these Graviton instances are great in the sense that they offer 40% better price performance over x86 based instances.

So just for that effort in terms of just migrating over you’re you’re getting a significant price performance improvement. But customers wanted more. They they liked the power, they like the cost benefits of Graviton 2, but they wanted even more performance.

So last year at re:Invent we announced Graviton 3 processors and earlier this year we developed, we delivered the first instance, the C7g based on Graviton 3. Graviton 3 processors are 25% higher performance than Graviton 2 processors. So they they offer a lot more performance that you would need.

They also offered the 2x higher floating point performance for to for things like your compute intensive workloads. They were the first instance in the cloud to offer DDR5 memory which as I mentioned earlier offers that really big boost in terms of memory bandwidth performance, 50% more memory bandwidth performance. And props most importantly they’re really sustainable in terms of just being carbon friendly. They use 60% less energy both Graviton 2 and Graviton 3 actually use 60% less energy for the same performance than than other comparable CPUs out there. So really really important to a lot of companies you know just being able to get that performance cost but also being good to the environment as well.

I mentioned the C7g earlier but let’s dive a little bit deeper into this. C7g is the first instance that’s powered by Graviton 3 instances. They offer the best price performance for compute intensive workloads in the EC2 portfolio with up to 64 vCPUs and 128 gigabytes of memory, 30 gigabits per second network performance and 25 gigabits per second of EBS performance.

As I mentioned before because they’re powered by Graviton 3 they get that 25% extra performance and 2x higher floating point performance. And really they’re great for your compute intensive workloads.

And then earlier this week we also announced a network optimized version of the C7g. These are great for not just compute intensive workloads but workloads that are compute intensive but also network intensive. They are built using the new version of the Nitro card so you get up to 200 gigabits per second of network bandwidth and up to 50% higher packet processing performance. So really great in terms of data transfers to and from EBS as well as S3 and other data intensive activities.

And because they’re built on Graviton you get that same Graviton performance benefit that I mentioned earlier, really great for network intensive workloads things like network virtual appliances, analytics, and you know even CPU-based machine learning.

So customers love Graviton. Today we have customers across all industries from startups to enterprises, all sectors. We have over 40,000 customers actually using Graviton so quite significant growth over the the four years since Graviton was first introduced.

Let me dive a little bit into some of these customer stories. So Epic Games, they’re the makers of the Unreal Engine as well as games like Fortnite and Gears of War, they use Graviton for their game engines and found that Graviton was very suitable for the demanding latency sensitive workloads. And they also while you know being able to run massive multiplayer gaming, they’re able to also get that significant price performance benefit.

Formula One, you know perhaps the world’s most prestigious and well-known motor racing competition series, uses Graviton for their computational fluid dynamics workloads to help them really model how their race cars will will happen, will will drive in various kind of wind and air dynamics simulations so that they can basically be produce a better race car and a better performing race car.

And then Honeycomb.io, they’re the maker of an observability platform, has gone all in on Graviton. Liz von Jones actually when she used Graviton 2 and Graviton 3 she she thought it was a sorcery how easy it was to migrate to Graviton and get all that performance and cost benefit.

And speaking of that you know we talked about a lot about performance and cost but really the third pillar of benefit is just that, just the ease of migration. One of the things that we did from day one with Graviton was make sure that the partners were ready. So we worked a lot with partners including all the Linux operating systems, the ISVs and the open source softwares to make sure that they were ready to run and support Graviton at launch. And you know we’ve been really successful with that as the community has grown. You know today as I said you know all the major Linux operating systems, development tools and open source software support Graviton.

So we’re really happy just to see that adoption out there. And you know speaking of Graviton it’s not just about the Graviton instances which you might be familiar with but it’s also about managed services. Whether you want to do it yourself with Graviton-based instances or AWS managed services like RDS and Elasticache and Neptune and EMR and more, you can do that and get the same price performance benefit with Graviton.

Last year I’m sure earlier this year actually at Silicon Innovation Day we had announced the Graviton FastStart program which makes it easier for you to migrate over to Graviton, whether it’s EC2 instances or managed services, and get that great performance and price performance benefit.

You know we’ve seen customers use Lambda and migrate over to Graviton and get that benefit in as little as four hours. So it’s it’s a really quick and easy way to start adopting Graviton.

So that’s a lot of choice for our customers but there’s actually one more partner that we have not spoken about yet and that partner is it’s Apple. So not too long ago we began working with Apple to enable you know Mac workloads on the the AWS EC2 platform. We introduced the first Mac instances in 2020, two years ago, and they feature Mac OS and Mac mini hardware. And you know the the ease of use, the cost benefit of pay as you go, and and all the the tools of AWS and customers love the experience but they wanted more.

So earlier this year we actually announced the EC2 M1 Mac instances that are powered by Apple silicon, the M1 chip, and this is offered for the first time in AWS. So the M1 chip integrates the CPU, GPU, neuro engines and I/O and much more into a single chip. With the M1 Mac instances you get improved performance, actually 4x better performance for for things like builds and things like that, compared to the x86 based Mac instances. And yet you still get all the benefits in terms of security reliability and the ability to connect with other AWS services and use other AWS services.

So you know customer feedback since we launched this has just been amazing and you know obviously we’re going to do more in the future.

So when we start off with this cloud journey, you know we had talked a lot about breath and death and you know being able to support every single workload. But there was one workload that you know we thought it’s really difficult to support - high high performance computing.

High performance computing was one of the circles that people thought could never be run in the cloud because it’s so compute intensive. It’s you know not only requires compute but a lot of storage, a lot of memory, fast networks and things like that. And you know HPC, it’s used to tackle some of the world’s biggest challenges from drug discovery to genomics to energy utilization a much more.

And you know customers have tried HPC on AWS and they’ve been very successful though. Customers like Formula One as mentioned earlier, AstraZeneca, Lawrence Livermore Laboratories have partnered with AWS to solve some of the toughest challenges on the cloud.

We’ve worked for many years making sure that HPC workloads run well on AWS. We’ve built services like Batch to help you run hundreds of thousands of computing jobs in the cloud, or AWS ParallelCluster which is an open source cluster management tool that lets you automatically set up the required resources for HPC workloads, Elastic Fabric Adapter that allows you to scale your applications to thousands of CPUs and GPUs and run distributed applications, and also Amazon FSx for Lustre that provides you a fully managed shared storage for scalability and performance.

But customers you know really wanted and needed the EC2 instances to be HPC optimized as well. So you know they were telling us that you can’t create a perform a cost-effective foundation for HPC without the HPC optimized instances.

So over the years we have built mainly the EC2 instances that are capable of running HPC workloads. You know just earlier this year we announced Hpc6a that are based on the AMD third generation AMD EPYC processor. They offer 65% better price performance for HPC.

We have C7g and C6i and C6a but customers you know they want more. So I was happy this week that we announced Hpc6id. Now see Hpc6id addresses another kind of HPC workload whereas Hpc6a addresses compute intensive workload, Hpc6id addresses HPC workloads that are more memory and data intensive.

They offer 600 gigabit of EFA networking so you get much faster networking, you get much better price performance, well 2.2 times better price performance for data intensive HPC workloads. And you know this is really targeted towards as I said that new kind of workload, the memory and data intensive work with things like finite element analysis that require modeling the performance of complex structures like wind turbines and concrete buildings and industrial equipment.

HPC instances are just really designed to deliver the leading price performance for this data and memory intensive HPC applications. So these Hpc6id instances are powered by third generation Intel Xeon Scalable processors and deliver the best price performance for this category workload.

They’re really ideal for those tightly coupled HPC applications. And they deliver the best per vCPU compute and memory performance for for those applications.

We also announced another HPC instance this week which is the Hpc7g instance. These are the first instance that are powered by Graviton based processor and actually they include a new type of Graviton processor, a new version of it called the Graviton 3e.

Now what these Graviton 3e processors does is they remove the performance limits of Graviton 3 to unlock performance that includes 35% better vector processing performance, which is great for HPC applications that really depend on the vector that actually does a lot of that vector instruction performance or processing at least.

Hpc7g instances, they’re ideal for another yet another type of HPC workloads. We had compute intents that we had data and memory intensive and now these are geared towards both compute and network intensive workloads. So they come with the fifth generation Nitro cards that already talked about earlier that offer 200 gigabit per second of EFA network for really fast transfers as well.

Just a couple weeks ago when we were at the Super Computing conference, SC, we’re honored to win for the fifth year in a row the HPC Wired best HPC cloud platform award that really just identifies this as being able to offer this platform that helps a lot of the HPC companies running HPC workloads out there.

So this marks the fifth year in a row that we’ve won this award and we’re really honored to have that.

Just to transition a little bit, talked about generalized compute, I’ve talked about HPC, but now let’s transition and talk about another category of workloads that’s also very compute intensive - let’s talk about machine learning.

As with processes we offer the broadest choice of accelerators as well. We offer FPGAs from Xilinx and GPUs from Nvidia and AMD and we offer our own custom accelerators as well with Trainium and Inferentia.

As workloads become more demanding, the need for acceleration with these accelerators is increasing across the board. Now machine learning is a powerful technology that’s becoming very ubiquitous everywhere. You think about machine learning there’s kind of two main aspects of machine learning: one is the training which is all about building those models, creating the models, and then there’s inference - that’s where you use those models to generate predictions based on input.

Every year we have 30,000 over 30,000 customers now build, train and deploy their machine learning applications using Amazon EC2 infrastructure services. And you know we on on the training side we have instances with powerful GPUs like the P3s and P4s. We also partner with Intel last year to offer their Habana Gaudi processors in the DL1 instance. On the infant side we have G4s and G5s powered by Nvidia GPUs that really help with that inference.

But as ML became ubiquitous, ML models really became large and complex and they’re growing exponentially in size. These large and complex models result in a rising cost of compute. I mean just think about three years ago the state of the art machine learning models were props 100 million parameters in size. Today those same models or those the models that are coming out today have grown to be hundreds of billions of parameters in size. So that’s a you know 100 multi 100x increase in size which is which is tremendous but also scary because that’s a lot of input.

This growth in model size is expanding the time to train the models from days to weeks and sometimes even months because they’re so big and so complex now.

So AWS noticed this trend and you know we started developing our own ML chips as well. We took our experience from developing Graviton processor, we thought okay why don’t we apply it towards the ML side.

So a couple months ago we announced the Amazon EC2 Trn1 instances which are powered by AWS Training processors. These are the highest performance and lowest cost for training deep learning models in EC2. They’re powered by 16 Trainium accelerators with up to 512 gigabytes of high bandwidth memory along with 800 gigabit per second of throughput. So you could do your distributed machine learning training.

Trn1 as I said it’s the first instance with 800 gigabit of networking performance but they also support native machine learning frameworks like PyTorch and TensorFlow so that it becomes easier for customers who are already doing machine learning to kind of adapt their their models to Trainium.

And you know we’re not stopping there as Art had alluded to. We are just not clicking oops. So as I have alluded to we’re actually going to come out with a network optimized version of the Trainium instance that’s going to offer even faster networking, 1600 gigabit per second of networking that will speed up trading times even further. This is incredible what developers are going to be able to do to train these ultra large models to gain greater efficiency, just faster trading times.

On the flip side we also a couple years ago we announced EC2 Inf1 instances. You know once you’ve trained your models you then have to use those models and provide input and ultimately make inferences and these inferences require quite a bit of performance to generate predictions in real time. And that’s why you know we really built the Trn1 instances.

You know they’re basically we wanted to offer this ability to have to deliver low latency as well as low cost of inference for customers. Inf1 is great for small and medium-sized complexity but you know as I said before these models are growing much much bigger.

So you know that’s why earlier this week we announced the Inferentia 2 processor that I’ll be able to handle much bigger size models, actually 100 parameter plus models actually. So now you can use training and inference from AWS with Trainium and Inferentia to handle these really large ultra complex models.

Inferential 2 will deliver up to 4x the throughput and 10x the lower latency, just one tenth of latency of what Inferentia one delivers today. So you’re going to be able to have those real-time predictions in inference, really really quick.

They offer up to 12 Inferentia accelerators and up to 384 gigabytes of high bandwidth memory so that you could have really fast inference. They will also, you know just like Inferentia one, they’ll be integrated with PyTorch and TensorFlow so you can really have an easy time with running your your models in Inferentia.

And just like Graviton we really focus on sustainability here. So one thing I’m happy to say also is that these Inferentia 2 chips they’re going to be offering 45% better performance per watt than the GPU based instances. Again we always look for offering performance, lower cost, but also you know trying to be as green as possible.

So we’ve seen customers of all sizes adopt Graviton. Why don’t we just dive into some of these stories. So Autodesk uses Inferentia for their Autodesk Virtual Agent or AVA. This is basically a chatbot that answers over a hundred thousand customer questions per month by applying natural language understanding and deep learning techniques to extract the content intent and meaning behind queries. With Inferentia all of this was able to get 4.9 times higher throughput than the GPU based instances so tremendous improvement in terms of just the the speed at which you’re able to react to the questions that are coming in.

Anthem created applications that automate the generation of actionable insights from customers with deep learning natural language models. With Inferentia they were able to get 2x higher throughput to GPU based instances so just tremendous amount of improvement in the using Inferentia.

And finally Airbnb, they were able to see a 2x improvement in throughput right out of the box by switching over the Inferentia from their GPU-based instance for BERT models and PyTorch using PyTorch. So just a tremendous amount of speed gained and lowering of cost by switching over to Inferentia.

With that I’ll switch over to Art and he’ll talk about cost optimization. Thanks Martin!

So we’re going to wrap up today by talking a little bit about cost optimization. I’ll probably have a couple minutes for a little few questions but I think that as we talked about all of this today and Martin talked about all the instances we have, I think that the final thing is, you know, everything - it’s not everything costs money to do it. So how do we do this so that we can actually help you save some dollars here?

And so we offer a large number of instances. As we talked about before, we offer x86 platform, we also offer Graviton. But in within x86 we offer two different types - the Intel and AMD instances - and even within that you get a 10% discount. But also if you use the instances in different low-cost geos you can see even further on that. So there’s instant savings and then there’s the further savings.

As Martin mentioned with Graviton, that you can save about 40% or so using the Graviton-based instances if you can do your workloads in ARM. So then we also offer a series of purchase options - on demand which is our base purchase model and we also offer spot where you you can get a discount where you use our extra capacity up. And then we also offer savings plans as well.

And then we deliver tools to everybody to try to save and provide recommendations, those tools such as Cost Optimizer can give you a recommendation or where to use another instance without reducing any performance but also getting better cost.

So what are those performance models? What are those savings models look like? As I mentioned before, it’s on demand is our baseline here. Savings plans which require a commitment to us for you to get a discount, however with that you could get substantial savings. And finally spot instances.

And since I really didn’t want to end today without any statistical final statistics, just want to close with a handful of things here. So right now using savings plans, customers, since we introduced our savings plans in 2019, customers have saved over 15 billion dollars by using the savings plans. So this is a pretty large amount of money that can be saved if you sign up here.

The other thing is this is since we started the spot spot instances, customers have actually saved 10 billion dollars in the last five years. Actually at dinner last night I was talking to somebody, a Japanese customer of ours, who’s sitting there and he discovered that like between 1am and 5am in our office in our region in Tokyo that there is plenty of spot capacity. And his instances can be automatically programmed to run between one and five and they don’t mind a little bit of disruption here or there.

And so if you’re in that category there’s an easy way to take advantage at either odd hours or using other alternative regions. A really great way to save a lot of money is while using spot.

So Dave Brown likes to show this chart which means I like to show this chart as well since I work for him. But here we can show you how we think your kind of configuration should be baseline. So at the bottom here in the pink is the savings plans. So if you have a lot of, as much of the work that you have, that steady state you can list it and put in a savings plan to save money on it.

Then occasionally there is work that you need in that on-demand bucket, costs a little bit more, but it was something you couldn’t have prepared out years or months in advance for it. And then finally for workloads that you have that are in that can afford some interruption you can use spot instances to save money on that.

A great example of that is potentially if you’re doing payroll for your company. Payroll is something that’s going to come out let’s say twice a month, it doesn’t have to happen all on Thursday afternoon. If you pace it out it could be interrupted over time just as long as you achieve the goal of delivering pay to the employees every two weeks or what have you. But in the meantime you could save yourself a lot of money during that calculation.

So my final stat today before we close this is we have since we introduced Compute Optimizer we have offered 10 billion recommendations. This includes recommendations for a little over 80% of EC2 usage that has been out there. So this is an easy way and I highly encourage everybody to save.

In a different presentation that I gave last year on cost savings we actually I brought up and showed how to use Compute Optimizer. But everyone has access to Compute Optimizer and is based off of real data that is offered to you, the customers, based off of your data. So it’s not random data, it’s based upon your data usage patterns that is delivered to you.

So we do an optimization for you based off your EC2 inferences. We’re a huge data company, Martin just talked to all of this stuff about machine learning, we use that ourselves to provide these recommendations to you and that is the key for our success. And we also offer AWS support resources as well when you use Compute Optimizer.

So I just want to thank all of you for coming out here today. I want to thank everybody for coming to re:Invent. If you have any questions I’m happy to take them here, you can step up to the mic over here on the my left side, your right side of the room. Otherwise I just want to say thanks! [Applause]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值