Good afternoon, everyone and welcome to CMP 225. Sorry, it seems a little loud. Uh I'm Art Bowl. I'm a Principal Product Marketing Manager in the EC2 Core group uh here today and I'm here today with Martin Yip who will also be presenting with me, Martin. Why don't you introduce you?
Yeah. Hi, everyone. So happy to be here today. So excited to be back in person, back on stage. Um so I lead EC2 product marketing uh for compute as well as networking. I'm happy to uh be here and uh interact with everyone today.
Alright. So we're gonna kind of go back and forth a little bit throughout the deck trying to make it interesting. And if we have a little bit of time, I'll give time for some questions at the end if we can do it. Just a note to my left. Your right here. There's a microphone, if you could step, step up to the microphone, if we don't happen to have enough time for recording, Martin and I will make ourselves available. There is another session in the room afterwards, so we'll do it outside right after. If we run out of time to get through this.
So today, my, our plan is to cover some of the last 12 months or so of launches and other information that we've had with EC2, try to demystify some of EC2 for everybody and talk about directionally where we've gone and a little bit about how we have developed EC2.
So I'd like to start with this little thing here. The light bulb, our goal in EC2 is actually to make EC2 as simple in the cloud as turning on a flipping a light switch to make it easy for you. And hopefully by the end of the talk today, I'll help have demystified a little bit of EC2 enough so that we can all agree that it is something that is pretty easy.
So I wanna start with a couple of stats to kind of talk about the global breath of amazon and ec2. So to date, more than 30 billion instances have been launched in amazon ec2, since we launched back in 2006. This is an incredible number of instances that have launched and that speaks to the breadth of the network that we have to devise the reliability that's needed. And I'll talk to how we have done that so far. Everything we have done has been built upon what we call the cloud pillar, something that was here uh that we started with on day one. If you had ever taken a look at jeff barr's 2006 blog. Those two things were we want to provide you our customers with the tools and services to securely and reliably work in the cloud. And the second is, is to provide you with the best possible performance at the lowest cost. I'm gonna talk about how amazon has designed our system to be able to do that. We're gonna finish today's talk with a little discussion about cost because i think in today's current macroeconomic climate, everybody's interested in, you have to pay for this stuff. How do you do it in a cost efficient manner as well?
So the next thing um for those who attended or heard dave brown, dave brown is the ec2, the vp for ec2. And he says to me all the time, sometimes the some the small things make a big difference and is the our focus on some of those small things that's actually led to so many of the innovations that i'm going to talk about today. I'm going to talk a little bit about what those small things are that we paid attention to that actually resulted in the creation of some of our own silicon. Why we started that path? And i hope to invite you on that journey as well. Today.
So far today, we have roughly 100 million ec2 instances are launched per day at amazon. If you had come here in 2021 the number i would have had on the screen last year was 60 million instances. So this gives an idea about the rate of increase that has happened here at amazon ec2. And i'm going to share to you today some of the stories about how we have disrupted and revolutionized the pa the compute paradigm. If you, one of the other things i have is is we actually even uh have the patent for defining the cloud and just finding the actual virtual cpu that we use today, that's out there. Those 100 million instances incidentally comp uh correspond to roughly 700 instances are launched in amazon ec2 per second around the world. And just another note is is that in 2022 customers lost launched five times as many instances that they did in 2018, talking about how many instances our customers have used here.
So to summarize a little bit about where we're going with the talk, we're gonna talk about that global scale. I'm gonna talk about the innovations of nitro and the nitro system itself. Martin's gonna come up here and talk a little bit more about some of the compute instances and the new in the new introductions we've had and then i'll come back up and talk a little bit more about cost again.
So as a reminder, we celebrated our 16th anniversary this year in august of 2022 and this represents a portfolio of the regions that we have had in the 1st 10 years, we introduced 11 regions. Today, we have 30 regions that have launched including one just two weeks ago. In hyderabad, we have five other announced regions and we have 96 availability zones. Amazon as a cloud provider has more, has the most availability zones and more. Most of our regions have a minimum of three availability zones in them. Why do i say talk about that? This talks to the reliability and our focus on providing reliable and availability services all the time. So this is like having multiple data centers within those regions. But these regions are spread throughout the globe and our customers have very variations of where they need that performance. And in many cases, you want to have the performance even closer.
So we introduced local zones. So we in, we announced that we would have an additional 30 local zones that we were going to add in 2021 and we've already begun that process to the original 17. So today we have 25 local zones spanning the globe. And what these local zones do is allow aws compute to get even closer. These local zones are spread through many major cities throughout the globe. And this is an effort, especially if you have applications that are perhaps you're streaming video or you're doing gaming. The local zones allows u allow us to provide closer connections to customers that are out there.
And in addition to local zones, we're now, we have now also offered aws outposts. Outposts are our our vision of being able to deliver compute to many customers. Even in an on premise fashion. Outposts are something that we can deploy in one u to 42 u racks. And that means that we can provide essentially an entire data center on premise if needed or we can actually include something within an iot situation in in on premise as well as needed. And what this does is is this helps reduce the cost that it takes to work with the cloud. These outposts are fully managed by aws and they allow customers such as yourselves. The opportunity actually use one group of api s when managing the cloud and also managing content that is placed in the on premise environment.
Some of the customers that are currently using outposts are pretty wide and varied here. They talk about customers in many different industries. Uh i will also mention that earlier, we had uh nasdaq on stage with dave talking about how they have actually leveraged many of these different places using both local zones and our outpost premises to in order to do the entire financial network here in the aws cloud.
So why aws infrastructure, our infrastructure comprises everything from routers, low balancers, custom servers and semiconductors to our own custom software and silicon. But all of this is design and purpose built for the cloud and designed and developed by aws. It's all unique to us. And then the question customers and other people ask me all the time is, is why do all of this stuff yourself, why not take off the shelf components? The answer is, is, is we started to realize that by designing the servers ourselves, we can improve the overall reliability if the problem happens in the server that we can handle it and fix it very quickly without having to rely on a third party. And that was a small thing that fix and reliability that resulted in not realizing that we could expand that to other spaces as well.
Years ago, we introduced our load balancers, which are the heart of being able to make sure that we can optimize the performance and balance the equipment between different servers within our data centers. We introduced our own custom software, our hypervisor, which i'll talk a little bit more about which helps uh make sure that that lightweight hyper hypervisor does not consume any of the cpu resources allowing us to provide higher performance than competitors into the cloud.
Just this week, we announced that we have actually over 600 instances and amazon offers amazon web services, we offer more instances than any other cloud provider. But the 600 number of people sometimes will tell me there are a lot of instances that creates a lot of confusion. But i would say that the instances are actually very tailored and you can really narrow down the instances to the salient few that are important to your workloads.
So if we take each of these here and start taking about the categories and choose the category. So for example, let's say you need a general purpose workload. Perhaps you're doing a web service or hosting web services, you take a look at the general purpose workloads and then you look at maybe the processor type in the second capabilities category. Perhaps you need to do the work on x 86 or maybe you're flexible and you can use the arm architecture. Now you further narrowed what you need to do down to a handful of instances a md or intel in the case of x 86 or the arm being the graviton processors. And finally, you can use if you're going to use a managed service, further narrowing it down. So every time you choose the workloads that 600 list gets shrunk down to the ones that are most important for you. And these are the real building blocks that we believe that everyone can work off of and how we have designed our network.
Back in 2006, we had one instance, the m one, the m one instance came with one gigabyte of networking uh that was available, which at the time, we thought was really, really impressive. But since then we realized the customers don't need just a one size fits all instance. And that wasn't really practical for us. So we introduced general purpose which are those m instance types, we then did compute optimized, we then began offering memory optimized and then we added our accelerated computing which included, concludes our f pga s, our gp us the machine learning platforms. And we also added storage optimized instances as well.
So we've tried to break that down and uh some what may be unknown fact is, is you can actually still access the m one instances. Should you decide you wanted to deploy these uh to this day in the, in our cloud system? And we also even have a class on how you can use all of our older instances. We have not deprecated a single instance since the launch since our launch in 2006. So customers can take a look at all of the.
So what is this innovation pace? I put this chart up. I didn't go all the way back to 2006. Here we start in 2010 and you can see the various kind of milestones if you will and you can see that accelerating platform here. I wanted to just call out a couple of things, one, our introduction inferential two, which martin will talk a little bit more about that's happening in 2023. But i also want to call out two other items, graviton back in 2018. And the fact that we've actually introduced three generations of graviton since that introduction in 2018. But you know, if you think about the pace of silicon innovation, that's pretty rapid and also this innovation of the nitro system, which is where the accelerated acceleration of our instances began.
What is that nitro system? It's more than this pretty picture. But this pretty picture gives me a great idea of the architecture or you and myself an idea of the architecture that we've designed here. The nitro system is unique to amazon. What the nitro system has allowed us to do is actually develop all these instances even faster and deploy them faster for you. The other thing about the nitro system is is is focusing on this has allowed us to deliver better performance to customers that are here.
So the nitro system produces and provides an overall performance benefit to customers at a component level performance. And how do i show that performance level? This is a chart here. So stage to my uh the far right of the the slides here, your left is this just a standard uh information about using the benchmarks? This is spec here. So basic compute information and most cloud providers have roughly the same performance there. However, if you look to the right to what i would characterize as real world workloads, you can see the variation starts to be pretty significant, especially if you look at redis, we have a 20% 27% advantage versus some cloud providers or mem cash where we have up to 22%.
So all in all using amazon web services, we believe you have about a 15% per better performance simply by using the amazon web services. And that's the equivalent of almost a full generation of cpu performance just by coming here.
So what why we are we able to do that part of that is that nitro system i have offloaded a ton of function that many other people have to place into the cpu. That allows me to provide more performance. Just as a note about this data, this is based on our sixth generation instances which were launched just in 2022. So this is a direct comparison here for the ice light processor on the x 86 side.
So why build your own chips? So those chip, these chips allow us to do specialization. It allows me to provide specialized security to the platform. It allows me to better and faster deliver things. As i said, since graviton launch in 2018, we've delivered three generations of a processor. This is an incredible space pace, excuse me of innovation in the silicon space and it allows us to deliver on other innovations. We'll talk about the nitro s sds and also talk about graviton as we get there as well. And finally, this is core of our security as well. We believe at amazon that we are not interested in touching or viewing any of customer data, your data and your workloads and the security of them are our biggest priority. I cannot and no operator at amazon can physically access any of your workloads or your data. This is core to how we have done this to uh to move this off there. I'm gonna talk briefly about security as we go forward. But the nitro system, as i mentioned before isn't just one thing. It's a combination of our cards which here allow us to do vpc networking or ebs services, storage and controller. And they also deliver a security chip to us just as we power on security is very, very important as we boot up our system every day or any time. We do a reboot. We crypto, we do cryptographic attestation, oops um on here to verify the images on there. If there's ever a problem with the images in the system, we do not boot and we rectify but keeping this away from customers such as yourself, allows us to make updates and allows us to do these improvements without having a direct customer impact, including when we update software and the operating system and make these type of changes. And finally, that lightweight hypervisor which uh allows us to better do and offer our scheduling of the instances. Excuse me.
So first, i'm gonna talk about networking here and then what we've delivered, as i said earlier, back in 2006, we deliver one gigabit per second of networking, something we were pretty proud of. And then we made a pretty large leap back in 2019 to 25 gigabits and now 50 gigabits in the, in our networking performance, we introduced the, our network enhanced instances. And this year earlier this week, we introduced that we now have network enhancements in our sixth generation, up to 200 gigabits per second. I'll just call out the very large 1600 gigabytes on the far right side and answer the question of where the heck is this? And who uses this? This is in our machine learning space in the trn one n uh area. So for folks doing training, if you're doing machine learning workloads, you know, in the last like 5 to 10, 5 years or so, you know, you move from uh doing and trying to do millions of mach of uh models to billions and you need this additional networking performance out there. So we'll talk a little bit about trn one n as we go through here, but that's what's on the outside here.
So this week we introduced our ec2 6 generation optimized instances these offer, as i said, the 200 gigabytes per 2nd 200 gig in networking bandwidth. And that means that you can increase your data transfer from s3 by two x which is pretty cool and these instances support up to 80 gigabytes of ebs bandwidth.
So next we introduced storage
So a lot of people, you know, coming from a silicon background, I'm not sure I always consider storage silicon, but it's in the same relative family I would say. But why would we start focusing on silicon? Everyone usually talks about performance on the CPU side of the house. But the reality is is if you can't access the data that you're going to be doing something with or the time it takes to access that data, the latency that it takes, it becomes a performance problem as well.
So we recognize this and we introduced our Nitro SSDs before back in 2021 at Re:Invent, we started this path here with the Nitro SSDs. What this has done is it offered us the ability to provide 60% lower IO latency performance and up to 70% reduction in latency variability, two key elements when you access in SSDs. Our SSDs are also encrypted with AES 256 encryption as well.
So earlier this year, uh actually, yeah, we introduced I4i, which is our first storage instance using the x86 platform here. This uses the Intel's Xeon Ice Lake performance and we offer 30% better compute than the I3 performance that was here originally. We also have two other Graviton based instances, the IM4gn and the IS4gen. So the difference between the two of them, aside from the slight name is is the memory to vCPU ratio that's here. So the IM4gn is a 1 to 4 vCPU to memory ratio and the IS4gen is a 1 to 6 vCPU ratio. So depending upon what you needed and what the, what's there.
But since then, since these introductions earlier this year, we haven't actually stopped there earlier this week. Actually, yesterday, we introduced TornWrite protection here. This has been enabled off of uh our Nitro SSDs for the I4i storage here. And what this does is an innovation that we've delivered for all of our customers so that you can have and we can reduce SLAs for customer workloads here as well to improve overall performance and security of these. We increase up to 30% better data transfer using the uh TornWrite protection.
Excuse me, I think there's no better way though to talk about this than to give a brief customer example. And I might mention Splunk was actually here uh yesterday talking about how they have used and leveraged these. But one key element about Splunk is is they do a lot of data collection and they have to analyze data and reports for their marketing reports really quickly. And so that's why I think it's a great example here using the Graviton 2.
So the IS4gen instances they've experienced almost a 50% better uh uh decrease in their search runtime performance here, which results in better productivity for customers here. So I talked a little bit about networking. We did storage. I'm gonna finish here talking about security.
So today Amazon is trusted by millions of customers around the world for providing protection and data services and it's key to that. So we call a lot of people in this industry call this confidential computing. But I think the definition, there is no real dictionary or Webster's definition of confidential computing and what it is. So I'm gonna offer our take on confidential computing here.
We see confidential computing as something falling into two dimensions. The first dimension of confidential computing is to protect data of the cloud provider, which is pictured here. That's us. So we want to protect your data from us. The second dimension here that comes up is this potentially protecting data within that cu uh customer. Perhaps you have two entities that you don't need, you may need to isolate and you want to protect personal data.
Think about an example I have with that we published a blog with the um uh Trade Desk. This is a u they were using UID 2 which is a, you may never have heard of UID 2, but I'm sure everybody in the room has at least accepted a cookie once or twice on their browser UID 2.0 is AAA version to be able to protect your personal and information using the using this. So be able to provide you still a customized ad. Sorry, everybody needs to get a customized ad to pay for things, but without having to exchange too much personal information. So that's a great example of dimension number two, which we'll talk about using here.
We introduced Nitro Enclaves in 2020. We introduced Nitro Enclaves two years ago and I'll talk about that briefly. So our security earlier this year, we did Nitro TPM, which is our the introduction of Trusted Platform Module in EC2. So you can now use this to protect secrets or keys and other sensitive data on EC2. So going back to the second dimension, protecting the other data that you have.
So Nitro Enclaves provides fully isolated secure environments to be able to do development and testing. We introduced these Nitro Enclaves, as I mentioned two years ago, in order to solve problems, customers had come to us about talking particularly about personal data that you wanted to protect. They are hardened, constrained environments that can only be accessed as I said here through a local shell.
Then in the last two months, we've introduced two new features associated with Nitro Enclaves. One Nitro Enclaves are now available on Graviton and Dave Brown announced earlier that we now also offer Nitro Enclaves in EKS pods. So from your EKS pod, you can also launch an enclave as well. So further enhancing our on our features here.
Next, I'd like to have Martin come up here and Martin's gonna start talking a little bit about where we are going. And some of the instances we introduced from the x86 and the ARM platform.
Martin: Thanks Art and uh hello, everyone again, excited to be here. So as Art had mentioned, AWS offers the broadest and deepest choice for our customers. Why do we do this? It's important for customers to be able to tailor their infrastructure to the workload needs. And part of that is offering customers choice in terms of processor across Intel, AMD and AWS. Let's dive into Intel first.
So it, so Intel has been uh with AWS from the very beginning back in 2006, the M1 instance that Art talked about they were with us um powering that uh instance. And since then, we've grown our partnership and now we have over 350 instances together spanning every single uh compute category and every single region.
Um Art had mentioned earlier about the network optimized instances that we launched earlier this week. Um but I really want to dive into some of the other uh instances that we have launched earlier this year. So the Amazon EC2 C6id, M6id and R6id instances are um these are our sixth generation uh dis variants of the core instances that are powered by third generation Intel Xeon process a k a Ice Lake and come equipped with uh NVMe attached storage, they come equipped with up to 7.6 terabytes of uh the local MB a storage and delivers up to 15% better price performance over uh comparable previous generation instances.
Um it's not just about the storage. So it's also about faster processing and faster networking, right? It's really each, each generation we try to offer customers more in terms of performance and price performance. So with these uh instances, they offer uh 2x faster networking and 20% higher memory bandwidth than previous generations, they also come with support for the new Total Memory Encryption that encrypts the the the memory of of the uh the instance itself, these instances um the the main difference between Cm and R is, as Martin mentioned is really the vCPU to memory ratio, whether you need compute optimized with 1 to 2 ratio, uh general purpose with a 1 to 4 ratio or memory optimized with a 1 to 8 ratio of vCPU to memory.
So these workloads are I mean, sorry, these instances are ideal for your general core compute workloads um including things like enterprise workloads, databases, backend and front end servers as well as many other things. uh but specifically those that need uh you know, additional access to high speed low latency storage. And as you know, at AWS, we don't stop innovating.
We've also announced earlier this week that we are introducing the fourth generation of Intel Xeon processors into our portfolio. And the first instance to have that it's gonna be Amazon EC2 R7iz instances. Um these instances are both high frequency and memory optimized instances. Um again, they're powered by the fourth generation um Intel Xeon processor a k a Sapphire Rapids. They'll have up to 128 vCPUs um with one terabyte of memory. And you know that results in 2.6 x more vCPUs and compute uh versus previous generation instances, 20% higher memory bandwidth as well as um you know, these are gonna be the first x86 instances um to have DDR5 memory.
So, you know, that means just much faster memory overall. Um and you know, 2.4 x higher memory bandwidth, as I said, these are high frequency instances. So they're really designed for things like HPC workloads or uh databases that have high per core licensing or gaming that requires just really big beefy instances. Um so we look forward to delivering these in uh 2023 coming up.
Um next, let's dive a little bit into AMD. So AMD has been a great partner. We were the first uh cloud provider to offer AMD in our portfolio uh back in 2018. And today we have over 100 instances with AMD. Um they spanned just about every category also.
Um and they offer better economics in terms of you get the performance that you need. But at a 10% discount versus uh comparable x86 instances, uh let's dive into some of the instances that we launched this year with AMD.
So earlier this year, we launched the sixth generation uh compute optimized general purpose and memory optimized um C6a, M6a and R6a instances. These again are kind of the core compute instances that are the, the workhorses of your portfolio. They're powered by third generation AMD EPYC processors with up to 192 vCPUs up to 1TB or 1.5 terabytes of memory and 50 gigabits per second of networking.
Uh and also 40 gigabits per second of EBS bandwidth. Depending on the instances there offer significant performance improvements over previous generation, 35% better performance improvement for the uh M6a and R6a. That's mostly because we're doing a two generation jump from the first generation um Naples processor to the uh Milan processor. And with the compute optimized, we're jumping from Rome to um to, to um Milan. So you're, you're getting a 15% price performance jump there offer, you know, a lot faster network and 2.5x faster networking and up to uh two x faster EBS bandwidth.
So you're gonna be able to transfer data uh make transaction to and from EBS a lot faster. Uh and as I said, they are, they, they're, they're great performance, but you get also the 10% lower cost versus other x86 instances out there.
Now, let's talk about um AWS and and the Graviton processor. So one thing Art talked about earlier was Nitro, right? And you know, with Nitro, we actually got a lot of experience building ARM hardware as well as building uh software for the ARM ecosystem. And we really took that experience and applied it towards building our own ARM based processor. The Graviton processor, we launched our first Graviton processor in 2018 and you know, customers loved it but they wanted more.
So we offered uh them the Graviton 2 processor a year later today, Graviton 2 based instances. We have over 100 Graviton 2 based instances across every single uh category. Um and you know, these Graviton instances are great in the sense that they offer 40% better price performance over x86 space instances.
So just for that effort, in terms of just migrating over, you're, you're getting a significant price performance improvement. Yeah, but customers wanted more. They, they, they liked the power, they liked the cost benefits of Graviton 2, but they wanted even more performance.
So last year at re:Invent, we announced Graviton 3 processors. And earlier this year, we developed, we delivered the first instance, the C7g based on Graviton 3 Graviton 3 processors are 25% higher performance than Graviton 2 processors.
So they, they offer a lot more uh performance that, that you would need. Um they also offered the 2x higher floating point performance for uh to for things like your computer intensive workloads.
Um they were the first instance in the cloud to offer DDR5 memory, which um as I mentioned earlier, offers that really big boost in terms of um memory bandwidth performance, 50% more memory bandwidth performance. And perhaps most importantly, they're really um uh sustainable in terms of just being carbon friendly.
Um they use 60% less energy. Well, Graviton 2 and Graviton 3 actually use 60% less energy for the same performance than um than other uh comparable CPUs out there. So really, really um important to a lot of companies, you know, just being able to get up performance cost but also being good to the environment as well.
So I mentioned the C7g earlier, but let's dive a little bit deeper into this. Uh C7g is the first instance that's powered by Graviton 3 instances. They offer the best price performance for compute intensive workloads in the EC2 portfolio um with up to 64 vCPUs and 128 gig of bytes of memory, 30 gigabits per second of network performance and 25 gigabits per second of EBS performance.
Um as I mentioned before, because they're powered by Graviton 2 back Graviton 3, they get that 25% extra performance and uh 2x higher floating point performance. And really, they're great for uh your computer intensive workloads.
And then earlier this week, we also announced a network optimized version of the C7g. Um these are great for not just compute intensive workloads but workloads that are compute intensive but also network intensive. They are built using the new uh version of the Nitro card. So you get up to 200 gigabits per second of network bandwidth and up to 50% higher packet processing performance.
So really great in terms of data transfers uh to and from EBS as well as S3 and and other data intensive activities. Um and because they're uh built on Graviton, you get that same Graviton uh performance benefit that I mentioned earlier, really great for network intensive workloads, things like network virtual appliances, uh analytics and you know, even CPU based uh machine learning.
So customers love Graviton. Uh today, we have uh customers across all industries from um start ups to enterprises. Um all sectors, we have over 40,000 customers actually using Graviton. So quite significant growth over the uh the four years since Graviton was first introduced, let me dive a little bit into some of these customer stories.
So, Epic Games, they're the makers of um the Unreal Engine as well as games like Fortnite and Gears of War. Uh they use Graviton for their game engines and found that Graviton was very suitable for the demanding latency sensitive workloads. Um and they also, while you know, being able to run uh massive multiplayer gaming, they're able to also get that significant price performance benefit.
Formula One. You know, perhaps the world's most prestigious and well known motor racing competition series uh uses Graviton for their computational fluid dynamics workloads to help them really model how their race cars will, will happen, will, will, will drive in um various kind of wind and uh air dynamics uh simulations so that they can basically be produce a better race car and uh a better performing race car.
And then HKayo, uh they're the maker of an observable platform um has gone all in on Graviton. Uh Liz Fong Jones, actually, when she used uh Graviton uh 2 in the Graviton 3, she, she thought it was a sorcery, how easy it was to migrate to Graviton and get all that performance and cost benefit.
And speaking of that, you know, we talked about a lot about performance and cost, but really the third pillar of uh benefit is just that just the ease of migration. One of the things that we did from day one with Graviton, which is make sure that the partners were ready.
Um so we worked a lot with partners, including all the uh Linux operating systems, the ISVs and the open source softwares to make sure that they were ready to run uh and support Graviton at launch. And, you know, we, we've been really successful with that as the community has grown, you know, today, as I said, you know, all the major Linux operating systems development tools and open source software support Graviton.
Um so we're really happy just to see that adoption out there. And, you know, speaking of Graviton, it's not just about the Graviton instances which you might be familiar with, but it's also about managed services.
Um whether you want to do it yourself with Gravitons or use AWS managed services like RDS and Elasticache and Neptune and um EMR and more, you can do that and get the same uh price performance benefit with Graviton.
Um last year, I mean, earlier this year, actually, at Innovation Day, we had announced the Graviton Fast Start program which makes it easier for you to migrate uh over to Graviton, whether it's EC2 instances or managed services and get that great performance and price performance benefit.
Um you know, we've seen customers use Lambda and migrate over to Graviton and get that benefit in as little as four hours. So it's, it's a really quick and easy way to start adopting Graviton.
So that's a lot of choice for our customers, but there's actually one more um partner that we have not spoken about yet and that partner is uh is Apple. So, not too long ago, we began working uh with Apple to enable, you know, mac workloads on the, the AWS EC2 platform.
We introduced the first Mac instances in 2022 years ago. Uh and they feature macOS and Mac mini hardware and you know, the um the ease of use, the the cost benefit of pay as you go and, and all the the tools of AWS and customers loved the experience but they wanted more.
Earlier this year, we actually announced the EC2 M1 Macs system that are powered by Apple silicon, the M1 chip. The M1 chip integrates the CPU, GPU, neural engines and IO and much more into a single chip.
With the M1 Macs, you get improved performance, actually 4x better performance for builds and things like that compared to the x86 based Mac instances. And yet you still get all the benefits in terms of security reliability and the ability to connect with other AWS services and use other AWS services.
Customer feedback since we launched has just been amazing and obviously we're gonna do more in the future.
When we start off with this cloud journey, we talked a lot about breadth and depth and being able to support every single workload. But there was one workload that we thought it's really difficult to support - high performance computing.
HPC was one of the circles that people thought could never be run in the cloud because it's so compute intensive. It not only requires compute but a lot of storage, a lot of memory, fast networks and things like that.
HPC uses to tackle some of the world's biggest challenges from drug discovery to genomics to energy utilization and much more. Customers have tried HPC on AWS and they've been very successful - customers like Formula One, AstraZeneca, Lawrence Livermore Laboratories have partnered with AWS to solve some of the toughest challenges on the cloud.
We've worked for many years making sure that HPC workloads run well on AWS. We've built services like Batch that help you run hundreds of thousands of computing jobs in the cloud or AWS ParallelCluster, which is an open source cluster management tool that lets you automatically set up the required resources for HPC workloads, Elastic Fabric Adapter that allows you to scale your applications to thousands of CPUs and GPUs and run distributed applications. And also Amazon FSx for Lustre that provides you a fully managed shared storage for scalability and performance.
But customers really wanted and needed the EC2 instances to be HPC optimized as well. They were telling us that you can't create a performing cost effective foundation for HPC without the HPC optimized instances.
So over the years, we have built many of the EC2 instances that are capable of running HPC workloads. Earlier this year, we announced HPC G4dn that are based on the AMD 3rd generation EPYC processor. They offer 65% better price performance for HPC.
We have C7g and C6i and C6a but customers want more. So I was happy this week that we announced HPC6id.
HPC6id addresses another kind of HPC workload. Whereas HPC6a addresses compute intensive workload, HPC6id addresses workloads that are more memory and data intensive. They offer 600 gigabit of EFA networking, so you get much faster networking.
You get much better price performance, about 2.2x better price performance for data intensive HPC workloads. And this is really targeted towards finite element analysis that require modeling of the performance of complex structures like wind turbines, concrete buildings and industrial equipment.
HPC instances are just really designed to deliver the leading price performance for those data and memory intensive HPC applications. These HPC6id instances are powered by 3rd generation Intel Xeon Scalable processors and deliver the best price performance for this category of workload.
They’re really ideal for those tightly coupled HPC applications. And they deliver the best per vCPU compute and memory performance for those applications.
We also announced another HPC instance this week, which is the HPC7g instance. These are the first instances that are powered by Graviton based processors. And they include a new type of Graviton processor, a new version called the Graviton3e.
What these Graviton3 processors do is they remove the performance limits of Graviton3 to unlock performance that includes 35% better vector processing performance which is great for HPC applications that really depend on that vector processing performance.
And these HPC7g instances are ideal for another yet another type of HPC workloads - we had compute intensive, we had data and memory intensive, and now these are geared towards both compute and network intensive workloads.
They come with the 5th generation Nitro cards that I talked about earlier, they offer 200 gigabit per second of EFA network for really fast transfers as well.
Just a couple of weeks ago when we were at the Supercomputing Conference SC we're honored to win for the 5th year in a row, the HPC Wire Best HPC Cloud Platform award that really just identifies us as being able to offer this platform that helps a lot of the HPC companies running HPC workloads out there. So this marks the 5th year in a row that we've won this award. And we're really honored to have that.
Just to transition a little bit - I've talked about generalized compute, I've talked about HPC, but now let's transition and talk about another category of workloads that's also very compute intensive - let's talk about machine learning.
As with processors, we offer the broadest choice of accelerators as well. We offer FPGAs from Xilinx and GPUs from Nvidia and AMD. And we offer our own custom accelerators as well with Trainium and Inferentia. As workloads become more demanding, the need for acceleration with these accelerators is increasing across the board.
Machine learning is a powerful technology that's becoming very ubiquitous everywhere. You think about machine learning, there's kind of two main aspects of machine learning - one is the training which is all about building those models, creating the models, and then there's inference - that's where you use those models to generate predictions based on input.
Every year, we have over 30,000 customers now build, train and deploy their machine learning applications using Amazon EC2 infrastructure services. On the training side, we have instances with powerful GPUs like the P3s and P4s. We also partnered with Intel last year to offer their Habana Gaudi processors in the DL1 instance.
On the inference side, we have G4 and G5 powered by Nvidia GPUs that really help with that inference.
But as ML became ubiquitous, ML models really became large and complex and they're growing exponentially in size. These larger and complex models result in a rising cost of compute.
Three years ago, the state of the art machine learning models were perhaps 100 million parameters in size. Today, those same models or the models that are coming out today have grown to be hundreds of billions of parameters in size. So that's a 100x increase in size, which is tremendous, but a little scary because that's a lot of input.
This growth in model size is expanding the time to train the models from days to weeks and sometimes even months because they're so big and so complex now.
AWS has noticed this trend and we started developing our own ML chips as well. We took our experience from developing Graviton processors and we thought, ok, why don't we apply it towards the ML side?
So a couple of months ago, we announced the Amazon EC2 Trn1 instances which are powered by AWS Trainium processors. These are the highest performance and lowest cost for training deep learning models in EC2. They're powered by 16 training accelerators with up to 512 gigabytes of high bandwidth memory along with 800 gigabit per second of throughput so you could do your distributed machine learning training.
Trn1 is the first instance with 800 gigabit of networking performance. But they also support native machine learning frameworks like PyTorch and TensorFlow so that it becomes easier for customers who are already doing machine learning to adapt their models to training them.
We're not stopping there. As I have alluded to, we're actually going to come out with a network optimized version of the Trainium chip that's gonna offer even faster networking, 1600 gigabit per second of networking that will speed up training times even further. This is incredible what developers are gonna be able to do to train these ultra large models to gain greater efficiency, just faster training times.
On the flip side, we also a couple of years ago announced Inf1 instances. Once you've trained your models, you then have to use those models and provide input and ultimately make inferences. And these inferences require quite a bit of performance to generate predictions in real time. That's why we really built the Trn1 instances.
Inf1 is great for small and medium size complexity. But as I said before, these models are growing much, much bigger. So that's why earlier this week, we announced the Inferentia 2 processor that'll be able to handle much bigger size models, 100 billion parameter plus models actually.
Now you can use training and inference from AWS with Trainium and Inferentia 2 to handle these really large ultra complex models.
Inferentia 2 will deliver up to 4x the throughput and 10x lower latency, just 1/10th of the latency of what Inferentia 1 delivers. So you're gonna be able to have those real time predictions and inference really, really quick.
They offer up to 12 Inferentia 2 accelerators and up to 384 gigabytes of high bandwidth memory so that you could have really fast inference. They will also, just like Inferentia 1, be integrated with PyTorch and TensorFlow so you can really have an easy time running your models in Inferentia.
And just like Graviton, we really focus on sustainability here. So one thing I'm happy to say also is that these Inferentia 2 chips, they're gonna be offering 45% better performance per watt than the GPU based instances. So again, we always look for offering performance, lower cost but also trying to be as green as possible.
We've seen customers of all sizes adopt Graviton. AutoDesk uses Inferentia for their AutoDesk Virtual Agent or AVA. This is a chat bot that answers over 100,000 customer questions per month by applying natural language understanding and deep learning techniques to extract the content intent and meaning behind queries.
With Inferentia, AutoDesk was able to get 4.9 times higher throughput than the GPU based instances - tremendous improvement in terms of the speed at which you're able to react to the questions that are coming in.
Anthem created applications that automate the generation of actionable insights from customers with deep learning natural language models. With Inferentia, they were able to get 2x higher throughput to GPU based instances. So just tremendous amount of improvement with using Inferentia.
Airbnb, they were able to see a 2x improvement in throughput right out of the box by switching over to Inferentia from their GPU bases for B models and PyTorch. So just a tremendous amount of speed gained and lowering of costs by switching over to Inferentia.
With that, I'll switch over to Art and he'll talk about cost optimization.