Compute innovation for any application, anywhere

Please welcome Vice President Amazon Elastic Compute Cloud Peter DeSantis.

Hello, everyone and welcome to re:Invent 2023 and welcome to the Compute Innovation talk. It's great to be here. I'm so excited to see so many of you and we have a lot of compute innovation to cover over the next hour. So let's get going.

This is where it all started, Cape Town, South Africa. I joined a small team of 14 people in 2007 working on what was EC2. And I can tell you back then, we literally had no idea what we were building and we certainly had no idea that it would become what EC2 has become today.

But this is a Compute Innovation talk and we shouldn't really get deeply into compute innovation without taking a look at some of the pivotal inventions that have happened in the compute era. Every evolution of compute that we experience today has roots in the endeavors of the pioneers that have gone before us.

This is an image of the first transistor created by Bell Labs which processed streams of bits back in 1947. Then there were the first vacuum tubes known as thermionic valves, but who inspired these inventions?

Well, if we go all the way back to 1815, Ada Lovelace is commonly referred to as the first programmer. Now back then, Ada didn't have many great compute solutions. In fact, all she had was design documents for a project known as the Analytical Engine and a very early design of a mechanical general purpose computer. She actually called the CPU the Mill and she called the storage system, the Store.

Now given these documents, design documents, Ada was able to write what is believed to be the first algorithm intended to be processed by a machine. This was over 200 years ago and she was able to envision a future where computers could produce content directly from human interactions. In fact, content beyond arithmetic, like art and music.

The incredible thing is that today, we find ourselves in an era where Ada's theories have been validated, large language models are creating images, entire symphonies and even writing poetry for us. What could Ada have achieved with these capabilities if she was around today? That is what motivates us as AWS to constantly push the boundaries of innovation on your behalf. And we are constantly amazed by what you as our customers are able to do with the tools and services that we provide.

Well, since 2007, our goal has been to find ways to democratize compute, make it readily available for when you needed it. And back then, AWS was brand new. Even the idea of virtualized compute instances that could be rented by the hour over the internet was unheard of.

I remember even struggling to explain what we were doing to folks that asked. Multi tenancy through virtualization has become a force of disruption that continues to evolve today.

Well, as we grow EC2, we face two challenges. We needed to find a secure means of dealing with some of the challenges that multi tenancy presented mostly in the form of overhead. How do I ensure that one customer's workload doesn't affect another customer's workload? Second, security has been and always will be fundamental to everything that we do at AWS. We do not compromise on security.

The virtualization technology available at the time was never going to support the types of workloads that we knew customers would want to bring to the cloud. And so we had to think differently. So starting in about 2008, we completely redesigned the way that hypervisors worked, hypervisors back then ran and used the central processor on the machine. And what we found was that about 25% of the resources on the machine were actually dedicated both to the hypervisor and to services that AWS needed to run. And that only left 75% of the machine available to customers.

It also meant that we were not able to achieve the performance, whether it was compute networking or storage that we knew we needed to be able to achieve, to be able to run your workloads. And so we had to change the way this worked and started in 2012, we went on a journey that took us about seven years to develop the Nitro system.

Now, the Nitro system actually uses hardware that offloads all of the processing that we need to do. Whether that's networking, IO storage, IO security, even the management API s any part of the AWS system runs on dedicated hardware that we have inside the server, the rest of the server, the processor, the memory, the storage is of 100 available to you as the customer.

This offloading has had an enormous impact on our ability to provide you not only with the best performance in the cloud, but also with the best security. This has been a journey over the last 10 years where we have now released five different Nitro chips. Each subsequent generation has improved on the past. Along with many important dimensions including features and functionality that we had in software, we have been able to include in subsequent Nitro chips.

As we have designed the hardware lower latency, higher throughput and the ability to handle more packets per second has constantly improved in our Nitro journey. Today, the Nitra system has grown to be a core component of EC2 offloading the virtualization functions to dedicated hardware and software. It has been key to unlocking numerous EC2 innovations across critical areas of functionality.

And one of those is how we actually manage our fleet, running a fleet of instances on a single box is relatively simple. When you do it in millions of machines, it becomes a lot more challenging. But every now and then we need to be able to do some management on instances. One example is live migration. We may have a server that is under performing or we may have the need to move some customer instances from one instance, one host to another host. Live migration allows us to do that. In fact, we do millions of live migrations every single week on EC2 and customers don't even notice that that is happening underneath their workload.

Now, when I want you to do upgrades or deploy patches or improve the service, we do live updates. This is an even less intrusive process where I'm able to update the firmware, the operating system, the hypervisor, the kernel, any part of the underlying system without any impact to the running instance. We do many, many, many of these every single week.

Nitro has also enabled a faster pace of innovation. In fact, today we have over 750 different instance types on EC2. Now in post in events, whenever I put the slide up, somebody says to me, well, Dave, how do I know which one to choose? It has been a question that we have struggled to answer. We've tried to explain that there are various swim lanes or it's not really that hard to understand or you can go and look at our documentation.

But today, thanks to generative AI, I'm happy to let you know that we have finally solved the answer to that question. Today, I am happy to announce the availability of Amazon Q to help you select EC2 instances with this functionality. You are able to simply describe what you need for your workload. And the EC2 console using Amazon Q will make recommendations about which instances you should use. And this actually uses all of the knowledge that we have of running EC2 for the last 17 years noting what customers have used, which instances work best. And so I'll provide you with a few examples.

Here, we are going to enter a basic query that says help me with instance families to deploy a relational database system workload for my web app with the highest performance and Q will immediately come back with a bunch of suggestions. In this case, it is recommending the new seventh generation R7A instance which makes use of the AMD Genoa processor. It includes a few other options as well and describes each one of those.

Now Q actually retains context. So you can ask Q for an update, maybe you're looking for an Intel specific instance. So we can ask Q, could you give me some suggestions that make use of an Intel CPU and once again, Q will do a little bit of processing and come back with a set of recommendations such as our seventh generation R7I instance, which runs the Intel Sapphire Rapids.

Now both of those first suggestions also use our high memory machines which are the best machines to use for database performance optimizations. So now that we've got instance selection on the way, let's look at some of the other areas that Nitro has allowed us to improve performance.

One of those areas is in network bandwidth, had we stayed with software based virtualization all the way back in 2008 and not gone on the journey that we did to get to hardware f loading, we would have been able to achieve about 100 gigabits per second and no more. There's basically a cap, if you're doing it in software, we thought we were doing pretty well all the way back in 2012 when we launched our CC1 instance with 10 gigabits of throughput, we thought that was really fast.

Well, recently, thanks to generative AI, we've really been pushing the bandwidth limits. In fact, in 2020 we launched our P4D with the NVA100 GPU instance with 400 gigabits of throughput. And at the time, we thought we'd probably never need any more bandwidth. And as you can see since then, we've just had an incredible growth in the bandwidth requirements.

Our latest NVIDIA H100 instance, the P5 uses 3.2 terabytes of internet bandwidth per instance. And our newest Trainium2 instance which we announced today uses up to 6.4 gigabits per second. And we expect the generative AI workloads are continue, are going to continue to grow these bandwidth requirements.

Another area that we've been able to innovate in is EBS bandwidth. This is talking to your network attached storage for many years. We provided up to four gigabits, maybe 14 gigabits per second with 1775 0 IOPS. And recently we've increased that to 60 G per second with 2600 IOPS and in 2022 to 80 G per second with 3500 IOPS operations.

Well, today we are happy to let you know that we have increased that further to 100 gigs per 2nd and 4000 IOPS from your EC2 instance to an EBS volume. Originally, this was only available on an R5B instance, but we've actually extended this to all Nitro instances. This level of performance comes with the EBS I3 Block Express volume. And it's amazing to see that when we first launched this, it was only available on the R5B instance. That instance has been incredibly popular with database workloads where you are now able to get levels of IOPS and bandwidth performance. That allows you to run very, very large databases, even in our existing instances and volumes, we've been able to go back and improve performance.

And so up to four times higher max IOPS throughput and size than what EBS I1 volumes are able to provide and 10 times lower outlier latencies is what you get with the CO2 Block Express volumes. And so really unlocking storage intensive workloads on EC2, as I said earlier, security is always our number one goal in everything that we do and everything that we build and we will not compromise.

And with the Nitro system, we have been able to incorporate security at the silicon level, enhancing our overall security posture. We actually built the Nitro Security chip directly onto the motherboard of the server and that protects the server from unauthorized modifications by continuously monitoring and verifying both the instance hardware and the firmware.

In fact, any machine that lands in our data center is validated by the Nitro Security chip to ensure that every single hardware and software component in that machine is exactly as it should be from the supply chain. And throughout the lifetime of that instance, every single hardware component and firmware component is as it should be. This is a level of security that is not available anywhere else.

We have also been able to take our Nitro security and provide you with a feature called AWS Nitro Enclaves. I don't know if you've ever had this problem of trying to protect some data. And I mean, rarely protect some data on a system. And the problem always becomes well, where do you put the password? Because you can encrypt something and then you have to store that something somewhere and then you may have a key store that you store it in, but that key store has a password. And you essentially end up with this continuous problem of how do I store something securely that I can guarantee nobody can access.

And that's why we launched AWS Nitro Enclaves and Nitro Enclaves actually provides you with a hardened and highly constrained VM that greatly simplifies the handling of sensitive data. It's about bringing additional isolation to your instance to protect data in use from an unauthorized access even from within your own organization. Sometimes this is protecting against internal threats where you want to make sure that no employee has access to an SSL key password could be personally identifiable information and it actually allows you to carve out an is state environment from your existing instance and gives you the option to allocate CPU and memory resources.

Now, customers have been adopting Nitro Enclaves since we launched it a few years ago and Itaú the Brazilian bank is a great example of a customer that's been leveraging Nitro Enclaves to innovate in new and secure ways. Their digital assets department is a business unit responsible for the development of solutions using blockchain technology and Nitro Enclaves has helped them create a safe environment for the manipulation of cryptographic keys, reducing their attack surface and guaranteeing that nobody has access to those keys.

In fact, we've been so confident in the security of Nitro that we actually invited an external security company, the NCC Group out of the UK to actually do a deep analysis and audit of the Nitro system. It was just incredible to see what they were able to make a public statement which they published earlier this year that there is no mechanism by which a cloud service provider employee can access customer content stored on host instances or in encrypted EBS volumes. That's an incredible statement from a well trusted security company and really speaks to just how paranoid we are about making sure that your data on AWS is always secure.

Now, all of this innovation has not come easy. It's taken a long time, a long time and one of my favorite sayings or statements that Andy has always said is there is no compression algorithm for experience. Sometimes the only way you can get good at something is to continue to innovate, continue to take feedback on what customers tell you what works and what doesn't work and continue the process of iteration and innovation.

Well, we took the Nitro system and we got really good at programming for ARM. And so we thought there must be a way to use ARM as a server processor. This started our journey of actually building customer service that drove innovation across EC2, bringing disruptive improvements in price performance.

We are proud to say that we were the first major cloud provider to offer custom server processes for customers and, and similar to Nitro. What you can see here is that many other cloud providers are now starting to catch up. But it's super early in that journey when we've been doing this since 2018.

In 2018, we launched our very first ARM based processor Graviton 1. What we actually didn't tell you at the time is this was our Nitro chip that we were using on our Nitro cards and it wasn't actually the server CPU but the reason we put it out there, it was a performing processor was to spark the ecosystem. We wanted folks to know that ARM was actually coming to the cloud to get the operating systems and open source software to move.

In 2019, we launched Graviton II, it is just an incredible process and has helped customers cost optimize by improving price performance for their workloads. And then in 2021 we launched Graviton 3 to provide even more performance. Graviton 3 providing 25% better performance than Graviton I.

Today we have over 100 instances using our Nitro processors, including a number of instances, using our very latest Graviton 3 processor. In fact, we've built over 2 million Graviton processes since we started the journey two offers up to 40% better price performance with these processes and 60% less energy usage than you would get from using comparable EC2 instances which not only improves your sustainability and carbon footprint, it also just uses less energy, which is incredibly important in the world that we live in today.

Over 50 customers and actually all of our top 100 customers are using Graviton processors and one of those customers is SAP. About two years ago at re:Invent, we announced that SAP were moving SAP HANA their largest database to Graviton and we started the journey with them. As could we move an enterprise application as critical as SAP to a Graviton processor?

Well, I'm happy to tell you that just a few weeks ago SAP actually announced that they had completed their journey and SAP HANA was now running on Graviton processes. They also saw 30% better compute performance and 50% higher memory bandwidth for their analytical workloads. And they were also able to reduce their carbon impact by an estimated 45%. It was just amazing what we were able to do with SAP.

Now, the innovation that AWS does is not only limited to the custom silicon that we built, we have a large number of partners that we work incredibly closely with and we innovate with those partners to bring you the best of their products together with the best of AWS and three of these partners in the processor space today are Intel AMD and Apple.

I am very excited to talk about some of the latest generation EC2 instances that we have built in collaboration with each of these partners. Let's start with Intel. Intel. We've had a long standing relationship that dates all the way back to the very first instance that we launched on Amazon EC2. That was actually a photo of my laptop. When I joined the team to meet the explosive growth in demand for the EC2 computes.

We've actually launched more than 402 instance types using Intel. We have continuously raised the bar on performance as well. With our seventh generation Intel based instances, we saw an opportunity to take this level of innovation one step further.

And Sapphire Rapids is an amazing CPU Intel's latest CPU on the market. But we saw an opportunity to collaborate with Intel to actually build a custom version of their Sapphire Rapids processor that is optimized for the cloud and only available on AWS. This together with a few optimizations that we did within the Nitro system has seen that the seventh generation instances for Intel are up to 15% more performant than comparable Intel processors on other cloud providers or even in your own data center.

And this means that AWS is the only place in the world to get the highest performance with Intel Sapphire Rapids. And we're incredibly excited to see what we've done together with Intel in these processes.

A good example of this is Tufts Medicine. They're a leading integrated health care system that brings together the best of academic and community health care to deliver exceptional connected and accessible care experienced to customers across Massachusetts.

They have consulted 30 electronic health records into AWS and are working to transition all on premise data centers to the cloud. They moved from m5n to the new seventh generation Intel Sapphire Rapids and they actually were able to increase users' session density by an incredible 68%.

They were also able to save EUR 1.2 million annually by just completing this migration to the newest instance type. We also know that some customers have workloads that maybe aren't looking to be the most performant but are looking for cost optimized solutions. And so we also innovated together with Intel to bring out our very newest flexible instance types.

These provide up to 19 more price performance compared to the previous generation of Intel instances. We have already seen some ec's top customers start using m7i flex and they have achieved significant price performance benefits. So another option as you look to find the very best instance for your workload, but we know that there are some workloads out there that want as much memory as we can possibly give them.

And so today I'm happy to announce that we launch in our newest instance, the Amazon x2gd u7i instance, which provides up to 32 terabytes of memory, these instances, 32 terabytes of memory, 30% higher than what we had on our previous version of this instance. And these improve the data load time. And we also increased the EBS bandwidth to 100 G a five x increase on what we had in the previous instance.

We also support the EBS i2 block express volume types, which we spoke about earlier. And we've improved the compute performance by up to 130 over the previous generation workloads like SAP are common to run on these instances which use an enormous amount of memory. And we're happy to say that SAP has used and certified these instances and they're ready for your S4 HANA workloads.

Now, the next partner is AMD and we've been on a similar innovation journey with AMD for our seventh generation instances. We were the first cloud provider to bring the new AMD EPYC processors to the cloud in 2018. And we have continued to innovate together with AMD over the last three generations of the EPYC processor.

Our sixth generation instances are based on the AMD Milan processor and they offer 10% lower cost versus comparable x86 instances. These instances continue to be popular with customers that have been looking to cost optimize on x86. You can get a lower cost instance with great performance and it's very easy to move from the existing instance to an AMD instance since they all use x86.

Now AMD, as they have continued to improve, they improved the core count. And so AMD has a very high core count today, 96 cores on some of these CPUs. And so we looked at the core count and we looked at the fact that Nitro does not use any of those calls for its own processing. And we thought was there a way that we could provide more performance for you as a customer together with AMD?

So our seventh generation AMD instances using the Genoa processor provide you with 96 physical cores, no virtual call with AMD. And that has increased the compute performance by up to 50 since the previous generation. Also, these instances will now provide the best price performance on x86 for most high performance workloads. We're incredibly excited about these instances as well and just how we've been able to think differently about bringing AMD's latest generation processes to the cloud.

One of our long time customers that we've worked with over many, many years is Netflix. They have been benchmarking these AMD Genoa instances. They experienced a 40 throughput improvement, 50 latency reduction and 16 reduction in concurrent GC pauses using these instances over what they were using previously. Those are just incredible numbers to go from one generation to the next. And see that level of performance. And we're excited to see what customers do with AMD.

We brought Apple to the cloud in 2020. We were able to launch the Apple Mac and make it available to you in the cloud to use for your build and test workloads. And the M1 Mac instances continue to be incredibly popular. We've refreshed those instances. We now have M1 silicon and we've also been able to bring our, the very latest Apple M2 Mac minis as well as the M2 Pro mini computers deeply coupled with Nitro. And they actually provide up to 35 faster performance versus M1 Mac instances. And with these instances, customers are able to achieve faster builds and convenient distributed testing without having to install, manage patch and upgrade the physical infrastructure. This has really been a game changer prior to these instances, we had many customers telling us that they had moved all of their workloads to the cloud, but they still had some Mac money sitting under their desk for their build and test workloads. And so customers like Pinterest Goldman Sachs Riot Games, they've actually reported up to a four x improvement in b performance and up to a three x improvement in parallel builds and up to 80% reduction in build times. Just amazing what you're able to do when you can get a Mac mini in the cloud.

Now, let's shift gears and focus on another significant area of compute innovations to do that. please join me in welcoming our vp of servius compute for aws swami sivasubramanian.

Thank you, babe. This is my 63 invent and every year it's great to see what you're doing with our technology. When I think about the concept of serverless compute or removing the management of servers, I can't help but think about an influential figure, Grace Hopper. Her work in creating the first compiler was not merely a technical achievement but revolutionary. And she really shaped the future. Her foresight made the programming languages that we know today possible. All the innovation that we've touched on today is towards one goal to enable you to innovate in your business. We innovate across all layers of the infrastructure to help you do better with your business outcomes for your customers. Innovation comes with agility and being able to move quickly and that comes with doing less. This is why we invest in technologies like serverless and container orchestration so you can bring your ideas to market faster.

We pioneered servius nine years ago with the launch of AWS Lambda in 2014. And we continue to invest in many areas including developer experience, really to help you build faster and in support of your enterprise monitoring and compliance. We do this all while focusing on well-architected fundamentals, performance, security resiliency and cost.

An example of the innovation. Just four years after launching Lambda, we introduced, introduced a major advancement with Firecracker. This is a virtualization technology that makes use of KVMs to run functions at superfast start-up times, you launch these lightweight micro VMs in a fraction of a second and you get that security and workload isolation without trading off on performance. And we've taken that even further with Lambda Snap Start Snaps start uses that Firecracker snapshot and it takes that that image and then it initializes the environment from a cache. When you run your Lambda function with staff start your function typically will start up to 10 times faster and you can gain further performance and cost benefits running your servius workloads on Graviton like what Dave was talking about earlier, up to 40% better price performance for your servius workloads. This is resulting in agility and efficiency for your business.

Another way that we help you drive agility is by enabling you to build faster. Most service applications are built as microservices that are loosely coupled through events and messages. You build your application by composing your services and your SaaS applications. These really model real-world interactions and they help you build highly responsive and fault tolerant applications to help you build event driven architectures. We launched Amazon EventBridge. It's a serverless service that uses events to connect applications together quickly. It makes it easier for developers to build scalable event driven applications. What you'll hear hear referred to as ESBs today, more than 1.5 million customers use EventBridge to deliver more than 2.6 trillion events every month. It's directly integrated with more than 200 different event sources just when you start. And 45 industry leading SaaS platforms, you can easily connect and stream data from these sources without having to write any code.

EventBridge started off as a bus and a schema registry. But we've expanded on that over time and we now have schedulers and pipes capabilities. Pipes helps customers create those point to point integrations between event producers and consumers with pipes. You go faster, you write less code and you build fully-managed integrations. It saves costs because you're only processing and paying for the events that you use. And it also reduces your operational load by not worrying about scaling, patching and provisioning agility is achieved with speed of development. But also the speed of running your workloads usually to launch a container, you need to download the entire container image from a registry and that can be slow. Those images are big. This is where see cable open container initiative comes in. This is a technology that was developed by one of our teams in AWS. It helps you be able to scale out faster by enabling containers to start without waiting for the full download of the image. We recently launched this in ECS and Fargate when it comes to container orchestration, some customers choose to run Kubernetes and that's why we've invested in EKS which is our managed Kubernetes service. We made some key decisions about the principles and we wanted to focus on in joining Kubernetes and the best of AWS. What we wanted to do is make sure that we continue to focus on security as job zero, just like what Dave was talking about earlier. And we also wanted to make it easier for you to have those native integrations with AWS services.

We continue to recognize the community work that Kubernetes has done done EKS recently celebrated its fifth birthday and we feel like it encompasses the best of AWS with its performance scale reliability and availability and it also allows customers to take advantage of its innovations at AWS of compute storage, networking and security. This is why more customers are running their Kubernetes workload on AWS than anywhere else. The 2021 CNCF report states that AWS was the most widely used cloud service provider for backend developers who rely on Kubernetes with a share of 64%.

Often compute is the main driver of workload of costs for workloads running on EKS and this means that you're seeking the right balance of performance and costs. This led to the newest innovation by the EKS team called Carpenter. It's an open-source performance performance provisioning management system for EKS. It automates our best practices in managing cost and performance what it does. It uses the Ker neti scheduling semantics and it optimizes by observing the applications in your cluster and consolidating pods based on needs. That means it will scale and select launch update and terminate two instances to minimize costs while maintaining performance. And today, i'm pleased to announce the donation of Carpenter to the CNCF autoscaling CIG. This is a significant milestone for Carpenter and it's the next step for driving innovation within the community. We will continue to invest in Carpenter to leverage the performance efficiencies of AWS compute such as Graviton, speaking of Graviton, it's time to turn things back to Dave.

Thank you. Well, thank you, Swami. It's just amazing to see the level of innovation that we're doing at the higher level of the stacks, insists and containers really, really great to see. Well, let's now go from the higher the topmost level of our compute stack and let's get all the way back down to the silicon. Swami gave you a bit of a hint at what we're gonna be talking about now. And well, let's get back to Graviton.

Earlier today, Adam announced our Graviton four brand new CPU. This is the most powerful and energy efficient chip that we have ever built in AWS. I wanted to give you some insights into how we think about silicon design and benchmarking. Firstly, benchmarks are not real world workloads and many chip designers work backwards from a micro benchmark. They look at these tests and this motivates how they design and improve their chips because they are well understood. But micro benchmarks are not real world workloads as it turns out they are quite different. And this mismatch means that hardware is being designed against a set of goalposts that won't translate to optimal performance in production of the real world. It's a little bit like teaching or studying for a test in school and only to find out that there's so much more to understand, to really master the topic.

And so when we think about chip design, it helps us to visualize this. We're using radar graphs. A single plot can give us a holistic view of a given workload's characteristics within the CPU each axis of this graph corresponds to a different chip design trait with the value corresponding to that workload sensitivity towards that trait in determining its overall performance. We can also divide this chart into two halves on the left. We have the front half and that is the front half of the processor which receives instructions and its performance is influenced by factors like the number of branches, branch targets and instructions which ultimately result in front end stalls. The second half of the processor is the back half, executes these instructions and is sensitive to properties like data in the l1 l2 and l3 caches, the instruction window size. And these ultimately result in back end stas and plotting front end statistics on the left and back end statistics on the right. It allows us to see at a glance whether a workload skews towards the front end or the back end sensitivity, the higher the value, the more relevant that property is to performance and so smaller is better in this graph.

So what these graphs teach, what do these graphs teach us? Let's take a look at a few micro benchmarks. This is an example of a traditional micro benchmark and we see it stressing the l3 cache with a large number of back end stalls. And what this tells us is that generally the back end of the CPU pipeline isn't able to find work to do. While these sensitivities on the right side of the graph, we don't see much on the left side of the graph when we step back and think about it, that kind of makes a lot of sense. Most of these micro benchmarks are small kernels that someone has extracted. They do the same thing in a repetitive loop and they mean to be stable and easy to run.

Well, if we take a look at some real world examples, we see quite a difference here. We have Cassandra Groovy and EnGen and a mix of common workloads that we see customers running today on AWS. And when we compare these real world workloads to the traditional benchmarks, we can see that they are bottlenecked on an entirely separate set of factors, particularly these workloads are impacted by things. The other micro benchmarks do not care about the branch predictor is missing more. There are lots of instruction misses out of the l1 and l2 cache and TLB misses. And unlike the benchmarks, the front end is causing the stalling. All this sums up the front installs being high here and not the back end stores like we saw in the micro benchmark. So quite a different signature when we design our processes, we want to analyze real world workloads.

So here is an example of MySQL running on Graviton 3 and Graviton 4. And you can see that we see lower sensitivities on Graviton 4 across all dimensions. Remember that smaller is better. This results improved performance on our MySQL workload year allowing Graviton 4 to do more work for the same number of CPUs. This is why we can say that customers see a 40 performance improvement over Graviton 3 when running MySQL on Graviton 4, just an incredible improvement generation over generation.

When you see a state benchmark stats like Graviton 4 is 30% faster than Graviton 3, it is an average across a number of real world benchmarks. It's not a micro benchmark that we've run to get the optimal performance. And that is why when customers adopt Graviton, they generally see the performance benefits that we've stated that they will see Graviton 4 has 50 more cause to x the l2 cache. When compared to Graviton 3, it will enable instances to scale up to three times more CPUs and three times more memory than what were previously available. So you can get larger instances that are able to support your highest performance databases in memory caches and big data analytics workloads

We've also focused on improving security. We have optimized performance for your real world workloads. And Graviton 4 raises the bar on security by validating, monitoring and protecting every stage of the host boot process. We fully encrypted all high speed physical hardware interfaces including the DRAM, Nitro cards and coherent links between the CPUs.

Today, we also announced the availability of our Amazon EC2 R8g instance, the first EC2 instance to be available with the new Graviton 4 processor, which is available in preview today. R8g instances deliver the best price performance and energy efficiency of any EC2 instance that we have ever built. They provide up to 30% more compute performance and use the fastest DDR5 memory that we've ever deployed within AWS. They offer unparalleled performance and efficiency for your most demanding memory sensitive workloads including things like big data analytics and high performance databases. And I encourage you to sign up for the preview to test out Graviton 4 for your workloads as soon as possible.

Now, it would be impossible to have a conversation about computer innovation today and not talk about machine learning and the generative AI. And also it would be impossible to reflect on the ML innovation without paying tribute to Alan Turing. He was a visionary mathematician and the father of theoretical computer science. And his pioneering work laid the foundation for the algorithms that form the bedrock of machine learning.

Today, his imagination stretched beyond the mere mechanical towards the realm of what machines could one day learn and achieve. Turing's question "Can machines think?" has set us on a quest that has driven decades of innovation leading to the almost unbelievable capabilities we are seeing in generative AI today.

Within Amazon and AWS, we've been working with machine learning in production for more than 20 years from some of the earliest days of our retail website. AWS offers the most performant ML infrastructure available in the cloud for training and inference of ML workloads. In fact, every year, over 100 customers build, train and deploy ML workloads and applications using EC2 infrastructure and the growth in this area has just been amazing.

If you go back to 2012, the common models at the time may be ran on 2 GPUs and use 60 million parameters with very little network bandwidth between them. Today, customers are scaling ML training workloads to more than 10 GPUs and up to 500 billion parameters. That's just incredible growth in that period of time and it has meant that we have continued to innovate in both the instances that are available as well as the network that needs to be available to support these workloads.

Now, we've had a long history of innovation with Nvidia. In fact, we were the first cloud provider to bring an Nvidia GPU to the cloud. And earlier this summer, we launched the P5 instance powered by Nvidia's latest H100 Tensor Core GPUs. And these instances have been designed for training and deploying complex large language models and diffusion models, powering the most demanding generative AI workloads.

The P5 instances accelerate your time to solution by up to 4x compared to what was available in the previous generation. And they reduce the cost to train models by up to 40%.

I want to also take a look what one of the things we have recently announced today as well with Adam in his keynote was a new collaboration together with Nvidia. As I said, we have been incredible partners together with Nvidia and today I am incredibly excited to announce some new developments.

Not only did we announce new instance types that will be coming soon for Nvidia including the availability of the L4 GPU in our G6 instance, the L40s GPU in our G6e instance, that is specifically designed for workloads like Omniverse as well as the new H200 GPU in what will be our P5e instance.

We also announced that Nvidia and AWS, we will be working together to build the world's fastest AI supercomputer. This will be a supercomputer that will provide up to 16,000 GH200 super chips, bringing together the combination of Nvidia's ARM chip, the Grace processor together with its Hopper GPU and provide up to 65 heaFLOPS of AI capabilities - just phenomenal.

We also announced that Nvidia's DGX Cloud will now be hosted on AWS using this incredible AI supercomputer. And we also announced that EFA our Elastic Fabric Adapter, which we have developed to provide high bandwidth low latency networking between these GPU instances will support this AI supercomputer. And we will be working together with Nvidia to deeply integrate both EFA and Nitro into the system. You can expect all of this to be available on EC2 in 2024. Just an incredible partnership of two companies that are really innovating in incredible ways on behalf of our collective customers.

As we did in the CPU space, we have also been innovating in the machine learning space with custom silicon. We began developing chips for machine learning more than 5 years ago when we started to see the growth in machine learning.

We put out our very first chip which was the Inferential 1 chip focused on inference, which was the largest workload at the time. A major milestone for us was the launch of Trainium based instances last year at re:Invent that reduced the cost of train models by up to 50%.

The number of parameters in a models has continued to increase we followed by launching a new variant of Train instances with up to 1600 gigabits per second of networking. And we now have clusters of up to 64,000 Trainium accelerators all interconnected with our Elastic Fabric Adapter that provides non-blocking petabyte scale networking.

We did not stop there. We as recently as today announced availability or preview of our AWS Trainium 2 processor. It's the second generation chip purpose built for generative AI and ML training. These instances will be up to 4x faster than the previous generation offering you the ability to get an eye watering 20 exaFLOPS of supercomputing performance.

Trainium based instances are optimized for training large scale generative AI models with hundreds of billions or even trillions of parameters and Trainium will deliver a seamless experience and work with standard frameworks such as PyTorch, TensorFlow and JAX.

However, these foundation models are deployed at scale when these foundation models, more costs will be associated with running models and doing inference. So that's why we launched our Inferential based instances. These are available and are powered by AWS Inferential 2 chip and they are optimized specifically for large scale generative AI applications which models containing hundreds of billions of parameters.

And I2 offers up to 4x higher throughput and up to 10x lower latencies than we had in Inferential 1 and that equals a 40% better price performance than other EC2 instances such as Inferential 1.

We recently earlier this year announced a strategic collaboration with Anthropic in advance of generative AI. And you saw Anthropic CEO on stage with Adam earlier today, Anthropic will use Trainium and Inferential chips to build train and deploy its future generation models benefiting from the price performance scale and security that AWS can provide and AWS will become Anthropic’s primary cloud provider. It's been really great to work with the Anthropic team. They've done a lot of model training on AWS so far and we're excited about what this collaboration means for you, our customers in, not only bringing you models but also improving our custom silicon for machine learning.

And now I'd like to invite a customer to share how they've been innovating using generative AI. This is one of my favorite products and when I'm not running the computer networking business at AWS, I'm often spending time in things like Photoshop and Premiere and Illustrator. And so it's fantastic to welcome the CIO of Adobe Cynthia Stoddard.

Cynthia: Thank you, Dave. It's great to be here and good afternoon to everyone. I'm Cynthia Stoddard and I have a wonderful job. I'm the CIO at Adobe and I'm excited to be here today and share about Adobe's journey in our collaboration with AWS over the years.

So, a little bit of history here, Adobe's mission is to change the world through personalized digital experiences. We are empowering everyone everywhere to imagine, create and be and bring any digital experience to life. Today. our mission is more relevant than ever as we experience tectonic shifts in technology and the explosion of content across the board.

We're fortunate to have three successful businesses, Creative Cloud, Document Cloud and our Experience Cloud. They are in the sweet spot of where the world needs technology to play a role. Our product strategy is to unleash creativity for all accelerate document productivity, empower digital businesses.

Adobe's product portfolio is the foundation of digital experiences starting with the first creative spark, the co creation and development of all content and media to the personalized delivery across every, every type of surface that you have.

Last year, Adobe celebrated its 40th anniversary. Our simple yet enduring founding principles still guide us today. Innovation is at the core we believe that innovation can come from anywhere in the company, Adobe success story over the past four decades, from our, you know, have our ability to constantly transform our businesses, take bold bets and look around the corner.

For example, we invented a desktop publishing with PostScript. We revolutionized uh the whole artistic and imaging with Photoshop, pioneered electronic documents with PDF and Acrobat and created digital marketing with our acquisition of Omniture. The list goes on and on of what we've done over the past 40 years.

A seminal moment in this journey was more than a decade ago when we completely transform our business model back then, we had long release cycles 18 to 24 months. so people had to wait for, people had to wait for any features from our creative products. But then we moved, we moved from box software to the cloud and to a subscription model and everything changed, everything changed at that point.

This is possible because of great partners like AWS who were critical in our transformation. The move to the cloud led to explore growth of our business, a faster pace of innovation for our engineering teams and much better experiences for our customers. This gave us the power to get features into the hands of our customers faster than ever. Instead of needing to wait for that next creative cycle now, that was over 10 years ago. And it laid the foundation for the Adobe that you know, today a company that is ever evolving and at the forefront of technological technological innovation.

Then in 2016, Adobe built an intelligent platform called Adobe Sensei leveraging Amazon EC2 instances using the cloud as a platform. Sensei was Adobe's next crucial transformative innovation opportunity. We started by infusing smaller machine models into our workflows. Adobe Sensei became a foundation of Adobe's machine learning features across our entire product line.

For example, a few years ago, we introduced Neural Filters and Photoshop a Sensei powered feature that revolutionized photo editing workflows. In this photo, for example, with Neuro filters, we were able to take this old image through an intelligent click, restore it with color with just a few clicks. This is the power of Adobe Sensei.

As the use of the platform grew, the demand for larger models from our data sets grew. Also at each step, AWS met the demand first with P3 then P4 instances improving network and storage performance in step with a compute. There is the photo isn't that amazing?

The scale and pace of innovation began to accelerate. And earlier this year, we began a new phase in our decade leadership. in AI. Last spring, we introduced Adobe Firefly, a family of creative generative AI models. We see that generative AI is transformational technology that presents Adobe with an incredible opportunity to innovate for years to come reimagining how digital content and experiences are created and delivered.

Built on advanced machine learning models. Firefly is trained on licensed content such as Adobe Stock and public domain content where the copyright has expired and is designed for safe commercial use. Firefly will allow anyone to use their own inputs to creatively generate everything from beautiful images, video and 3D to marketing copy conversations and personalized experiences.

In just months. what began as a text image generator has become embedded throughout our product portfolio. As a creative copilot, over 4 billion images have been generated by Firefly to date. Firefly is accelerating both in how and where it's used in the quality of the images we create, you can see here just how much and how quickly our models are improving.

We also recently introduced additional Fire five models into our products. The world's first generative AI model focused on product vector graphics, bringing generative AI expertise directly into Adobe Illustrator, workflows and an industry first model for template design in Adobe Express that generates fully editable templates in seconds with just a text description.

Firefly also extends into the enterprise. It is embedded in our enterprise offerings in Adobe Gen Studio, an end to end solution that brings our products with Firefly at the core to help brands meet their needs for content, optimize the content supply chain and drive business growth through scaling personalized efforts.

Adobe Gen Studio leverages Firefly to enable businesses to generate content that is designed for this for safe commercial use. Customers can customize and fine tune Firefly using brands, they're using their brands on style characters and objects to generate on brand content. Gen Studio also accesses Firefly APIs across various platforms to supercharged workflows and automation.

We're just getting started. Adobe is going to have the best, most complete and most natively integrated creative models in the world and we have a what planned for the months ahead.

Adobe has always innovated, has always led the way for creators, but as you can see, the creative pace is accelerating. Looking ahead, the transformation continues as you know AI is a disruptive force rippling through the industry and Adobe is making significant investments to help bring AI to our customers. And our partnership with AWS is helping us innovate at a rapid pace.

The AWS account team for Adobe and Adobe engineering have been great partners providing a high level of service and support as we grow, Adobe's business with AWS, we are excited to continue our work together to create personalized digital experiences.

Thank you.

Dave: Oh, thank you so much Cynthia. It's just been an incredible journey to not only watch Adobe innovate but to be able to innovate with them.

Now, let's get back to GPUs. Many of you have recently spoken to us about some of the challenges we've had in accessing the GPUs that you're looking for and need. And so when we looked at our EC2 office GPUs, we saw an opportunity to innovate to make it a whole lot easier to get predictable capacity on a dedicated network with low latency networking and high bandwidth for your machine learning workloads.

A couple of weeks ago, we announced the availability of Amazon EC2 Capacity Blocks for Machine Learning. This is a first of a kind cloud usage model that further democratizes machine learning by making it easier for you to access GPU compute capacity. This eliminates the need to hold on to GPU instances that are not currently used. It also reduces your costs and makes development cycles more predictable.

Customers can now reserve our latest Nvidia P5 instances by specifying the cluster size, the future start date and duration very similar to maybe reserving a hotel room. EC2 Capacity Blocks does the rest by ensuring that you get uninterrupted access to GPU compute capacity required for your machine learning projects when you need it. We are excited to see how Capacity Blocks works for you.

Now, we have also been looking at how we can use ML to speed up development and prototyping without compromising on production best practices such as infrastructure code. I'm happy to announce AWS Console to Code which is deeply integrated into our EC2 Console that allow you to move from prototyping to production code a whole lot faster.

Console to Code uses state of the art large language models and deep learning algorithms to generate infrastructure as code in formats such as CloudFormation, CDK, SDK, Terraform and others. These executable and repeatable code snippets follow AWS’s guided best practices and help improve deployment success rate to create reliable production workloads.

Let's take a look at how it works. You are now able to record actions within the EC2 Console and then simply select Console to Code in the table of contents. Next, we are able to ask you to generate that code in a format of your choice. Once you click that the console will simply generate the code to repeat the action that you recorded in the console, which is a code that you are able to use in the automation of your infrastructure. This significantly simplifies going from an idea to actually having something ready to put into CloudFormation or another format.

As I wrap up my time with you, I'd like to share some more exciting news. Today. we are announcing a new Compute knowledge, digital badge and aligned learning path via the AWS Skill Builder. I will be taking this test later today. You can earn a digital badge that you can share with your on your resume and your LinkedIn profile with your employer or on social media and complete a short test to get your badge.

We stand on the shoulders of giants providing the most secure, reliable, scalable cloud that enables your innovation. None of this would have been possible with the innovators that went before all of us that set the way for where we are today. We are very excited to be able to play a small part in the innovators of today companies such as Moderna Boom and John Deere that are changing industries and the way that we live through the innovation that we've done together with them.

And so as we build these tools and services, we are excited to innovate on your behalf. And the question now remains, what will you build?

Thank you.

  • 10
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

李白的朋友高适

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值