AWS Graviton deep dive: The best price performance for your AWS workloads

最新推荐文章于 2024-07-11 12:08:29 发布

taibaili2023

最新推荐文章于 2024-07-11 12:08:29 发布

阅读量412

点赞数 11

文章标签： aws

本文链接：https://blog.csdn.net/weixin_46812959/article/details/134589740

版权

hello and welcome to uh c mp3 27, our breakout session where you can learn more about how graviton enables the best price performance for your workloads.

Introducing myself. I am Sudhir Raman. I lead Core Compute Product Management at Amazon EC2 and uh co presenting me today is Oran Barak, Head of Core Computer Engineering at uh Stripe.

So really excited to be back in person presenting at Greenman and we have an exciting agenda lined up for you today. We're going to be talking through diving deeper into our Graviton processors. You're gonna talk about key workloads and performance and we'll also discuss best practices in terms of how you can transition your workloads to Graviton and lower your costs. And Oran here is also going to walk us through Stripes, Graviton adoption, journey and key takeaways and learnings.

So with that, um let's get started.

AWS has been investing in building custom chips over the last several years and this has included um custom chips that have been powering our Nitro cards as part of our Nitro system that's allowed us to offload storage and networking of the main processor maximize resource efficiency and enhance security. These custom chips and solutions have also included storage solutions like Nitro SSDs, machine learning chips that help to accelerate the performance of machine learning such as inferential and Trainium and also modern and efficient chips for Graviton that powering our server.

And there are many reasons why we have invested in building custom chips specialization and efficiency. Here, we are able to optimize our hardware for our use cases at AWS focusing on the right feature set, optimizing for cost and power speed. In terms of going all the way from product definition to building the parts to landing servers in our data centers, the entire hardware and software are all operated by owned by a single team under one roof innovation where we are able to create more value for our customers by innovating across all layers of the stack. Be it a Nitro hypervisor being be it our virtualization stack going all the way down to the servers and the hardware that power our servers. And we are able to do that holistically as opposed to optimizing each of these components in a silo finally, security. So Nitro provides us with a mechanism to enhance the security of our servers to a hardware-based route of trust.

So looking at a Graviton journey, um the AWS Graviton journey started back in 2018 with the first iteration of the Graviton chip powered the two a one instances that introduced ARM as an architecture for the first time in the cloud. And that really proved out that applications could run in the cloud at scale on ARM based service. And since then, with each iteration of the Graviton ship, we've continued to push performance and efficiency at a large scale. For example, Graviton two when it was first announced in 2019, delivered two extra performance per core as the first iteration of Graviton. And it also delivered four x. the number of cores, which is the first Graviton chip and continuing on that journey. Graviton three has delivered another step function improvement in performance or the Graviton two base chips.

Now, let's take a closer look at both Graviton two and three starting with Graviton two first.

So Graviton two based two instance types were launched and they delivered up to 40% better price performance compared to other instance types within two. And these are targeted for a broad spectrum of workloads. Anything that ranges from web surfing, load balancing, gaming all the way through databases in memory caches as well as big data analytics. Today, we have 12 different instance types that are powered by Graviton two that allows our customers wide selection and choice for a variety of workloads. Be it compute optimized, be it a general purpose workload or whether you need more storage, memory networking or even gp based options.

Now, what we've seen in the marketplace is a lot of momentum with our customers in terms of adopting Graviton to lower the overall costs. An example here is uh WISL where they observed a 50% reduction in run time using Graviton two based storage instances when compared to their existing i three and i three em fleets. And that's led to a reduction in their overall storage costs continuing with the customer momentum today. What we see is 48 of the top 52 customers by usage are drawing on Graviton two based instances to lower their overall costs. And these include customers of all sizes. From the um from the latest start-ups all the way through very well established enterprises across geos and across multiple industry verticals.

Another key theme that's resonating with the customer base is around the energy efficiency of Graviton processors that contributes towards customers sustainability goals. An example here is with NE and NTT Docomo when they saw a 72% reduction in power consumption running their five G core software on Graviton processors.

It's not just external customers. We are using Graviton based servers even internally within Amazon for many of our mission critical workloads, for example, at both Prime Day 2021 and 2022 events. A number of core retail services relied on Graviton servers to realize both infrastructure scaling as well as cost efficiencies.

Now, while Graviton two has already delivered many benefits, we see customers continuing to bring more workloads to the cloud as they transform their organizations and fuel new opportunities and this has required us to deliver more performance with e generation. And that's where Graviton three comes into the picture.

So with Graviton three, we have c seven g instance, which is the first instance with an ec two to be powered by Graviton three processors. And c seven g delivers the best price performance for computer optimized workloads. With an two, a little bit more details into Graviton three and c seven gc seven g delivers up to 25% better compute performance two x higher floating point as well as three x better machine learning performance over Graviton two. It's also the first in the cloud. Our Graviton suite chips are the first in the cloud that have enabled ddr five memory and that really helps with workloads that have access to 50% more memory bandwidth versus the ddr four memory that we use in Graviton two.

Along with that, we also get the benefits of energy efficiency. You find that Graviton instances deliver 60% more energy efficient as opposed to other comparable instances within our two data centers.

Now, let's double click and go a little bit deeper into the Graviton three processors and talk about the chip architecture as well as discuss some workload performance results.

Graviton three as the name suggests is the is the third generation of Graviton and it's the most powerful Graviton ship that we've built just so far. It has over 50 billion transistors that's up from the 30 billion transistors that we use in Graviton two. As you can see in the picture, it also has a triplet based architecture with seven silicon dye and I'll talk a little bit more about what each of those represent in just a second. And like I said, it also represents the first time we've introduced ddr five in our data centers that'll bring more memory bandwidth.

Here is a closer look at how Graviton two architecture compares Graviton two. A lot of the performance improvements that are delivered to Graviton three are through what we call IPC or instructions per cycle. And as you can see here, a number of transistors that have grown from Graviton two and Graviton three have really gone into making a much bigger core and a more powerful core.

Gabbidon three has a two x wider front end. It has a bigger branch predictor. It also has two x wider issue and a larger instruction window. Other enhancements include twice the SIMD performance compared to Graviton two including support for Scalable Vector Extensions or SVE as well as support for the b float 16 instruction g three also has two extra memory operations with enhanced prefetching and twice the number of ALUs and multipliers with the wider multipliers that help with things like ratl accession negotiation.

Gravid down three also supports pointer authentication which is a technology that can help with prevent attacks such as written oriented programming, some of the other unique aspects of Graviton is that in the two instances that are powered by Graviton, every vCPU happens to be a full core, there is no simultaneous multi trading or hyper trading in our Graviton processors. And let's look at some of the implications or advantages of that.

So if you look at a typical x 86 instance where every vCPU is a thread, sometimes you can see contention in the execution stream um between the cor or in the caches among the threads. This is sometimes good if you're trying to fill holes in your execution. But other times it may end up delaying execution with Graviton and c seven g instance. What you find is that every vCPU is a full physical core and that really helps with no contention in resources across courses or caches.

Another key part of the enhancements that we've driven in Graviton are based on customer feedback. Where with Graviton two, we heard a lot of customers say that they were really happy. They didn't have to worry about newer domains which is non uniform memory architecture. So having a part where they see a consistent path from co to memory and not having to worry about the concept of near memory versus farm memory. Should the applications be aware? Should we leave it to the OS to the side that we have imperfect information? Customers didn't have to worry about that and resonate that feedback really resonated with us and has led us to keeping all the cores together on Graviton three as well. And all of these cores are connected through a mesh architecture and the mesh supports up to two terabytes per second of bisection bandwidth with the course the mesh.

Next we connected populated the caches across these cores. And if you look at the Graviton suite chip, you can find that there is more than 100 megabytes of user accessible caches on this chip. And you can map the picture i'm showing in the middle to the top, right, that shows you what each that represents.

So what you see right in the middle is all the cores and the connectivity and then come the memory modules. So ddr five memory controllers running at a speed of 4800 that's 50% more bandwidth compared to dd four on Graviton two and similar to Graviton two, the memory is has always on memory encryption and then down to the bottom you see is pc i gen five support for io connectivity.

Another enhancement that has gone into Graviton is both Graviton two and three is direct interrupt injection. So traditionally, what you will find is interrupts go from our io cards, our Nitro cards through the hypervisor to the customer vm. Now with Grabiton two and three interrupts are directly injected into the guests. So what this results in is lower latency. And since the Nitro hypervisor is not involved it also results in high tr

Now, let's talk about some workload. Performance results. Now, right at the outside, I'll say that there is no substitute or you're running your own workloads and doing your benchmarking because performance can obviously vary by workload. But this is meant to give you a flavor for what to expect on some typical workloads.

So first up, we have Spark SQL here and what we've benchmarked is a Graviton two base c six g instance. That's the yellow bar. Where is this? A Grabiton three base c seven g instance, which is the green bar. And we've benchmarked on eight node cluster one terabyte data set using Spark 3.3 and also using Corrado 17 for the jdk. And we observe uh and we're using the four excel instance size for an apple stales comparison. And we observed that Graviton three c seven g delivers 28% higher performance was ac six g.

Another area that is of growing importance is video processing given the amount of video that's going through the internet and um video encoding and video processing tends to be a key cloud based use case. So AWS engineers as well as the open source community have been collaborating to improve the performance of video processing on Graviton processors and have been driving a number of optimizations into the community.

Um so what we see here is a comparison of ff mpeg 4.2 versus a latest branch of ffm peg that has many of these optimizations that give you a better performance in crap net net. What you see here is not just not just that c seven g performs better than c six g which which is expected. But the fact that the new optimizations that we are driving through the ecosystem have improved the performance of video encoding on Graviton processors significantly by more than 60%.

Now, the area here is machine learning performance. Graviton three delivers a leap in machine learning performance as well with two x the vector width, the 1.5 x more memory bandwidth and support for bloat 16. And here, AWS teams and many external teams have collaborated well together to drive a large amount of improvements in Tensorflow in Pytorch in one dnn as well as the armed compute library. And all of that has resulted in massive gains for cpu based machine learning inference across many pytorch and tensor flow models, which is depicted on the chart that you see on the right, some feedback from customers who have adopted Graviton three. And there's many more that we have published on our website. But the, but the general theme that you'll see across the board here is seeing an improvement in performance anywhere between 25 to 45%. Uh depending on the workload type.

A lot of customers have also noted the lower latency benefits and a more consistent performance overall in terms of even when the system is fully loaded.

So now that brings us to some of our most newest innovations on Graviton and new to instance types that we announced just at this re invent a couple of days ago.

First up is the c seven gne two instance. This is our newest network optimized instance that delivers the best price performance for network intensive workloads on two, a couple of key things to note here on c seven gn is it delivers the highest network bandwidth as well as the best packet processing performance across network instances. So up to 200 gigabits per second of networking and they're also powered by our latest Nitro cars. So this uses the Nitro v five, the latest generation of Nitro cards that helps to deliver the improved networking performance. And not only that the new word Nitro cards are also extremely energy efficient. Um they deliver 40% higher performance per watt compared to our previous generation Nitro cars.

So c seven gn instances are now available in preview. Um so there's a sign up page um if you can sign up if you're interested and the AWS team will be able to give you access.

The second instance of Graviton base that was announced is the hpc seven g instance. So this is optimized for tightly coupled hpc workloads that are computer intensive and they are powered by new Graviton three e processors and, and, and Graviton three e delivers up to 35% higher vector performance compared to Graviton three. So that's really optimized for hpc use cases um such as weather forecasting molecular dynamics and computational fluid dynamics. And customers can also deploy these using Panel Cluster which is an open source cluster management tool, this these instances are coming soon. So we'll be able to share more details going into 2023.

So while we have conversations with customers, a typical theme that comes out is how mature is the ARM software ecosystem and how easy is it for me to run my applications in cap. So I want to talk a little bit about both of those questions.

So starting with software and services, what we have seen is there's been a lot of momentum over the last few years in in terms of the overall armed software ecosystem support.

Starting with the operating systems, all the major commercial operating systems are fully supported on Graviton, which includes Amazon Linux. It includes um Ubuntu Red Hat Enterprise Linux as well as Soza Enterprise Linux Server. And there's also broad support from the community in terms of the various linux offerings and all of these are rendered as armies and typically, you know, just like any other army that you would deploy on an instance, a similar story in the container space.

There's broad support for containers across the Graviton ecosystem with, with a lot of customers deploying container workloads and Graviton. Um this includes Docker and Kubernetes. It also includes our AWS managed services such as ECS and EKS as well.

We also find support in the container registries where they support multi architecture images that include both x 86 and arm 64. Be it ECR be Docker Hub or other popular registries and some of the newer container technologies like Firecracker micro vms and the Bottle rocket OS that's fully supported on Graviton as well.

Here are some more examples of other companies whose software is supported on Graviton and this is an ever growing list and this is meant to be a sample for some of the the popular tools and software that customers use.

So starting with databases, databases is a first class citizen on Gabbidon and pretty much the majority of open source databases as well as many commercial databases have included support for Graviton today. And you also find that um for your logging, your monitoring uh security software as well as the CS CD tools.

Um there are many options available that are fully supported as well.

AWS in terms of growing the ecosystem has also invested in a program called the Graviton Ready program. And here this, this consists of certified partner solutions where these software vendors have actually gone and optimized and validated the solution fully on Graviton which makes it easy for customers to deploy these on their instance types.

Um this is also a list that has been growing rapidly and we continue to document these on our website for Graviton ready partners.

Um and, and, and you can find a much fuller comprehensive list on our website.

So talking about databases, another exciting announcement that we had agreement 2021 is that S A and AWS jointly announced a partnership to support S A Hanna cloud on Graviton sa a cloud, as many of you may know is a fully managed in memory cloud database as a service. And since then today, there's been collaboration between AWS and SAP across multiple areas. This includes getting the Graviton infrastructure ready for build pipelines as well as preparing SCP Hanna clouds containerized microservices as well as the SAP database um Hana container to run on Graviton.

So SAP Hanna cloud is based on SAPS kernes project called Gardiner that will make Graviton based two instances available within the covid nine worker pools.

So while you can use Graviton directly on two, we've also extended the price performance benefits of Graviton to many of the popular AWS managed services. So this includes services across databases across analytics across compute, as well as the newest ad for machine learning, which is Amazon Sagemaker service and managed services typically represent a very low friction path for customers to run and Grabiton because in many cases, it's simply an instant switch with little to no code changes.

Another recent support that was added is the AWS Nitro Enclave support on Graviton uh Nitro Enclaves as you know is um allows you to create computer environments to securely process highly sensitive data. And these are supported on both night, Graviton two and Graviton three based instance types

Transitioning to Graviton. So let's talk to some of the best practices that you can use to run your workloads on Graviton instances.

So firstly, in terms of workload targets, typically, Linux based workloads both open source as well as commercial with the growing is software ecosystem, those are good targets. And the general rule that we say is the more current your software, the better it is. And and that's generally true even for your existing implementations outside of Graviton. And there are multiple AWS tools and SDK s that can help you with this transition with many of the familiar tools such as auto scaling groups that fully support multiple architecture. So you can actually run x 86 and arm 64 based mixed clusters.

So i want to put out point out that there is the url over here, which is our getting started technical guide for Graviton. And that on github is a very useful resource that i would encourage you to bookmark since it documents a lot of best practices across various languages and applications that will allow you to get the best performance out of the system and and the right tuning optimizations that are available.

In addition, we also have a Graviton fast start program that provides step by step guidance on both self manage workloads as well as using managed services to lower your cost using Graviton as well.

And for those of you who haven't tried Grabiton yet, we have a free trial that's ongoing so you can take advantage and try Grabiton for free with the t four g instance free trial.

I also want to talk through some specific popular workloads and applications that customers are running and point out some of the things that you can look out for as you start considering and moving your workloads.

So firstly with containers, um these are super popular. We have a lot of customer success stories that have moved containerized microservices um to Graviton and and a couple of things to look out for here are firstly and foremost container images are architecture specific. So you would need arm 64 images uh for your container software

The good news here is that with the container repository supporting multi architecture images. What this means is that because they do it with manifest list and a seamless manner, right? Image based on the host type and the right architecture is deployed automatically when you're using a multi architecture environment.

There's also a lot of popular container software in the ecosystem that's already available in the registries with 64 image versions. We have a full list again on that getting started technical guide, but you can see some examples listed here on that slide. And in cases where you don't have an image, it's also possible for you to build an arm 64 image very easily through what's supported in the ecosystem, using things like docker engine, darker desktop or build x through docker that allows you to build an x 86 and an arm 64 image at the same time.

Java based applications also tend to be really popular and they are generally performing out of the box and i'm 64 and being an interpreted language, there's no need to recompile while jd binaries for graviton are available from multiple sources. If you have a choice, you would recommend using amazon correo since that provides you with a path to getting some of the newest optimizations that aws is driving in terms of java performance from capital java eight onwards and newer is supported on arm 64. But we found that many customers have been using java 11 and newer to get the best performance out of the system.

Um just one caveat to look out for if you're running java applications is shared objects which are architecture specific. So you might have a code written in lower level languages and compiled to a specific architecture. And the best way to identify those would be you would unzip your jar file, look for the elf binary and make sure that for every x 86 version, there is a corresponding arm 64 version that exists.

So if you're running c or c++, um those applications would need to be recompiled. So using a recent compiler is preferred and if you have any dependencies on assembly intrinsics or you rely on a vx instructions, those would need to be ported. And um there is, there is options there like sse to neon or s id e um that can help with the porting here.

Another point to note here is large system extensions or lc. So both graviton two and graviton three support large system extensions. So that delivers low cost atomic operations, which improves system throughput for things like locks and texas and an optimized lipsy with le support is available through the popular operating systems.

Finally, a word on python. So if you're using python, um typically you would use pip install to get through um all the python packages, making sure you have uh a newer enough version of pip available and aws from our side, we are also actively working to make several pre compiled packages available for graviton, right?

So there's more than 200 plus packages that are done and you can track the likely status of those builds also on our github graviton web page. There's a bunch of those that are listed on this slide and there are many more and, and one caveat here that i would point out is while there are many pre compiled packages that are available in case you run into a situation where there is not a package that's pre compiled for arm, then pip will attempt to build that from source and that may result in a slightly longer installation time.

And for guidance on many other applications such as go and php ruby tensorflow.net. and more again, i'll, i'll point that you are allowed if you want to bookmark that there's much more detail that we go into over there for each of these.

So with that, i will turn it over to oren to walk us through stripes, grab it on adoption journey and key learnings, right?

Hey folks. My name is ian. I'm the head of core computed stripe and i'm here to talk a little bit about what we've done with graviton in the last six months.

First of all, what is stripe? Many people know us, but for those who don't, we're a fintech company, we're still a start up. What we do is we build payment infrastructure for the internet. We have millions of companies as our customers today everywhere. From very large enterprises to small start ups and basically they use us to accept payments, grow their revenue and accelerate their business opportunities.

We currently have about 8000 employees around the world in 23 countries, but 30% of our workforce is remote. So when you use stripe, what does that actually look like? In many cases, we're integrated with partners. In this case, you can see apple pay in certain countries. We use amazon pay, google pay, apple pay and others. In other cases, you would see our logo directly. We're currently available to businesses in more than 50 countries around the world. And those businesses can accept payments from their customers in 200 countries and territories around the world. And as i said, this is ranging from everywhere from giant platforms that produce websites to solo preneurs and tiny businesses.

Let's talk a little bit about our growth in the last two years. Currently, we're processing more than 500 million api calls per day in 2021 we've processed in excess of $640 billion across the world. That was a 60% growth year over year. Most of it was fueled by covid. We are basically experiencing obviously a very tremendous growth in our infrastructure. As you can see from this graph here across the last two years, the number of virtual machines that we use in our infrastructure has gone five x.

We're one of the few companies in the financial industry that actually publishes their api publicly. We guarantee five nines of sla and this is a thing that we're very proud of. In many cases. Our dependencies don't provide that kind of sla and we've built a lot of systems and redundancies to provide our customers with a great experience. Even in edge cases, we've been collaborating from with aws for more than 10 years.

Stripe was actually born in the cloud and all of our infrastructure is currently within aws. Obviously, we're using all of the basics, ebs r ds ec2 s3 elastics and many, many others. We're currently, of course, always trying more and more offerings, more things that aws bring to us and we have a very tight collaboration with our aws partners.

So it wasn't a surprise that about six months ago, os comes to us and says, hey, how about you try graviton? And at that point, you know, there was a big question mark. Should we go and invest? It's a new ecosystem. It's arm, we don't have experience with arm. We're very happy with our intel infrastructure. Should we actually do this?

The pitch here was like, if you do this, you can see, you know, dramatic cost reductions, cost performance can go up to 40% even if you don't scale down your fleets as a result of um decreased cpu usage, you can still just replace machine for machine and see 20 to 30% direct cost savings. And obviously, there's flexibility of choice for our engineers to choose the right hardware platform for their workload.

So this sounded really good, but you know, it's, it's a process. We don't just take these things lightly. And we said, you know what these numbers are convincing, we want to take the first use case and see what to do with it.

So let's talk a little bit about how do you identify the very first use case? And let's assume your company is trying to do a similar thing. What should you look for?

Well, step number one is cost sizing. Obviously, you wanna pick a workload that would benefit from graviton the most. How do you find such a thing? There's many cases just as sudhir has shown us and on the internet about other companies that have migrated workloads. So s so the idea is simple, you pick something very similar and say, well, if this has worked really well for other companies, let's see if this works well for us.

Then of course, you start with an effort evaluation. Just picking any workload won't do like how much effort is it really to lift and shift this into graviton? What's the effort to get this into production? Getting a poc up and running could take a few weeks. That's great. But what's the real effort to go full production with graviton? Remember those five nines? This means that there is no downtime, there is no maintenance windows, there is no, well, you know, we're upgrading. Therefore, you see some hiccups, those five nines have to remain at all times. How do you do this?

Obviously, we want to benchmark performance very, very carefully before we commit to sending a huge fleet of machines into graviton. And finally, after taking all this into consideration, obviously, you build a plan and start executing.

So what did we choose? We chose reno rino is an open source sql engine. Basically, it's use a stripe to power internal dashboards for metrics, financial targets. In many cases, teams use this to do queries to understand their cost, attributions, external customers do this for reporting.

So here's a simplified, you know, flow of what happens. Somebody hits our front end api s and basically this goes into internal reporting api s into a query engine and from there into the reno databases.

So why reno? Well, we've seen other reports on the internet that migrating reno to graviton is pretty straightforward. It's an open source. There's already examples of this working and the results look good internally. When we assess the effort, it seemed that this is not like a big deal to basically take this and try it out.

So ok, we did a poc the results look nice. Now we need to get into actual production. So what's the hurdles? The first step is just like sudhir mentioned, you need an army that supports graviton. Amazon supports many armies, but we have our own stripe basically builds their own army. It's based on ubuntu, but we have lots of components that we've added over the years. And so we need to now build a stripe army that supports graviton.

So once again, what does that mean? We have many tools that do the actual build, for example, packer and others but the idea here is that now you're building not one army but two. You need to basically say every time i have a security patch, i have a bug fix, i have an os update, i need to build two images. So that pipeline goes from 1 to 2. And as many know 1 to 2 is really not that different from one to n

The next step after you've created your own army is to basically say, ok, what about the actual applications? Some of them are using uh kernis. But in this case, reno did not. So we didn't have to go down that path. But then again, there's many components, rino is not just an open source that you go and install, there's many microservices and other components that go into this ecosystem.

So obviously, just like in the us, there's two things here. One is the build, you have to basically augment your build pipelines to create two artifacts, one for intel, one for arm. And basically then um after you've done that, you have to basically look at deployments, right?

So let's say you are a developer, the infrastructure teams have created an army for you. They've augmented the build platform, you're writing a piece of code, let's say in go and you want this to go and you don't really want to care if this goes to an intel machine or an r machine. If you have to care, we've just added more work to hundreds of teams at stripe.

So that means that every time that you change your code and compile our build system, our c id pipelines have to be augmented to build all those artifacts and then deploy them to your target machines, taking into account automatically the machine type. So none of this is, you know, scaling a mountain, but it's lots of friction points that you need to augment your cd pipelines and test.

Ok. So now we've done all this, everything works. How do we validate performance? Well, the first step we did is to basically use airflow and spark to take actual production loads. So not synthetic tests. These were done in the poc stage and basically replay our own production traffic.

So we build a shadow fleet and replayed all production fleet, all all production traffic both through the intel fleet and the armed fleets and compare them 1 to 1. Here's a simplified table here and you can see that the results were actually pretty impressive. We see that for short queries, there's like 37% reduction for medium, 57 and all up all queries at on average, gives us a 54% reduction in time that they take to execute.

Now, when a query finishes faster, this means it's taking less cpu it means more queries can be executed on the same machine. Ok. So we've done all this. Now, here's a little bit more results about our shadow fleet and how we did this. Uh so basically, we launched darkly, we basically did not go directly into production but ran this for several weeks on both fleets and compared all results.

Here's some interesting facts that came from this shadow testing as you can see from the blue line. That's basically the graviton fleet and the orange line. That's basically the intel fleet. What you can see here is that it's not just basically executing faster. There is like a very nice clean line that comes from graviton. What does that mean? It means that there are less spikes. You see less problems at a p 99.99. It's more repeatable. That's the bottom line, it's more repeatable and you can get the same results again and again.

Um and so this basically built our confidence that this is not just a, you know, cost reduction exercise. This was a surprise, we're actually improving our customers experience because we're giving a more repeatable one.

So finally, here's the summary of all this up to 50% query performance improvement, the error rates, what's errors? It's basically time outs of queries that did not finish in time. These were reduced by 10 to 15%. It's pretty easy to cross compile all the artifacts. Now, we do have support in our build systems and deploy systems um for both architectures, which means we are ready to start taking on more workloads and we've done all the heavy lifting again, the graph here, the colors are now inverted, but you can see how the, the graviton fleet is way less spiky than the intel one.

So what's the key takeaways? What, what should you take from all this data that i just showed you? Graviton itself works great if you just, you know, go into your ec2 console and launch it, it just works right. Um there's no need to worry about boxes crashing to date. We haven't seen any abnormalities or you know, hardware crashes or anything of the sort. It just works.

However, in order to basically prove your own workload, you need a solid testing infrastructure, you need a way to aggregate all this data. You need it to be statistically significant. It's not enough to do a quick pc and say i'm good then basically, how do you roll this out to production? This seems like a trivial question, but it's not if you have a fleet of thousands of machines, right?

It's not that one day you flip all of them into graviton, it's going to be very gradual. You're going to be living for weeks in a split brain mode, the way part of it is intel and part of it is armed and you need to be prepared for this, your on call teams, your support teams need to understand that there's going to be a period where there's two types of machine, it's no longer homogeneous. So how do you do this transition very easily and safely? That's something you have to plan for and tool for.

So what's next? What's coming up at stripe? First of all, we've heard a lot of great things about graviton three and we want to try it out. We wanna try amazon managed services as well and we're currently basically building a plan with amazon professional services to accelerate and basically send, instead of taking one work flow at a time, we wanna take dozens. We wanna basically go mass scale and see if we can enjoy the same things we saw on reno on many, many other workloads.

Of course, this is a lot of work. This is why we partnered with aws migrating one service at a time is easy. Migrating 2030 40 at a time is hard and want to understand how we can do this better and build automation tools to make this happen.

Ok, sudhir. Thank you.

All right. Thank you. Thank you, oren.

Um so that pretty much brings us to the end of our presentation. Um we have a quick summary here. Um couple of key takeaways in terms of graviton providing the best price performance. You have a free trial. If you haven't tried graviton yet, you can get started and a lot of data in terms of customer references, best practices are out there in these urls and we really want to thank you for taking time out and attending re invent in person and taking time out for the session.

Um there is a feedback app that should come up on your uh on your mobile. So we would really appreciate if you could fill that out and provide feedback.

Um thank you. Thank you very much.

taibaili2023

关注

11
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
AWS Graviton deep dive: The best price performance for your AWS workloads

hello and welcome to uh c mp3 27, our breakout session where you can learn more about how graviton enables the best price performance for your workloads.Introducing myself. I am Sudhir Raman. I lead Core Compute Product Management at Amazon EC2 and uh co p
复制链接

扫一扫