Compute innovations enabled by the AWS Nitro System

最新推荐文章于 2024-09-15 18:44:36 发布

李白的朋友王维

最新推荐文章于 2024-09-15 18:44:36 发布

阅读量119

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134811585

版权

Sharma: Hello, everyone and welcome to re invent and welcome to our session on compute innovation using the aws nitro system. My name is Sharma. I work as a principal product manager in the co to instance team and today I'm joined by BJ. He works as a software engineering, principal architect. He's from sales force. And together we are super excited to be talking about the aws nitro system, the underlying technology using which our modern instances are built.

So if you are looking to get an idea of what an aws nitro system is, what are the benefits of aws nitro system and the innovations it has allowed us to achieve over the years and years, then you would like this session. If you want to understand the new instances that we have launched, specifically the gen seven instances that we have launched using the aws nitro system, then you would love this session. And finally, if you want to know how sales force our customer has been able to leverage our instances to get performance benefits, then you would get a valuable information from bj in this session.

So without any further delay, let's get started.

So nitro has had a huge impact on ec two and on aws, the majority of our fleet is based off nitro system. So we introduced nitro way back in 2017. But the work actual work related to nitro started almost a decade back. And when we talk about the nitro system, we often talk about a single entity object or an abstract object, right? But what are the components of the nitro system? My goal is to break down the nitro system into different components and talk about each of those components in detail.

So if the image that you see on the slide is an image of an instant stack, the top three parts are what is there in a server? So you would see a customer application in there, the instance os and also a lightweight nitro hypervisor. And the bottom part is what is the nitro system? It's a dedicated hardware which which for security storage or networking functionalities.

So when someone asks you what is a nitro systems you could have, you can say that nitros systems is a lightweight hypervisor and a dedicated hardware which is used for security storage and many other functionalities. But before i dive into each of those components of nitros systems, let's take a look at what was the motivation behind creating that nitro systems.

So when aws first started, we use state of art hardware and software at that time to architect the cloud, very similar to the on premises, data centers that we have. But one problem that we encountered was that the system hypervisor was consuming too many compute resources including the vcpu and memory. There were also performance variations in the two software stack which which was partly caused by the increase in the customer demand. But it also hinted at the possibility of problems which could come in future.

So aws realized that we need to do something different and this something different led to what we call our hero, which is the aws nitro systems. So essentially we are trying to solve three key problems with nitro systems. One is to reduce the hypervisor tax so that we can give you the compute resources, two to the performance variation. And thirdly to have a foundation for our a system which will have a foundation for all of our uh two instances.

So this required a large departure from classic virtualization technologies intended for an on prem usage and reimagining of how it will look like in the cloud. So to understand that i want you to look at this slide and in the slide, you would see a classic virtualization diagram. It's a diagram of a host. You would see the square boxes in the host are where the customer instances are living. And then you would see that networking storage management, security and monitoring functioning along with hypervisor is there in the host itself.

Now, if you look at the diagram for aws nitro systems, you would see that we have ported over all the functionalities, the networking storage management, security monitoring, all to dedicated hardware. And even the hypervisor is, is a very thin hypervisor and we have purposely kept it thin, we will dive deep in the next slide.

So if you dive one level deep into different components, then at high level, i would say that there are three different components of the nitro system. The nitro cards, the nitro security chip and the nitro hypervisor, the nitro cards and the nitro security chip are the hardware-based components. Whereas nitro hypervisor is a software based component.

So let's take a look at what are the different ni cards uh that we have used. So there are different type of nitro cards for networking, elastic block storage and the local instant storage and you can map them or try to relate them through the instance features or the instances that we have offered. For example, the vpc networking card can be mapped to the uh the networking bandwidth that we offer with our instances. The ebs, the elastic block storage card uh could be mapped to the ebs bandwidth that we offer uh with our instances. Whereas the local nvme storage can be mapped to the disk with instances that we have.

But there is also another card which is the behind the scene card which we call as nitro controller. Nitro controller is the brain of the system. So it interfaces with the two control plane at one end and it coordinates with the other cards including the nitro security chip and the nitro hypervisor.

Now, nitro security chip is something which is integrated into the motherboard itself and is responsible for protection and integrity monitoring of the hardware resources. Let's take an example to understand this better. So any any server would have different hardware components like memory dems uh or micro controller to keep a track of uh temperature. And each of them would be running some form of a firmware. And this firmware needs to be updated to for, for new enhancements or to fix bugs or, or any other reasons, right?

Typically, what you would do is you would log on to the host and use a tool to flash the firmware updates. And since customers are already running on the host, there may be a chance that customer could get to those spy flashes but not with nitro, we have a dedicated hardware to update the firmware flashes and no one can muck with the spy flashes. So this makes running your instance secure in the nitro environment.

The last component is the nitro hypervisor. So we have intentionally kept it very, very thin. So the one thing that it does is it programs the hardware for virtualization to carve the vcpu and memory and hypervisor only talks to the nitro controller. It stays quiet when the instance is running. The goal was to get the nitro hypervisor performance to be the same as the bare metal performance.

So just by a quick show of hands, how many of you have heard about the bare metal instances? I think majority of them have heard about the bare metal. So the bare metal instances are the instances where there is no hypervisor involved. You get a direct access to the hardware and we'll talk in more depth about these bare metal instances. And ravi would sorry, vijay would also give an example of the bare metal instances performance comparison with the virtualized instances.

So enough of looking at the uh symbolic representation of what a nitro looks like. Let's take a look at what the nitro system looks like under the hood. It's hard to believe that some of these cards are actually a building block for our cloud. So these are the cards which are the brain child of anuna labs, which is now a part of uh amazon and it's the underlying foundation upon which our modern ec2 instances are built. And this card, if you look at this card, they look very similar to the network interface card. And over the years, we have developed the newer cards with each generation of instances that we have launched.

Now as a customer, you would say that ok, aws has created a great framework. But what lies in for me, what are the benefits of aws nitro systems? So there are three essential benefits for aws nitros systems, the performance security and the innovation.

The first obvious benefit of performance is that because we are offloading all the networking ebs and storage functionality to dedicated cards you get in more vcp, you get in more memory, you get all the computer resources. So obviously you get more performance. And the second uh benefit related to performance is that you get when you're running a real world workload. For example, uh customers have told us that they run me mcast or ngnx or ffm peg or x 264 to run or to stress out the system and run and compare the system level performance. And when they run real world workloads on two instances and compare it to the other cloud providers. They typically see higher performance on ec2 instances. It is because of nitro. And lastly, the nitro also gives us a capability to scale the networking and ebs bandwidth and i would talk more in a bit more detail in the subsequent slide about how we have been able to scale the networking and ebs bandwidth using nitro.

The next benefit is the enhanced security that monitors protects and verifies the instance, hardware and firmware. So i spoke about the nitro uh security card in my uh previous slide, but there are also details uh available in a white paper that we published last year. And i would encourage you to look at the white paper. I have provided the link as well as the barcode related to that white paper.

The third obvious benefit and the biggest benefit is the pace of innovation. Nitro allows us to accelerate the pace of innovation. Think of it. This way you have ac pu which you qualify and then different nitro cards which are already existing, just like a lego block. You can use the cpu and connect different nitro cards to come up with different instances. Isn't that amazing? This is especially important in the rapidly changing environment that we live in. If a customer comes up with some kind of a new requirement, then we are able to meet the requirements quickly at a higher quality bar. Super important.

Now let's dive deep into the performance angle and the innovation angle in here. And let's start by talking about the nitro performance with networking bandwidth.

So when we first launched our first instance, the m one base instance at that point, we provided a networking bandwidth of one gb ps which is that that time was sufficient. But then when as customers started bringing in more workloads from on prem to aws, they started asking for more networking bandwidth.

So in 2013, what we did was we started offloading and this is where the nitro journey actually began. That's why i said almost a decade back, we started offloading the networking functionality to a dedicated card and we were able to achieve a networking bandwidth of 10 gbps and in subsequent generations, we have been able to increase that networking bandwidth.

So for example, for our gen five instances, the c fives and m fives instances, we provide a networking bandwidth of of about 25 gb ps. And for our gen six and gen seven instances, we provide networking bandwidth of about 50 gb ps.

The customers then said that there are more demanding workload like network appliances or uh telco workloads, the a iml workloads the hpc based uh workloads and you need even higher networking bandwidth. So what we did was we used series of networking cards to deliver even higher networking bandwidth. So either we use a dedicated card for networking or we use series of card for delivering higher networking bandwidth. And if you attended uh adam's key note yesterday or uh dave brown session, then you would have seen that we are able to support networking bandwidth of 6400 gb ps simply unheard of which is through which will be through our trainum based instances which will be coming out soon.

Let's take a look at ebs bandwidth. Now, once our experiment related to offloading the networking functionalities to a dedicated hardware was successful. The next thing in 2015, we did was we offloaded the ebs related functionality to a dedicated card

So instantly we were able to increase the EBS bandwidth from two Gbps to four Gbps. And we were even able to double the IOPS. And over the period of time as we came up with the new versions of Nitro card, we have been able to scale and increase the EBS bandwidth. For example, with our Gen 5 instances, we were able to support EBS bandwidth of about 19 Gbps. And with our Gen 6 and Gen 7 instances, we were able to support EBS bandwidth even higher, about 40 Gbps.

Now the customer said that or uh said to us that they have more demanding workloads like file system or even databases workload, and they need even higher EBS bandwidth. So because we had Nitro systems, we were able to scale even the EBS bandwidth to 60 Gbps first with R5b instances that we launched uh before uh in 2020. Then uh with our C6gn and R6gn instances which was announced or which was launched in last re:Invent. When we were able to support EBS bandwidth of 80 Gbps and 350k IOPS as well.

Now what just with a few changes to the Nitro card this year, what we did was we just increased the EBS bandwidth from 80 to 100 Gbps. And we were also able to scale the IOPS from 350k IOPS to 400k IOPS. And all of this is possible because we had Nitro system.

Now let's talk about the innovation angle in here. So I would like you to look at the chart in a manner that post 2017 there, it's a Nitro era and uh before 2017, it's a pre Nitro era. So pre Nitro era from 2006 to 2017, before we launched Nitro instance, we were able to ship 70 instance types. And when I talk about instance types m5.large is one instance type, m5.xlarge is another instance type or instance configuration, right?

And when I just talk about instances, then m6i is an instance, m5 is an instance. But from 2017 to 2023 we have been able to support more than 750 instance types. And it is possible because we have Nitro systems. That's the only change we made uh from 2006 to 2017. And the biggest change we made from 2017 to 2023.

Not only that if you look at your chart post 2017, then you would see that every year we had some kind of new technology coming up. For example, in 2018, we launched our first ARM base instances using Graviton. 2020 we launched Graviton 2 base instances. We had our Nitro SSD ranum based instances launched as well. The world is talking about the AI/ML in 2023 but we were, we launched an inference based chip in 2019 itself through our Inf1 instances. And this year, we even refreshed our Inf1 instances with Inf2.

So all of this has been possible because we have an ability to innovate faster because of Nitro system. And we have also been able to innovate across the different CPU processor families. So pre Nitro era, before uh 2018, we had uh instances with Intel processors. And after that, in 2018, we were the first cloud provider to offer AMD based instances. We were also the first cloud provider to offer the ARM base instances in cloud using our own Graviton chip. And we were the, we were also the first cloud provider and the only cloud provider so far to offer the Mac based chips or Mac based instances.

So you would see a lot of firsts here. It is because of the foundational micro technologies that we have been using because of Nitro, we have been able to deliver the right compute for each application and workloads. And we have been able to offer different broad choices of instances using different processes and architectures to our customers.

So we recently launched the Gen 7 instances across different swim lanes. And thanks to Nitro, we have been quickly able to, we have been able to deliver 17 plus two instances. Now, when I talk about instances in this case, m7i is one instance, m7i flex is another instance, not instance type. So just note that distinction.

Let's look at the instances that we have offered in the past year, year and a half or so. So we started the Gen 7 journey with Graviton 3 based instances. We launched C7g instances and then m7 and r7g instances. These instances provided up to 25% higher compute performance and two times higher floating point performance to accelerate most compute intensive workloads.

These were the first instances in cloud to support the DDR5 memory which provided 50% more memory bandwidth over the DDR4 memory dims that we had. And these instances could also be used for AI and ML workloads because they provide up to 3 times higher performance for machine learning workloads, 2 times higher vector width and support for bfloat16 data type.

There are also PyTorch and TensorFlow optimizations which could help with the AI/ML workloads. So we launched C7g like our compute optimized instances, general purpose instances, memory optimized instances. Along with the disk variants, we also launched our high networking variant of Graviton 3 based instances. And then we also for the very first time launch our HPC based Graviton 3 based instances.

So we launched so many instances using the Graviton 3 processor in the past year or so. And if you think that is impressive, then we are ready to be impressed even more because yesterday, if you would have attended Adam's keynote, then he announced the Graviton 4 based instances. So we launched the Graviton 4 based chips and r7g as our first Graviton 4 base instances.

These instances provide up to 30% better compute performance compared to the previous generation r7g instances. And they provide memory up to 1.5 terabytes, 3 times higher versus the previous generation r6 instances, which is suitable for scaling if you have some kind of a high performance database workloads. And these are the most energy efficient instances in cloud.

Now, all of these have been possible because of the Nitro, but there is a customer angle in here as well. Like what has our customer said about our instances. Let's take a look at what our customers are talking about our Graviton instances.

So customers like Sprinklr have been able to leverage C7g instance and provide 27% better performance versus C6g instances. Whereas customers like NextRoll have been able to get 15% higher request handled with C7 instances and 40% higher latency versus the previous generation C6g instances.

As customer like Honeycomb has 100% of the EC2 fleet using Graviton and with C7g instances. Because of the performance of C7g instances, they have been able to consolidate their workloads and they have been able to use 30% fewer instances and save massive costs versus the C6 instances for the same workloads.

Whereas the customer like Thelium have been able to use our high networking variant of C7g and of C7g instances and get 29% better performance for the real time workload management application.

So the meta theme here is that Graviton instances have saved cost for our customers and they are the, their customers are getting the best price performance on Graviton based instances.

Next, I would like to talk about the innovation across the Intel swim lane. So AWS has been partnering with Intel since past 17 years, the longest of any cloud vendor. Now in this partnership, the AWS and Intel's leadership team, the engineering teams and many other cross functional teams have been engaged ever since AWS launched its first EC2 instance in 2006. And in those 17 years of collaboration, AWS has introduced more than 400 different Intel based instances.

And since August, we have launched more than 5 different Sapphire Rapids based instances. Let's take a look at what we have been able to offer with our Sapphire Rapids based instances.

So I want you to focus on the innovation pillar first and look at the processors. These aren't any roadmap processors from Intel. These are custom processors. So AWS has partnered with Intel to have custom processors available only to AWS.

So if you have an E2 Nitro system and then you have custom processors, it means that our Gen 7 Intel instances have been would be able to deliver 15% better performance versus other cloud providers using comparable Intel processors. Isn't that amazing?

The good news is that the innovation doesn't end here. Traditionally, EC2 instances probably offered with a single bare metal size. But starting with Gen 7 for the very first time, we are offering two bare metal sizes with our Intel instances. And each of these bare metal sizes would support the discrete on-device accelerators that come with the Intel chips.

So there are there are three discrete on-device accelerators - the QAT, the DSA and IAA. And these would be supported on these bare metal sizes. So if you have a workload which is related to compression or encryption or databases workload, then you would benefit from these accelerators.

Our Gen 7 Intel based instances have been able to provide 19% up to 19% better price performance versus the previous generation Ice Lake based instances. And there are many new features and offerings with our Intel based instances.

There are different flavors of instances. First of all, the m7i flex is a new category of instances which we just introduced and I'll talk about it in a moment. We also announced a preview for u7i instances which offers up to 32 terabytes of memory. And then there is the fastest Sapphire Rapids based processor instance in cloud, the r7i instances with our memory based instances.

Uh we support the DDR5 memory dims, they support sizes up to 48xlarge and we have increased our EBS volume attachment from 28 to 128. There is also an ISA based AMX instructions which are supported with our Intel based instances.

So if you have some kind of an entry level AI/ML workload, right? You can consider using Gen 7 Intel instances. And then you can if you have very simple ML models and you're just starting your journey with AI/ML and you can use this instruction sets which comes with the Intel processor and then you can gradually transition to the GPU based instances once your model becomes more complicated.

Now the 2-3 different things in those slides and the two new innovations in those slides were m7i flex, which is a new category of instances different from our C's and M's and R instances, the C7i and M7i and R7i instances and the dual bare metal or the two bare metal sizes, right?

So when we were designing the Gen 7 Intel instances, we realized that the customers do not fully utilize the compute resources of the C2 instances. Yet they were paying for the CPUs that they were not utilizing. And last year, you all know that cost optimization was a theme and our customers were asking us to give them the latest and greatest without increasing the price.

This led to the innovation of new category of instances which we call as m7i flex. So m7i flex are the easiest way to get the price performance benefit for majority of the general purpose workloads. So if you have any kind of web serving workloads or virtual desktop workloads or other general purpose workloads, you can consider using m7i flex.

They offer up to 19% better price performance versus the previous generation Ice Lake based instances. And they are available in most common sizes from large to 8xlarge, which means you will get up to 32 vCPUs to 128 GB of memory. They are the ideal first choice for most of the applications that do not utilize all the compute resources.

So if you have some kind of other application which utilizes compute resources like some kind of a financial trading application, then don't panic. We have m7i instances, you can use them. But for all the other general purpose workload, you can use m7i flex.

Now m7i flex was possible and we were able to deliver it quickly because we had Nitro systems. So then there is a new Flex CPU scheduler which was built as a part of the Nitro system. This then the Nitro hypervisor runs this new Flex CPU scheduler which ensures that at a server level, the m7i flex instances get the CPUs that they need instantly.

Also encompassing we are Nitro systems and Nitro hypervisor. We also had NI management services comprising of intelligent placement and live migration that ensures that m7i flex instances gets the CPU that they need instantly by leveraging these servers across the EC2 fleet.

So net net, we see that AWS Nitro System kind of allowed us to enable the M7i Flex instances. And it allowed us to get the CPU resources that they need within the server as well as across the server to deliver reliable and consistent performance.

Let's talk about the Gen 7 Intel Bare Metal instances. So in 2017, our customers actually told us that they have some legacy workload wherein they cannot use hypervisor. They have some licensing database workloads wherein they cannot use the hypervisor. They needed direct access to the bare metal or to the hardware. And because we had AWS Nitro System and remember from my previous slide, we intentionally kept it very thin. We were able to chuck off the Nitro Sys uh the hypervisor and provide our customers with the Bare Metal instances.

So we launched the IBA Bare Metal instances for the very first time in 2017. But what happened after that, like with each instance that we were launching, we were having uh Bare Metal instances which was the largest size, typically the largest size as we were moving uh from or we refreshing our newer uh going to a newer generation. The the CPU vendors were also uh increasing their core counts. For example, our Cascade Lake processors were 24 core. Our Sapphire Rapid processor is 48 core and a customer said that they wanted smaller Bare Metal sizes because they were already using the Cascade Lake processors and they want to transition smoothly using the smaller Bare Metal sizes.

There were also customers like Nasdaq who wanted better num affinity or a single socket, Bare Metal sizes. And that's why we decided to launch two Bare Metal sizes for the very first time with our Gen 7 Intel based instances. And this was all again possible because of the Nitro System.

Let's take a look at what are customers have been telling about uh the Gen 7 Intel instances. So customers like Salesforce uh are i have been able to get up to 25% higher throughput at 11% faster response time on M7i versus M6i. So uh Vijay would talk more about these performance number in, in the next few slides.

The customers like Nasdaq have been able to simplify their new architecture and leverage our smaller Bare Metal sizes for their financial application workloads. Whereas the healthcare customer like Tufts Medicine have been able to see or have been able to saw three times more users per session and 68% higher performance on M7i versus M6n instances.

And customers like Ara Labs have been able to use R7iz instances, the fastest uh Sapphire Rapid instance on cloud and able to get 25% better performance versus the R6i instances for the ED workloads.

So the meta point is in here is that it's fascinating to see that there are workloads across different industries where customers are able to leverage our Gen 7 Intel instances.

I do want to switch gears and talk about our Gen 7 Genoa based instances as well. So in 2018, we were the first cloud provider to offer the instances with AMD processors. So we positioned the AMD based instances and priced them 10% cheaper versus the corresponding x86 based instances. We continued with this positioning in Gen 6 AMD based instances as well, which allowed customers to save cost.

But we realized with Genoa that AMD has made significant advances in their core counts, they were offering cores up to 96 core and our customers were also asking us for higher performance. So what we decided to do with our Genoa based instances, we decided to disable hyper threading. And by that, I mean is that, I mean that we offer one vCPU as one physical core, typically, we have been offering two vCPU as one core. And because of that, we were able to deliver disruptive performance improvement of up to 50% versus the Gen 6 AMD based instances.

Now, because the AMD based instances are also using the Nitro Systems, they are using the same components essentially as the instances built using the other CPU processes. This means that the customers could migrate to AMD based instances with little or no, essentially no code modification.

And in terms of feature are AMD based instances offer uh were were built using the DDR5 memory dims. Um they have a new medium size, they also have ABX5112 BNNI instructions enabled. These were not available in Gen 6, these are available only in Gen 7 and there is also a support for BFloat16 and as are Intel based instances, these instances have increased EBS volume attachment of up to 128.

Let's see what our customers are saying about our AMD based instances. So customers like Netflix, we have tried out our AMD based instances observed 40% throughput improvement and 50% latency reduction with our AMD based instances with M7a instances specifically versus the previous generation.

Whereas the customers like Ferrari have been able to use HPC7 instances and get 30% performance improvement for their computation, fluid dynamics workload or CFD workloads and 25% performance improvement for their finite element analysis workloads.

There are customers like Total CAE and ICONA who have been able to leverage our AMD based instances and gotten and either use our AMD based instances for the higher networking bandwidth or they have used AMD based instances to reduce their simulation times.

So the meta point in here is that if the customers are looking for higher performance or if they are looking to get their jobs done sooner, then they are considering our Gen 7 AMD based instances.

I think now that you have heard from an insider, like how AWS have been able to leverage the Nitro bases uh instances, uh Nitro bases system and create many instances. I would like to pass on the stage to Vijay who would talk about how Salesforce, one of our top customers have been able to leverage the Nitro bases instances and get performance improvements. Thank you.

Hello, everyone. My name is uh Vijay, uh Pula, I'm from uh Salesforce and uh Salesforce is uh is the pioneer in uh the uh uh software is a service service model uh in right from 1999. And uh it is currently one of the largest uh enterprise software company in the world.

And as you heard yesterday from um uh from Adam's uh uh keynote, um AWS and uh Salesforce are extending the relationship to the next level. And currently AWS is the primary uh cloud provider for Salesforce. So I'm excited to be here to talk about uh how we leverage uh leverage the EC2 instances. Uh for our fleet wide uh workloads uh um in Salesforce.

Uh so, before I proceed, uh so here's the standard disclaimers uh that we usually put in any talk that uh that Salesforce uh uh delivers. Uh so as you probably are already aware, uh any uh performance numbers are highly uh have a huge variance based based on like the workload types. And uh uh what kind of uh test confirmations and environments and uh uh all the different tunable that you have in the stack, how you set it and so on.

Uh so the gist of the message here is uh please don't make any purchasing decisions uh based on the results I'm going to be sharing here. Ok.

So having said that uh let's start with a quick overview of the Salesforce uh Einstein One platform. And uh for those of you who don't know about it, it is the world's number one AI CRM uh platform. And uh in this platform, we have like uh you see on the left side of the uh picture uh Data Cloud and uh Einstein AI, both of them are native to the Einstein AI platform.

And uh with this platform, our customers can enhance and uh seamlessly align their CRM uh applications and uh using the integrated metadata uh using that metadata framework that you see at the bottom of the picture and you can generate code uh content uh emails, campaigns and a lot more with intelligent builders. And you can automate securely across people processes and systems to increase the productivity.

And you can use uh low code and no code tools to drive dev and admin efficiency. And you can uh extend uh Salesforce faster with the open APIs and uh thousands of uh ecosystem partners uh that, that we have in the Salesforce ecosystem.

And um uh to learn more, I have put a QR code uh uh on the slide, you can scan that and it will take you to the page where you can learn more about this platform.

And um so to summarize with the Salesforce i platform, our customers can easily create AI powered applications and workflows that supercharges productivity. It reduces costs and it delivers amazing and trusted customer experiences all powered by the data AI and CRM.

And this platform is, has a massive scale uh like across uh we serve, we serve tens of millions of users on a daily basis across many, many industries and across many geographies. So, on a daily basis, we process over 3 billion transactions with an average response time of under 300 milliseconds.

So now let's go to the uh now that we, we have seen uh what uh the, the high level overview of the Salesforce uh uh product track. Now let's talk about the Salesforce Hyperforce.

Uh so if you have been uh already using Salesforce products and services. Or if you have been following Salesforce's announcements, you probably already have heard this term called Hyperforce. So uh let's look at what, what is Hyperforce.

So Hyperforce is the next generation uh Salesforce's uh infrastructure architecture that is engineered for the public cloud and um with it. And if you want to learn more about Hyperforce, again, I have put a QR code here that you can scan and it will take you to the page that has tons of information on Hyperforce.

But there are some key differentiators with Hyperforce as compared to uh compared to our competitors. So as a customer on Hyperforce, you can manage your data residency. Uh you can choose where uh on uh on the Hyperforce, you want to run your Salesforce applications, you can assign local storage and you can also offer several compliance certifications that, that, that, that local region requires.

And as a customer on Hyperforce, you can scale sustainably by serving, serving your customers fast with a flexible and scalable infrastructure that is delivered on a carbon neutral cloud.

And on Hyperforce, you get a deeper level of security because we enable into a data encryption by default. So that includes data at rest and also the data that is in transit. And with Hyper four, you can also safeguard customer data using the robust and transparent privacy controls. You know exactly what customer data is being collected, uh who has access to it and how it is being used.

And with Hyperforce, you can um you can also increase as a customer, you can increase your agility by reducing the time to value uh work faster using the zero downtime upgrades that uh that c force delivers on Hyperforce.

Um and also there's extensive uh native public cloud integrations with, with several other products and services that AWS offers and also other clouds offer.

Uh so, uh so in a nutshell, uh using uh on Hyperforces, basically, uh our customers can scale globally across a lot of different regions where Hyper Hyperforce is available. But at the same time, they are serving their customers locally in that region.

And uh so now let's look at um what is uh how Hyperforce is on AWS? So Hyperforce by design uh is multis substrate which means uh in addition to AWS, uh we do work with a few other cloud providers.

Um but, but as of now, AWS is our primary uh cloud provider. So uh the map that you see here uh like I overlaid the AWS s uh infrastructure, global infrastructure uh map uh showing all the regions where Hyperforce is currently live and serving production traffic uh with, with live customers on it.

And um as you can see, there are about 1818 AWS regions where Hyperforce is currently live. And there are a couple of other regions where we are in the process of uh deploying Hyperforce and going through all the different uh compliance validations before we open it up for, for our customers.

So as we speak, there are more regions that we are going into uh on AWS. And um so Hyperforces uh design and implementation it perfectly aligns with all the six pillars of uh of the AWS s well architecture framework. So you can rest assured that it is highly available. Uh it is uh secured uh and it is reliable and so on. Ok.

And um with so Hyperforce basically is engineered, says that our customers can increase uh growth in new regions. Uh they can expand their growth into new regions. Uh they can improve their regional performance, they can handle the mission critical workloads and uh they can receive the latest and greatest uh from Salesforce uh with, with a lot of speed.

So now let's look at now that we have seen uh the Salesforce's overview and then the Hyperforce and uh Hyper what Hyperforce is on AWS. So let's look at the compute uh that we use in Hyperforce on AWS.

So, of course, in addition to compute uh we do uh consume tons of other uh AWS s uh production services

But since this, this is about the nitro system. Uh we will keep this discussion focused on e two. Uh so uh uh just like any, any um enterprise and mission critical systems. Uh uh so under the hood, uh hyper force in aws is also a very distributed ecosystem. Uh uh it is service oriented with a lot of microservices. And the, there's a wide range of technologies that we use across the different, different subsystems in our oral ecosystem.

So we, we carefully choose a given technology based on what is what is best suited for, for a given subsystem based on the task at hand. And uh so, so in a nutshell, we have a wide variety of workloads uh uh with each of which has like different uh different uh characteristics of its own.

And as you heard from sudeep, um um aws has been innovating so fast, like they have been releasing like hun 100 and 72 plus uh uh instance uh instance type. So how do you even keep up with all of this? Right? How do you know what is the best instance type that uh that, that we should be picking up for a given green workload? Right.

So in sales force that the process is as follows at a very high level. Uh so basically, we do have ongoing uh ongoing uh um uh discussions with aws and our awesome technology partners in intel and a md to understand what is the next processor uh that that the chip funders are coming up with and when aws is going to be taking that and then translating it into an instance type that will be available for us. And to also to understand uh what are the benefits, uh what are the advantages of the upcoming newer processors and how we can leverage that? Right.

So that, that is uh uh that is one part of the equation, then the other one is we look at uh uh look at something called as an instance recommender tool uh where basically we pointed to our production systems or our uh prep production systems where, where we have all kinds of workloads that are mixed together. And uh basically the instance recommender runs through all the different types of instances that we will pick as a subset and then it kind of gives us a list of recommendations.

And then we also look at uh any um industry standard benchmarks, they are already available. And if not, then we do run industry standard benchmarks as well. And uh based on that we, and then of course, we do look at the price performance ratios uh because, because we would, we would like to keep uh optimized for the cost as well uh while making sure that we deliver uh trust and uh uh performance uh to our customers.

Uh so we, so once we understand all this, then we shortlist a few instances that we would like to experiment with. And that is when it comes to my lab. And uh we basically run uh run a series of workloads that are specific to sales force. Um and some of which i'll explain it in the next slide. Uh and uh and uh based on the, based on the benchmark results, uh we, we basically then uh uh work again with aws and uh and uh intel or a md depending on which, which instance type you are picking to understand what are the supply logistics like because hyper force is spread across so many regions, but we don't always have the latest and the greatest instance types available across the board.

So we do take that into consideration and then we prepare our roll out plans where we have a very strict stagger in terms of how we deploy the new instance types to our production fleet. And of course, we do deploy it in such a way that it goes to our pre product environments. First, we have a big period to see if there are any issues that we may find. And then also after that, we deliver it to.

So we have like our own engineering org in in one of the one of the hyper force instance. So anything that engineering engineering is confident of delivering, we first put it there. So if something breaks like the whole engineering team, like thousands of engineers, they're all experiencing down time.

So we we basically take it there. Once we are confident with the pre product environments, we put it there, we let it bake and then we take it to another instance where we have internal sales force orgs and internal salesforce customers across the different divisions. We make sure that works fine before we take it to our uh uh uh uh production systems where we have live customers. Right?

And even there, we do it in a staggered manner and we, we closely observe, we have like a huge amount of uh uh metrics that uh that are emitted from across the stack from different subsystems. And all of that is basically monitor, we have monitoring and alerting in place to make sure that nothing breaks.

So, um and also because of uh of the way how the uh nitrous system is designed, so explain so beautifully. Uh so there is a very quick turnaround, like most of the time we see that that moment intel announces a new chip or a md, announces a new shape uh to the time when it is available for assessment in my lab, the turnaround time is very quick and that is purely possible because of the uh nitro system that uh aws has.

Uh so now moving on. Uh so before i go to the next slide, i would like to quickly explain to you like uh uh a little more uh details of how we do the benchmarks, right? So we do have uh like synthetic benchmarks that we run in my lab. And uh uh we basically um uh prepare some that something that we call it as a composite workload.

So composite workload basically is a mixture of several different scenarios that we see in production. So we pick the top top few most commonly used scenarios. And then also we pick the top few most critical scenarios like for example, login, it is not one of the top commonly used scenario, but it is the most critical. So we do pick, pick the critical scenarios as well.

And then we pick a mix of scenarios that exercises the different subsystems of the, of our target ut that we would like to exercise and and to the internet load, right? So that's, that's how we prepare the composite uh uh composite workload.

And then we have uh uh the load shape that uh where we try to uh like typical product loads and things like that. And also we have some, some other types of tests where we do overload and stress tests and things like that. And in addition to that, then data, data set is a very important component of any benchmark, at least for the sales force. And i'm measuring for most most other applications as well.

So the data, when we say data shape, it's not only about the data volumes, it's also about the data complexity and how all these different entities relate to each other, the complexity of the data set and uh some of the customization because salesforce platform is highly customizable, our customers kind of do all kinds of customization on top. Uh so we do pick a few of those uh as a representative scenarios. So that's, that's how we run our benchmarks.

And um so uh what you see here on the screen is basically uh like uh uh how, how, how those synthetic benchmarks have been behaving across all the different uh instance types. So we started off with um and again, remember here that the application, this particular application is a java based application. Uh and it is uh deployed on uh uh cun it's a containerized application that is deployed on cuban.

So across the tests, the the resource spec of a given community pod and all the and all the containers inside that they're all fixed, right? Nothing is changing. And similarly the number of key pods that, that that application is getting for a given test is also fixed, right? And similarly the release the the load shape, the data shape and all the other para test parameters that i mentioned earlier, they're all constant. Ok.

So the only thing that changes is the underlying two instance type right that we would like to experiment with. So, so the far left data point, the first data point that you see is the m 512 xl and that is the instance type that hyper force started with like that, that was the instance type that we were running, running flew across our entire hyper force instances.

And um uh at some point, we basically uh basically decided to vertically scale the instance size, right? So we were still in m five. But then we are quickly finding because of the massive migrations that we were doing from our first party data centers uh to the hyper force instances. We were kind of uh uh uh had this pattern where we keep adding more and more nodes to our clusters.

So at some point, we basically uh increase the instance size from m 5.12 to m 5.24 excel uh so that we get more resource spec for the same number of nodes in our clusters. Um and at that point, of course, we, we again did an assess detailed assessment in the lab. And um so as you can see here, basically, there are 22 plots.

Uh the one in the blue is tracking the, the throughput or the capacity. And then the one in the green is our response times and the blue one is mapped to the axis on the left and the green one is mapped to the axis on the right. So as you can see when we went from the m 5.12 xl to m 5.24 xl.

So the workload uh uh throughput that we're getting is on par right now if you're asking, hey, you just double the instant size. Why is your throughput still the same? You have to remember what i said earlier is the resource spec per k pod per part. And the number of key parts that we allocate for for that application is the same, right?

The only thing that has changed is the underlying instance type and then the instance type previously when it was running m 5.12 excel, it was packed with a certain number of parts and now parts and now with 24 excel, it is more densely packed, right? So, but then at the throughput level, we are still getting on part throughput.

Yes, the response time slightly degraded but then it was still well within our acceptable range. And now, so this is again, it's showing the showing the uh that when you vertically scale the size because of the way how nitro system is designed and implemented. There's no, there's no side effects, right? You can, you can you can play around with the different instance sizes.

But of course, you have to remember other factors if you are on communities. Like for example, uh what is the bin packing efficiencies and uh uh uh what is the amount of total unallocated resources and things like that, that, that, that you are to be paying attention to uh because that, that does impact your costs, right?

And then uh when we and then at some point when m six i became available, uh we did again a series of series, series of uh uh assessments in the lab and then we rolled it out into production and we did hit a glitch uh where we work with our intel tech uh technology partners to eventually nail on that issue and resolve it.

Uh but then, as you can see here, the throughput has increased and the response times have reduced, which is good, right? You want lower response time and higher throughput. And uh similarly, now we are evaluating, these are the latest from our lab, the m 78 or 24 xl. And as you can see here, the uh when you compare the m seven i 0.24 excel with the m six i.

Uh so the re the throughput has increased by 25% and the response times have become, have reduced by 11% which means it has become faster by 11% which is a great trend to have. And we see the same trend even on the a md, some of the production instances that we have is running on a md.

So when we compared with m six a and m seven a 0.24 excel, uh we noticed uh like uh uh 40% higher throughput and 19% faster response times on m seven a as compared to uh m six a.

Uh so now for translating, how do you translate this into the cost savings? That's a whole other topic for maybe some other day. Uh but uh but that, that's from the performance angle.

So now uh let's talk about the performance in the bare metal instances, right? Like as uh su was mentioning the difference between the bare metal and the vm, right? Is that the bare metal does not have the hypervisor, right? And the, and the equivalent vm uh will have the hypervisor but, but both of them have the same resource spec so now when interviewers launched the m five metal, uh we were very curious to uh to, you know, you always have that it, performance engineers.

So you wanted, we wanted to measure and find out, hey, is the, is the hypervisor really uh really lightweight and has absolutely no impact. So we did run bunch of the industry standard spec cp 2017 test suite benchmarks. And what we found is that the m 5.24 xl and the equivalent m five dot metal had very similar performance characteristics.

It was within, within 1.5 to 2% which is well within the error margin. Uh so, uh so it was, it was on par, right? And then similarly, when you go from uh uh when you go and experiment with the m seven i 48 xl with the equivalent uh m seven i 48 xl metal, we again found a very similar trend that both of them shows about the on part performance.

So that basically is another data point uh uh uh to um um to support uh uh what uh su was saying about um uh that, that the hypervisor is so light that it, it really does not have uh have any performance over it.

Uh so with that, here's back to uh back to su i think we just have the last slide. So uh the key takeaways in here is that uh aws has a unique differentiator in terms of aws nitro systems. We have it since 2017. Some of some of the other cloud vendors have just started. It has allowed us to innovate faster on behalf of our customers and deliver better performance and security versus other cloud provider.

We launched the ec two gen seven instances using the nitro processes using the nitro system and we have been able to launch 17 plus new instances across different cp options with many new features. And lastly, the aws nitro system has allowed the partners such as sales force to take advantage of the infrastructure that continuously improves.

So with that, i think we would like to conclude our presentation. And if please do fill out the uh survey uh which is there in the mobile app, and if you have uh any questions or would like to contact us, then we have provided our uh email i ds as well.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫