Deep dive into the AWS Nitro System

Hello and welcome to a deep dive in the AWS Nitro System. My name is Ali Si. I'm a Senior Principal Engineer in AWS. And today I'm gonna talk to you about some of the things we've done in Nitro and why we've done them and the benefits you get from those.

But I first want to start with the silicon that we're building in AWS. We've been built silicon in multiple areas now spanning IO and data center infrastructure, core compute and machine learning with the Nitro System, which I'm gonna spend today talking to you about.

We've moved a lot of functionality away from a traditional hypervisor onto purpose-built chips that we built in AWS. This journey started back in 2013 when we moved networking off of the hypervisor and with the C3 instance and then working with the start up at the time in Annapurna Labs, which AWS would later acquire we've been building and iterating on that ever since.

And so I'm going to talk mostly about that today, but I'd like to also spend a couple of minutes talking about the other silicon building with Graviton. We've built general purpose compute that provides the best price performance in EC2. And just yesterday, we announced the fourth generation of Graviton chip similarly for several years now we've been building ML chips both for inference with Inferentia and training with Trainium.

Um and yesterday we announced Trainium 2, our second generation of training chip that provides best in class price performance and efficiency for machine learning workloads.

So why do we do this? Why do we spend our time building these chips? And the answer is pretty simple, we do it when we have a path to to use that those chips provide you with better value. And there are a couple of reasons for this.

Um the first is by building our own chips. We get to really specialize that hardware for the use cases within AWS and specialization lets us tailor our designs to our particular needs. You might say, well, of course, like any chip is going to be built for your particular needs. But if you look at the way most people are going to build a chip, they want to make that chip applicable to a number of customers. They're going to add a bunch of different features of which one customer is only going to need a handful.

We get just a laser focus on the features, we need to make these chips perform well in AWS and not burn them with those other features. The second is speed by building our own chips. We get better speed of execution. And the reason for this is kind of simple, we own the end to end development process from the, the concept of what we're going to build all the way to what server is it going to be in? What instance is it going to be in? And when are we going to provide it to our customers?

And that just lets us bring it to you faster. But it also means that we get to design across that boundary. We can have the people who are working on the software that will end up in production, actually using that software as we define and build the product.

And the second one is innovation or the third one, sorry. Um building our own chips, really lets us innovate more because it removes the typical silos that you have. You don't have a chip team, a card team, a server team, a hypervisor team that are different organizations. It's one big organization and we can innovate across those boundaries.

And lastly, security. Nitro provided us a mechanism to enhance the security of our servers through a harder route of trust, through verification of firmware and the server and limiting interactions with the host for a narrow set of APIs.

So let's dive more into the Nitra system. And that was really a fundamental rethink of how virtualization should be done at its lowest level. But our system has Nitro cards, these offer networking, storage and security functionality and in the host, we run our Nitro hypervisor. On top of that, we run your virtual machines and you run your applications.

I'm going to talk about how this separation provides us better security and performance and has allowed us to innovate more today. But first, let's talk a little bit about how we got here. You know, it all started with a simple question. If we applied our learnings from operating EC2 for around a decade, what would we change? What problems could we fix?

And there's a lot of of good suggestions like we could improve throughput and performance. We could simplify our hypervisor, we could reduce latency and jitter. so we had a bare metal like experience, we can provide transparent encryption.

Um we could remove operator access, we could have a narrow, narrow set of audible APIs and really Nitro gave us a path to implement all of these Nitra is a combination of purpose built hardware and software.

We are on our fifth generation of Nitro chip now and we introduced this back in 2017 with the C5 instance at reInvented it all kind of started in 2013 with AC3 instance though when we first moved some networking functionality and had enhanced networking off onto an accelerator card.

Over time, we expanded this to IO other IO like EBS instant storage. And finally, we built a hypervisor that we got to remove a bunch bunch of functionality since 2017 we have announced something over 600 instances, all of which have been powered by the uh the Nitro hypervisor.

So here's a view of how a server looked like before the Nitrous system, we have customer instances and here they're running on a hypervisor called Zen. Now, Zen's great, but it also did a lot in this case, it did memory management and CPU um scheduling, but it also did device simulation limiters, security group enforcement, quite a bit of other things. It even has a full blown Linux user space.

And this privileged 0.0 all these resources used host CPU introduced jitter. And so we started offloading these capabilities onto dedicated hardware. And as I mentioned, we started first with networking.

So let's dive into networking. Now we have a flat family of cards with Nitro, some that do networking, some that do storage, some that do multiple things and we throw this around. But I thought today it would be useful to actually bring one and show you this is a Nitra card. It's got a picture that kind of looks like a pc ie device. It's uh otherwise looks like a pc ie card.

And if you plugged it into a to a server, you would see the en a devices that you probably are familiar with. You would see the nvme devices for ebs um or instant storage. And that card lets us offload the b pc data plane.

So when you attach an en i to your instance, there are security groups for that en i there's flow logs, there's routing and really there's all the encapsulation that's being done to, to create your vpc and the layer your, your packets on top of our software defined network. And that's all happening on this card.

All that used to happen in the hypervisor and it got moved over and by moving to dedicated hardware, we could increase the performance of those operations by orders of magnitude. These cards also support um technologies like b pc encryption.

So when you enable vpc encryption, as the, the packets move through the card, they're encrypted with a two bit a s encryption in transit with no performance overhead. And this is great because you don't have to make a choice between, between the performance overhead or not. It just can happen transparently without one.

I mentioned you've experienced these through the en a device when you attach an en a device or when your system boots and it has the, the, the basic en a device that's attached and dna has two parts, one part is what's running on the card, that's the device part and there's also the driver in your instance.

Um and if you look at what we've done with en a, it's been really extensible. Well, I mean, announced een a it's supported up to 10 gigabits. Since then, we went to 25 gigabits, 100 gigabits and 200 gigabits all with the same driver, you can start stop an instance with an anti device of 10 gigabits and start on another piece of hardware and get a 200 gigabit instance.

And that's actually pretty unique if you go and look at uh network drivers. Um, there's a wide variety of them and even a single vendor tends to iterate and have newer versions of the driver that aren't backwards and compatible with the old hardware or for is compatible with the new hardware.

So we with en a, we've been able to simplify the experience of moving from generation to generation by being able to hold that device model constant. for a number of years.

We also have support for the n a fabric adapter. e fa is a network with features geared towards hpc and machine learning workloads.

So hpc and machine learning workloads are special. usually something is high bandwidth or something is low latency, but machine learning and hpc workloads tend to be both high bandwidth and low latency.

Um and so with e fa, we developed and used another aws develop technology called the scalable reliable data gram. Let's talk about that for a minute.

Traditional network protocols take a single path of the network and there's relatively slow to react to things like congestion and even slower to react to any failures. Our networks and our data centers are look something like this. They're highly connected. There's many paths that packets can take between two servers.

And with that knowledge, we can detect things like congestion and failure much faster than traditional protocols can that were meant for internet scale millisecond many millisecond latencies. And that's exactly what we're doing with. srd.

srd provides a foundation where we can utilize multiple paths through our network simultaneously allowing us to distribute bandwidth across those paths, react quickly to congestion and significantly reduce tail agencies. And these tails are really important in distributed applications and bulk synchronous applications where you can't start the next step until your, your communication is done.

So even if your p 50 is great, you have to wait till your p 100 before the next bit of the computation starts. Um and, and that's one of the reasons that uh e fa is so useful for workloads like hpc like workloads.

So here i'm showing scaling of hpc application in the number of cores. So the cores are increasing on the x axis and the amount of performance you're getting for those cores is increasing on the y axis. And in a perfect world, this would be linear you as you add more cores, you'd get more performance just 1 to 1.

But you can see before e fa in this purple line, it was that was the case to about 400 cores. But then the performance flattened and even worse as you increase the core count, it started becoming negative, you were adding more cores to solve the problem but you were actually slowing the system down with e fa is, this is the blue line. It's not quite the red line, but it keeps scaling, it scales all the way up to over 1000 course.

And um this allows you to, to run larger problems in aws. The first version of e fa supported 100 gigabits. The second version supports up to 200 gigabits and the latency of that, those those packets through our nitro cards was reduced by 30%.

So let's take another look at this just with, with tcp. So if you're not using hpc like network and talk about, about flow hing.

So with traditional protocols like tcp, they want to take a single path of the network and while they can deal with out of order packets, they can't deal with it as a common practice. So traffic takes a single path through our network.

And even in a large over provision network like we operate, there can be congestion from time to time. There can also be an occasional link failure and this will cause tzp to back off its congestion control to uh to kick in. And as i mentioned, that's really meant for internet scale latencies, not for latencies within our data center.

So if you have congestion, you'll have a lower throughput. If you have a link failure, you have to wait for a time out before you can actually re-establish that connection. And you know, i said, well, why can't we just use multiple paths? Well, because tcp doesn't really handle that's out of our packets particularly well.

But what we can do with srd is send traffic down multiple paths, offload that to our nitro cards. And that's what we're doing with en a express. You can enable en express on a wide variety of in instance sizes now and the nitro card takes care of sending your packets down multi path simultaneously assembling them in the right order at the end and delivering it into your application without any application changes, only with a settings change.

This works for both tcp and udp. Um and the benefits are, are pretty huge. So we see a five x increase or up to a five x increase in single flow um bandwidth from five gigabits to 25 gigabits and up to an 85% reduction in p 99.9 tail agencies here, as i mentioned, it's a simple configuration. It's supported on a number of instance types now.

And within your, within a s your same a z, your traffic will start, you can start using en a express.

So if we look at how we've increased network bandwidth over time, when easy two started, we had a one gigabit network. And since then, we've increased that from 1 to 10 to 25 to 50 to 100 to 200. And then with our machine learning platforms we've gone from 400 gigabits up to 1600 gigabits to 31 point um uh sorry, 3.2 terabytes and even now 6.4 terabytes just a huge amount of increase in bandwidth in a single instance.

So networking is, is obviously a huge important piece for our workload storage is also an uh an important aspect, especially for workloads, like databases that need fast consistent storage.

Um unlike with our networking, where we created the en a device with nvme, there's a great standard driver that's very extensible. And that's what we use for our ebs um storage and for our instant storage and the nitro cards that i just showed you expose an nme interface on one side and they translate those nv commands to transactions to our ebs data plane, just like with the night with the networking side.

If you would have an a encrypted ebs volume as soon as the, the packet transits, the nitro card or the, the, the mbme request, it gets encrypted or decrypted there, not on the, on the storage side.

Um and so really the nitro card is providing an nvme to remote storage protocol here too under the hood for a lot of different types of instances and volumes we're using srd as well. So we're reducing tail lacs by by using multiple paths through our network to go from um our your instances to the, the ebs data plan.

And very similarly, we've gone from a small number of um uh of of bandwidth, two gigabits of ebs bandwidth all the way up to 100 gigabits of ebs bandwidth today.

Now ebs is great. Uh it's the right answer for a lot of customers but you get durability, you get snap shots, but some customers also want local storage

So let me talk to you a little bit about how the Nitro card enables better local storage as well. And to do that, I really need to talk a bit about what flash is and how an SSD works.

There really are two different important parts of a flash device. The first is the NAND - this is where bits get stored. But the NAND's kind of peculiar and you can only write blocks of NAND that you've erased and you have to erase blocks, usually in megabyte type chunks. So if you want to update just one byte, you actually need to copy a chunk of data, update that one byte and store it somewhere else. And in doing so you create some garbage.

Um, and so to abstract all of this, there's something called a Flash Translation Layer or FTL. And this maps the logical addresses the operating system understands to the physical locations in the various NAND chips. It does the garbage collection that I just mentioned. Uh, it does wear leveling. And if you actually zoom out a bit, it's much closer to a write logging database than maybe you'd think.

And if you go and acquire a number of SSDs, every one of them has their own FTL and they all behave a little bit differently. They usually do a pretty good job, but in our experience over years of operating a number of them, they can also be unpredictable at times. And you'll have garbage collection kick in at just the wrong moment or that will suddenly cause requests to stall.

So how can we fix this? Well, with our Nitro cards, we've integrated an FTL into them as well. And so we can provide the same FTL across a number of different devices. This provides up to 60% lower latencies, improved reliability because we can update the firmware on those devices. And also again, encryption - we can encrypt all that data with an ephemeral key.

So here's how our servers look after we've moved all that I/O that I've mentioned away from the hypervisor onto dedicated hardware. You can see this Dom0 is now significantly offloaded and many of the functions are now handled by the Nitro card.

So what else can we do? So after offloading that I/O we can go back and revisit - well, what does the hypervisor really need to do? And this led us to develop a Nitro hypervisor, truly a lightweight hypervisor with all that functionality removed. It just does memory and CPU allocation and then it gets out of the way. By that I mean it's quiescent - if you're not asking it to do anything, it's not there. It's not consuming cycles. It's got a small size.

And with that, we can also remove things like the network stack. There's no network stack in the Nitro hypervisor. There's no systemd, there's no SSH, there's no way to even connect to that hypervisor.

Putting all that together, we end up getting bare metal like performance. We've had customers run a bare metal system and compare it to a virtual system and say, you know, there's almost no difference in performance there.

Additionally, we've added the Nitro security chip to our systems. So this is a component on our motherboard that enables us to measure and confirm the contents of flash devices like the devices that power the components that measure fan speeds or temperatures. By measuring the contents of the flash, we can confirm that it has the contents that we expect and we can also update them out of band to make sure that we're running the latest firmware on all these components.

So with that, we've migrated the management functionality as well and we can remove Dom0 and replace the Xen hypervisor with our own lightweight hypervisor, the Nitro based hypervisor.

And if you look at the number of instances we've been able to launch, you can kind of see how much this has accelerated the pace of innovation with this modular design and being able to take Nitro cards and put them in different systems.

From 2006 to 2017, that's 11 years - we had 70 different instance types. Going from 2017 to 2023, in those 6 years, we've gotten to about 750. So over 600 instance types in those 6 years. And these types provide you options with high networking, with storage, with different processor architectures all so you can optimize your costs for your specific use case.

So let's talk about the security aspects of the Nitro system. Keeping customers' workloads secure and confidential is our top priority. And security is a shared responsibility - AWS is responsible for the security of the cloud. We operate, manage and control the components from the host, the host software, the virtualization layer down to the physical security facilities that those systems are in. And you, as customers, are responsible for security in the cloud - what I mean by that is for something like EC2, keeping your OS up to date, configuring things like security groups appropriately, and just general management of the guest OS.

In the past couple of years, there's been a lot of interest in the world of confidential computing and also some confusion. And usually the first question I ask people is, well, what are you protecting data from? And usually the answer is they want to protect the code and data from other people, other than them - be that other customers or a cloud service provider. And this is called "Dimension 1".

By the way, when we architected Nitro, we built mechanisms to protect customer code and data from outside access. It's really fundamental to the design decisions we made.

When we built Nitro first, there's a physical separation from where your code's running to where our encapsulation and infrastructure is running - on the Nitro card versus the host.

When our systems boot up, I mentioned the Nitro security chip that measures the firmware, confirms its contents. Our Nitro controllers boot, measure themselves and use that to unlock their storage and allow the system to boot with the Nitro systems.

We're able to live update the components in there without downtime. The software is developed by a globally distributed set of teams with multi-party code reviews. It's cryptographically signed by an automated build pipeline that engineers don't have access to and then deployed by our deployment service which defines the deployment velocity policy and where changes can be deployed. And then rolls back if there's any anomalies.

But most importantly, there's no remote access to these systems. There's no SSH server or anything like an SSH server. There's only an API that's encrypted, authenticated, authorized and logged - and none of these APIs provide access to customer content.

The other dimension people talk about is dividing customer workloads into a more and a less trusted components. So this separation can limit your operator access but also compartmentalize bugs in your application to not expose secrets.

A common case here can be to put certificates into an enclave and then only allow the parent process to request signing with those certificates. So you have all the protections of a Nitro based instance, but enclaves give you a couple more things:

An enclave is an instance where the parent process has donated some memory and some CPUs to create a new instance. But that instance doesn't have a couple of things - it doesn't have networking, it doesn't have any persistent storage. It just has a small "straw" that goes back to the parent. And this makes it a separate, highly constrained VM - the parent can't SSH into it, the parent can't control or inspect the processes or applications running in it.

And so in that enclave, you can have secrets and do things like request signing with those secrets or process data where the bulk of the application can't access it.

We've also been adding some other features, security features to Nitro. One is Secure Boot - so AMIs today are immutable. When you have an AMI and you create one, it can't be changed. But customers have been using Secure Boot in a number of areas and asked us if we could support that.

And with Secure Boot, the firmware measures or chain measures the bootloader, the operating system, and then compares those measurements to UEFI variables. And what this ultimately provides is more defense in depth to secure software from threats that persist to reboots. So if for some reason, the software on the root volume has changed, this can be detected with UEFI attributes.

Another thing we've added is a TPM device. So a TPM is a Trusted Platform Module - the Nitro TPM exposes a standard TPM 2 conforming device that is part of the boot process, can extend measurements, tie the platform identity into it and use those measurements to generate cryptographic data that can then be used for other applications.

People have applications that rely on BitLocker or others and this is a way to use those.

Nitro has delivered a unique architecture that restricts operator access to customer data. And we've written down a lot of the aspects of this in this white paper called "The Security Design of the Nitro System". It's available on the AWS website.

And more recently, the NCC Group has performed an independent architecture review of the Nitro System and found that there's no mechanism for a cloud service provider employee to log in to or access customer content stored in instance storage and encrypted EBS volumes. So a really powerful statement.

So let's put some of this together. We've talked about various aspects - what we talked about networking, we talked about storage, we talked about security. How does it all work?

So to use an example, I'm going to start with a Graviton 3 server. Graviton 3 is a little bit odd in that it has three sockets. Typically a server has one or two, but Graviton 3 has three. And the reason for that is really that for a certain power envelope of a rack, we could just stick more processors in there. So we did.

The end-to-end ownership of both Graviton and the Nitro system allowed us to develop a Nitro card that could talk not to one socket or two sockets, but to three. And all these three sockets have independent lifecycles that are managed by the Nitro card - they could be bare metal instances, they could be virtual instances.

So I'm going to use this picture for the next few slides. And on the left, we have the EC2 control plane. If you start an instance and call the RunInstances API, you're gonna talk to the EC2 control plane. There are a number of services in there to do things like authenticate your request, identify a pool of capacity that meets the requirements for what you've asked for - the Availability Zone, the instance size, the instance type - and ultimately identify a specific host that can run your VM.

On the right here, I'm showing a Nitro controller and some Nitro cards. And in this example, there's three VMs running - each with 64 cores, 128 gigabytes of RAM, 30 gigabytes of networking and 20 gigabytes of EBS.

So let's walk through how an instance gets launched:

Your API call comes into the EC2 control plane and we find a specific host to target for that. And the control plane makes a request to the Nitro controller - "Please allocate some resources." In this case, 64 cores, 128 gigabytes of RAM.

The Nitro controller talks to the hypervisor and requests that those resources be allocated and reserved. When that's set up, the control plane then asks the Nitro cards to please attach the fundamental devices - your ENI device, your boot volume, any other volumes or network adapters that you've asked for as part of the instance. Those get created and connected to the Nitro cards. This all happens over PCIe. So there's no hypervisor interaction between accessing those devices and the actual communication with them.

And then a last API call - this is "Start the instance". The instance starts, it looks at its EBS volume, it finds the code to boot and it starts booting. Your instance starts running, it boots the OS, it talks on the network, and it's available.

Now because we moved all that functionality off of the hypervisor, if you start a bare metal instance, the same thing happens except there's no hypervisor - you just get the whole host. And instead of telling the hypervisor to start the instance, we just boot up the main CPU.

A lot of people think of the CPU as the nexus of a system, but in our servers, we've kind of turned that around - the Nitro card has become the nexus of the system, it controls when the host boots.

Because of all the functionality for EBS, VPC, EBS are handled outside of hypervisor, you get the same experience if you're running on a bare metal system or on a virtual machine - you get the same devices, you get the same configurability like security groups. And you know, we can take that even further:

For a couple years now we've had Apple Mac based instances. And here too, if you open up one of these servers, you'd find there's actually a Mac mini sitting in a server and there's a Nitro card with Mac support. Thunderbolt is really another way to transport PCIe. So there's a Mac, a Thunderbolt, a PCIe bridge, a Nitro card all on a server and a little solenoid to push the power button on the host.

But this means that you, as a developer, can get the same access to Apple machines in the same environment that you expect with EC2 - you can have security groups, you can have the elasticity, configurability and monitoring that you've come to expect.

So, I've actually flown through this. As I wrap up, I want to share with you today that we're announcing a new Compute knowledge digital badge and learning path on AWS Skill Builder. This is a way to gain and demonstrate compute knowledge and skills.

And with that, I thank you for spending time with me today. I'll be hanging around here for a little while and I'm happy to take any questions offline. Thank you.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值