Protect sensitive data in use with AWS confidential computing

Arvind: Welcome to "Protect Sensitive Data and Use with AWS Confidential Computing." My name is Arvind and I'm the specialist responsible for the Confidential Computing business in AWS. I will be jointly presenting this session with JD Bean, Principal Security Architect, and Alex Roiz, Software Development Engineer from Stripe.

Before I begin, I would first like to start by thanking every one of you for choosing to spend your time here with us today. You have a lot of choices in an event like this, but you chose us. I'm both humbled and grateful.

I would also want to let you know, I'm one more thing today. I'm super excited because for the first time, we have one of our customers with us today to share their journey in building applications, leveraging Nitro Enclaves. So you're not just going to hear from JD and me, but you're going to hear from Stripe about their experience using Nitro Enclaves and confidential computing capabilities from AWS.

No matter where you are in your cloud journey, if you are in the business of building applications that process sensitive data and you're looking for mechanisms to protect this data, you are in the right session. If you're not in any of this business, but you still chose to be here, I applaud your curiosity. We'll make it count. We'll make sure you learn something new today.

Let me begin by first setting the agenda for what we're going to discuss today. I'll start by introducing our perspective on confidential computing and talk a little bit about the different security and privacy dimensions of confidential computing. Then we'll go ahead and introduce the Nitro System and talk to you about our no operator access assurance.

We'll then switch gears and talk a little bit about Nitro Enclaves itself and discuss a little bit about key features and benefits. Then you're going to hear from Stripe about their journey, their experience in building applications, leveraging Nitro Enclaves.

I'll then share a little bit more about other use cases, leveraging Nitro Enclaves and then close the session out with some resources for you to learn more.

So what is confidential computing? Every year I do this session, I tried to, I tried to ask for a show of hands. So I'm going to do that again. I want to see the numbers growing. The cult is getting bigger. How many of you know about confidential computing? As I thought, two years ago when I did this, there were maybe two people in the room who raised their hand. So we're definitely doing something right. We're, we're seeing over half the audience know about confidential computing. That's a win in my opinion.

Let me share our perspective on it at AWS. We define confidential computing as the use of specialized hardware and associated firmware to protect data in use from any unauthorized access. The key to focus here is protecting data in use.

If you take a step back and think about what you do with data, we do three main things with it, we store it, we move it, we process it, that's it. Everything we do with data can be bucketed under this. The mechanisms to protect data while the data is at rest and in transit have existed for a while. What we are doing with confidential computing is extending that protection and providing capabilities where you can protect this data when it's in use. So now you have end to end protection, protecting data at rest, in transit and in use. That's really the focus of confidential computing. That's why we are focusing on discussing about protecting data in use today.

Upon speaking to customers and partners, we have identified two distinct security and privacy dimensions that customers want to protect their code and data from.

Dimension one is where customers want to ensure that their data remains protected from any operator from the cloud provider. In this case, that would be AWS.

Dimension two is where the customers want to protect their data even from themselves, even from admin level users or malicious actors who could pose as admin level users and gain access to data.

Now, all of this was a little bit of a mouthful with the definition, with the dimensions and everything. If you forget about everything I talked about today and just want to take away something:

Confidential computing = protecting data in use Two distinct security and privacy dimensions:

  • You protected from dimension one: from us
  • Dimension two: from you

That's it. That's the crux of the session. If you are clear about this, the rest of it is going to be easy to follow today.

So what are these data types that customers are concerned about protecting? Right, not all data is created equal - there's innocuous data and there's sensitive data. When we are talking about confidential computing, most often we are only talking about sensitive data, not everything needs the same kind of protection.

Here's a few examples of data types that are considered to be sensitive:

  • Personally identifiable information
  • Healthcare information
  • Financial information
  • Digital assets
  • Intellectual property like machine learning models

These are all what we see as being used most often when customers are looking at confidential computing capabilities, but the data types are not limited to this. I just put a few examples here for you.

None of this data is new to the cloud. That's what you have to remember. None of this is new to the cloud. This has been in the cloud for a while and customers have been processing and using this in the cloud for a long time.

But what's happening right now is there's more awareness around wanting to protect this data while it's being processed, be it due to regulatory and compliance requirements or be it because they want to up their security game, whatever the case may be. And the workloads are also evolving, the requirements are changing. That's why we are discussing more and more about confidential computing these days. That's why you're hearing more about it.

Ok. Now, you know what's confidential computing, you know what the data types are, you know what they want to protect. But how are we doing it from AWS?

We have two distinct solutions:

  1. Our AWS Nitro System. The Nitro system is the foundation of all of the virtualization that powers modern EC2 instances. When I say modern, that's all the EC2 instances we launched since early 2018. And with the Nitro system, we address dimension one by default. So if you're running a Nitro based EC2 instance today, your content is already protected from AWS operators. That's dimension one, no operator access.

  2. AWS Nitro Enclaves. With Nitro Enclaves, we provide you with the capability to spin up an isolated and hardened compute environment which provides further isolation on top of what Nitro Systems already provided. So you can isolate even yourselves from the data. And that's dimension two.

I'm now going to invite JD Bean on stage to talk to you about the Nitro System in depth. JD...

JD: As Arvind mentioned, I am JD Bean, I am Principal Security and Sovereignty Architect under our AWS Compute Services organization.

So I want to underscore something that Arvind said just to make sure that we all walk away with this as a baseline. If you are here to learn what you as a customer need to do to protect your sensitive data and code within an EC2 instance from access by an AWS operator, the answer is: use and select a modern EC2 instance for your workload.

That is a number of instances we released in 2017 and every single EC2 instance type that we have released from the beginning of 2018 onward. That's it.

Now, if that's what you're here for, you could take a break, check your email, spend a few minutes. If you're watching at home, feel free to skip forward a few minutes.

I hope that you don't, because what I want to talk about a little bit is how we've designed the system to enable that outcome.

In our shared responsibility model, this is about security and privacy of the cloud. So to understand the Nitro System, it's very useful to understand a little bit about the journey that got us to the Nitro System.

So pictured here is not the Nitro System. Pictured here is a high level logical representation of a, I'll call it a sort of a classical virtualized virtualization system, specifically, as you'll note, you have the word Zen there, this portrays the Zen hypervisor and uses terminology that is familiar in the Zen system.

Before we had the Nitro System, we were actually using Zen. And this particular instance, where the last instance we released that looked like this, is actually way back in 2012.

So we look at the system and you know, at the time when this was released in 2012, this was a state of the art design and still today, the vast overwhelming majority of production virtualization systems in the world, still more or less look like that. But we looked at it and we said this is a, you know, perfectly fine system, it meets our bar, but we want to take it further and we saw some room for improvement.

One of the things we saw was that the CPU cores that ran on our virtualization system, we had to allocate quite a bit of resources, up to 30% for doing kind of boring housekeeping tasks, things like converting customer EC2 instance network traffic and emulating a virtual device. And then, you know, encapsulating that in our software defined network and shuffling packets around. And we have these really expensive general purpose CPUs that we wanted to be able to allocate fully to our customers.

So we started out and we had an idea and that was to take our VPC networking component, right. This is, I'll actually take a quick step back here, right. What's going on in this picture?

So we have our Zen hypervisor that's like the core of the virtualization system. What this does is more or less manages a little bit of shadow state, allocates out resources for virtual machines and sort of just handle some of the privileged instructions, the things that like you don't want a virtual machine to be able to do because it would undermine the security and integrity of the overall virtualization system.

The little orange boxes are the part that most of our customers care the most about - those are their EC2 instances. And then there's a bunch of other stuff. And all of it tracks back to this Dom0.

Dom0 exists to allow our virtualization system, that's the hypervisor, to actually be useful for customers - to provide things like I/O for networking or network attached storage like EBS or even locally attached storage. It also provides all the management, the governance, the security, the monitoring and the orchestration that allows our customers to put in an API request that says "EC2 run instance" and sort of get their very own virtual machine in the cloud and all of that stuff runs inside Dom0.

It's a special purpose virtual machine that's a copy of a general purpose operating system. For AWS, we use our own Amazon Linux.

So taking a look at the system, we, we saw that VPC networking capability and we thought, saw that it was using a lot of resources...

And we thought to ourselves, we think we can offer better performance in terms of networking for customers and uh you know, better compute performance and security uh for what's left on the box. So we took that component out and we removed it from the system main board and we ran it in a new custom silicon device uh attached to the instance uh what we called a nitro card.

So we moved VPC networking off to its own system on a chip, its own computer, basically with its own CPU its own memory that was custom built and designed to operate in our, in our software defined infrastructure and it worked really well. Um customers really loved it, it achieved the ends that we, we, we wanted and we looked at at the system and we thought maybe we can take it further.

The next natural choice was to look at our EBS storage. So we moved that off into a new custom built card that actually worked kind of similarly. On the back end, it was uh communicating over our internal networks to EBS storage servers. EBS is a, is a network provided storage layer. Uh and but on the local side, it was presenting an interface of a local Nbme drive to customer instances. So i was doing a lot of the same networking stuff on the backside. Um just preventing preventing a different interface and doing slightly different work presenting that into the to the EC2 instance. And that worked really well.

So we kept going and we pulled off local storage as well. Um these are locally connected drives and we created new nitro cards and connected them up and we're able to remove even more um housekeeping tasks from the main CPU and then we decided to really go for it. Uh the last thing that was running in DomU now was the management, the security, the monitoring, all that orchestration software that sort of makes a virtual machine system useful at scale for our customers.

So we pulled that off as well. And once that was now into its own card, a special card, we call it the nitro controller card. Um she's sort of the queen. Uh DomU started to sort of look a bit like a vestigial organ? It, we didn't, it didn't really have a purpose anymore. So we completed our journey of evolution and came up with something radically different. We removed DomU and at the same time, we also removed the Zen hypervisor.

We replaced it with our own custom uh in house built nitro hypervisor built on KVM, the linux kernel virtualization module. And we now have this incredibly thin, i would describe it as firmware like hypervisor instead of having to run the Zen hypervisor along with a full copy of an operating system, we were able to remove any general purpose. Uh amazon managed general purpose computing environment from the system main board where we were running our customer workloads.

So the nitro hypervisor uh is super tiny. It lacks all sorts of things that you would expect from a general computing environment. It doesn't have a shell, it doesn't have a networking stack or a general purpose file system. Um it's very much treated and managed as an injectable firmware module. And this meant from the performance standpoint that we were able to dedicate the full resources of one of the underlying host servers for C2 directly to our customers. This provided huge improvements in things like jitter uh and, and performance and cost value proposition.

Now I could go on and on and on about all the incredible things the nitro system has allowed us to do. But since this is a confidential computing talk, I'm going to focus in on that very specific benefit that it provides, but do know that there's a ton of other incredible benefits that the nitro system has brought with it both from a security and privacy standpoint and from a performance standpoint.

So a quick overview, right, the nitro system, we call it a system, it is built of three main types of components. The first are the nitro cards. Um these are the cards that i described earlier that these are the custom built silicon actually built and designed by Annapurna Labs. The same group that's responsible for Graviton chips. This is actually where our relationship with them began.

We also have nitro cards that are responsible for the components of the local storage, the VPC networking and the EBS storage components integrated onto the system main board. We have a special component called the nitro security chip. This has some interesting uh and really important functionality for our system. It allows the nitro controller card to bridge its root of trust onto the system main board.

So what happens is when the system starts up, the nitro uh security chip actually holds the system in reset and allows the nitro controller card to come in and to validate every single piece of non volatile storage on the system main board to ensure that only known good trusted code is running on that box in operation. It also provides a defense in depth blocking any sorts of rights to non volatile storage from the main CPU.

Lastly is the nitro hypervisor. Now, technically, this one deserves an asterisk as we do offer bare metal servers. Thank, thank you to the uh thanks to the nitro system that doesn't involve any nitro hypervisor. But for our virtualized systems, which are the vast majority of our two instances, we have this incredible lightweight hypervisor. It provides basic memory and CPU allocation, sets up connections between the cards and the the instances the virtual machines directly and then steps back, uh it effectively is not there.

And so we have some inc uh really incredible metrics. You can go and look and compare the performance of a bare metal instance with its virtualized equivalent and see the incredibly slim performance difference now is that a particular security benefit? No. Um but it's indicative of just how small and narrow the software that makes up the nitro hypervisor really is.

So on the right here, uh you actually get to see a few of what these nitro cards look like. They have evolved over the years. We're now in our fifth generation of nitro chip. And this is the really important feature that i wanted to to underscore because it's so critical to confidential computing and why the nitro system is the fundamental basis of confidential computing throughout AWS.

In designing the system from the ground up to run at scale within our environment, we were able to make a really important choice on behalf of our customers, which is to provide no mechanism whatsoever for an AWS operator to log into the nitro hypervisor or any of the nitro cards.

Now, I want to be clear here even going back in the Zen days, right, we had all sorts of controls and protections and monitoring, et cetera to protect against the potential, you know, any potential operator activity. However, with the nitro system, we took that a step further building in a technical restriction. In fact, the absence of a mechanism altogether to log in the nitro hypervisor itself has no SSH server. As i mentioned, it has no connection to our network. Even the only thing that the nitro hypervisor can communicate with is with the nitro cards themselves.

And similarly, the nitro cards also have no SSH server. Instead, the only communication that they have with the outside world is through inbound API operations that are authenticated, author logged and audited. And what is incredibly, incredibly important is that none of those APIs have the ability to access customer data in two memory encrypted EBS volumes or encrypted network traffic.

Now, over the last few years, we've made some, some big strides in providing greater transparency and assurance to our customers around this design and its attributes. First is the security design of the AWS nitro system, which is a detailed white paper going into a number of the topics i discussed today in greater depth as well as covering a lot of the other security benefits that we didn't touch on today. If you're interested, i encourage you to give a look if you'd rather not read that paper but would rather see sort of a third party's view on a deep dive into the nitro systems.

We brought in the NCC group, a well respected security group to come in and assess the AWS nitro systems architecture to look at our artifacts, to interview our most senior engineers, uh and ultimately, uh spoiler alert, they found that there was no gaps in the nitro system that would compromise any of the security claims that we make about the absence of AWS operator access.

Finally, we also added these commitments into our service terms themselves. This isn't something that we did just for our biggest or most security conscious customers, but it's something that we did uh and benefits each and every customer that signs up for an AWS account, whether they're a solo developer or uh or, or an intelligence agency.

Um so with that said, uh you know, i, i hope that you took away from this just how seriously we've taken the investment and development of the nitro system in providing confidential confidential computing by default for our customers.

With that, i'll invite Arvin uh back up to speak about enclaves and the second dimension of confidential computing.

Thank you JD. So you've heard us talk about confidential computing. What our perspective on it is the two different dimensions, security and privacy dimensions from which you want to protect data. Now, you know all about the nitro system. So now we can talk about enclaves.

What is AWS Nitro Enclaves? To understand enclaves, we'll have to first take a look at what a normal EC2 host looks like. And if you're getting the drift of this, we always like to compare with something so we can make our point. We saw that with the natural system too, right.

Alright. So on, on the screen here, we have this big white box, which is what your standard instance looks like today. And, but in the instance, you have a bunch of different things. You have the OS you have the application that's going to process your data. Maybe you have some third party libraries, you leverage to build your applications and different users who could have different levels of access to the instance.

Now when it's game time and you have to process data, you're going to bring encrypted data into the instance, but to process it, you will have to decrypt it. You can't, you can't process encrypted data at scale. Yet once you decrypted, you're revealing plain text to all of the entities that are in the instance today. And that's really the model that you have to start thinking about.

Do you want any of those entities that have access to the instance to gain access to that data? And if you don't want that, if you want to replicate this environment that we created for you to create one for yourself, that's really when you start thinking about Nitro Enclaves.

With Nitro Enclaves, you have the capability to carve out CPU and memory resources from your own EC2 instance, if you're running an instance today, you can carve out CPU and memory resources from it to create a Nitro Enclave. And by creating that you're creating a hardened and isolated compute environment within which you can then proceed to decrypt. Your data, reveal plain text to the enclave to the application residing in the enclave to process that data.

With Nitro Enclaves, you're getting additional isolation on top of what you already got with the Nitro System. So that's what Nitro Enclaves is about. So really like when you start making your choices about what to use, when the question to ask yourself is what are you protecting and who are you protecting it from? The answer to that will guide the solution that you're gonna use here.

Enclaves can also provide proof of identity. So this is where attestation comes into the picture. So the enclave brings with it an attestation document signed by the Nitro Hypervisor that you heard JD talk about the hypervisor has measurements. It has hashes of the enclave image that you created the application that's running in the enclave, the specific application that's going to process your sensitive data, the parent instance ID and then some IAM information the role of the parent instance and you can also do user defined info like nonce if you're concerned about replay attacks and you want to protect against that and you could define that as well.

And with the attestation document, the enclave is now able to prove its identity to another entity. It wants to establish trust with before you start exchanging data, before you start providing secrets to the enclave. And we ourselves leverage this to provide first class integration with AWS KMS.

Most often we find customers using AWS KMS when they use Nitro Enclaves and to establish trust with KMS. The enclave provides the attestation document. When you built the enclave, you would have received these measurements from the enclave image file that you created.

Now you could set up your key policy with KMS and use the key policy to check against the attestation document you receive from the enclave. If the, if the attestation document, the measurements in the attestation document match those of the key policy, then KMS now knows that it can share secrets with this enclave. It's not any enclave, but it's a specific enclave running that specific application that is set to process the sensitive data that you're expecting it to to process.

So we provide first class integration with KMS but you're not by any means tied to AWS KMS. If you want to use your own key management service, you're free to. The only difference is you will have to write that plumbing. We just talked about, you know, matching the key policy and all of that. But the attestation document is available for you regardless of whether it's KMS or not.

So having heard about, you know, the Nitro System Confidential Computing Nitro Enclaves, how to use it. Where do you use it? Let's talk a little bit about the features and benefits of Nitro Enclaves.

First thing is additional isolation and security. So when I talk to you about this hardened and isolated compute environment that you're creating, there are some guidelines that come with it. There are some things that make it a hardened and isolated computer environment.

Number one, the enclave does not have any external network connectivity. So it's it it cannot talk to the outside world. It has to go through the parent instance to talk to the outside world.

Number two, there is no persistent storage and number three, there is no root user access, there is just no interactive access. You cannot assess such into an enclave. In fact, the only communication channel that exists between the enclave and everything else is a secure local channel through which the enclave talks to the parent instance from which it was spun up from.

So the thing to remember here is you've created a enclave based environment to protect sensitive data and process it inside the enclave, which means when you're bringing data from the external world to the enclave, it's incumbent on you to make sure that you're not revealing plain text to the parent instance, the parent instance only acts as a conduit to the enclave.

So wherever you store this encrypted data, you bring it still encrypted through the parent instance through that secure local channel into the enclave and then proceed to decrypt it. So the enclave is providing additional isolation by making sure there is no external connectivity and storage and all of that. But as you build your applications, as you bring data, as you take care of your data flow, you have to keep in mind that you are not revealing plain text to the parent instance.

Second big feature benefit if you will is flexibility. So I talked about the ability for you to create enclaves by allocating CPU and memory resources. And you can scale this as you please. This depends on the size of the application that you're going to drop inside the enclave, the data payload that you're planning to bring inside the enclave to be processed.

So depending on that, you can choose how much CPU and how many, how much memory you want to allocate to spin up this enclave and run your application in there. So there's a lot of flexibility there, there's a lot of instant sizes and types that we have made available for you for you to for you to make it enclave enabled.

And to me the most exciting part of this flexibility piece is that it's processor agnostic. So it does not matter if you're using x 86 Intel or AMD or ARM based CPUs in your fleet, you can spin up enclaves from any of this. We have support for all of these different CPU types as well.

And then lastly cryptographic attestation. So we touched a little bit on this. I talked to you about the attestation document that the Nitro hypervisor signs and sends over. So that helps the enclave prove its identity and authorize the code that's running inside the enclave. And we have leveraged that ourselves by providing first class integration with KMS as well.

So we've made it a lot more easier to use with the natural integration with KMS and taken the heavy lifting off you if you have to do this attestation for yourself.

So now you know about enclaves now, you know features and benefits. So where is it really used? Like what kind of applications does you know, take advantage of enclaves?

Here's the high level look of it before I go into actual use cases and workloads, right? Cryptographic operations. If you want to decrypt data, we've been talking about it from when I started the session, right, signature validation tokenization, masking, influencing with machine learning models. What not these are some high level constructs that you can think about when you're thinking about. Ok, what are the use cases? Where do I apply this? Where does this really matter?

And here's just a smattering of different customers from various different segments who are, who are using enclaves right now and I see some in the audience as well. So thank you, thank you for using and thank you for being here.

That said I now want to invite Alex from Stripe to talk to you about their journey about their experience using Nitro Enclaves in building applications that process sensitive data, Alex.

Alex: Thank you, Ian. So my name is Alex. I'm a staff software engineer with Stripe and raise your hands if you've heard of Stripe before. Well, every, almost everyone's hand went up. Uh well, I was gonna say if you hadn't heard of us, if you've ever used a card to make a payment online or in person, there's a good chance that Stripe has facilitated a payment between you and a merchant.

Stripe is a company that offers payment processing services and APIs for applications ranging from point of sale to mobile devices. Millions of companies of all sizes use Stripe to accept payments, send payouts and automate other financial processes.

To do this to process billions of dollars of payments and move money for our customers, Stripe has to handle a lot of sensitive information. This includes information we get from customers like credit card numbers, bank account numbers, transaction details and PII such as email addresses and phone numbers.

But then we also have to actually move that money through the financial world and to do this, we need to communicate with other entities like banks and credit card networks. This means we must also protect the information we use to identify ourselves to these networks and also protect data in transit.

Lastly, we must protect the keys that we use to secure all of our data internally and also both at rest and in transit. So I want to talk about how, what we're looking for when we're looking for a solution for protecting our most sensitive keys.

We're looking for something that allows us to fundamentally restrict the access to this key material. In most cases, this means restricting it so that there's no way a human can handle the key directly. We want only our trusted services to have access to these keys or even knowledge of what these keys are.

We also want our ideal solution to produce an audit trail of when any of that key material is used and what it is used for. Lastly, we want to ensure any of that software that interacts directly with those keys that may use it and generate that audit trail has been reviewed by at least two individuals and gone through our standard release process.

So Stripe has historically used a variety of different tools and techniques for securing these keys. We've done everything from OS level hardening to at times actually using dedicated hardware security modules and the precise tool we use depends on both the sensitivity of the data and sometimes the compliance standards that we're subject to.

AWS Nitro Enclaves provided us a new opportunity or a new way to secure these cryptographic keys. And let's take a quick look at why this solution was a good fit for us.

As I mentioned, there's no external communication by default, this means we're able to control the channels through which the software running in the enclave communicates and reduce the risk of exfiltration through channels. We did not intend this means as we mentioned, there's no local storage. So you don't have to worry about it, getting written to disk showing up in a core memory dump and and things like that, there's also no interactive access by default.

So this means that even though even even in those rare cases that we want to allow debug access for engineers in production, we don't have to worry about an engineer accidentally being exposed to one of these keys.

Enclaves are also easy to provision an upgrade. I mentioned hardware security modules provide some of the best in class security that are some of the safest places you can store keys, but they're very cumbersome to deploy, they're cumbersome to upgrade and they don't play well with the tools that we use to manage other more traditional linux servers enclaves.

On the other hand, were really easy to provision with our within our existing AWS environment and played well with a lot of the existing tools we are already using HS Ms are also incredibly expensive and Nitro Enclaves were great because it gave us a way to secure data by just securing the compute resources we are already using rather than having to buy a box that would have to live somewhere else.

However, like HSMs, we wanted to have really tight control over the, the firmware or the, the software that runs in that secure context and enclaves gave us a great way to make sure that we're only releasing keys to these enclaves when we're absolutely positive that we're running the correct software on them.

And lastly because we are running in AWS, we wanted a solution that allowed us to use the, the primitives in AWS. We are already used to such as for things like authorization, authentication and key management. So with enclaves, we get that.

So let me walk you through the solution we built with Niter enclaves. When a typical service instance build spins up, we set up a parent service which is kind of our typical application service which will respond to requests from the outside, interact with outside storage and so on.

We also spin up a Nitro enclave which is going to be the component that handles the key material directly. There's also a couple of support services that show up related to processing network requests and logging. And I'll speak to those momentarily.

The reason in this context for the separate enclave service is that we want to make sure all the keys are handled in that environment and that's separate from the parent service. Our goal is to handle them in a process that is isolated from outside communication, can't accidentally write to local storage and we have tight controls on what, what exact software is there.

The interesting part is how do we get these keys into the enclave when we're starting up? So what happens. One of one of these two instances, bootstraps, the enclave service will request of the parents service a KMS encrypted copy of all the keys it needs to run to do whatever task it has been allocated.

The enclave takes these encrypted keys. It also gets from the Nitros security module, a copy of its attestation certificate. And it also requests the Nitros security module include in that attestation document, a copy of a ephemeral public key associated with that enclave.

The enclave then through the proxy service makes a request to KMS. KMS will then validate the attestation document ensure that the request was made from an enclave matching. One of these one of the allowed listed measurements of software and it will then re encrypt the keys to the public key presented in that attestation document, those are returned back to the enclave, the enclave decrypt it with its ephemeral private key and now it has the keys and memory, it needs to operate.

After we do this, we mark all the services as healthy and we're ready to serve requests for availability and latency reasons. We only do that key loading at start up once an enclave in our case is bootstrap, we run with all the keys we need in memory and this is to avoid round trips to KMS. each time we have to handle a request, a couple of other things on there.

We have the uh logging, we have the proxy service also will proxy uh request to our uh observability platform. And we also have a logging service which facilitates right. writes to an alog on disc from the nitro enclave. We'll talk more about those in a little bit.

So what do we have to build ourselves to make all of this work? So first, we had to build some client side tooling to run within the enclave itself. Stripe uses a few different languages. But our initial use cases were targeting services that we had written in go.

We wanted to avoid using the AWS provided Rust SDK through C bindings. As C/C++ had been a source of frustration in the past for us, we ended up building our own tooling for allowing the enclave to communicate with the Nitro security module and make calls to KMS within go.

We also in our case, built our own TCP and UDP proxy. While AWS did provide us a TCP proxy, we need UDP support for some of the observability solutions we were using.

"And although we could have used a network proxy to ship logs directly from the enclave to our log store for performance reasons, we wanted to aggregate the logs on the parent instance. So we built another logging proxy that would allow us to output emer messages to a file on the parent instance before we ship them off to our log store.

The last thing we built, which was sort of interesting was is that we really wanted to maintain the invariant within Stripe that all communication between services happens over TLS. We saw the opportunity to establish trust between the parent service and the enclave by building off the attestation primitives once again. And what we did is additionally, on the start up, the enclave will generate a self signed TLS certificate which it sends as part of an attestation document to the parent, just like KMS. The parent validates the attestation document matches the measurements it expected and then will then use that certificate for future communications with the enclave. In this way, we know that the the parent service can be confident when it reaches over that TLS that it is communicating with their correct enclave. It also avoids us having to do the attestation dance for each individual request.

One little interesting performance hiccup we hit while deploying this. And originally we, we built, we did some early proof of concept testing before deploying more production ready service into a Nitro Enclave. But we deployed that kind of more finalized version. We noticed there was a regression in performance, especially in terms of latency and throughput. We were seeing from the enclave after a bunch of kind of ad hoc debugging and trial and error. We were curious when we noticed that when we disabled logging, the performance issues went away. This kind of confused us because we knew we were definitely not saturating the overall network bandwidth between the two, the parent service and the and the enclave. And after doing our own research and having some conversations with engineers at AWS, we came to understand that what we're running into was a limitation with the Linux virtual sock implementation. And that was that it did not efficiently handle kind of multiple concurrent network streams.

And the reason for this was is that all traffic regardless of the ports involved is multiplexed over the same underlying queue in a vsock. And as we describe, we have different proxies running for different things. We're sending logging information, we're doing the actual RPCs and we're also sending observability and our log statements were sitting in that queue, the same queue as our API responses, which was kind of gumming up the works. In our case, it just meant that we had to be much more frugal when it came to deciding what we're going to log. We couldn't, we couldn't be the prolific loggers that we were in our other production services, but we made it work and we deploy this all to production earlier this year. And since doing that, we really haven't thought about enclaves at all, which is exactly what we wanted.

A few things we did notice though one was performance and that is we experienced no overall performance overhead from adding enclaves to our solution. And we actually saw more consistent latency in our applications because we eliminated a network hop where previously we're sometimes going to a different host to do some of the cryptographic operations. We greatly simplified our deployment infrastructure and we were able to remove a lot of the complexity. We had built ourselves to try to harden host at the OS level where instead now we can just use the Nitro Enclave primitives. We reduced toil for operators by making it really easy to replace instances and do OS upgrades because the keys can just be loaded into the enclave itself because the trust is kind of built in and related to this.

We were actually able to remove a critical risk for failure by because we can now again, because the instances can come up automatically, we can auto recover from failures much faster. And in one case, we were able to reduce the time to recovery from hours to minutes because again, we can just trust these enclaves to bootstrap themselves. So, thank you. This was our journey with enclaves and I'd like to hand it back to Arvin to discuss some other customer use cases. Thank you, Alex."

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值