Defense in depth: Securely building a multi-tenant generative AI service

最新推荐文章于 2024-08-09 14:16:22 发布

李白的朋友王维

最新推荐文章于 2024-08-09 14:16:22 发布

阅读量103

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/135136037

版权

My name is Eric Brandwein. I'm a Distinguished Engineer with the Amazon Security team and I'm going to tell you about how we build defense in depth into one of our new generative AI offerings. It's called Code Whisperer Customizations. It gives you the ability to tune Code Whisper on your own code base such that it gives you locally appropriate code completions.

As is often the case in talks like this, I'm taking credit for the work of others. There's a large set of amazing teams that have done a bunch of incredible work and I'm just the one that gets to talk to you about it.

In addition to being a computer geek, I'm a photographer, or at least I'm an amateur photographer, but I'm still a computer geek. So I find ways to combine the two. I used to tell people that these were the hot springs at Kr in Iceland, but I was recently back in Iceland and I learned that Kerr is actually the Icelandic word for hot spring. So these are the hot springs at hot spring.

One of the easiest ways to chew up CPU time with photography is by stitching panoramas. In ancient times before your phone would do this for you automatically, you'd take multiple pictures and then you'd load them up on the computer. I used a piece of software called Hugin and it would seamlessly stitch them together into a single image. This is fascinating stuff. You have to deal with misalignments in the position of the pictures, parallax errors, lens distortion, what projection you want the final panorama projected onto, and more. And today, either your phone does it for you automatically or most of the time you can do it in Photoshop with like three clicks.

Regardless, when people talk about stitching pictures together, this is usually what they mean. This is the view of Cornell's campus looking north from the bell tower. This isn't the only way to stitch pictures together. Sometimes you have a scene like this one where the camera lacks sufficient dynamic range. The ratio of the brightest pixel to the darkest pixel is called dynamic range. The camera lacks sufficient dynamic range to capture what you see. There's no setting that's right here. The sky is blown out, there's no detail in it. It can't be recovered and then the foreground, the rocks in the foreground are underexposed. If you were to crank the the brightness on them, it would just be noise.

And so what you can do is you can take multiple images with different exposure settings. So the one on the left is about three stops darker the one on the right is about three stops lighter. And so I've got both the correctly exposed sky, mid ground and foreground. And then if you merge them together, you get an image that you couldn't possibly have taken with your camera. It's one that's much closer to what your eye sees. It's a great way to chew up even more CPU cycles. And it's also something that your phone now does for you automatically, it's called HDR (High Dynamic Range).

I had trouble wrapping my brain around what HDR was doing until I realized that it's just like stitching a panorama except you stitch a panorama in x and y and you stitch HDR in depth. And so it's just stitching in a different dimension. Once I had that realization, it clicked for me.

This is an arrow slit in a castle in Conwy Wales. And of course, if you really want to bring your laptop to its knees, you do both at the same time. If I recall correctly, this panorama is nine images wide, two images tall and five images deep for a total of 90 source images. It's wider, taller and deeper than anything you could have taken with a camera. And it's actually just about burned out the the fan on my laptop. It was a spectacular render. This is Fran Valley in South Africa.

So that's great, fascinating stuff, Eric, but you came here for a security talk, not a photography lecture. How do they relate?

Well, panorama stitching in multiple dimensions. I want to talk about defense and depth in multiple dimensions. This is a picture of the Swiss cheese model of failure. It was originally proposed by James Reason in 1990. Each of your mechanisms for presenting some failure is like a slice of swiss cheese. Some number of inbound failures are going to hit the cheese. Some number of inbound failures are going to hit one of the holes and they're going to be let through. And so you stack multiple slices of cheese and they stop more and more and more potential failures, but no matter what you do, no matter how many slices of cheese you stack, there's always going to be a path through where something can happen.

And so in security, this is often called the attack chain or the kill chain. It's a good model. And usually when people say defense in depth, this is what they're talking about this. They mean having more slices of cheese, they are requiring an adversary to break something else to get past another gate before they can pivot and go on to further access. And this makes it harder for a security issue to occur and it increases the odds that the defenders detect them and stop them again. It's a good model, but it's stitching together security mechanisms in just one direction. Just like stitching a panorama.

We don't want this service to just be secure when we launch it, we want it to be secure as the team changes, as the service grows, as people come and go as the demands of our customers increase. And so we want defense in depth, a common tool when used in defending something is a threat model. This is where you lay out explicitly, the threats that you're going to protect against how those threats apply to your service and what mitigations you have in place. This talk is essentially walking through the threat model of Code Whisperer Customizations.

The first step is laying out the threats. This list isn't exhaustive, but it's enough to motivate this talk. So clearly, we're worried about external actors. These are, these are the people that are outside the company, maybe they're wandering down the hall, checking every single door, maybe they're specifically targeting us or specifically targeting you, but they're obviously one of the threats about which we're concerned.

We're also concerned about insider threat. Usually when people talk about this, they're referring to an employee with privileged access that has been subverted for an economic or ideological purpose and they're actively working on behalf of some external party. However closely related to this is credential theft where some outside actor has stolen the credentials of an employee and they're acting on their behalf. The employee isn't actually malicious.

We're also concerned with insiders with no malicious intent. Security mechanisms are often subtle, the threats against which they are not always deeply understood by everyone. And so the membership of a service team turns over in the course of the years, people join people, leave the set of people that made the decision at design time, aren't the set of people that are on the service team today. And so changes made by an authorized service team member acting in good faith, just trying to get new features out may actually wind up weakening one of our security guarantees. And so we have to protect against this as well. This is, this is one of those other dimensions of depth that we worry about. It's depth across time across developers.

And also, of course, we're worried about software correctness. This most often comes up when talking about publicly available issues in software where there are CVE, this is a Common Vulnerability and Exposure. It's just a database that maps an identifier to a software package in version and most commonly available, most publicly available security scanners run off of CVEs. But we're also worried about the issues in the software that we write ourselves. There's never going to be a CVE issued for one of the microservices deep inside of Code Whisperer, like that's our code, no one else is ever going to see it. And so we have to do all of the things to make sure that the things that we launch, the things that we build are secure, that they meet the bar.

And finally, we have to be secure in the face of hardware failures at our scale improbable events happen daily and so-called impossible events occur with regularity if something goes wrong, telling our customers, oh, that's not how it should have happened is not going to be satisfactory. You expect us to deliver, we have to deliver even in the face of hardware that doesn't behave the way it's supposed to behave.

So an invariant is a statement that's always true. A security invariant is just an invariant about security. This is one of the tools that we use whenever we build anything internally. It really helps to just write down the statements that you think are always going to be true about your service. It, it, it makes your assumptions explicit. It leads to a clearer conversation with everyone writing down the invariants is important, but it doesn't really become a security invariant until you have a test for it. And ideally that test is running in production, it's validating the thing that you're shipping to customers.

And so one of the, the most important mental shifts that you have to make when you're thinking about security and variance is that most builders are very good at positive testing. Make sure that it does the thing that i want it to do when you're thinking about security end variance, you also have to think about what are the things that i don't want to have happen and you need to make sure that the bad things don't happen, you have to perform negative testing.

So these are the top-level security invariants taken by Code Whisperer Customizations. These aren't specific enough to actually write tests again against. And it's going to turn into a much larger catalog of more specific security invariants. For example, the first one up there, you could say the container image that ingests customer code from S3 terminates after handling a single request that guarantees that that container is going to run once. If something bad happens, the container evaporates, we instantiate a new one, we start fresh. That's something we can test. Here is the remaining transcript formatted:

Typically we'll have unit and integration tests. There's going to be a whole gamut of things that this code has to run through before it can be pushed to production. But we also want to have a test live in production for the example invariant that I just gave, we could have that software emit a metric for the number of requests that it had handled and it's going to sit at zero while it's waiting for work. And then the one request comes in and the metric increments to one and it sits at one while it's working and then it should evaporate. And if it doesn't, if it doesn't terminate, if we ever see a two in that metric we get alerted.

So there's a lot we could cover and so some stuff is going to have to be given short shrift. It doesn't mean that it's not important, but it does mean that it's not this talk.

The easiest way for an adversary to get into your systems is a publicly known vulnerability. You don't get to talk about cool defense in depth until you have a robust patching program in place. AWS has a robust patching program in place. Patching is not anyone's favorite topic to talk about. Not even in the security community, it is hard and you just have to do it, patch everything.

One of the nice things about running in VPC is that network segmentation is ubiquitous. In the good old days, the rack that your server was in probably dictated which network segment it was on. And if you wanted to add a firewall between two boxes, you may have to fish cables around, talk to the networking team, talk to the firewall team. And in VPC, you can just say this is in a security group. These are the rules and you can change those rules whenever you want. And so you can shrink wrap network segmentation around the various services that make up your offering. And so you should just do this. You should absolutely run network segmentation ubiquitously in VPC. We have a bunch of tools that will help you get this done.

Another nice thing about running in AWS is the flexibility of our identity system. You can give each of the components in your service exactly...

And only the rights they need to operate again, we've got all sorts of tooling to help you write good policies, to minimize policies, to understand the impact of changes to your policies. And there have been many talks on this at this conference. All three of these are defense and depth mechanisms. We do them, but they're not the interesting things that i want to tell you about today. So i'm not going to tell you about them today.

We've got a world-class application security team and every single customer facing service or feature gets deep engagement from that team. Everyone calls this an APS review, but I hate that term because it sounds like the team brings us something and asks us to review it, asks us to put our stamp of approval on it. That's not how it works. The security team engages with the service team before they start building.

We have the working backwards process. The first thing you write is the press release and frequently asked questions. That's when the security team gets engaged and they work together to design a secure service and a test suite that we need to ensure that it remains secure. We also penetration test every single customer facing service. We send people that work for us but they are evil and they carry pointy sticks and they go and they poke our services and they make sure that they're suitable for purpose.

And we have a test framework that automatically ingests the AWS SDK. So any service with even a reasonable number of APIs is going to wind up in the AWS SDK. We're going to automatically ingest it and we're going to generate a couple 1000 test cases for that service that are run automatically by the security team. If there's more complicated logic that needs to be tested, we'll work with the service team for them to build their own bespoke tests. But the fundamental stuff is built automatically as soon as they show up in our internal version of the AWS SDK and run automatically constantly in production in every region by the security team.

In order to understand the security that we've put in place, you need to understand what a confused deputy is. This isn't a Barney Fife reference. It's a real honest to god security problem. So let's say that we've got the AWS Simple Data Getter. The valuable service provided by Simple Data Getter is that you no longer need to go all the way to S3 to get your data. You can ask the Simple Data Getter to fetch it for you. So you've configured Simple Data Getter by giving it a role in your account to assume. And this role has access to your top secret bucket.

Now, you can ask Simple Data Getter. Can you please go get my data and it can reach out to S3. Everything's working and you're happy, but this is a multi-tenant service. So I come along, I'm widely regarded as a terrible person, I can figure the Simple Data Getter and so we can access my kitty pictures bucket. And so now I can ask the Simple Data Getter to get me my kitty pictures and it reaches out to S3 and it gets me my kitty pictures and all is, well, now your data is top secret. You've locked it down. If I try and call S3 and get your data, I'm going to talk to the hand, I'm not going to be able to get your data. That is exactly the purpose of our access control model. Of course, I can't get your data but the Simple Data Getter can. So all I need to do is send a carefully formed request to the Simple Data Getter and it's going to reach out to your top-secret bucket because it has access to your top-secret bucket.

I am using the Simple Data Getter as my deputy. It is acting in a confused fashion. This is a confused deputy attack here. And so this is clearly a contrived example. We would never build this trivial service. It serves no actual purpose and hopefully like this is obvious even to nonsecurity practitioners. But in the real world confused deputy issues can be incredibly subtle and you need to think about them.

So let's actually talk about CodeWhisperer customization. Let's get a high-level view of it. I've left out a bunch of details. I've lumped some internal components together so that it fits on the slide if you work on the Code Whisperer team and are watching this, I'm sorry about what I've done to your work.

First, a user calls in to create a customization. They give us a name for their new customization and a pointer to their code. This can be in a variety of locations, code repost, things like that or it can be in an S3 bucket, which is the example that I'm going to walk through today. The other cases are, are largely parallel to this one. I've labeled this box AWS Step Functions. But in reality, it's a complex orchestration of Lambda and Fargate and AWS Batch all running under AWS Step Functions. It's not important for this talk exactly how this control plane is constructed.

So the Code Whisper front end calls into the control plane which then reaches out to S3 to get the code that we're going to index. And the result of this training is going to wind up in Amazon OpenSearch Serverless and an S3 bucket owned by the Code Whisper service.

So now our end user, our developer comes along. And so we need to add the model hosting service, the Code Whisper model hosting service to our diagram. The developer calls in and it's not actually the developer calling in. It's going to be automatic in their integrated development environment, their IDE but they call in. They ask for completions that calls into Steps. Steps is going to talk to the OpenSearch Serverless, it's going to talk to model hosting and it's going to return customizations. That's it. That's a high level overview of how Code Whisperer customizations works. This is what we have to secure.

And if we were only building one of these, this would be easy like, but we have to build this as a multi-tenant service. It would be really easy if i could like fire up a bulldozer and make each of you your own physical data center and then rack servers like build a whole bunch of single tenanted instances of this. None of you would like the latency of delivery of that or the cost structure. And so we build a single multi-tenant instance of this per region. And so that's the crux of the engineering challenge here.

So let's dive in the first thing that we're going to look at is employee access. You're paying us to operate this service for you. You expect us to respond when something goes wrong when there's an outage. And so some level of employee access to the service is expected and appropriate. How do we manage that access?

If you take one thing away from this talk, please use MFA go get real MFA token. If you're protecting anything of value, anything that you offer to customers use MFA at AWS, we require hardware tokens. They're, they're U2F or FIDO tokens for access to any of our production infrastructure. Um every code commit uh things like that. The easiest way in is an unpatched system. The second easiest way in is a weak credential, a weak password. And so humans are bad at picking strong passwords. We do not get upgraded often there are no patches available. We're bad at picking strong passwords for multiple sites. We're bad at remembering high entropy strings. Don't make humans do that go use MFA enabling MFA is probably the highest ROI security work that you can do. If you're protecting an account that you use to order lunch or you're logging at the local hardware store, use whatever you want. If you're protecting anything where the value of what you're protecting exceeds the cost of a SIM jacking attack. SMS is not MFA we have a yearlong investment in an operational tool that we call Mechanic.

Most host accesses are for predefined operations, checking some stats, tailing logs, wiggling the wires, troubleshooting. We know how to do this. We've got templates for doing this. And so the easy way to do this is just to SSH into the box, but we all talk about least privilege and I have access to the box to run arbitrary Unix commands is way more access than you need to just tail the logs. And so Mechanic is a system where we have all of these predefined verbs. The verbs are classified as to how sensitive they are based on the sensitivity of the verb, there's different levels of approval that are required to run the verb. The verbs are parameterized. You can tell it like which log file you want to tail or which process you need to kill things like that. And so we have moved a tremendous amount of our human interaction with our services to Mechanic to the point where we've driven down the noise floor and we can talk about every access that is not driven by Mechanic.

So there's still a long tail of accesses where operators get in the box. But we can talk about these things with those humans and with their leadership. And then there's a feedback loop where we build new verbs based on the things that people are still running manually. But what about a witting insider? What about an unwitting insider if you know what a blind XSS is or the, the, the security engineer's favorite command line curl pipe bash? Like how do, how do you deal with that? What about the $5 wrench? If someone wants in badly enough, they are going to gain access, they are going to get an employee to give up their credentials to turn over access to their laptop.

So that's one of the reasons that we have what we call contingent authorization. The idea is that nobody, zero people have persistent access to these servers. If I have to access a server, either I have to use Mechanic or I need some level of approval before I can run the commands before I can get on to that box. And so that, that that justification for that access, it might be what we call an MCM a change management. It might be a pager ticket, it might be a customer support case. But I have to show evidence that my access to this box at this time is appropriate. And if I don't have that evidence, you know, some operational event or whatever, I need a peer on my team or perhaps even someone in leadership to approve my access.

We also have a system that we call Chronicle at the very bottom of it. It's the Linux audit d but we capture every command invoked by our operators on every host. And then we've got this giant machine that streams these logs off of the box and runs through all sorts of rules and summarizations and flags anomalies and things like that. The typical latency between a human running a command and us being able to cut a ticket based on that command is tens of seconds. And so we've got this process between Mechanic and Chronicle and contingent authorization where we're scrutinizing every single access to the box, not just what the person is accessing, but what they're doing when they've accessed that box. And we review this with every level of the company and that's not hyperbole.

We have a quarterly meeting with Andy Jassy where we go through our contingent authorization and Mechanic metrics like this is something that the company takes very seriously. And so as a result, it's very hard for you to masquerade as me. You need my hardware token and pin and access to a baseline device that passes our posture checks. If you can act as me or if you've subverted me with a $5 ranch or a duffel bag full of cash, I still have no access. You're going to have to subvert one of my peers in order to get approval for the access that you want to do. And so maybe you've subverted two of us. In that case, every single thing that I do is going to be logged and analyzed and reported on. And so now the ability for an adversary to act as an employee intentionally or otherwise is getting smaller and smaller. This is defense in depth

It's both cultural and mechanistic. You can't act as me. If you can, you can't get to where you want. If you can, you'll get logged and reported on new employees have to follow these processes and these processes help our operations, they make our systems better. And so the builders lean into this, they want fewer outages, they want fewer fat fingers. And so the culture of the organization, the culture of the team drives this and it gives us high confidence that over time this will be maintained.

So we need to get to code and s3, how do we do that? Like in the, in the early times, what you would do is you would add code whisper to your bucket policy or you'd give code whisper a roll in your account and code whisper could go fetch your bucket. That's a potential stumbling block. The person that's setting this up may not have permission to change the bucket policy. They may not be familiar with creating policies, but it's also more access than we need. We only need to access that bucket once when the customer calls us.

And so we've built something called forward access sessions. This enables us to act on behalf of a customer, but only when they've sent us an api call and asked us to do work on their behalf. So in order to do this, we need to introduce some new components.

We've got IAM - the Identity and Access Management service. You all use this whether you know it or not. And we've got a service that I've called the identity firewall. This is one of the microservices in the control plane of code whisper. This microservice is the only piece that has the permissions required to access IAM, KMS, etc.

So as before the customer calls in, this is a signed API call. So there's going to be a signature attached to it. And so now our front end can call the identity firewall and that's going to call IAM and it's going to ask for a FAS.

Now what we've done here is we've passed in the digest, the hash of the request that came in from the customer. And so we don't have to pass the whole request that's potentially very large, but we've passed in enough that I can validate the signature and we've also passed in the signature. So the IAM service can validate that there's actually a customer that has called us and that wants us to do work.

We passed in the profile. The profile is agreed upon by the service team, the security team and the identity team so that no one person can set up a profile. And then we've also passed in a scope down policy. So when we set up the profile, we didn't know what bucket a customer was going to use. So we had to put s3:star as the resource. But now that we know which bucket the customer is going to use, we can pass in a narrower policy when we request FAS.

And so the result of this, if it all checks out is a standard set of short-term AWS API keys that give us access to that one bucket for a limited period of time. And when those keys expire, they automatically go inert in the absence of a signed valid customer API call code whisper has zero access to any customer repos or s3 buckets.

So the FAS is passed along to the workflow so that it can access the customer's bucket and we just call in. So when we put all of this together, code whisper at rest has zero access to customer buckets regardless of how many customers are using the service. When a customer calls to create a customization, only the identity firewall has the access to create the FAS at IAM to pass along to the workflow.

And so we, and and one important thing is that the identity firewall never sees a byte of customer data. It's never directly interacted with by customers. And so it's thus harder to subvert. And when the credentials are used in the workflow, they're good for only a single bucket for only a short time window. And again, this is defense in depth.

I particularly like the identity firewall service. This not only centralizes all of these security functions, which is a good thing. It also means that a new member of the team that's trying to add a feature. If they try and do something that's not in accordance with our invariant, they simply won't be able to unless they add code to the identity firewall that gives us a control point in the code base where we can add extra scrutiny and make sure that the system is continuing to act in the way that we intended. This is also a nice confused deputy protection if by some means you manage to trick me into trying to access someone else's bucket. When I get the FAS, the for access session from the IAM service, it's only going to work for your bucket. My attempt to call someone else's bucket is going to get an access denied error and we've protected our customers.

So now that we fetch the code, we need to index it. The index that we use while generating code completions is stored in Amazon OpenSearch Serverless. This is actually in Adams' keynote yesterday, we're currently using RAG retrieval augmented generation. And so there's some additional data that we need that's stored in our own S3 bucket.

So we fetch the data, we start indexing it, we write it to our own data stores. But access to these data stores is equivalent to access to the customer's S3 bucket. We have to protect this as well. How do we minimize privileges here? In order to do that? We need to introduce another friend, the KMS service, Key Management Service. This is a public AWS service and hopefully you all are using it. This is how we manage keys and pass them around our services so that you have control over how your data is encrypted. And again, we strongly encourage that you use KMS.

So before we move on, we need to learn a little bit about Amazon OpenSearch Serverless. It's a relatively new offering. It's a cloud native search service with compute for data index and indexing that's completely separate from the compute for query processing. So you can scale them independently. The data is backed by S3. So it inherits S3's durability. There's a lot that we can say about this service. But for the purposes of this talk, the top-level concept in ALS is a collection, you can think of this as roughly equivalent to like an S3 bucket like that's about the granularity of it. It's a set of search indexes and documents associated with a single workload.

All of the data in Amazon OpenSearch Serverless is encrypted at rest using a KMS key and there is exactly one key per collection. And so if you want to use two different KMS keys, you have to have two different two different collections.

If our customer, when they call code whisperer customizations passes in a KMS key, that's the key that we'll use. If not, we'll use a service owned key and multiple collections can share the underlying compute as long as they use the same KMS key. A collection that uses a different KMS key will be placed on separate Amazon OpenSearch Serverless compute.

So when the customer calls in to create a customization, they can optionally pass in the identifier for a KMS key if they do. So that's what we're going to use to encrypt all of the data for this customization at rest. So if they don't, again, we're going to use our own our own key.

So now that we know which key they want to use, the identity firewall is going to call KMS and it's going to create a grant on that key so that we have access to use that key for our purposes. The call to create a grant is going to show up in your CloudTrail logs. It's going to be auditable and you can always query KMS to see what grants are outstanding from which services you can also revoke those grants. This is going to break your code whisperer customization, but it gives you additional visibility into and control over how your data is used.

So now we can finally kick off the workflow. It's going to call into Amazon OpenSearch Serverless. It's going to create the collection using the key that the customer specified. And this is an important design decision for the service. If we'd been running our own search index, this would have been a big job for the service team to minimize that work to make it feasible for them to do it. They probably would have run a large single index and then they would have built a security model on top of it where they, they have all of these bits of metadata and a whole bunch of if then else code to make sure that exactly and only the data is returned in response to queries that matches the owner of the data and all of the permissions models that they have.

But because Amazon OpenSearch Serverless exists because we have another service team that solves this problem for them, we can create extra collections every time a customer creates a a customization, we create a new collection. And so rather than using this giant ball of code, the the OpenSearch query processing engine to make sure that we keep customers separate. We've got this very simple mechanism where it's just a separate collection per customer per customization. And so this turns the problem of keeping customers' data separate into a much smaller, much more tractable, much more simple problem.

And so one side effect of this is that if a customer specifies a KMS key, when they create their customization, they're going to get placed on separate compute, only used for customizations encrypted with that key. So generally using KMS is a great way to keep your data separate, but specifically here, it's got an extra benefit.

So now we can call into the identity firewall, which is going to call into IAM and it's going to assume a role in our own account. And so once again, we're going to pass a scope down policy limiting our access. So this isn't a FAS this is us acting within our own account. It's a standard IAM role assumption, but we pass in this extra policy that limits us to exactly and only one collection.

And so now we pass that, that set of credentials along to the workflow. It is running with access to one collection and one prefix in that S3 bucket. So whether it's a hardware error, a well meaning software developer that's trying to index data across two customers, the streams got crossed. There's a memory corruption error. It doesn't matter what went wrong here, we only have access to one customer's worth of data. Our separation and variant here is enforced by the architecture of the service.

So after all of this work, we finally got a customization that's available to our end users. Now they can make a call to code whisper. And again, this is going to happen automatically in, in their editor and ask for code completions. The first thing that we do after we authenticate this user is determine whether or not they are authorized to use this customization.

This seems like a pretty straightforward problem today. The permissions model in code whisper customizations is very simple. There's a set of users, they are authorized to use a customization, that's it. There's, there's no complex sharing, there's no hierarchy, there's no nesting, there's no access control within a customization. You have access to it or you don't.

But still asking the code whisperer team to build an authorization model to build all of that logic that is a large job and systems that start simple, rarely stay simple. There's going to be future complexity. You our customers are going to ask for things from us and we're going to make the service more complicated in order to delight you. And then the code whisper team would have to absorb that complexity. And it's incredibly important that your authorization model work even in weird corner cases. And so we want confidence that this thing is going to do exactly what we expect it to do and only what we expect it to do every single time.

Rather than asking the code whisperer team to build this. We're using Amazon Verified Permissions. This is a public service that's offered to all of our customers. Also, this Cedar language that's running underneath of Amazon Verified Permissions is open source. You can download it, you can build it into your own code, you can use it to build your own authorization services. But in this case, we're using Amazon Verified Permissions, the publicly available service, the Cedar language that I just mentioned is how you express policies. It's relatively simple. It's easy to read. It uses the same model that AWS policies use where you've got Allows and Denies trump Allows the ordering of statements doesn't matter. There's no looping.

"It's not touring complete intentionally. And so it's pretty straightforward to read and write policies in Cedar. And one important thing about this is you give VP a schema for your application, you describe the hierarchy of resources that you have of operations on those resources. And so now when you try and submit a policy and you've got a typo you've transposed two characters or you've got a resource that's nonsensical for a certain principle, the system will kick it out.

And so it's absolutely work to write this schema. It takes time, but it will pay itself off the first time you submit a policy that is either semantically or syntactically incorrect, and it kicks it out immediately rather than running with that policy in production for who knows how long with inappropriate access control. And so the fully validated uh policies in Amazon Verified Permissions are awesome. And it's formally proven this actually came out of our automated reasoning group. And so the entire system was designed such that we can write formal mathematical proofs of properties of the system and guarantee that VP does exactly what we think it does.

So this is something that we're really excited about. The permissions model in CodeWhisper is, as I said, quite simple, but the fact that policies are separate documents, they're not woven through the code in if then else statements is awesome. And the CodeWhisper doesn't expose Cedar to our customers directly. But we could in the future as we make this richer and more complex. If the right thing to do is to allow customers to submit Cedar policies, we could totally do that.

And so now that the user has been authorized, we need to introduce the model hosting service for reasons that I'll talk about in a minute. We're not using one of our ML services here, we're directly hosting this on GPUs. So as before the customer has asked for completion, we're going to call into the identity firewall. Since we're only accessing our own resources. Again, we don't use a FA here. We just do a roll assumption. But now we know the identity of the customization that the customer is going to access.

So again, we assume are all we pass in a scope down policy that limits us to just this one collection, this one customization just like the last couple of times. Again, we've very narrowly scoped our access here. And so now we can invoke the inference workflow. So we connect to that one Amazon Opener, serverless index that we have access to. Then we connect to the model hosting service.

So today we're doing inference on GPUs and these are large complex pieces of hardware. The firmware on a GPU is approximately the size and complexity of an operating system. And so we've chosen to bind each of our models to dedicated GPU cores, we've minimized the complexity of the software running on the model hosting service to ensure that an inference call for one customer can't interact with an inference call from another customer.

So there's no specific issue that we're aware of here. This is just a defense in depth mechanism. This is us saying just in case something goes wrong here, we want to make sure that we've got separation all the way down to the GPU cores. So it would be nice if we could host this on one of our ML services. But hosting ML models is one of our core competencies and we thought that this was an appropriate cost for the service to bear.

And so again, this is defense in depth. We've tightly minimized our privileges here. This is the hot path. This is where we're going to get the most traffic setting up. Customizations is an infrequent operation. Calling in to get code completions is going to happen all day, every day as people are working. And so this is where we need to focus and make sure that we've got everything buttoned up.

Humans like to name things. I'm pretty sure that none of you have a child or a dog named after a GUID. But here are some system generated ideas. If you're like me, you're going to remember them by their first couple digits and you know, there's only 16 hex cars. So you're going to get collisions and you're going to get confused. We are bad at remembering these things, but the vast majority of places that these identifiers are going to get used is deep in the bowels of the system. The user interface facade is skin deep. The rest of it is all computers.

Here are some human generated IDs. These are, I mean, not real examples, but I've seen all of these in the wild in our systems um these are going to propagate through our systems. They're going to wander around. You're going to have to be super careful everywhere. They're going to get logged. They're going to get displayed that first one there with a little bit of javascript in the middle. Some horrible person out on the internet is going to create a customization. They're going to give the customization, that name. They're going to call in to AWS support and say that they're having problems with their customization and that support agent doing their job is going to pull up their internal tools to look at that customization. This is called a blind XSS cross site scripting attack.

And so the support tool has to render this correctly. The log analysis tool has to render this correctly. It is super hard to get that right everywhere. And so using system generated IDs makes it easier to propagate these things through to have all of these different surfaces touching these things safely.

So CodeWhisper allows people to name things. We've been talking about the name of your customization, the whole talk, but that's skin deep. All of the identifiers that flow through the system are machine generated and not exposed to customers. So as an outsider, you can't even guess what the name of the Amazon Opener serverless collection is that we're using to index your code. You can't craft the policy injection that would cause you to skip over to someone else's collection we're still vigilant for things like this. We still have to worry about the customer service case here because the, the evil person is still going to name something like this and they're still going to call in. But at least it's only the facade of the service. It doesn't propagate all the way through some encryption modes like as GCM as is the advanced encryption standard, it is largely the only block cipher that we have these days. And GCM is GEL counter mode. It's a way of turning a cipher that encrypts a block. It takes a block of plain text into a block, a cipher text into something that can encrypt messages of approximately arbitrary length.

It supports something called additional authenticated data in KMS. We call this encryption context. It's applied as a set of key value pairs. So if you specify encryption context, when you encrypt, you have to specify exactly the same encryption context. When you decrypt or the decryption will fail. This is not secret data, your encryption context will get logged in CloudTrail. Do not put sensitive things in encryption context, but it has to be there if you supplied it at encryption time and this can be used as a confused deputy protection.

So this is the encryption context that CodeWhisper customizations actually uses. So when we write your data to S3, we set this as the encryption context. It's not huge or complex, but it does include the identifier of your customization. And so if by some means, adversary clever input memory corruption, hardware failure, we wound up with the streams crossed this, this check is going to catch it. The encryption context is generated right at the point that we're reading or writing from S3 or from the index. And so if the thread thinks that it's working on behalf of you, but it's really working on behalf of someone else, it's going to generate the wrong encryption context. And then S3 is going to kick this request out and we'll get a ticket.

We've come to the end of the talk as we've walked through CodeWhisperer customizations and how we launched it securely. Hopefully, you understand that when we talk about defense in depth, we mean depth in multiple dimensions, depth along the the the set of mechanisms that you'd have to bypass, but also depth in terms of culture in terms of testing to make sure that the invariants that we've chosen remain sound throughout the life of the service, no matter who joins or leaves the service team to accomplish this.

We use mechanisms. When Amazonians talk about mechanisms, there's something very specific that they mean there's a Jeff talk from one of the all hands back in the day and the Jeff quote is good intentions don't work. You need a mechanism. You have to assume that your colleagues are well intentioned that everyone wants to do the right things that everyone is trying to help customers out yet things fall apart over time. Why do things fall apart over time? Because good intentions don't work. You need a mechanism.

So for example, ingesting the AWS SDK and automatically generating tests is a mechanism. We don't have to remember to periodically ingest the SDK. We don't need to remember to send someone to run the weekly test suite. As soon as the SDK gets published, new tests get generated, everything gets run automatically. That's a mechanism. And so we scale, we, we, we provide defense in depth through mechanisms. But we also use culture like if, if we're not bringing our people with us, then we're never going to get to where we want to get to.

And so examples of this like all of the operational tooling that we have things like Mechanic where it is baked deep into our DNA at this point that getting on a box is the last thing you want to do that you want to build the operational tools. And we're very lucky here that this is one of those cases where our security goals line up with the company's operational goals. No one likes outages, no one likes making mistakes. And we have learned that building these tools more than pay, pay off the ROI is incredibly rapid. And so now you've got a mechanism that enforces a culture of people following these rules of people. Building these tools of people thinking operationally and thinking about security which then builds new mechanisms.

And this is a journey I've been in Amazon for just shy of 16 years when I joined the company. The only thing we had were root keys. There was no IAM there was no, I am. We actually only had one region. There was no EBS like all of the things that we've described here, it took us years and years and years to build. This was a long term investment. And so I'm incredibly proud of what the team has produced.

I don't want you to leave this talk and think, oh my gosh, there's no way we could ever replicate that. We didn't build this overnight. We didn't like wake up one morning and say, you know what we need defense in depth. We've been working on this for years. And if you would like to do something like this, first of all, there's a bunch of AWS services like IAM and KMS and all of those things are available to you today. We had to build them before we could use them, but you can use them now, but just get started to start building in this direction and it's like compound interest, it adds up incredibly quickly.

And so our ability to launch a service with defense in depth like this along multiple dimensions is a result of that investment over a series of years. So thank you very much."

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Defense in depth: Securely building a multi-tenant generative AI service

My name is Eric Brandwein. I'm a Distinguished Engineer with the Amazon Security team and I'm going to tell you about how we build defense in depth into one of our new generative AI offerings. It's called Code Whisperer Customizations. It gives you the abi
复制链接

扫一扫