Amazon S3 security and access control best practices

李白的朋友王维

已于 2023-12-06 18:48:00 修改

阅读量112

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

于 2023-12-05 13:50:27 首次发布

本文链接：https://blog.csdn.net/just2gooo/article/details/134805389

版权

Hello and welcome. I hope you had a great morning and you're looking forward to a wonderful week here at re:Invent.

If you're here, it's because you use AWS. And if you use AWS, it's almost guaranteed that at some point sooner or later, you'll be storing your data in Amazon Simple Storage Service (Amazon S3).

Amazon S3 provides a foundational service - flexible, scalable, durable object storage for the cloud. And once you start putting data in the cloud, your number one priority becomes securing that data, making sure that uh only the parties who need to have access, have access and no one else.

We at AWS deeply understand having security as a top priority because we think about it that way too. I'm Meg Later. You'll hear from Becky. We both work at AWS and this is what we think about all day because S3 is so ubiquitous within AWS.

It's worth acknowledging that there are many different kinds of workloads, they're different and they have different security objectives. S3's extreme scalability, availability and durability, make it a great place to store your business's data lake. By storing your data lakes on S3, you unlock ever more value from AWS's machine learning and analytic offerings or partner offerings in your own applications. So that's probably one way that you're using S3. It's also probably where your data at the end of the day ends up. It's probably where you're storing your logs.

S3 as a highly available object store is also a highly convenient place to hold the bits of connective tissue that hold your workloads together. It's likely where you're storing your infrastructure data and your configurations. And in fact, this is one of S3's flagship use cases.

It's also a great place from which to host content for your customers. This includes media content as well as websites and their assets.

Who uses this data? Certainly human users, for example, consumers of your website or scientists interacting with your data lake and this little red worker hat that you'll see is used throughout the presentation represents AWS's Identity and Access Management or IAM authenticated identities, your applications and workloads that produce and consume data in AWS. These will be interacting with your data as well.

And your whole job here is to protect this data no matter what it is and no matter what the desired use cases are. And you know, we know that when you start looking at security in the cloud, there's a lot of advice and chatter out there. So make sure that it seems complicated. It's not in preparing for this talk. We boiled it down to the four best practices that our customers use to successfully secure their data in the cloud.

We're going to talk about a few security controls and some new default settings. We'll chat about encryption at rest in S3, a new default. And when you might want to use a more advanced setting, we'll chat about how you can monitor and audit the access to your data. And we're gonna give you some practical tips for using AWS's Identity and Access Management (IAM) policies.

So let's get started with our secure defaults. You know, this is actually the third re:Invent where I've given some flavor of this talk. And every year it changes a little bit. But this year, I'm really excited to talk about how we used to have a bigger list of recommendations. And we've turned some of those recommendations into secure defaults, things that you just get for free that you used to have to configure for your bucket.

We've made these recent changes this year. So on January 5th, we launched encryption by default, which encrypts all new objects in both new and existing buckets with Amazon S3 managed encryption that's called SSE-S3. And we're going to chat a little bit more about that later. This happens automatically unless you select a different mode of encryption.

Then in April, we enabled Block Public Access and disabled ACLs which are Access Control Lists for all new S3 buckets. These changes enhance your security posture and make your buckets secure by default.

Now let's dive into these features in more detail. As I mentioned earlier, all new S3 buckets have S3 Block Public Access on by default. So when you're getting ready to use S3, what's the very first thing that you do? Gonna create a bucket? And like all other resources, which means things that you own in AWS, an S3 bucket is owned from the account from which it was created and like everything else in AWS.

Uh the moment you create an S3 bucket in an AWS account, it is private. It's impossible for an identity from outside that account to access it. It's private by default. And it always was, yes, there are configurations by which the bucket can be shared to entities outside the account. And you'll hear a ton about those later. But until you take a deliberate action to configure your bucket that way it is private and maybe I should explain what we mean by entities within the account.

Exactly as I said before, AWS's identity system is Identity and Access Management or IAM and the identities in IAM are called principles. And you're gonna hear that word a lot later in this talk, but they represent users. IAM does in fact have users. But most modern use cases involve the IAM role, an identity associated with temporary signing credentials.

IAM roles and users use their credentials to sign requests to the AWS API and that includes the S3 API and these are authenticated requests. Actually, IAM users and roles are themselves resources. They belong to an account here. I'm showing you three roles that are owned by the same AWS account as the bucket I just created. So they're in the same account and the bucket's private by default.

So can they access the bucket? Actually, no IAM principles have policies attached to them to see what they can do? So they might have access if they have an IAM policy that says they have access and we'll see how to do that later. But you know who definitely does not have access? An identity, either an anonymous user or an IAM principle belonging to some other account outside the bucket's account no matter what policy they have over there in their account unless the bucket was configured to grant them access, they definitely do not have access.

It is of course possible and very common to configure an S3 bucket so data is available outside of the account for read, write or both. And in a large enterprise environment, you're going to be doing that a lot. But even if you have tons of cross account data sharing scenarios in the environment, it's unlikely that any of those scenarios intend to make the bucket fully publicly accessible and Block Public Access is a simple feature by which you say that the contents of this S3 bucket are under no circumstances to be made publicly accessible.

What that means very technically is that unless you have a specific configuration on the bucket, granting access to a specific outside entity, outside entities will be denied access.

Does this mean that you can no longer share data outside of your account? No, this is a particularly powerful feature because it's not a hammer locking your data into your account, but rather it's targeted at preventing accidental exposure. And in fact, to make the feature even simpler, you can and should set it account wide. When you do that, it means all of your existing S3 buckets will be covered this way it's set and forget with no further configuration needed. And remember for new S3 buckets, you don't need to do anything at all in new accounts as well. S3 Block Public Access is enabled by default.

I do hear from customers, you know, hey S3 is providing me great scalable cost effective storage for my website content, HTML, CSS JavaScript. This is content that I do want to make available to the world on the internet. Does that mean I need to create exceptions to Block Public Access or I can't use it for my whole account?

Actually, no, the cool thing about Block Public Access is that it blocks inadvertent public access. Your website is supposed to be on the internet. So here's what you're gonna do. You're gonna use Amazon CloudFront, which is another scalable and cost-effective AWS service that's optimized for low latency delivery of content over the internet.

You're gonna put your S3 bucket with your site assets behind it. And when you do this, you're gonna write an S3 bucket policy that grants specific to your S3 CloudFront origin access identity, which is the identity in CloudFront that's going to fetch your content from S3 when it needs to load the cache. This is a well configured secure S3 bucket because it is granting specific access to a specific identity.

Alright. Now we're gonna talk about modernization of our access controls in S3, which is our ACLs disabled by default.

So in S3 historically, there are multiple ways to grant access to your data. Like every other AWS service IAM is a mechanism that you use to grant access to the data. The bucket has always been private by default. So within the account, you can take advantage of IAM's granularity to assign different IAM roles for both users and systems at exactly the right level of access to S3.

And there's also the bucket policy. This too is an IAM policy except that it's on the bucket and it has two main purposes. One is to let you share the data with specific IAM entities outside of the account as needed and two is to define preventative controls that assert what types of access should never happen.

Becky is gonna go into detail about how you use each of these policies to do what later in the talk. Now, actually in S3 and unique among AWS services, there's a third way S3 launched in 2006, which was a full five years before IAM. And the access control mechanisms which predates IAM was called Access Control Lists. They are meant to bear some similarity to the ACLs that you'll see in file systems.

They still exist, but you shouldn't use them anymore. And when you create an S3 bucket today, this mechanism will be entirely disabled by default. So you don't need to do anything. Of course, you probably have buckets that were created before April. And some of these may have ACLs, you may have already disabled ACLs on buckets in your account, which you can and should do.

But we hear from customers that while they understand the importance of this security step, they want to be very sure that they can disable ACLs without interrupting existing desired access patterns. So this year, we've also launched two ways to get visibility into your use of ACLs. Even for a large scale S3 bucket S3 inventory reports will indicate the ACLs present on each object. So you can produce an inventory report and you can query it.

Our two data access logging mechanisms in S3, both AWS CloudTrail events and S3 server access logs indicate when a data access event relied on an ACL in order to succeed and we're going to talk about each of those in one second. But first, I want to point out that when you disable ACLs on your bucket, no metadata actually changes from an operational perspective. This is really important. It means that you can always roll back your change disable ACLs. If you find that you unexpectedly need to reverse that action as operators in AWS, we really understand the importance of rollback in any operational safety program. And so that's how that works here.

Alright. So we're going to give you briefly a bit of detail for how if you have an existing and actively used S3 bucket, you can disable those ACLs.

So, as I said before, this first mechanism is S3 inventory report. This is a great management feature of S3 where you can request a listing of every S3 object that you own. They're delivered in standard formats like CSV and Parquet. And it's the most efficient way to get a full listing of all the objects in all of your S3 buckets.

And it's highly useful for a number of reasons. S3 inventory reports aren't new, but these two fields are. They're going to tell you the Apple setting on each object. So if you have an existing actively used S3 bucket and you're preparing to adopt this important security step, the first thing that you're going to do is request an S3 bucket inventory.

As I said before, the state is in standard format. It's queryable, so this means you can use AWS Athena or another tool to quickly identify any object with non-standard Apple settings. And in the common case where you find that you have none, you're ready to disable Apples. But if you do find that there are non-standard Apples in your bucket, the next step is to figure out whether or not they're actually being used in practice and this requires visibility into your access patterns.

So this year, we've added the visibility into those two methods of logging that I mentioned before - CloudTrail and server access logs. And we're gonna say a little bit more about what those are later this talk. But this new log field ale required tells you which specific accesses relied on the older Apple system in order to succeed. When you analyze your logs, you can identify which types of access patterns you need to model with an IAM policy. Once you are no longer seeing this attribute, you can be confident that disabling Apples will not interrupt any existing usage patterns. This analysis is best done with some SQL queries against the data, which is straightforward to do with Athena.

The QR code here takes you to our blog post where we show you in detail, how to run those queries and what to do with the results.

All right. So next, we're gonna talk about our last secure default - encryption at rest. So first we're going to start with some recommendations and who they're for. A large majority of use cases fall into one of these two. In fact, the biggest categories do nothing. When you write data to S3, as of last January, it's encrypted with a key that S3 owns and manages. And again, that's for both new and existing buckets. Your data is encrypted at rest. Not much more to say about that. And it simplifies any internal conversations you're having about whether your data will be encrypted. All your new data is encrypted.

A smaller number of use cases require you to have a second layer of permissions, control on your data or a key that you own and manage. When this is the case, we recommend that you use AWS Key Management Service (KMS). KMS gives you a way to do this since it has its own resource policy in IAM. And for these cases, we recommend that you use a CMK which is a customer managed key in KMS.

For completeness, I'm going to do a quick comparison of a few different ways here. So we talked about SSE-S3. This is the one that you get from by default. There's a flavor of SSE-KMS where you use an AWS managed key similar to SSE-S3. There's not a key that you own and manage, but you do get some additional logging over SSE-S3 in your CloudTrail logs. But importantly, this mode impacts your ability to share data outside of your account because there's a policy attached to that AWS managed customer AWS managed key that you don't control. So you'd never be able to share data outside of your account.

And then this is AWS KMS SSE-KMS where you manage the key policies, you get additional in CloudTrail, you handle the key rotation. KMS will do it for you on a yearly basis if you don't do it more often than that, there's no impact to shareability because you can modify the KMS key policy.

So looking at these options, we think nowadays that the use of that middle option is almost better, almost always better served by either taking advantage of the default S3 that you get by doing nothing or by creating and configuring a CMK if you need that additional level of permissions or key management.

We also heard from some customers that there are some customers who face specific regulatory requirements to apply multiple layers of encryption to data. And as of this past June S3 can do that for you too. We launched Dual-layer server side encryption with keys stored in KMS (abbreviated DSSE-KMS). With this launch customers can fulfill regulatory requirements to apply multiple lever layers of encryption to data without having to do client side encryption before sending their data to S3.

This is implemented with two different AES-GCM modules. And DSSE-KMS uses the AWS Key Management Service to generate the data keys, giving the customers control over the customer manage keys.

All right. So now you've selected the mode that you want to use based on your specific regulatory and compliance and security requirements. So how are we going to make sure that our objects are encrypted?

Well, you can specify the mode at the per object level during the put using the headers. This is how you would do it for SSE-S3. But you do not need to do this anymore because it's enabled by default. For SSE-KMS, you're going to specify the mode and the key that you want to use to encrypt your data. And similarly for DSSE-KMS, you're going to again specify the mode and the key that we should be using to encrypt your data.

What if you don't want to change all of your applications? In this case, you can use a bucket level default. And again, if you want to use, so c s3, you don't need to do anything. As I said before. AWS is automatically applying SSE-S3 to all new objects. But if you do want to use a variety of KMS, you can use the default encryption at the bucket level to specify your preferred encryption mode. It's a one time bucket set which will automatically encrypt all new objects. You specify the encryption mode that you want to use. And if you choose KMS, you're going to pick your KMS key. And if you choose KMS, we're also recommending that you use bucket keys. So we're gonna talk about this feature later, but it's a cost savings feature for those of you who need to use KMS.

All right, we're going to take a quiz and I require participation to keep me entertained.

All right. So if we have a put where there's no encryption specified in the object header and the bucket doesn't have a default enabled. Raise your hand if you think the object will not be encrypted and raise your hand, if you think the object will be encrypted with SSE-S3. I'm so glad if you take nothing else away from this talk that you are now getting, you know that you're now getting default encryption. Ok?

But what if the object put has SSE-KMS specified? But the bucket doesn't have an encryption set enabled? Raise your hand if you think you're getting SSE-S3, raise your hand if you think you're getting SSE-KMS. Awesome.

So that was too easy. Um all right. So we have an example where the put specifies SSE-S3 explicitly in the header, but the bucket has an SSE-KMS default enabled. Raise your hand if you think you're gonna get SSE-S3 and raise your hand, if you think you're gonna get SSE-KMS.

All right, for this one, you're actually gonna get SSE-S3 and the mental model here is when you have two instructions to S3 and they conflict S3 is taking the more granular one. So in this case, what we say at the object, but you might be wondering how you can prevent these conflicts. For example, if you have a regulatory requirement that you use a specific KMS key, how can you make sure that you only get objects that are KMS encrypted in your bucket? And the way that you would do this is you'd use a bucket policy. We're going to have tons of discussion about policies later in this talk. But this example denies puts that are not KMS.

Of note, this policy is going to accept, puts that specify KMS via a header in the put like we talked about or that rely on default encryption sent to SSE-KMS depending on your requirements, you may need to go one step further. This is a bucket policy that specifies a customer manage KMS key ID that must be specifically used in order for puts to succeed on this bucket. And again, this will work regardless of whether that key is set at the put or from a default bucket setting.

All right. So this section is a bit on a bit less on the what to do and the why, but a little bit more on the how it works. So if you're curious about the technical details here, they are our first example shows how encryption works with SSE-KMS when a customer puts an object and specifies that this object should be encrypted with a KMS key.

S3 sends a request to KMS to get the key KMS. Remember has its own resource policy and it evaluates if this customer has the permissions to encrypt data with this particular key. If KMS determines that they do KMS sends three back a plain text data key. S3 uses that data key to encrypt, the object and S3 stores an encrypted version of that unique plaintext data key alongside the object S3 then destroys the plaintext data key.

Now on the get customer gets an object that is SSE-KMS encrypted S3 sends that encrypted data key to KMS KMS again, evaluates the resource policy to determine that the customer has the permissions to do this. And if so sends back the plain text data key which S3 uses to decrypt the object, send it back to the customer and destroys that plaintext data key.

I briefly mentioned bucket keys when we talked about setting defaults at at your bucket level. So bucket keys is a feature that we launched in 2020 designed to help our customers save money and request cost to KMS. Customers who have enabled this feature have saved a cumulative $80 million since it launched three years ago. So let's take a look at how it works and how it's a little bit different from KMS.

The first put for a bucket key enabled bucket looks similar to KMS. So again, the customer is going to put an object. They're going to say it should be encrypted with KMS. They're going to give us a key and they're going to say it should be a bucket keys, encrypted object.

S3 is going to again send a request to KMS and KMS is going to send back a bucket key, presuming that this customer can use the KMS key. S3 will derive a unique plain text data key from that bucket key. Use the key to encrypt the object and will store an encrypted version of that bucket key alongside the object in your bucket and destroy the plain text data.

Now the next time a customer puts an object in the bucket key enabled bucket S3 is going to use the time limited S3 bucket key to derive a unique data key to encrypt the object S3 again stores that version of the, the encrypted bucket key alongside your object and destroys the plain text data key. But importantly, there's no request to KMS.

And so here's where our customers are saving money on their request costs to KMS on the get when a customer gets an object that was bucket keys encrypted when the time limited bucket key is not already in S3. S3 is going to send that encrypted bucket key to KMS KMS will decrypt it and send the bucket key back after verifying that the permissions are all in place. And S3 will reder the plaintext data key that was used to encrypt your object, use that to decrypt your object and send it back to you. And as always, we're destroying that plaintext data key.

Now, the next time the customer gets an object that was bucket keys encrypted with the same bucket key within the time limited window. S3 is going to go ahead and use that bucket key to red, derive your plaintext data key. Give you your object back and destroy the plaintext data key again with no request to KMS here, which is where we save money and improve performance.

One of the things we hear from customers is I really like this bucket key feature. I'm into saving money but my, my bucket is a multi tenant bucket. I have different customers of my own that have their own customer manage keys that they use when they write data to my bucket. Can I still use bucket keys? And the answer is yes, it's designed to work for the multi tenant bucket.

In this case, you're going to enable bucket keys in your bucket default encryption configuration and have your end customers their preferred customer manage KMS key to the request header. When you upload an object and request a different customer manage KMS key in the encryption request header, your object is still going to have the bucket keys enabled by default. And the effectiveness of the bucket key will vary depending on the workload and how many objects are written to your bucket.

Ok. So we've talked about the encryption options in S3, a framework for deciding which one is right for your use case and how it works under the hood. So now let's talk about how you can audit the encryption status of objects in your environment.

S3 Storage Lens provides organizational wide visibility into storage usage and activity to improve cost efficiency and data protection. It has an interactive dashboard built right into the S3 console providing a central organization wide view of S3 storage across all of your accounts and buckets. You can drill into more detailed views of storage, class bucket and prefix. So we're going to look at a couple of questions that can be answered with Lens.

So you might be interested in which buckets have encrypted data. Storage Lens can pinpoint the buckets that have unencrypted data or an incorrect configuration on the bucket. So here the encrypted bytes include any type of S3 encryption that you may have enabled on the bucket. And here's an example bucket that has hundreds of thousands of unencrypted objects, which is 63% of the bucket's contents which the owner may want to address here depending on their business's requirements.

We also hear from customers like, "Hey, I heard about that great Bucket Keys feature, the easy button to save money. I want to know if we should re-encrypt our existing KMS data and save money on reads." Now, here we recommend that customers spend a little bit of time looking into their usage patterns. If you have a read-heavy bucket with objects encrypted via an object-level KMS, you can carefully evaluate these patterns. So you'll use Storage Lens to identify KMS requests to your organization and zoom in on the accounts and buckets that have the most requests. If you typically only access data for a short period of time after it's created, we recommend letting your objects age out until Bucket Keys are being used for all new objects. If your data is very read heavy over longer periods of time, perhaps years, it may make sense to re-encrypt your existing object-level KMS objects with Bucket Keys.

And so this bucket here has a, has a lot of requests compared to the rest of the organization. And so this customer is going to look at only this bucket when they evaluate if they want to re-encrypt their data.

Maybe you do want to re-encrypt your data, maybe for Bucket Keys, or maybe because we enabled encryption at rest by default for all objects since January and you've long been able to request other types, but you may want to mass re-encrypt or encrypt your data in S3. You don't want to do it one by one. S3 Batch Operations is a capability of S3 that can perform this mass encryption or re-encryption, rather than requiring you to write a script that's going to iterate through all of your objects and copy each in place.

Batch Operations let you submit a manifest of all the objects, which an inventory report can provide, and it will drive the job to completion even if you have many, many objects. Batch Operations offers numerous types of operations including tagging and deleting, but for re-encrypt or encrypt, you're going to use the Copy operation in place and you're going to specify your desired mode of encryption. Ok?

So we've covered the basics and we're going to cover more advanced IAM policy authoring later. But right now, I want to talk about the features that S3 provides out of the box to assess the security posture of your buckets.

Security in the cloud, the customer's responsibility in our shared responsibility model, compromises both preventative and detective controls and you always need both. What we're about to talk about is in the realm of detective controls.

We offer a few features in S3 that provide out-of-the-box visibility into how your buckets are configured and how they're being accessed. When you have many buckets in your AWS accounts, you probably want a quick way to view your existing permissions to prevent unintentional access permission. For this scenario, we created a tool called Access Analyzer. This feature analyzes permissions for your buckets and provides a simple dashboard that shows buckets with public access as well as buckets that are shared with external accounts.

So let's look at an example here. In this example, you're going to select Access Analyzer, press 3 on the left panel, you select your region. And once you do that, you're going to see a list of findings. Access Analyzer reviews your bucket policy, bucket ACL, access point and block public access settings to produce findings. It will list the buckets that have public access in the top panel and buckets that are shared with external AWS accounts at the bottom panel.

Here we see that we have 4 buckets with public access and 2 buckets with access from other AWS accounts. The dashboard also tells us the mechanism that granted that permission such as bucket ACL and bucket policy. So we can go there to make adjustments if required.

The ability to audit who accessed your data is really important to keep a detailed record of all API calls made to your S3 bucket. You can turn on S3 server access logs or AWS CloudTrail logs for your S3 bucket. These logs are useful in compliance and in security audits. With the log files, you can query and perform analysis of log data to gain insights into your access patterns.

So you might be wondering why S3 offers two types of logging for data access, and when you should use each. Much like IAM, S3 predates CloudTrail. CloudTrail launched about 10 years ago. Today we recommend AWS CloudTrail since all of your other AWS service audit events will go there and you'll be able to apply similar practices to analyzing and auditing the data, to the access to S3. For example, if your organization audits across events separately, AWS CloudTrail can send those events to a central logging account.

S3 server access logs work well if longer delivery times are acceptable and you're not trying to unify your log analysis activities with the other AWS services that you use. There's no cost for S3 server access logs, although the normal S3 storage costs apply to the logs stored in S3.

All right. And with that, I'm gonna hand it over to Becky to chat about IAM and S3.

Thanks so much, Meg. Yeah, so, you know, you can't really talk about securing your stuff in AWS without talking about Identity and Access Management. IAM. AWS, the cloud is accessed via API and authentication and authorization for all of these APIs in AWS including S3, which almost all of our customers use, is done with IAM, which actually every one of our customers uses.

And if you're a hands on builder in the cloud, if you're actually implementing workloads in the cloud, you are, you are going to be writing these IAM policies at some point.

The other thing I'll say about this is people often ask, people particularly at the beginning of their cloud journey, are often, often ask, you know, how do I build up cloud skills? The vast domain, how do I learn about it? And actually my recommendation, this is a personal opinion, perhaps a little bit biased because I work on this stuff, is actually start by getting a good grasp on IAM. IAM learning, IAM is kind of the analog of learning how to learn for the cloud, because IAM works the same for all of our services.

We're going to look at it through the lens of S3 right now because that's actually the place you're probably going to focus most of your attention and IAM. But this is, this, I, it works the same regardless of what you're doing in the cloud.

So IAM, like I said, this is permissions for, for S3 and for all the other AWS services. Everything is done with an API. Each of these API calls is individually signed, authenticated and authorized against these policies. And there's policies in a lot of places in AWS. We're not even going to talk about all of them here today. There's policies on the IAM principle, on the caller, there's policies on the resource, on the S3 bucket. Both of these policies come into play when S3 is figuring out whether you can read or write or do something else with this data. Ok?

So we're gonna jump right in. I'm gonna teach you how to read and write these policies because it's actually a lot of people find this daunting because all you see these JSON policies, but it's actually very straightforward and can always be broken down into first principles, so to speak, spelled differently.

So you have your IAM principle, typically a role. This is the entity that gets to make the call. They have a policy attached to them. And as you saw early on in Meg's part of the talk, just because you're in the same account as the S3 bucket doesn't automatically mean you get access. There actually needs to be an IAM policy on the role saying that they get access to do it, or on the bucket saying that they get access to do it.

So let's look at a policy that we might have written for this IAM principle. So we say these policies, they're made up of one or more statements, sometimes many, many statements. And each of these statements is going to say either Allow or Deny, which means exactly what you think it does. But here we're trying to allow access. They all talk about an Action. So here the Action is GetObject - if you read access to an object - and you see that the Resource here, the Resource here is wildcard.

This, if you, if you're wondering what that thing is with all the colons that starts with a, that's an Amazon Resource Name or ARN, and it's the fully qualified identifier for anything you have in AWS. And this is how it looks for S3 buckets and specifically for the objects in these buckets. So we said we're allowing you to take the GetObject action on anything that matches that prefix.

You can see that the API call that we're making does in fact match this statement. So this statement, along with any other statements that are also matched by the attributes, by the properties of the request being made, the statement gets taken into account. Since this is the only statement in town for our example right now, and so nobody's out there saying deny this, this user is allowed to read this data.

So ok, we can read some objects that match that wildcard. Now, another thing that might say this role belongs to some kind of a workload and application. A part of this application is very common. I first need to list the objects that I have before I read them, maybe I'm some kind of query engine of some kind. And so I got to list them all out before I can read it.

Well, so that's actually a different action. It's a different request, it returns different data. Whereas the GetObject will return the contents of an object, the ListObjects is going to, the ListObjects is going to return just like a list of metadata for a bunch of objects. So different thing being returned, it's a different action.

So how do we, how do we fill in the blanks here? It's obviously not s3:GetObject. Well, you fill them in like this and you might be wondering, well, how, how did she know that?

Well, at some point, I, I'm pretty sure at some point, I'm pretty sure some of you are going to just know this by heart, but you don't actually have to. There is, I got a QR code up here for my favorite page of the AWS documentation. This is the IAM service-by-service documentation. You'll see if you go to that page, you'll see down the left rail they have a list of all of our services. For each service they show you this handy table that summarizes exactly how IAM works for this service.

So S3 has one of them. And in S3, if I go down the table looking for the thing that lets me list the contents of a bucket, which is what this ListObjectsV2 does, you'll see that it's actually, you might think it's called ListObjects because the other one was called GetObject and the API method was called GetObject. This one's actually called ListBucket - slightly different name.

You do want to actually look at this table, make sure you're getting the Action name right. And the Resource you'll notice we don't have wildcards in here anymore. And that's because the thing that the ListBucket action acts on, the direct object of that sentence is a bucket, not an object. So what you're seeing up there, that's the how you spell out an ARN for a bucket. So this is what you see here.

So this is how I, this is how this is the statement I'd add if my application needed to list the contents as well. We also talked a little, so now we kind of are getting the pattern here now. So we also talked a bit about encryption, right? Earlier we could have used KMS, SSE-KMS, server side encryption with KMS to encrypt our data using a key.

And if you remember from that part of the talk, the reason you do this so that you have the second resource in AWS, this KMS key also see it called CMK customer managed key and it's the customer managed part here that matters. This key actually has a policy that you can control a second layer of control. Ok? So IAM clearly comes into play here.

So what's going to happen here? I've got an object in my bucket. It's I'm telling you that I have previously encrypted it with service side encryption with KMS and give you a second to think about. And you can see I wrote an IAM policy. This is exactly how we learned how to write IAM policies for reading contents of an object. I want you to give you a second to think about this. What happens here and you know, because you're good at taking tests that I wouldn't be asking you this question if this just works, right?

Um so no, this, this actually doesn't work. And the reason why again, it was the second level of control, you actually need access to two different AWS resources to do this. So we covered the S3 bucket from S3's perspective. You are allowed to see the contents of this object. However, the contents of this object are currently ciphertext. So they're not going to be very useful to you, but it, so in order to decrypt this object, this all goes, this all goes on behind the hood at S3. Um in order to decrypt this object, well S3, on your behalf, you know, under your identity, under your permissions is going to make an onward call under the hood to an action in KMS called decrypt to decrypt that intermediate key to be able to see your data. And uh because I don't have any statements saying that anything like that is allowed, we don't have access yet, you're going to get an access denied on this.

So you know, of course, we can fix this. So S3 is gonna say access denied because although it could give you the ciphertext that's not very useful. Um how do we fix it? We fix it by granting this principle the permission that they need also to the key. So I'm showing you, you know, you go to your IAM documentation page, you choose AWS Key Management Service, you find the decrypt action and this is how it tells you to write that you can see the pattern is very similar, right? This is how you, this is how you fully qualify a KMS key in a particular region in a particular account uh with these ARNs. Ok? So that's the pattern. Everything else in IAM is a repetition of that pattern. That's really all there is going on.

Now, if you think about what we use, IAM for what anybody uses IAM for in the cloud. It's actually two different categories of things going on in IAM. Of course, they both have to do with access management. But there's sort of two sides of a coin, particularly if you think about a service like S3, like what might you put in your S3 bucket, you might put a data lake in, in an S3 bucket that has tons of different use cases, accessing it from all kinds of different places. And you, of course, you know, in many cases, particularly in a large, in a, in a nonsmall organization, you're gonna have multiple AWS accounts might have many AWS accounts. So you're, you are going to need to write policy on the S3 bucket to make the data accessible outside the account for all of these data lake use cases. And then on the other side, having shared the data, you also need to do something that we now have a word for it in AWS, create a perimeter around this data. You want to actually make some assertions about the outer bounds of who you're actually sharing this with if you're getting into sharing it. So let's talk about both sides of this coin because you do both of these in IAM. And in fact, we're going to focus on the S3 bucket policy.

So let's first talk about IAM for data sharing. So uh so this half the picture. So we're gonna be sharing data, the the buckets private by default. But we have a, if you have more than one AWS account, it is almost certain that at some point, you are going to have a workload in this account that needs to access data, an S3 bucket in that account, almost guaranteed. One example is very easy to wrap your head around is you might be using the S3 bucket to hold some shared configuration data for your workloads. You might want to keep that in one place in your organization but you have the workload running on two or lambda or something over here. And so of course, by default out there, we don't have access.

So, um how do we do that? Ok. So what we've done so far, we know how to write a get object, a get object policy here. We can read the contents of this bucket but uh wrinkle here, we're in a different account from the S3 bucket. So think about this for a second. What happens here, right? You know what happens here? You don't have access, you don't have access because if you think about it, you this should not grant you access somebody in account over here shouldn't be able to write a policy about a bucket that i own over here. They shouldn't be able to go to the locksmith and print a key to your house, right? This shouldn't, this shouldn't work. Of course, this is a very common and expected use case. So you want a secure way, a secure and specific way to make this work.

So how do we do that? Well, if you think about what the problem is in this picture, the problem is in this picture is that we had, I mean, he's a little bit of jargon here, we had two authorities who could issue policies. We have the authority who owns the collar. That's a count one. We have the authority who owns the bucket. That's a count four. They kind of, don't you think they kind of both need to be a, have issued a policy saying this access is allowed and that's exactly what happens when there's two issuers to authorities. What you're actually what you're gonna have is two rounds of evaluation and they both need to pass, right? Because anything else wouldn't make sense? Ok.

So what I'm showing you is we have just another IAM policy on the S3 bucket follows a lot of the same physics except it's just one new thing there, which is the principal element there. Uh the principal element, you're saying this statement is about a count one, it's a count one making the request, then that principal thing is gonna be matched with somebody else. It's not. Um so the principal element is necessary on a resource because of course, on a, you need to say who you're talking about when it's attached. When the policy is attached to a principle, you don't need to say who it's talking about because it's who made the call. Ok.

So now I have both of these policies, two issuing authorities, both of them say that this works. So we have access, we have good well configured specific access across accounts to our S3 bucket. So let's scale this up right in anything but a very tiny getting started AWS environment in the modern era, you're gonna have multiple AWS accounts. Like if you read all of our kind of best practice advice about how to use AWS accounts. It's more or less we say, you know, use an account per workload. In fact, I would add per region. So an account is like a box of things that goes together in a workload. And actually, you might be doing this fairly granularly. So if you're doing that, you're gonna have a lot of accounts, I'm showing you six accounts here, but there might be a couple of extra zeros on that one. And, you know, you, you'll probably have these accounts tied together in something called an AWS organization. This is how you pay for the thing with a single bill. The organization also gives you a lot of really useful security controls that can span your organization. I'm not gonna talk about that today, but it's a good thing to know about if you're doing security in a larger AWS account. And the organization also allows you to scale various things about IAM policy because IAM understands your organization.

So over here, I've written a bucket policy. Let's say I have workloads writing in all these accounts in my org and there are hundreds of them. There's 1000 of them. Well, I could write that S3 bucket that just lists account one, account, two, account three, account four. But if my intention is accounts in this organization should have access, I could do something like this and you'll notice here, you know, because you came to this talk because you're interested in security, you'll notice here that the principal has a star and that might make you feel kind of uneasy. And actually, actually, you shouldn't because you'll notice down here in the, this is our first time seeing a condition, but these conditions, they work the way you think all of the conditions in a statement need to match. In order for the statement to get picked up. The condition actually is very specific about principles. It says must be a principle from my organization. So that's what this means.

Let's scale up a little bit more you might find and and we find this actually happens, especially with customers with data lakes in S3 buckets, you might find that over time, you're identifying another use case where the data needs to be shared on the bucket and another use case and another use case and another use case. And we start to accumulate a lot of statements in this bucket policy. You can get up to 20k characters in a, in an S3 bucket policy um by my, by my very rough arithmetic. This lets you talk about maybe like mid tens, like 3050 that kind of number of these use cases. You can, of course, it depends on the length of the strings. But that's about, I mean, I'm, I'm in the right ballpark for order of magnitude and you start to add these statements. I don't know how many of you have ever started to write a bucket policy that start starts to look like this, right? Yeah, you probably have.

Well, I'm gonna tell you about a few, if you, if you recognize this bucket policy, I'm gonna tell you about a few things. One of them is new that you should be looking at. Um, you know, if you're getting worried about, you know, will my bucket policy scale to support all of these use cases. So one of them is access points. Um so S3 is this feature called access points, access points. The easiest way to describe them is they are additional endpoints to your S3 bucket. They go through the same policy evaluation with the bucket policy evaluated and the principal evaluated, but they have one very important. They have two very important aspects. Aspect. One, the access point itself has an IAM policy on it. Aspect number two, you can have up to 10,000 of these. So this adds a couple more zeros, four more zeros on to the amount of IAM policies you can write. And you can see that these, you know, i have red, blue, yellow access points. You can see how if you remember that bucket policy that adding use case statements to the bucket policy. You can imagine factoring each of those out in into access points. And your bucket policy would instead just say i allow things that are delegate, i allow things that are shared through my access points. And that's the pattern you should be looking at if you are scaling up access to your bucket. And the thing you're worried about is IAM policy space. You have these, you know, you do have these relatively static uh use cases for sharing. They're just very numerous. Um access points are a great fit for that.

So uh the other thing about access points to help with compatibility, you know, because S3 is such a ubiquitous concept in the world that you have a lot of tools and applications that are written to speak S3, speak the S3 API expect a bucket name. So to help with that, when you use access points, you'll notice that S3 assigns a name to the access point.

It's the name of the access point you asked for plus a little uh little suffix generated by s3. This you can drop this string in wherever you use an s3 bucket name and this is just going to work. So anything that's expecting in three bucket, but you want to use access points. Look at that, look at that alias. We made a couple, we've launched a few new things with access points in the last year of the aliases aren't from this year, but i mentioned those you can now create an access point in one account in front of an three bucket in another account. All the iam stuff works with that. That's useful for some use cases where you need the sharing to be controlled in another account.

And then finally, like i said, 10,000 pretty big number. But this week we launched something that actually takes the granularity of uh of access control and s3 1 step further, not gonna spend a lot of time on it today. Although we do have a talk on, i believe it's on wednesday, it's stg 337 um scaling access for large scale data lakes. Uh we're gonna talk about this new capability a lot, but just to show you what we're doing here, i'm gonna do a screenshot of this new feature here.

So access grants, they are kind of what they sound like. I'm gonna kind of uh drill in on one of these here. Uh we talked to a lot of customers who have data lakes in s3 and they think about access to their data lakes in terms of folders and users and groups who should have access to that folder often in terms of projects, right? Like i have a now, of course, if you've been working with s3 for a while, you know, there's no such thing as a folder in s3, that's not really a thing, there's just slash characters and the object names, but lots of people think about it that way. So this lets you take a prefix in s3 and it gives you sort of a simplified, a simplified and highly scalable mechanism for granting access on a prefix by prefix, use case by use case basis to be able to manage each of them individually.

The other thing you might notice here is the grants here. You can grant to groups and users from your directory. Um maybe if you're using microsoft in id or, or any number of the one identity providers that integrates with our iam identity center, you can make grants directly to those entities and have applications that are accessing data directly as those entities. That's a new capability called trusted identity propagation that came from iem identity center this week. So a lot of really exciting new stuff in this, in this feature. It's all i'm gonna say about it today. But if your use case looks like this, now you have a way to express that more directly.

Now, of course, we talked about so that those are basically your options as you know, as you're trying to share data in a well controlled way outside your account. And once you're, once you're doing that, you or you know the people who are responsible for looking after security in your organization are gonna wanna know. Ok. How do i know that excess permission has not been granted? How do i know that this fits within the bounds of what our organization considers acceptable, acceptable sharing of data?

Well, you do that with iam too and that's, you know, primarily for us three focused on the three bucket policy. Ok. So who does need access to your data? Well, your aws accounts, we talked, talked about a number of reasons why. The same account, many accounts plural in your organization might need access to the data. Of course, not everybody has access to everything. That's what the, that's where the fine grain policy stuff comes in. But, um, you know, within those, the, if they're in your organization, it's someone who could have access and also aws services, loads of aws services are integrated with s3 because it gives you that scalable storage. And so many aws services work with data, right? Analytics, machine learning, observ, observ ability services, they all cloud trail, they all work with your s3 bucket. And basically for most customers, you can kind of draw a circle around this picture and say, and i want to exclude everything else that isn't in this picture, right? This is a particular concern, you know, when you're thinking about cloud security skills in a large organization, um you, you know, it's, it's often the case that there's thousands of accounts, a lot of people are going to be operating in those accounts. And you want, you want an easy way to make sure that they're not share none of these, none of these actors, no matter how good their cloud skills are, are sharing data beyond your bounds.

So we can there's a number of techniques called aws data perimeter and we have a great white paper on it. You could look that up and read about all of our techniques as great specific examples about how to do this. But your s3 bucket policy can actually assert a bunch of things about what can't happen, create a perimeter around the massive amounts of data in this bucket. So we're gonna focus on deny statements means get your brain ready for double negatives and triple negatives because these are saying what can't happen whenever a deny statement matches the request game over uh the axis is denied.

So i'm gonna say here, uh i'm gonna say here, you're probably trying to parse this im policy when i'm talking to parse it for you. Don't worry about this. This works exactly the same as the simpler policies we looked at before. By the way, you can look up any of what any of these conditions mean. So key thing here is i'm saying, ok, i'm going to deny s3 actions to this bucket if the collar is not from my organization. So they're from somewhere outside the circle and the collar is not an aws service, right? That one's important too because i want like the cloud trail service to be able to write to my bucket.

So if they're so i just said they're outside that magenta circle that we drew. And what that means is i do not want to allow this access. And i'm actually just going to mention last week in the pre invent launches. We launched a really useful capability for scaling these patterns. You're looking at one statement here that says not from my org and not an aws account. I'm going to deny it. What if they are in aws account? Like it's cloud trail or something? Writing data into your s3 bucket? Well, you want to assert that this access is being done on your behalf. Right. Cloud trail operates on many customers behalf. You want to make sure they're operating on behalf of your organization. If you know security jargon, i just, i'm talking about the confused deputy problem. So last week we launched, you can look it up, it's called source or id. There will be examples for how to write these deny statements scalable.

So if you have an s3 bucket where you're expecting aws services to interact with it on your behalf, you can exclude all interactions that are not on your behalf and do it in a scalable single statement. Networks are often also an important component of these data perimeters. Now, you're probably all up to speed on, you know, zero trust kind of thinking, you know that just because you're coming from the right network is never a good reason to grant someone access. But network, if you're coming from the wrong network network is an important signal. Actually, if you're coming from the wrong network, even if you're presenting the right identity, even if you're saying all the right things if you're coming from the wrong network, you probably should be excluded from this bucket. That, that too, if i could draw in three dimensions, that too would be a that if i could draw this three dimensional spheroid around my uh or ellipsoid around my uh around my bucket i network would be another dimension of that.

So we're going to write another one of these uh bucket policies. I'm actually gonna throw in a concept called vpc endpoints. Um when you're running workloads in aws, vpc is virtual private cloud, it's your data center in the cloud. This is the network that you control for your compute in the cloud. And vpc has this feature called vpc end points. You can create them to many aws services most especially s3. And that uh that helps s3 identify the network that this request came from. If you're trying to create these data perimeter assertions, you can see why this is useful. If i have a good signal of the network that this came from, then i can exclude access from unexpected networks. That's exactly what i'm doing here.

Now, you see a little bit more on this slide than what i just said. So i'll break this one down for you. So the first condition says the calls not coming from my expected vpc or vpc. If i have a bunch of networks in the game here, i would have a list here and the call is not coming from, you know, over the public endpoint from an ip address range that i'm expecting. Like, maybe i have corporate ip address ranges that i'm expecting. And so i say they're not coming from there. Got some more conditions here. The caller isn't an aws service, right? Because the aws service that's at cloud trail accessing my bucket is actually going to come from whatever network cloud trail runs in, which is not going to be my vpc. And the call isn't being made on my behalf. It's not an onward call. It's not s3 calls kms or athena calls s3. There's a lot of patterns like this around aws. They're summarized by the, uh, via aws service or called via condition.

So i'm saying this is not, this call is being done directly by a principal here. They're not aws, they're not aws making an onward request. They're not coming from a network i'm expecting. And so i just want to exclude that access. So that's, that's what's going on here.

All right, we just took you on a whirlwind tour of all of the things that you can, you need to and can do to get a really good handle on security in s3. Of course, if you come away from this talk with just one takeaway, it's that actually this is easier than a lot of people think this is a little easier than a lot of people think because a lot of these things are now, you know, we're really proud to say a lot of these settings that we strongly recommend are now the default. You would actually have to do work to not configure your buckets this way. So you're getting encryption by default, you're getting block public access by default. You're getting the older access control system called apples, you're getting that off by default.

Um we talked a little bit about encryption and how that works. Um really, you have two options. You have the option where you do nothing and you get encryption anyways and you have the option where if you want that second level of control, you want to write more iem policies. I'm not joking. You actually want the second level of control um to be able to use customer manage keys, kms keys and s3. Uh we talked about how you get good visibility into what's actually going on. Um getting, getting visibility into the request being made, that's server access logs and cloud trail and also being able to assess your uh assess your security posture in a static way using im access analyzer, which is uh which is backed by some of our automated reasoning techniques at aws.

And then we talked about the two faces of iam. We talked about im for granting access in ways you do that for certain patterns at different levels of scale. And then we talked about i a sort of the, the, the negative side of ie being able to use iem to create a data perimeter to exclude use cases that you're just not expecting. So that's how you secure s3. And i want to say thank you and i hope you have an excellent time uh at re invent this, this year. It looks like a fantastic conference. We're gonna be ha we'll be having this year. I hope you go to a lot of great sessions. We have a lot of go to a lot of fun events.

Um thank you so much for coming. Have a great week.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Amazon S3 security and access control best practices

Ok.
复制链接

扫一扫

Amazon S3 security and access control best practices

“相关推荐”对你有帮助么？