Simplifying data security in a complex data lake environment

Right. Hello. Uh my name is Zachary Friedman. I'm Director of Product at Auda. I'm Puy uh Product Manager from Amazon S3. And we are excited to talk to you today about simplifying data security in complex data lake environment.

So we are briefly going to talk about customer requirements and context for um access controls in a hybrid data lake and data warehouse environment. Uh Huey will talk about new uh feature for access controls on Amazon S3. And then we will briefly talk about our new Auda Amazon S3 integration.

So S3 fundamentally forms the storage backbone for all data period. So no matter where the data is actually accessed data is stored in S3. So for instance, if you're a Snowflake customer, that data is in S3, it is just abstracted away from you. Um if you use Databricks, the data is an S3 and an admin can access it. And obviously, if you use Redshift Spectrum, that data and its existence is known to the creator of an external table so quickly, like one core question that starts to emerge if you are responsible for maintaining one of these cross platform multiengine analytical data platforms with a compute and storage aspect is like how should I think about doing this? Now?

One approach is that when data is in S3, you could allow no raw data access directly from object storage. This allows you to control fine grain access through like specific known access points. Uh the query engines like Spectrum Snowflake Databricks and so on. The problem with this approach is that it is limiting for one. It means that data scientists need to wait for data to land in a data platform before using the data. It means that open source tools can only be used if they support connecting to those data platforms. It also means that for instance, like a straight file read without any sort of data transformation could incur unnecessary compute costs. And people can only access data in a data platform like they have to have a log in to that data platform so quickly in these large scale analytical communities with again a storage and compute. Um yeah access pattern. It it becomes challenging now but there are plenty of reasons to limit direct S3 access. Two traditionally managing complex and dynamic access control logic over S3 files has been challenging until now.

Alright, thanks Zach. Uh hi everyone, Qe from uh Amazon's free product team. Just out of curiosity how many of you guys like actually work in securities and data governance and you know area companies. Ok. Sweet. Ok. Uh of course. I muto folks. Um so this is a new feature we recently launched on sunday. So two days ago called Amazon S Reis Gras. Uh and what it does is a couple of things. Uh first is allow you to grant S3 access directly uh to user and grouping or uh external corporate directory. So today S3, like most A BS services, mostly authenticate authorized against IM. Uh so, you know, identity needs to be IM principal, I am user and rows. So this new features were allowing you to grant access directly to um you know, direct user and groups.

The second thing is we heard a lot of customers, they wanna, you know, define access, not using IM policy, you know, the bracket E json. Um but you know, in a more intuitive sort of grand style that's similar to how we demand permission in a relational database. So with Access Gras that you can sort of define permission in a, you know, intuitive brand style is also highly scalable uh similar to the left side. You know, first example, you get a different a access level to a directory, users, director groups. Last example, you different access to IM role and what it does behind the scene is that Access Grants bends adjusting time, least privileged short term credentials and you know, when client need access and, and that's based on sort of the grants and it will do lookups and figure out the request authorized if it is, it will authorize and then you expert credentials and the client can then use to access at three and last, but not least we heard a lot from customers that auditing is super important, especially, you know, if your companies are in financial services, government, uh you know, healthcare, heavily regulated industries where you wanna see end user access. So who access what data exactly at what time? So the feature come uh with integration with Cultural so that you can use Cult to do that, you know, native and detailed audit history.

And how does how does it work? So concrete scenario, you have your bucket, let's say, you know pdf np four j sign a bunch of stuff like ont structure b block storage and then you have your user that want access to the data, you know, traditionally you will provision IM principle for those users, you know, allow those user to assume those role and then you have bucket policy to manage access in a world of Axis grains. And you know, uh what you would do is that first you create a what's called a grain instance and this is a sort of a per region per account concept uh as a logical grouping for all your grants. Thereafter you register all the grants, the permission saying who can have low level access to what sort of, you know, the what we talked about intuitive grant style. And then when client uh when they need access to S3. What they would do is that they call for instance, and they get data access request saying like, hey, here's who i am. Here's the S3 prefix bucket object. I would like to get access. Can i get access if the request is authorized? Based on all the grants that you registered Access gra will bend you back three credentials. And these are credentials are, you know, standard STS tokens, uh short term credentials. And then the client can use that uh short-term credential to access S3.

So in the absence of us building this feature, we've seen, you know, a couple of large C as your customer building, very similar construct and this pattern, you know, people call it credential broker, credential, vendor, session token generator and whatnot. So let's walk through a, you know, standard scenario where reactive gran and I mua might come into pictures of how it might help you.

So let's say, you know, your uh you work on data governance at your companies and your data la stock is fairly complex. It has snowflake, data break, starburst and uh russia of course, and you need to, you know, govern those data and make sure the right people have right access and you need to enforce policy and register those policies and you provision sort of role based access. So uh you have finance and marketing, you have roles set up for them, you have permission set up that govern those access. But increasingly uh with the rise of gen a I and machine learning, you have new persona like data scientists and machine learning engineer. What they need is that they need direct access to S3, you know, probably from a notebook environment. You know, they pull data from S3, do some exploratory analysis and then feed those data directly to a machine learning model. But today you manage those access mostly with IM and bucket policy, which means that it's slightly sort of, you know, separate paradigm from how you would do it with data breaks and snowflake uh in a more sort of traditional analytics and rigal database like you know gran style.

So what you want is be able to govern your S3 axis, the same as how you would govern your data axis for data break snowflake, red ship and starburst. And that's where I mua can come in and help.

Yeah. Mm so building off of this three access grants feature, what Am mua has built is the ability like you, he said to grant S3 access alongside cloud native data platforms in a common security model and policy engine. What a muta adds to um you know, existing concepts of role based access control is a concept called attribute based access control. What this allows you to do is you can take attributes of identities. And so there's like the mua user that represents it abstracts over the equivalent concept of an AWS IM identity or like a human 80 group or 80 user um identity. And that identity to mua can have attributes about it, like what department they're in or um it it can get very fine grained as well and also taking attributes of data. So you create like an S3, the um the uh S3 prefix in the uh privilege grants, you represent that as we call this concept, a data source in a muta and you can tag that S3 prefix or that abstraction over an S3 prefix as also having specific attributes. So for instance, like has pi i and you can use something like amazon macy to actually classify data inside of files and then use the results to build policy like in access logic.

And then ultimately um to close it off, you get detailed audit access available via cloud trail based off of features that S3 access grants has to integrate with the cloud trail uh for privacy requirements. And that will all uh eventually in the short short term be like centralized into a mua in a uh a uh platform agnostic um like unified data model.

And I just wanted to share a quote from a joint customer of ours that both Huey and I work with closely. So this is from um gentleman who's principal software engineer at Booking.com who is like one of these folks who is uh you know, responsible for building centralized access controls between Snowflake and S3 and so by leveraging both S3 access grants and the Amazon S3 integration from Auda, they were able to build a single control plane for data owners and governors. So folks who are responsible for saying who is authorized to access what to manage data access at scale for like regardless if it's in S3, regardless of if it's structured or unstructured in their data lake and in their cloud native data platform.

And moreover, as the integration is based on new, like native S3 access control capability. Not only, yeah, so they said it gives them confidence that the controls will be enforced consistently. Also like the fact that it's native access means like there's no agent, there's no proxy uh sitting in front of the data access to protect it. And so bermuda, you can think of as a uh you know, control plane orchestrator, uh you able to write single data access policy in auda. And we push that out to S3 access grants. And then at data access time, you just ask uh you call the get data access endpoint that that Huey mentioned a few slides back and that's just like an AWS to AWS call. There's nothing um there, there's no like a mua sitting in front of it uh uh of S3 there.

And so, uh yeah, thank you everyone for coming to our talk. Uh we do have time for questions if anybody has any

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值