What’s new in Amazon DataZone

最新推荐文章于 2024-09-27 17:58:23 发布

李白的朋友王维

最新推荐文章于 2024-09-27 17:58:23 发布

阅读量139

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134795230

版权

Sheka Verma: All right. Can you guys hear me? Yay? All right, I see a lot of familiar faces in the room. Thank you again for joining us uh today to hear more about Amazon Datazone. I am Sheka Verma. I'm the head of product for Amazon Datazone. And um I'm also joined by my co presenters, Steve McPherson, the general manager for Amazon Datazone, as well as Mahesh Chandni, one of our customers um very prized customers who will walk you through his journey with Datazone, which I think will be very useful for a lot of you.

All right. Um and I see a few other customers in the room here who I highlight as we get to those sections. Are you excited? Yeah. Did you get some food before coming here or are you gonna try to slide up? Ok. All right, good, good, good. Ok. I'm glad you got some food because we have a packed agenda for you.

So we at Amazon we are very customer obsessed as I hope you all know. So we're gonna start and end with the customer. We start with the voice of the customer, which is what did we hear from our customers that led us to create Amazon Datazone. Um then we will go through what's new in Amazon Datazone. If you heard Adam's um keynote today, there's an exciting thing that he announced. Did anybody hear it? Did anybody hear Adam's keynote today? A few of us. Yeah, a few of you said that you heard it and that's why you came to the session. But we have some exciting stuff to show you there um which will really simplify your um cataloging and data discovery quite a bit. So I'm excited about that. Uh we also have two demos. So we want you to see some of the new things we have created. So we'll share the demos. Um and then finally we end with uh our customer deep dive which Mahesh will lead us through um the journey on Datazone. And um I think you'll find it uh quite informative or wherever you are in your journey to explore Datazone. So let's get started.

Um every customer I meet wants to be data driven. Is there anybody here who don't want, who doesn't want to be data driven? Oh right, everybody. That's why you're here. Um and we want to really make data available to everybody in the organization, but it's hard. So let's just quickly recap what all of you have told us why data being data driven is hard. Number one finding the right? data is hard, right? So in this particular case, I'm just walking you through a simple example, which i am sure resonates with a lot of you on a sales marketing and a finance team um wanting to share data and collaborate amongst themselves. Does that resonate with anybody here? Yeah. All right, cool.

Um so number one finding the data is hard, even when you find the data, getting access to it, who's the owner? Can I trust this data? That is super hard when you get access to the data, you all have different tools of your choice, right? Some people like visual tools, some people like notebooks, some people like query editors, how do you hook up that same consistent data with the tool of your choice? That is super hard. Um then you want to be able to work with a team, data analysis, analytics in general is a team sport. So how do you share it securely with the team um that you're working with is hard and last but not the least making sure that you're following the rules and regulations that you're supposed to follow as part of your organization is very hard because governance is currently hidden in each of these different data sources and each of the different tools that you're using for your analysis. All right.

So we all want to be enabling self service analytics for our customers. So what are the key components of self service analytics? Just let's quickly walk through this number one. Your users need to be able to discover, understand and access data across your organization. Number two, they need to be able to collaborate with multiple people on the same data problems. So same data set, same consistent access, different people and different tools. Then you need to be able to hook up to a variety of analysis and p tools. How many people here use just one analytics tool? Nobody, that's what i thought it's a very diverse landscape, right? So you want to be able to use the tools of your choice with the same consistent data set, then you want to drive self service through some sort of a single pane that all your users can go to so that you can point them to one thing and say, hey, go here, start your journey and go from there and last but not the least you want the mechanisms to govern data and tools consistently across the tools of your choice and across the data sources that you want to make available super duper hard today. But that's what we are after.

So as you can see, we have bold ambitions and a bunch of this you can do across the sources that we support. So i'd love to just show you some of these things today. So just to recap um self service analytics, Amazon Data zone, what does it help you do? It helps you catalog discover govern analyze data across your organizational boundaries. So you have a set of data producers who are teams who own the data. These are your data owners. So like revenue is often owned by a finance team. Campaign data is usually owned by a marketing team, you know, and so on and so forth, right? So you want the owners of the data to be able to share that data consistently, the consumers of the data can use that data consistently within the governance guard rails of whatever the producer asked them to do.

Let's go a little deeper how many folks have seen this a version of this slide before i just want to see some of you. Ok. So there's quite a few new people in the crowd here. Awesome. I love it. Um so what is within this square box in the in the middle here with Datazone? Number one, we start with something called a domain which is really an organizational construct that you can use across your organization to decide what is the boundary across which data sharing data consumption is very free flowing. Because within that domain, you will have one catalog that you can have for all of that domain. You can have multiple projects and environments to associate the infrastructure that you want to use for your analytics tools and you can have consistent governance across all of those projects and data sets.

So what does a domain? Some of you may ask so often i found that customers like to think of domains as their lines of businesses, their organizational hierarchy. So it could be sales, it could be marketing, it could be loans for a financial services company. It could be um campaign analysis for um an ecommerce company. It could be whatever you want it to be, but you need to think of it as how much do you want to share data across that construct of a domain? Because across that domain, you get a portal that you can drive all your users to. So if you have one domain, for example, for your entire company, you can give one data home page to your entire company and say go there, look for the data you need and go from there to start your journey. And of course, this all is available through a ps because most of you i think must have either homegrown solutions or other analytic solutions that you're already using, that you'd want to hook up with Data zone. Um so with the, we provide up to be able to do that very easily and Mahesh actually is gonna talk about that as part of his presentation as well. Does this make sense? Can I get a few nods or something? Ok, cool. Thank you.

All right. So that ugly picture that we saw earlier where things were separate, disconnected. What happens when you have Data zone is really you can go to Datazone for all of that. So you can go to Datazone to find the data that you need request and get access to it, connect it to the tools of your choice, collaborate on it with your team members, with this construct of project that we have introduced in Datazone and follow the governance rules of your company, your line of business, your team, whatever is it? That may be cool.

So we have this cool thing called Life pole in this session, which is where we can gather your input on some of these things. Oftentimes customers think of Datazone as an enterprise data catalog or a data marketplace or an enterprise data ecosystem. Um so our question is, do you have something like this? And if you could go to um this particular qr code or the url, you should be able to respond to it. So i'll give you a few minutes.

Y everybody is doing something awesome. So just take a few minutes, few seconds and answer that question and we'll see the results right here. Um in a few. All right. So let's see. Some folks have licensed the software, some build it themselves, some evaluating options right now and some. No. Ok. All right. Quite an even spread. I think you're gonna find this presentation quite interesting and exciting. Um you know, at re invent, we are all about reinventing. So if you have something already, it may give you ideas on how you expand it, you change it, you morph to something else or you use Datazone, your choice. All right. Thank you, everybody.

So let me show you a quick demo of what i just, the story that i just told you about where there is two teams that want to share data across themselves. Uh sales has create a data set that finance wants access to. Finance also has multiple skill sets and personas that want access to different tools. So let's see how this would work in Datazone.

Um so what we're gonna do here is the finance team has a data project in Datazone. They will ask for access to the sales data set. Uh sales will approve their request and then finance team members will be able to use a variety of tools of their choice with that data set. So let's go.

All right. So this is the Datazone portal which users can use with their corporate credentials. So you do this signed on, this is not in the console, it's outside of the console. So they just come to this web page. The finance team is going to their finance analytics project on the right. You can see there is a variety of things available for them. There's data warehouse, there's data lake, there is SageMaker as well as there is QuickSight Sage me site are available in private preview right now. So if you're interested, please do contact us or your account team and we can hook you up.

Now. They're looking for the sales data. They just found this data asset, retail sales transactions. They're gonna review some information that is available. So there's a read me that is easily understandable since this was coming from glue, there is technical information that we have pulled in directly from glue. You can get that. Now, the interesting part is some of this other information which is available here. So you can see who the owner for the data is. Um will smith, i trust will smith. So i'm gonna probably go for this state asset. Um there is additional information that can be added via metadata forms that you can configure for your domain. So this can all be very standard information across your assets here. There's a data quality form so that you can see um what kind of data quality score has been given to this particular asset. It just helps you understand and trust the data a little bit better.

So the finance team member who was looking for this is like, ok, looks cool. Uh schema looks cool. I'm gonna ask for access. So they click on subscribe and just provide a quick justification. The workflow is you know, it's routing you to an approval form. Now as the approval has been created, we'll switch to the sales side who just got a request for approving this particular data asset. So now they'll go into their incoming request. They see that the request is there like, ok, who is asking for this thingy. Click on this. Ok

Cool frank. Ok. I know frank. I know he needed this. So I'm gonna approve it. Um clicked on, approve. Go ahead.

Now we'll switch back into the finance project. And now if we go to finance analytics and the data tab, you can see that there is a um transactions table that is available here and it's been added to three environments.

So this is what i'll tell you a little bit more about those three environments are what gives you the three different kinds of tool sets that your finance users wanted access to.

So here we go, we went back into the project, we clicked on athena and we landed into athena in here. The important thing to note that the context that you were carrying of the project and the table that you just got access to is carried here with you. So you don't have to go and look for it anywhere. You don't have to establish new connections, what not it's there right there. You can preview it and you were able to get to the data right here in athena without you having to dig through and find other things.

So the plumbing aspect of connecting datazone to all of these services is what we do in the background so that you don't have to do it yourself. It's readily available for you.

Now, the second thing we have a analyst who wants to go into quick site again, you saw that there was quick link there to get into quick site. They clicked on quick site and if you've ever used quick site before they just landed in here, um you can see that again, the data context of the project and the data set is carried with them.

So they're gonna quickly create um some analysis on quicksight here. They click the same table. Um they're gonna query the data quickly. We'll create a quick calculation. Um how many of you use quick site here? A quick site users? Some of you. Ok. Cool.

So if you want to use this, let us know and we can hook you up um in the private preview that we have going on. We'll create a quick calculation and then off we go, you can see that you know the data is available and you can do all of that very, very quickly.

Um there's also some very cool um gen p features that were released in um quick site if you have got that news. So essentially all of that is available with your government data set that you just used using data zone, which is super cool. All right, i looked at this. I'm going to create a quick, quick graph looks good. I'm done this works.

So you were able to get from datazone to pci do your analysis and now i'm gonna quickly show you sage maker. So again, you're back in datazone you clicked on the sage maker link. Now you want to use a notebook um to do your analysis.

So you're landing in sage maker studio with this connection. Again, your context is carried. You can use the data with data wrangler or studio notebooks, whatever you are used to using within sage maker. And um you know, save the data set, update the data set, whatever and it saves it back in its location.

How many folks here use sage maker a lot more? Ok. All right. Well, if you want to use this, let us know. So again, here we are, uh, um, kick starting a notebook and we'll do a quick query on the data set that we just showed you. Um, so that we can conclude this part of the sagemaker journey.

So in here, you'll see that the same table that we are talking about is now available here. You were able to readily get to it. You didn't have to go hunt for it from within the notebook. So that concludes, uh, this particular demo.

Um so i know it went rather quick, but i hope you saw that your users were always able to go back to datazone, start their journey from there. They were able to find their asset, get access to it via the project. There was few people in that project, right? So it's not just a person, it's a team that can be associated with the project and then they were able to jump into the tools of their choice.

It was the same, consistent data and same governance across all of those. How cool is that? Like it? Yay. All right, cool. Well, we have cooler stuff coming.

Ok. So let me talk a little bit about some of the things that i showed you there. So um one of the first things that people ask us when they start their journey with datazone is what the heck is domain.

And because it's a very key organizing construct for you when you get started with datazone, especially for large companies. Um i've seen, i've seen customers start with one domain. I've seen some companies that want segregation across their domain start with multiple domains.

So imagine a scenario where everybody in your organization wants to share data very readily. But there is an hr team, these hr people, they want to do not share their data so they can get a separate domain um where it's very segregated and it's available only to the folks that they enter in the domain.

So what they would do is you can, you know, decentralize your teams like that, right? You can decide how architecturally you wanna um organize data across your company and set up domains that way, then you can associate your existing aws accounts that you're using across um your use of aws already and associate them with the appropriate domains.

So those domains now have the association with a set of accounts and they can share it amongst themselves. Of course, each domain is connected to the identity provider. You saw that, right? Super simple. No accounts to remember.

Um no uh im accounts, no, you know, roles to worry about um in the aws landscape or console and of course available across multiple regions. So what does that allow you to do?

Um is for each of those domains, you're sharing data, um data assets very readily. And when we say data assets, it's, you know, glue tables, it's red shift tables. It's we call something called this custom assets, which can be really anything, anything you wanna make available for a variety of folks within your organization, for sharing purposes, you can add it in datazone as a custom asset and it can come from anywhere too.

So anything that you can catalog and glue can be made available as a discoverable item in datazone, which is a very large list of things. Um now, in the, in the more improving the searchability and find ability of data, you can associate business glossaries standard terms, you can create metadata forms um in the sample that i showed you those little snippets of tables that i had stacked one under the other.

Whether there is a data owner, there is um you know, those are all specific metadata forms that you can create and make available and everything here is searchable across the catalog so it's, um, it's really super easy to find stuff.

So just a couple of screenshots to show you how easy it is. You know, you, it's um, very simple. You search as you, you know, type it searches gives you things, gives you a listing. Um there is, i don't know how much you can see the, um, the things, but on the right side of the screenshot, you can see on the left panel, you can search things by the glossary term.

There are a variety of asset types and facets that you can search on the asset detail page which went by quick in our demo earlier. But here you can see the segregation of the metadata forms and how um the technical data and business metadata glossary data is all available together. Ok?

Projects. So the projects that we talked about earlier, the finance analytics project, it had multiple things on the right. If you remember there was data warehouse data lake sage maker quick site athena, there was just a bunch of things on the right.

So those are environments within a project. So your project think of it as a as a business use case focused container where you want a few people to work together on a specific problem with a few data sets.

Um and a few environments project gives you the um the ability to construct that um sort of container for your folks and then the users get and the data gets authorized at the project level and then infrastructure and the tools are really at the environment level.

So that's how you know the things that these two things work together. For those of you who may have tried datazone during preview environments is something that we have introduced recently with our g to give you that flexibility of infrastructure sonos. Again, give you um within a project, you can have multiple environments which allows you to go into the multiple tools that we support.

All right, cool. So we are gonna go now into some of the automation and gen capabilities. And i'd love to invite steve on the stage here. Let me leave you with our second life pole question.

Um think about it and steve will guide you through the next few slides. Thank you, steve.

Ok. Thank you. Well, thank you guys. So of that, it seems like the the the polls are in and it seems like there's a generally a trend towards ah larger numbers of people that need access to data in a company which you know, is intuitive when you have people, they have data, they want to do work in an information worker environment. They need access to data. That that makes sense.

And from that perspective, it resonates with one of the reasons that we built datazone, which was to give people and teams ah ah easy access to to data. So, shica took you through some of the foundational primitives of amazon datazone, the business data catalog itself, the projects system which allows us to tie infrastructure to grant some things.

Now, i want to walk you through the the elements of datazone that actually allow you to do govern sharing. So within datazone, we have introduced the concepts of publication and subscription as a way of commuting access grants and and data between users in between, in between tools.

So with that data producers pretty obviously, they, they go into datazone, they advertise the data that exists. They, they curate it and then they publish a catalog. Subscribers go to the catalog as well. They search for it with appropriate with with reasonable context on it and they, they request access to the data.

Now this is important because normally this happens through sneakernet through someone sending an email, some, some document that goes somewhere or they are very vast complicated systems that people have to build to track all these things and and understand it.

And so we've built these primitives of publication and subscription to facilitate that in distributed organization. So i want to dive into a little bit more to walk you through these.

So the publishing flow, if you're looking at publishing data from like the glued data catalog or amazon redshift, you can automate publishing by through the the data sources which can automatically create assets in inventory in in datazone.

If you have data that exists outside of the the tooling that we have for automated discovery you. So those are the aws glue data catalog, redshift. These, these go through automated publishing if you have data that comes from out, you know that that is not covered by those individual crawlers or integration through our api.

Also through the ui you can do manual publishing of assets and you can really cover anything that you want any arbitrary description of things, especially with the flexibility of of datazone. Now, with custom asset types and metadata forms, all of this stuff can be can be programmatically integrated as well.

Once you have these things in datazone in your project, you can then curate the data which is important for people to, to make it discoverable and usable and really connect with with why people may want it. So and then from there, obviously you publish it to the catalog and that's what makes it discoverable to others.

So conversely, if you go to the to the subscribing side, a a business user will have some need for data, they will search for it, they will then find it and request access to the data. This is again formalized through the the subscription process here, the approval is then reviewed, it is either approved or rejected.

And if it is approved, the asset is then commuted. If it is a glue table, if it is a red shift table, if you have custom integration to do your own subscription fulfillment, those assets are then materialized into the appropriate target.

So if it's a glue table, it would be pushed into the appropriate glue environment if it's. And as you saw in the demo there, the data was made available in quicksight and sagemaker through that process of, of fulfillment that we have there.

So with, with those primitives there, you have a business data catalog, you have projects that can connect you to tooling. You have a formalized way of your organization in a distributed fashion, actually publishing things and subscribing to things and actually being able to track it and to get the appropriate permissions into the infrastructure.

So people can access the data and it's, that's a whole bunch of stuff that people have had to build for a long time. And it's been really hard and really frustrating.

One of the other big challenges here is actually taking all of that stuff, these these basic capabilities and and getting the data to not, you know, beyond just the physical connectivity of it and getting it to the point where people can actually understand it, like understand what the data is.

And when we talk to customers specifically, when they're building out their internal infrastructure for for data sharing, there's a huge amount of effort that people have to put into not only like the infrastructure of it but also just the rote process of documenting data so that people can discover it and identifying tags, identifying different pieces of information just generally what to do with it.

And so with that, i am very excited to announce the integration of a generative a i into amazon datazone, specifically automated descriptions powered by a generative a i

This is something that was announced yesterday, today, and is now live in the product. So if any of you guys are current Amazon DataZone users, you can go in and check it out on the site.

So with this, beyond just facilitating the actual connectivity of data and the the rote sort of platform level of it, we're also able to really help to aid search and discovery of valuable assets. We're able to recommend how to use the data and giving guidance to users all through Genie AI. And fundamentally to significantly increase the productivity of teams that want to get up in a data sharing environment.

Okay? And with that, I am going to go into a demo and show you and show you this. So in this, we'll pick up from the same finance team, finance team has a uh a data set and I'm gonna take this from nothing all the way to the end uh where we've, we've published it into the catalog and we're gonna, we're gonna use um the automated generation to, to fill out the whole catalog information for it.

So with that, we'll kick it off. Okay? So I'm a user, I'm going in, I'm going to create the data. So we're starting from nothing. Here I am. I'm clicking into Athena first. I'm going to do it like most of us, I'm going to create a table and that table has some fairly cryptic names in it, of columns and, and things. So this is it, it exists physically. Now. Now I'm going to go and make it discoverable to users.

So I'm going into DataZone, I'm going to run an automated metadata job. I'm kicking that thing off now and I'm going to see the results of that coming back and you can see the DataZone has already figured out how to take the cryptic name of the table and turn it into something readable in payments online. Now, I know that this specifically needs to change because it's about the US market so I can edit it because it's, this is what you do is you curate your editing form here.

So from there, I'm now going to go look at some other interesting data in the schema. We've gone from rgn for region to actual full names of columns and all of this just started from the create statement that we had a minute ago in Athena. So this is great. Now, I've gotten some column names out of it and I've gotten the table name stuff. But what if I could actually fully describe the data with that one click that we just did there?

I have now generated a thoughtful computer-generated, thoughtful summaries of the data, how to use it, what and what it means within, within the organization. Now, after reviewing it, I can see this pretty good. I'm actually going to go look now at the, it's also extended the descriptions in the columns as well to, to take us even further from the cryptic names all the way into something that somebody who is coming in to read it could, could fully understand what this data set is and they could discover it in the catalog.

And we've gone now in a minute and 40 seconds as you just saw in the video here, we've gone from something being created from nothing in Athena in infrastructure and being pushed all the way through with full documentation into an area where people can easily discover the data across organizations and use.

So for before this generative AI integration, before automated descriptions, where you may have started with something that had very cryptic column names like we have here and very sparse information. This probably looks familiar from every wiki that anyone's ever created of, you know, trying to help people understand data as you start with the physical stuff. That's that's obvious and hope that the users will, will figure it out the context, will let people know that rgn obviously means region and ct y equals equals country.

And with that, and with this integration, we have now taken that very sparse information and made it something that was fully searchable within DataZone and very practical, useful inspirations on what is the demand of the data. What do the the individual columns mean? How can you use this in processing, what sort of analysis might you want to do with it?

And this, what previously would have been hours of work is done in a matter of seconds and is instantly in your organization for everyone, for everyone to benefit from. So this reduces the infrastructure work that you've had to historically build. And the manual work and thoughtfulness of having someone actually sit down and go through hundreds of tables or hundreds of data sets and actually describe it in a way that the human on the other end can, can make sense of it.

So that's very cool. We just got to GA we're obviously cranking out stuff as we're hearing from customers, the generator, a space is critically important for us and we are moving quickly to extend these capabilities. The, the what I described earlier in the context of publication and subscription. When you think of what publication and subscription means in, in terms of a project where a project may represent a daily report or a weekly report, you want to give access to the purpose of that report. You don't want to give it to a person Sally who may win the lottery and, and go somewhere.

So that convention of the project creates a closure of access grants that can be used generally to govern automated processes in general teams who want to interact with data but also can be used fundamentally to, to govern the access that generative AI presses have. So if you want to say that this training, this model or using this rag application, I wanted to have these specific grants, these are the things that come through DataZone projects and we'll be building further integration with AWS services to, to make that easier and easier.

Similarly, really extending the search capabilities of the catalog to have interactive chats where you can say, hey, what is the, what is the data related to this business report? Or you know, I'm trying to find this event and then actually getting to the point where where we can ask questions of saying, hey, I want to figure out, you know, there was an event in our system last week. I want to understand which customers were impacted by it and interacting with the LLM, which has full understanding of the, the catalog, which is otherwise public within the domain, you know, public to any user as if you were going to a human that understood every single thing that was in the catalog and they could guide you through.

Yes, you want to get that. Yes, you want to get that. And then say, here's the type of analysis that you want to do. These are the things that we're continuing to build on with uh with this work in in general I with that, I will hand off to Mesh. I hope you can hear me. All good. All right, good afternoon, Iwan. My name is Mahesh Shawa and I lead the data platforms team at CPP. I also known as Canadian Pension Plan Investment. We are investment management firm based out of Canada today. I'll be sharing with you our experience of using DataZone service. But before that, a very quick word about my employer CPP Investments is a global investment management organization. We are stewards of Canadian Pension Plan. The fund is valued over $500 billion and comparatively we rank against against the world's top most pension funds. According to a recent report by Global Sovereign Wealth Fund, CPP delivered the highest 10 year returns of 10.9%. Whereas other pension funds average 7.4% in just over 20 years. We have grown CPP assets from 36 billion to a little over 570 billion. That's an amazing feat.

So how do we do that? Of course, a lot of credit goes to an amazing talented team at the fund. But there's a lot of rigor as far as processes and supporting technologies are concerned. There is segregation of duties across multiple investment functions and every investment decision is validated. This requires a lot of information flow across various departments and data sharing is key and all of this data has to be highly accurate and there's no room for any inconsistency in our business.

All right. So before I start, let's talk about our architecture before and after we moved to the cloud in the legacy state to solve their data needs. Departments typically copy data from source operational systems to their local databases. And this was done typically using you know, batch project specific batch ETL pipelines which were built and managed by central IT teams. We use combination of Informatica DataStage and other ETL standard tools over a period of time. This led to a lot of data sprawl and data consistency issues. And there was also heavy reliance on central IT teams and lack of tribal, basically reliance on tribal knowledge because of lack of metadata departments are losing trust in the data and teams were doing manual reconciliations just to ensure that the data is accurate. And in the process, they were wasting a lot of valuable time and effort to solve our legacy state data problems.

We built a cloud native data platform on Amazon using AWS uh sorry data mesh principles. First, we federated the ownership of the data to the teams that had the data knowledge, essentially creating data products. Next, we created a data platform that secures and virtualize all data in the organization, enabling in place data sharing. This solved our data sprawl problem, minimize our data consistency issues as well as simplified our landscape.

Every box in this diagram, architecture diagram you see is an AWS account in the center, you see a data platform account which virtualize is all data in the organization through a centralized enterprise data catalog. This architecture is an implementation of hub and spoke data mesh pattern. Some of you may be familiar with this one.

Um in this, the hub enables federation and the spokes are various data domains which house the data products as well as data producers and consumers. The hub does not contain any data. Instead, it manages the entire data mesh via central control plane and the control plane integrates and orchestrates various services across all these multiple a accounts. The control plane also collects and stores metadata centrally and it enables data producers and consumers to collaborate amongst themselves in a self service manner.

We designed our data platform with self service in mind. We were very conscious of avoiding ticket based culture building a self service interface and a unified abstraction over a distributed data mesh which spawns across multiple AWS accounts requires a lot of technical expertise, custom engineering and integration across multiple AWS services. We created a custom control plane which simplifies platform on boarding experience and run time operations by hiding the technical complexities of various AWS services.

The control plane on boards and ads account onto the platform and it creates various components required for entitlements, security data catalog, observable and many other data services. The control plane also democratizes data services so that producers and consumers can collaborate amongst themselves using the data platform in a self service manner.

Leveraging the control plane, a data producer can publish the data onto the platform by themselves at data on boarding time. They specify basic set of minimum metadata which control plane takes applies entitlements and shares it across the organization. Leveraging the enterprise data catalog, consumers search and discover the data using an in house build data portal and request access to it if needed. And the control plane provisions the user level access based on the rules set up by publishers.

So all of this work flow i just explained is in house built. So in summary, our custom control plane democratizes data services. It accelerates mesh implementation as well as simplifies mesh operations.

Building, a comprehensive data mesh control plane requires a lot of technical skills and a lot of maintenance overhead which we always wished if we could somehow offload to a vendor as a service. Unfortunately, there is no vendor out there in the market which implements a comprehensive data measure as a product. When we heard the announcement of the DataZone service, we were very excited because it aligned very well with our vision of self service as well as democratization of producers and consumers.

DataZone resonates very well with our control plane. And we are very happy to have partnered with AWS during the build of the service. It not only helps us simplify our architecture but also makes the interactions between our producers and consumers almost frictionless. And most importantly, it's an AWS manage service and appears to be the missing layer or the link that we are always finding for from a vendor to simplify and mature our architecture.

So far, we have had very positive experience with the service. Our data stewards really like the data portal interface which helps them curate the data using business name generation metadata forms as well as business glossary. We really like the environment's concept as it helps encapsulate and accelerate data enablement across multiple accounts across multiple consumption patterns. The manage subscription workflow and its integration with Lake Formation integrates very well with our service account provisioning use case and also the automatic data sharing across multiple accounts, which also integrates with Lake Formation, simplifies our custom engineering solution for catalog sharing and these were all parts of the control plane.

So as the next step, we are working very closely with the AWS and DataZone and moving forward in adoption of DataZone in our platform. And this will help replace parts of our control plane as well as our in house build data portal. We're also looking forward to test and integrate the new gen features. And we're also working with the DataZone team on integrating additional capabilities into the service such as tag based access control, data quality, data profile and data lineage all in built into the DataZone which will further improve data comprehension and data security and trust in the data for our consumers.

We continue from CPP side very closely work with AWS to mature and simplify how our data products are built discovered and consumed across the organization. We appreciate AWS for their customer focused approach and building solutions for customers like us so that we can focus on solving business challenges. That's all i had for my lightning talk more than happy to connect offline and have some more detailed discussions. Feel free to connect. And thank you very much for listening in back to shipping.

That was awesome. Thank you, Mahesh and Steve. Was that awesome? Yay fired up. Ok, cool. All right. So i, i see a few more customers in the crowd here, but i just wanted to highlight a few more stories which are all available online for you guys to see um Bristol Myers Web. Um It Thought Garden Health Holus, a variety of companies across many industries have started using DataZone and we're getting really good, interesting feedback. And like Mahesh mentioned, we like, we love working closely with customers so we continue to enhance the service based on all of this feedback.

So with that, let's do our final live poll question and then we'll go into q and a um based on what you know so far and what we've shown you, does this fill a current gap in your portfolio? Uh we would love to know. Let's let's take a few. Ok, cool. Need to learn more. Awesome. Well, that's why we are here. We have a ton of sessions um in the next few days as well. Uh if you are just hearing about DataZone, um please do check out um additional sessions we have, I'll share more details here, but well, thank you folks so much. Do take a couple of minutes to put in uh the feedback in the survey. Really appreciate you guys. Thanks.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫